https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=36702
--- Comment #8 from Kyle M Hall (khall) <[email protected]> --- Created attachment 200997 --> https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=200997&action=edit Bug 36702: Add ability to retry failed background jobs with a max tries parameter Currently when a background job fails it just stops with a status of 'failed' and there is no way to have it run again. This is a problem for jobs that fail for transient reasons, like an Elasticsearch index update when the server is briefly overwhelmed, or a plugin job that calls an external API that is temporarily unavailable. This patch adds the ability to retry a failed job up to a maximum number of times. When a job throws an error the worker leaves the failed job alone so its messages and report are kept as history, and enqueues a new job that retries it. The new job points back at the one it is retrying through previous_job_id, so the whole chain of attempts can be followed. The maximum number of retries comes from, in order: * an explicit max_retries passed to enqueue() * the job type's default_max_retries() * BackgroundJobsDefaultMaxRetries system preference ( default 3 ). A value of 0 disables retries. Jobs that aren't safe to re-run with their original arguments ( batch modifications, imports, e-holdings creation, SUSHI harvesting and statistics pseudonymization ) override default_max_retries to 0 so they opt out. Retries don't all fire at once. The first retry runs immediately and each following retry waits an extra BackgroundJobsRetryDelay seconds ( default 30 ), tracked through the not_before column. The worker won't process a job before its not_before time, requeuing it in RabbitMQ mode or skipping it until the next poll in database mode, so retries work the same whether or not RabbitMQ is used. Test Plan: 1) Apply all the patches 2) Run updatedatabase.pl 3) Restart all the things! 4) Note the new columns on the background_jobs table ( max_retries, retries, previous_job_id, not_before ) and the two new system preferences, BackgroundJobsDefaultMaxRetries and BackgroundJobsRetryDelay, under Administration > System preferences > Administration > Jobs! 5) Leave BackgroundJobsDefaultMaxRetries at 3 and BackgroundJobsRetryDelay at 30 6) Make sure JobsNotificationMethod is set to 'STOMP' 7) Restart background jobs workers 8) Enqueue a job that will fail. In the Koha shell run: perl -e 'use Koha::BackgroundJob::TestTransport; Koha::BackgroundJob::TestTransport->new->enqueue( { transport_id => 999999 } );' 9) Browse to Administration > Manage jobs and refresh as the job runs 10) Note the original job ends as 'failed' and a new job is enqueued that links back to it through 'Retry of' 11) Note the first retry runs immediately ( Retries 1 / 3 ), a second is held back about 30 seconds ( Retries 2 / 3 ) and a third about 60 seconds ( Retries 3 / 3 ) 12) Note that once retries reaches BackgroundJobsDefaultMaxRetries ( 3 ) no further retry is made 13) Set JobsNotificationMethod to 'polling' 14) Restart background jobs workers again 15) Repeat step 8 to enqueue another failing job 16) Note similar retry chain is built -- You are receiving this mail because: You are watching all bug changes. _______________________________________________ Koha-bugs mailing list [email protected] https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/
