[
https://issues.apache.org/jira/browse/SLING-7778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stefan Egli updated SLING-7778:
-------------------------------
Description:
The creation of a scheduled job is handled in
[{{JobSchedulerImpl.addScheduledJob}}|https://github.com/apache/sling-org-apache-sling-event/blob/ca009499cbe30ae30df5069effe051e0bef9ef72/src/main/java/org/apache/sling/event/impl/jobs/scheduling/JobSchedulerImpl.java#L501]
which in turn [calls
{{scheduledJobHandler.addOrUpdateJob}}|https://github.com/apache/sling-org-apache-sling-event/blob/ca009499cbe30ae30df5069effe051e0bef9ef72/src/main/java/org/apache/sling/event/impl/jobs/scheduling/JobSchedulerImpl.java#L523].
There, the job is first persisted to the repository via [calling
{{writeScheduledJob}}|https://github.com/apache/sling-org-apache-sling-event/blob/ca009499cbe30ae30df5069effe051e0bef9ef72/src/main/java/org/apache/sling/event/impl/jobs/scheduling/ScheduledJobHandler.java#L213].
Normally the sequence of events is then as follows:
* [A] the [synchronized block in
{{addOrUpdateJob}}|https://github.com/apache/sling-org-apache-sling-event/blob/ca009499cbe30ae30df5069effe051e0bef9ef72/src/main/java/org/apache/sling/event/impl/jobs/scheduling/ScheduledJobHandler.java#L216]
is entered and the job is scheduled with quartz.
* [B] then shortly afterwards, observation is triggered and
{{handleAddOrUpdate}} is called. This will, as a *side-effect* [add the
scheduled job to the {{scheduledJobs}}
map|https://github.com/apache/sling-org-apache-sling-event/blob/ca009499cbe30ae30df5069effe051e0bef9ef72/src/main/java/org/apache/sling/event/impl/jobs/scheduling/ScheduledJobHandler.java#L446]
* the result of this being that the scheduled job is now in
{{ScheduledJobHandler.scheduledJobs}}
* later on, when the scheduled job is finished, {{handleRemove()}} (or similar)
is called, and it checks if it finds the scheduled job in
{{ScheduledJobHandler.scheduledJobs}}. if it does, all good. And here we're in
the all good case, so all is fine
However, there's the following race-condition:
* first [B] happens, ie first the observation event is triggered, so
{{handleAddOrUpdate}} is called, and again the side-effect hits: the scheduled
job is added to the {{ScheduledJobHandler.scheduledJobs}}. all good so far
* then [A] continues into the synchronized block. *But* now, the [remove of the
scheduled
job|https://github.com/apache/sling-org-apache-sling-event/blob/ca009499cbe30ae30df5069effe051e0bef9ef72/src/main/java/org/apache/sling/event/impl/jobs/scheduling/ScheduledJobHandler.java#L217]
actually does damage: the scheduled job ends up *not* being in the
{{ScheduledJobHandler.scheduledJobs}} map.
* so in this scenario, when a later {{handleRemove()}} (or similar) happens, it
can't find the scheduled job (see
[here|https://github.com/apache/sling-org-apache-sling-event/blob/ca009499cbe30ae30df5069effe051e0bef9ef72/src/main/java/org/apache/sling/event/impl/jobs/scheduling/ScheduledJobHandler.java#L370]
h being null)
* the result being that
[{{JobScheduler.unscheduleJob}}|https://github.com/apache/sling-org-apache-sling-event/blob/ca009499cbe30ae30df5069effe051e0bef9ef72/src/main/java/org/apache/sling/event/impl/jobs/scheduling/ScheduledJobHandler.java#L371]
is *not invoked*, *leaving the scheduled job with the JobSchedulerImpl*.
And the result of this race-condition is that
{{JobSchedulerImpl.scheduledJobs}} map has a zombie scheduled job, which it
shouldn't actually have.
So later on for example, a topology event happens, such as a PROPERTIES_EVENT.
That in turn will trigger
[{{JobSchedulerImpl.configurationChanged()}}|https://github.com/apache/sling-org-apache-sling-event/blob/ca009499cbe30ae30df5069effe051e0bef9ef72/src/main/java/org/apache/sling/event/impl/jobs/scheduling/JobSchedulerImpl.java#L152]
which will now *re-schedule the previously scheduled job*. And we have a
duplicate scheduled job execution.
was:
The creation of a scheduled job is handled in
[{{JobSchedulerImpl.addScheduledJob}}|https://github.com/apache/sling-org-apache-sling-event/blob/ca009499cbe30ae30df5069effe051e0bef9ef72/src/main/java/org/apache/sling/event/impl/jobs/scheduling/JobSchedulerImpl.java#L501]
which in turn [calls
{{scheduledJobHandler.addOrUpdateJob}}|https://github.com/apache/sling-org-apache-sling-event/blob/ca009499cbe30ae30df5069effe051e0bef9ef72/src/main/java/org/apache/sling/event/impl/jobs/scheduling/JobSchedulerImpl.java#L523].
There, the job is first persisted to the repository via [calling
{{writeScheduledJob}}|https://github.com/apache/sling-org-apache-sling-event/blob/ca009499cbe30ae30df5069effe051e0bef9ef72/src/main/java/org/apache/sling/event/impl/jobs/scheduling/ScheduledJobHandler.java#L213].
Normally the sequence of events is then as follows:
* [A] the [synchronized block in
{{addOrUpdateJob}}|https://github.com/apache/sling-org-apache-sling-event/blob/ca009499cbe30ae30df5069effe051e0bef9ef72/src/main/java/org/apache/sling/event/impl/jobs/scheduling/ScheduledJobHandler.java#L216]
is entered and the job is scheduled with quartz.
* [B] then shortly afterwards, observation is triggered and
{{handleAddOrUpdate}} is called. This will, as a *side-effect* [adds the
scheduled job to the {{scheduledJobs}}
map|https://github.com/apache/sling-org-apache-sling-event/blob/ca009499cbe30ae30df5069effe051e0bef9ef72/src/main/java/org/apache/sling/event/impl/jobs/scheduling/ScheduledJobHandler.java#L446]
* the result of this being that the scheduled job is now in
{{ScheduledJobHandler.scheduledJobs}}
* later on, when the scheduled job is finished, {{handleRemove()}} (or similar)
is called, and it checks if it finds the scheduled job in
{{ScheduledJobHandler.scheduledJobs}}. if it does, all good. And here we're in
the all good case, so all is fine
However, there's the following race-condition:
* first [B] happens, ie first the observation event is triggered, so
{{handleAddOrUpdate}} is called, and again the side-effect hits: the scheduled
job is added to the {{ScheduledJobHandler.scheduledJobs}}. all good so far
* then [A] continues into the synchronized block. *But* now, the [remove of the
scheduled
job|https://github.com/apache/sling-org-apache-sling-event/blob/ca009499cbe30ae30df5069effe051e0bef9ef72/src/main/java/org/apache/sling/event/impl/jobs/scheduling/ScheduledJobHandler.java#L217]
actually does damage: the scheduled job ends up *not* being in the
{{ScheduledJobHandler.scheduledJobs}} map.
* so in this scenario, when a later {{handleRemove()}} (or similar) happens, it
can't find the scheduled job (see
[here|https://github.com/apache/sling-org-apache-sling-event/blob/ca009499cbe30ae30df5069effe051e0bef9ef72/src/main/java/org/apache/sling/event/impl/jobs/scheduling/ScheduledJobHandler.java#L370]
h being null)
* the result being that
[{{JobScheduler.unscheduleJob}}|https://github.com/apache/sling-org-apache-sling-event/blob/ca009499cbe30ae30df5069effe051e0bef9ef72/src/main/java/org/apache/sling/event/impl/jobs/scheduling/ScheduledJobHandler.java#L371]
is *not invoked*, *leaving the scheduled job with the JobSchedulerImpl*.
And the result of this race-condition is that
{{JobSchedulerImpl.scheduledJobs}} map has a zombie scheduled job, which it
shouldn't actually have.
So later on for example, a topology event happens, such as a PROPERTIES_EVENT.
That in turn will trigger
[{{JobSchedulerImpl.configurationChanged()}}|https://github.com/apache/sling-org-apache-sling-event/blob/ca009499cbe30ae30df5069effe051e0bef9ef72/src/main/java/org/apache/sling/event/impl/jobs/scheduling/JobSchedulerImpl.java#L152]
which will now *re-schedule the previously scheduled job*. And we have a
duplicate scheduled job execution.
> race-condition in scheduled job creation might keep scheduled job in-memory,
> and cause a duplicate scheduled job execution
> --------------------------------------------------------------------------------------------------------------------------
>
> Key: SLING-7778
> URL: https://issues.apache.org/jira/browse/SLING-7778
> Project: Sling
> Issue Type: Bug
> Components: Event
> Affects Versions: Event 4.2.0
> Reporter: Stefan Egli
> Assignee: Stefan Egli
> Priority: Major
> Fix For: Event 4.2.12
>
>
> The creation of a scheduled job is handled in
> [{{JobSchedulerImpl.addScheduledJob}}|https://github.com/apache/sling-org-apache-sling-event/blob/ca009499cbe30ae30df5069effe051e0bef9ef72/src/main/java/org/apache/sling/event/impl/jobs/scheduling/JobSchedulerImpl.java#L501]
> which in turn [calls
> {{scheduledJobHandler.addOrUpdateJob}}|https://github.com/apache/sling-org-apache-sling-event/blob/ca009499cbe30ae30df5069effe051e0bef9ef72/src/main/java/org/apache/sling/event/impl/jobs/scheduling/JobSchedulerImpl.java#L523].
> There, the job is first persisted to the repository via [calling
> {{writeScheduledJob}}|https://github.com/apache/sling-org-apache-sling-event/blob/ca009499cbe30ae30df5069effe051e0bef9ef72/src/main/java/org/apache/sling/event/impl/jobs/scheduling/ScheduledJobHandler.java#L213].
> Normally the sequence of events is then as follows:
> * [A] the [synchronized block in
> {{addOrUpdateJob}}|https://github.com/apache/sling-org-apache-sling-event/blob/ca009499cbe30ae30df5069effe051e0bef9ef72/src/main/java/org/apache/sling/event/impl/jobs/scheduling/ScheduledJobHandler.java#L216]
> is entered and the job is scheduled with quartz.
> * [B] then shortly afterwards, observation is triggered and
> {{handleAddOrUpdate}} is called. This will, as a *side-effect* [add the
> scheduled job to the {{scheduledJobs}}
> map|https://github.com/apache/sling-org-apache-sling-event/blob/ca009499cbe30ae30df5069effe051e0bef9ef72/src/main/java/org/apache/sling/event/impl/jobs/scheduling/ScheduledJobHandler.java#L446]
> * the result of this being that the scheduled job is now in
> {{ScheduledJobHandler.scheduledJobs}}
> * later on, when the scheduled job is finished, {{handleRemove()}} (or
> similar) is called, and it checks if it finds the scheduled job in
> {{ScheduledJobHandler.scheduledJobs}}. if it does, all good. And here we're
> in the all good case, so all is fine
> However, there's the following race-condition:
> * first [B] happens, ie first the observation event is triggered, so
> {{handleAddOrUpdate}} is called, and again the side-effect hits: the
> scheduled job is added to the {{ScheduledJobHandler.scheduledJobs}}. all good
> so far
> * then [A] continues into the synchronized block. *But* now, the [remove of
> the scheduled
> job|https://github.com/apache/sling-org-apache-sling-event/blob/ca009499cbe30ae30df5069effe051e0bef9ef72/src/main/java/org/apache/sling/event/impl/jobs/scheduling/ScheduledJobHandler.java#L217]
> actually does damage: the scheduled job ends up *not* being in the
> {{ScheduledJobHandler.scheduledJobs}} map.
> * so in this scenario, when a later {{handleRemove()}} (or similar) happens,
> it can't find the scheduled job (see
> [here|https://github.com/apache/sling-org-apache-sling-event/blob/ca009499cbe30ae30df5069effe051e0bef9ef72/src/main/java/org/apache/sling/event/impl/jobs/scheduling/ScheduledJobHandler.java#L370]
> h being null)
> * the result being that
> [{{JobScheduler.unscheduleJob}}|https://github.com/apache/sling-org-apache-sling-event/blob/ca009499cbe30ae30df5069effe051e0bef9ef72/src/main/java/org/apache/sling/event/impl/jobs/scheduling/ScheduledJobHandler.java#L371]
> is *not invoked*, *leaving the scheduled job with the JobSchedulerImpl*.
> And the result of this race-condition is that
> {{JobSchedulerImpl.scheduledJobs}} map has a zombie scheduled job, which it
> shouldn't actually have.
> So later on for example, a topology event happens, such as a
> PROPERTIES_EVENT. That in turn will trigger
> [{{JobSchedulerImpl.configurationChanged()}}|https://github.com/apache/sling-org-apache-sling-event/blob/ca009499cbe30ae30df5069effe051e0bef9ef72/src/main/java/org/apache/sling/event/impl/jobs/scheduling/JobSchedulerImpl.java#L152]
> which will now *re-schedule the previously scheduled job*. And we have a
> duplicate scheduled job execution.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)