[ https://issues.apache.org/jira/browse/ARTEMIS-2926?focusedWorklogId=499313&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-499313 ]
ASF GitHub Bot logged work on ARTEMIS-2926: ------------------------------------------- Author: ASF GitHub Bot Created on: 12/Oct/20 10:16 Start Date: 12/Oct/20 10:16 Worklog Time Spent: 10m Work Description: gemmellr commented on pull request #3287: URL: https://github.com/apache/activemq-artemis/pull/3287#issuecomment-707028383 I think the changes seem ok, but I think perhaps the PR overlooks another simpler and more important behaviour that may be leading to the observed issue? The given period at construction of the scheduled tasks is documented as "the delay between the termination of one execution and the start of the next". Thats unsurprisingly consistent with the behaviour of scheduledExecutorService.scheduleWithFixedDelay(), which is whats used by the 'not onDemand' instances of the scheduled tasks. However, the tasks dont actually run in the scheduledExecutorService thread if the additional executor is given during construction. If the second executor is given, the scheduled task is just offloaded by the scheduledExecutorService for execution on the provided executor and entirely forgotten about. That seems like it could be the core of the observed issue to me? The above means there is no further tracking by the scheduler of when the task actually runs or how long the given task takes, meaning the periodic contract is somewhat lost at that point forward. Consider a situation: 1. Say there is a backlog of existing (related or unrelated) things for the executor still to run, so a new 'scheduled offloaded' task may not run for a little while until that is processed. Or instead say that thread scheduling means the second executor doesnt immediately get to executing the task. Whatever the reason, something means there is a small delay, but eventually the task does run. 2. A second task instance comes along from the scheduledExecutorService at some point, very closely after the configured period since it isnt affected by actual execution of the task, which gets offloaded. Maybe now there isnt any or as much backlog on the second executor, or theres a better thread scheduling environment, and this second task may get run relatively quicker than the prior instance actualy did. 3. Due to the 'lastTime' tracking occuring within the task itself, on the second executor, this second task instance which was offloaded by the scheduledExecutorService at its precise period, will now be observed to have occurred within the configured period of the previous tasks 'lstTime' and so get skipped. 4. This means nothing happened, and wont until the scheduledExecutorService comes along after a 3rd period and offloads the task another time, by which point approx double the expected period has elapsed and the task actually executes. Rinse and repeat this process over and over. If the second executor is provided, its actual execution + 'lastTime' period checks are essentially happening independently of the scheduling, and it seems like the scheduledExecutorService is trying to somewhat blindly throw tasks over a wall such that they land at the right time and actually get to run as opposed to skipping and waiting for next time. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 499313) Time Spent: 20m (was: 10m) > Scheduled task executions are skipped randomly > ---------------------------------------------- > > Key: ARTEMIS-2926 > URL: https://issues.apache.org/jira/browse/ARTEMIS-2926 > Project: ActiveMQ Artemis > Issue Type: Bug > Components: Broker > Affects Versions: 2.13.0 > Reporter: Apache Dev > Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Scheduled tasks extending {{ActiveMQScheduledComponent}} could randomly skip > an execution, logging: > {code} > Execution ignored due to too many simultaneous executions, probably a > previous delayed execution > {code} > The problem is in the "ActiveMQScheduledComponent#runForExecutor" Runnable. > Times to be compared ({{currentTimeMillis()}} and {{lastTime}}) are taken > inside the runnable execution itself. So, depending on relative execution > times, it could happen that the difference is less than the given period > (e.g. 1 ms), resulting in a skipped execution. -- This message was sent by Atlassian Jira (v8.3.4#803005)