[ https://issues.apache.org/jira/browse/DRILL-2502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris Westin updated DRILL-2502: -------------------------------- Summary: Improve Foreman code safety by providing a generic event delivery mechanism that supports queueing (was: Improve code safety by providing a generic event delivery mechanism that supports queueing) > Improve Foreman code safety by providing a generic event delivery mechanism > that supports queueing > -------------------------------------------------------------------------------------------------- > > Key: DRILL-2502 > URL: https://issues.apache.org/jira/browse/DRILL-2502 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow > Reporter: Chris Westin > Assignee: Jacques Nadeau > Fix For: 0.8.0 > > Attachments: DRILL-2502.1.patch.txt > > > While working on DRILL-2245, I ran into problems because > Foreman.moveToState() can be called recursively. Being synchronized doesn't > prevent the same thread from calling back into this. When this happens, state > transitions are either dropped or broken depending on when this happens. This > event really needs to be handled atomically, but when > queryContext.cancelExecutingFragments() is called, the completion of the > cancellations results in QueryManager.fragmentDone() being called, and it > indiscriminately sends the COMPLETED state, even though we may be in the > middle of handling a move to FAILED or CANCELLATION_REQUESTED. Unless the > target state is first recorded before any possible recursive call happens, > the recursive transition is recorded, and the second transition doesn't > result in any message being sent to the user -- the client hangs. > It's difficult to reason about when its safe for this kind of recursive call > to happen, and even more difficult to avoid inadvertently introducing these. > Problems happen when the delivery of an event to a listener causes cascaded > events that eventually deliver to the same listener. > In order to avoid this, we should create (and use) a generic class that can > receive such events. If nothing is happening when one is received, it is > processed immediately, in line. If an event is received when we're in the > middle of processing another event, the newly received one is queued. When > the current event processing is completed, we then go through any queued > events, and process them one by one. We continue in this way until there are > no more events left, then return. This mechanism would improve code safety by > preventing the delivery of events while in the middle of processing other > events by the same handler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)