Robert Joseph Evans created STORM-3024:
------------------------------------------

             Summary: Allow scheduling for RAS to happen in the background
                 Key: STORM-3024
                 URL: https://issues.apache.org/jira/browse/STORM-3024
             Project: Apache Storm
          Issue Type: New Feature
          Components: storm-server
    Affects Versions: 2.0.0
            Reporter: Robert Joseph Evans
            Assignee: Robert Joseph Evans


We have run into some issues recently where occasionally a strategy on a very 
large cluster will take an extra long amount of time finish scheduling.  This 
slowness cascades into other issues, like topologies not being able to be 
killed because the timer thread is still in use trying to run scheduling.

The plan is to make scheduling happen in a thread pool.  The main thread will 
wait for up to a configurable amount of time for the topology to be scheduled, 
but if it does not complete in that time it will be left to keep running in the 
background thread in hopes that later on it will be scheduled.

If for some reason the state of the cluster changes while scheduling is 
happening in the background we will cancel the scheduling, as any scheduling it 
produced may not be able to fit on the cluster.  The next time the scheduler 
runs it will restart the scheduling and hopefully allow the cluster to reach a 
steady state even if it takes a while, but without blocking kills and other 
critical operations from happening.

Note that we are also working on optimizing scheduling as well so that these 
issues don't happen in the first place.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to