Github user revans2 commented on the issue: https://github.com/apache/storm/pull/1764 Just so we don't miss the comment from @jerrypeng > Couldn't a wrong ordering of events happen since we are locking when calculating a scheduling then unlocking and then locking and uploading the new scheduling and unlocking > for example: > T0: submit > T1: rebalance > T2: rebalance - calculate new scheduling > T3: submit - calculate new scheduling > T4: rebalance - upload new scheduling to zk > T5: submit - upload new scheduling to zk > > even though we should end up with the scheduling calculated by the rebalance but we end up with scheduling calculated from the original submit. Yes, that is correct. We should do something here, and he suggested that perhaps as part of a refactor of Nimbus we should look at support for long running scheduling. In the short term I think I might make scheduling and writing to ZK atomic, but long term I think I will file a JIRA to look at better scheduling.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---