[ https://issues.apache.org/jira/browse/STORM-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15352992#comment-15352992 ]
Jungtaek Lim commented on STORM-1933: ------------------------------------- If there's race condition, sync-processes recognizes disallowed worker and removes directories for that worker but sync-supervisor can recreate worker's heartbeat directory just after sync-processes removes worker root. sync-processes : shutting down worker sync-processes : RMR heartbeat directory of worker sync-supervisor : sync supervisor called sync-processes : RMR pids directory sync-supervisor : write new assignment sync-supervisor : read workers directory to obtain worker list (in kill-existing-workers-with-change-in-components) sync-processes : RMR root directory (late!) sync-processes : remove worker-user sync-processes : read worker heartbeat by creating LocalState which refers heartbeat directory. NOTE: it creates VersionedStore which creates "directory". In the next run of sync-processes, sync-processes will read workers directory to obtain worker list, and since heartbeat directory is created, worker will be recognized as "not-started". > Intermittent test failure on test-multiple-active-storms-multiple-supervisors > for supervisor-test > -------------------------------------------------------------------------------------------------- > > Key: STORM-1933 > URL: https://issues.apache.org/jira/browse/STORM-1933 > Project: Apache Storm > Issue Type: Sub-task > Components: storm-core > Affects Versions: 1.0.0, 2.0.0, 1.0.1 > Reporter: Jungtaek Lim > Assignee: Jungtaek Lim > Attachments: > only-thread-1362-and-1363-BUG-60850-intermittent-failure-supervisor-test.txt > > > test-multiple-active-storms-multiple-supervisors is failing with fairly high > chance. I've run unit test of 1.x branch 3 times and met this issue, and > users report FileNotFound issue on supervisor which seems to be related to > this. > I have log file so I'll attach once issue is created. -- This message was sent by Atlassian JIRA (v6.3.4#6332)