[ 
https://issues.apache.org/jira/browse/STORM-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15352992#comment-15352992
 ] 

Jungtaek Lim commented on STORM-1933:
-------------------------------------

If there's race condition, sync-processes recognizes disallowed worker and 
removes directories for that worker but sync-supervisor can recreate worker's 
heartbeat directory just after sync-processes removes worker root.

sync-processes : shutting down worker
sync-processes : RMR heartbeat directory of worker
sync-supervisor : sync supervisor called
sync-processes : RMR pids directory
sync-supervisor : write new assignment
sync-supervisor : read workers directory to obtain worker list (in 
kill-existing-workers-with-change-in-components)
sync-processes : RMR root directory (late!)
sync-processes : remove worker-user
sync-processes : read worker heartbeat by creating LocalState which refers 
heartbeat directory. NOTE: it creates VersionedStore which creates "directory".

In the next run of sync-processes, sync-processes will read workers directory 
to obtain worker list, and since heartbeat directory is created, worker will be 
recognized as "not-started".

> Intermittent test failure on test-multiple-active-storms-multiple-supervisors 
> for supervisor-test 
> --------------------------------------------------------------------------------------------------
>
>                 Key: STORM-1933
>                 URL: https://issues.apache.org/jira/browse/STORM-1933
>             Project: Apache Storm
>          Issue Type: Sub-task
>          Components: storm-core
>    Affects Versions: 1.0.0, 2.0.0, 1.0.1
>            Reporter: Jungtaek Lim
>            Assignee: Jungtaek Lim
>         Attachments: 
> only-thread-1362-and-1363-BUG-60850-intermittent-failure-supervisor-test.txt
>
>
> test-multiple-active-storms-multiple-supervisors is failing with fairly high 
> chance. I've run unit test of 1.x branch 3 times and met this issue, and 
> users report FileNotFound issue on supervisor which seems to be related to 
> this.
> I have log file so I'll attach once issue is created.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to