[ 
https://issues.apache.org/jira/browse/STORM-532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14359847#comment-14359847
 ] 

ASF GitHub Bot commented on STORM-532:
--------------------------------------

Github user caofangkun commented on a diff in the pull request:

    https://github.com/apache/storm/pull/296#discussion_r26362192
  
    --- Diff: storm-core/src/clj/backtype/storm/util.clj ---
    @@ -392,6 +392,15 @@
           (.addArgument command a))
         (.execute (DefaultExecutor.) command)))
     
    +(defn exists-process?
    +   [process-id]
    +   (let [line (if on-windows? (str "cmd /c \"tasklist /FI \"PID eq "  
process-id  "\" | findstr "  process-id  "\"" )
    +                              (str "ps -p "  process-id))]
    +        (try-cause
    +           (exec-command! line)
    --- End diff --
    
    I prefer to use /proc/ directly , 
see:https://github.com/caofangkun/apache-storm/commit/2b2d65095402217d0217272b65ac9e1925125152#diff-60bc01aeb7fe37b1dc8ba418ebd627c3R379
    but I am afraid  that  "/proc also does not exist on all unix variants"  as 
you have mentioned before.



> Supervisor should restart worker immediately, if the worker process does not 
> exist any more 
> --------------------------------------------------------------------------------------------
>
>                 Key: STORM-532
>                 URL: https://issues.apache.org/jira/browse/STORM-532
>             Project: Apache Storm
>          Issue Type: Improvement
>    Affects Versions: 0.10.0
>            Reporter: caofangkun
>            Assignee: caofangkun
>            Priority: Minor
>
> For now 
> if the worker process does not exist any more 
> Supervisor will have to wait a few seconds for worker heartbeart timeout and 
> restart worker .
> If supervisor knows the worker processid  and check if the process exists in 
> the sync-processes thread ,may need less time to restart worker.
> 1: record worker process id in the worker local heartbeart 
> 2: in supervisor  sync-processes ,get process id from worker local heartbeat 
> and check if the process exits 
> 3: if not restart it immediately



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to