[ 
https://issues.apache.org/jira/browse/STORM-1155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15015233#comment-15015233
 ] 

ASF GitHub Bot commented on STORM-1155:
---------------------------------------

Github user hustfxj commented on a diff in the pull request:

    https://github.com/apache/storm/pull/849#discussion_r45436295
  
    --- Diff: storm-core/src/clj/backtype/storm/command/healthcheck.clj ---
    @@ -0,0 +1,88 @@
    +;; Licensed to the Apache Software Foundation (ASF) under one
    +;; or more contributor license agreements.  See the NOTICE file
    +;; distributed with this work for additional information
    +;; regarding copyright ownership.  The ASF licenses this file
    +;; to you under the Apache License, Version 2.0 (the
    +;; "License"); you may not use this file except in compliance
    +;; with the License.  You may obtain a copy of the License at
    +;;
    +;; http://www.apache.org/licenses/LICENSE-2.0
    +;;
    +;; Unless required by applicable law or agreed to in writing, software
    +;; distributed under the License is distributed on an "AS IS" BASIS,
    +;; WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +;; See the License for the specific language governing permissions and
    +;; limitations under the License.
    +(ns backtype.storm.command.healthcheck
    +  (:require [backtype.storm
    +             [config :refer :all]
    +             [log :refer :all]]
    +            [clojure.java [io :as io]]
    +            [clojure [string :refer [split]]])
    +  (:gen-class))
    +
    +(defn interrupter
    +  "Interrupt a given thread after ms milliseconds."
    +  [thread ms]
    +  (let [interrupter (Thread.
    +                     (fn []
    +                       (try
    +                         (Thread/sleep ms)
    +                         (.interrupt thread)
    +                         (catch InterruptedException e))))]
    +    (.start interrupter)
    +    interrupter))
    +
    +(defn check-output [lines]
    +  (if (some #(.startsWith % "ERROR") lines)
    +    :failed
    +    :success))
    +
    +(defn process-script [conf script]
    +  (let [script-proc (. (Runtime/getRuntime) (exec script))
    +        curthread (Thread/currentThread)
    +        interrupter-thread (interrupter curthread
    +                                        (conf 
STORM-HEALTH-CHECK-TIMEOUT-MS))]
    +    (try
    +      (.waitFor script-proc)
    +      (.interrupt interrupter-thread)
    --- End diff --
    
    @revans2  If script-proc is blocked,then throw InterruptedException and 
println "Script" script "timed out.".But the script-proc isn't really stop.Like 
that:
    admin    12755     1  0 12:49 pts/0    00:00:00 /bin/sh 
/home/admin/test/healthCheck.sh
    admin    12978     1  0 12:50 pts/0    00:00:00 /bin/sh 
/home/admin/test/healthCheck.sh
    admin    13228     1  0 12:51 pts/0    00:00:00 /bin/sh 
/home/admin/test/healthCheck.sh
    admin    13504     1  0 12:52 pts/0    00:00:00 /bin/sh 
/home/admin/test/healthCheck.sh
    admin    13644 13465  0 12:52 pts/0    00:00:00 /bin/sh 
/home/admin/test/healthCheck.sh
    
    Maybe we can stop the process ?
    
    (defn interrupter
    +  "Interrupt a given thread after ms milliseconds."
    +  [script-proc ms]
    +  (let [interrupter (Thread.
    +                     (fn []
    +                       (try
    +                         (Thread/sleep ms)
    +                         (.destory script-proc)
    +                         (catch InterruptedException e))))]
    +    (.start interrupter)
    +    interrupter))



> Supervisor recurring health checks
> ----------------------------------
>
>                 Key: STORM-1155
>                 URL: https://issues.apache.org/jira/browse/STORM-1155
>             Project: Apache Storm
>          Issue Type: Improvement
>          Components: storm-core
>            Reporter: Thomas Graves
>            Assignee: Thomas Graves
>             Fix For: 0.11.0
>
>
> Add the ability for the supervisor to call out to health check scripts to 
> allow some validation of the health of the node the supervisor is running on.
> It could regularly run scripts in a directory provided by the cluster admin. 
> If any scripts fail, it should kill the workers and stop itself.
> This could work very much like the Hadoop scripts and if ERROR is returned on 
> stdout it means the node has some issue and we should shut down.
> If a non-zero exit code is returned it indicates that the scripts failed to 
> execute properly so you don't want to mark the node as unhealthy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to