[ https://issues.apache.org/jira/browse/STORM-1155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15015233#comment-15015233 ]
ASF GitHub Bot commented on STORM-1155: --------------------------------------- Github user hustfxj commented on a diff in the pull request: https://github.com/apache/storm/pull/849#discussion_r45436295 --- Diff: storm-core/src/clj/backtype/storm/command/healthcheck.clj --- @@ -0,0 +1,88 @@ +;; Licensed to the Apache Software Foundation (ASF) under one +;; or more contributor license agreements. See the NOTICE file +;; distributed with this work for additional information +;; regarding copyright ownership. The ASF licenses this file +;; to you under the Apache License, Version 2.0 (the +;; "License"); you may not use this file except in compliance +;; with the License. You may obtain a copy of the License at +;; +;; http://www.apache.org/licenses/LICENSE-2.0 +;; +;; Unless required by applicable law or agreed to in writing, software +;; distributed under the License is distributed on an "AS IS" BASIS, +;; WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +;; See the License for the specific language governing permissions and +;; limitations under the License. +(ns backtype.storm.command.healthcheck + (:require [backtype.storm + [config :refer :all] + [log :refer :all]] + [clojure.java [io :as io]] + [clojure [string :refer [split]]]) + (:gen-class)) + +(defn interrupter + "Interrupt a given thread after ms milliseconds." + [thread ms] + (let [interrupter (Thread. + (fn [] + (try + (Thread/sleep ms) + (.interrupt thread) + (catch InterruptedException e))))] + (.start interrupter) + interrupter)) + +(defn check-output [lines] + (if (some #(.startsWith % "ERROR") lines) + :failed + :success)) + +(defn process-script [conf script] + (let [script-proc (. (Runtime/getRuntime) (exec script)) + curthread (Thread/currentThread) + interrupter-thread (interrupter curthread + (conf STORM-HEALTH-CHECK-TIMEOUT-MS))] + (try + (.waitFor script-proc) + (.interrupt interrupter-thread) --- End diff -- @revans2 If script-proc is blocked,then throw InterruptedException and println "Script" script "timed out.".But the script-proc isn't really stop.Like that: admin 12755 1 0 12:49 pts/0 00:00:00 /bin/sh /home/admin/test/healthCheck.sh admin 12978 1 0 12:50 pts/0 00:00:00 /bin/sh /home/admin/test/healthCheck.sh admin 13228 1 0 12:51 pts/0 00:00:00 /bin/sh /home/admin/test/healthCheck.sh admin 13504 1 0 12:52 pts/0 00:00:00 /bin/sh /home/admin/test/healthCheck.sh admin 13644 13465 0 12:52 pts/0 00:00:00 /bin/sh /home/admin/test/healthCheck.sh Maybe we can stop the process ? (defn interrupter + "Interrupt a given thread after ms milliseconds." + [script-proc ms] + (let [interrupter (Thread. + (fn [] + (try + (Thread/sleep ms) + (.destory script-proc) + (catch InterruptedException e))))] + (.start interrupter) + interrupter)) > Supervisor recurring health checks > ---------------------------------- > > Key: STORM-1155 > URL: https://issues.apache.org/jira/browse/STORM-1155 > Project: Apache Storm > Issue Type: Improvement > Components: storm-core > Reporter: Thomas Graves > Assignee: Thomas Graves > Fix For: 0.11.0 > > > Add the ability for the supervisor to call out to health check scripts to > allow some validation of the health of the node the supervisor is running on. > It could regularly run scripts in a directory provided by the cluster admin. > If any scripts fail, it should kill the workers and stop itself. > This could work very much like the Hadoop scripts and if ERROR is returned on > stdout it means the node has some issue and we should shut down. > If a non-zero exit code is returned it indicates that the scripts failed to > execute properly so you don't want to mark the node as unhealthy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)