[ https://issues.apache.org/jira/browse/STORM-695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kai Sasaki reassigned STORM-695: -------------------------------- Assignee: Kai Sasaki > storm CLI tool reports zero exit code on error scenario, take 2 > --------------------------------------------------------------- > > Key: STORM-695 > URL: https://issues.apache.org/jira/browse/STORM-695 > Project: Apache Storm > Issue Type: Bug > Affects Versions: 0.9.3 > Reporter: Michael Noll > Assignee: Kai Sasaki > Priority: Minor > > Commands such as "storm kill non-existing-topology" will return an exit code > of zero, indicating success when in fact the command failed. > h3. How to reproduce > Here is but one example where the {{storm}} CLI tool violates shell best > practices: > {code} > # Let's kill a topology that is in fact not running in the cluster. > $ storm kill i-do-not-exist-topo > <snip> > Exception in thread "main" NotAliveException(msg:i-do-not-exist-topo is not > alive) > # Print the exit code of last command. > $ echo $? > 0 # <<< but since the kill command failed this should be non-zero! > {code} > Another example is the "storm jar" command. If you attempt to submit a > topology that has the same name as an existing, running topology, the "storm > jar" command will not submit the topology -- instead it will print an > exception (think: "the topology FooName is already running"), which is ok, > but it will then exit with a return code of zero, which indicates success > (which is wrong). > h3. Impact > This bug prevents automated deployment tools such as Ansible or Puppet as > well as ad-hoc CLI scripting (think: fire-fighting Ops teams) to work > properly because Storm violates shell conventions by not returning non-zero > exit codes in case of failures. > h3. How to fix > From what I understand the solution is two-fold: > # The various Storm commands that are being called by the {{storm}} script > must return proper exit codes. > # The {{storm}} script must store these exit codes and return itself with the > respective exit code of the Storm command it actually ran. > For example, here's the current code that implements the "storm kill" command: > {code} > # In: bin/storm > def kill(*args): > """Syntax: [storm kill topology-name [-w wait-time-secs]] > Kills the topology with the name topology-name. Storm will > first deactivate the topology's spouts for the duration of > the topology's message timeout to allow all messages currently > being processed to finish processing. Storm will then shutdown > the workers and clean up their state. You can override the length > of time Storm waits between deactivation and shutdown with the -w flag. > """ > exec_storm_class( > "backtype.storm.command.kill_topology", > args=args, > jvmtype="-client", > extrajars=[USER_CONF_DIR, STORM_BIN_DIR]) > {code} > which in turn calls the following code in {{kill_topology.clj}}: > {code} > ;; In: backtype.storm.command.kill-topology > (ns backtype.storm.command.kill-topology > (:use [clojure.tools.cli :only [cli]]) > (:use [backtype.storm thrift config log]) > (:import [backtype.storm.generated KillOptions]) > (:gen-class)) > (defn -main [& args] > (let [[{wait :wait} [name] _] (cli args ["-w" "--wait" :default nil > :parse-fn #(Integer/parseInt %)]) > opts (KillOptions.)] > (if wait (.set_wait_secs opts wait)) > (with-configured-nimbus-connection nimbus > (.killTopologyWithOpts nimbus name opts) > (log-message "Killed topology: " name) > ))) > {code} > which in turn calls the following code in {{nimbus.clj}}: > {code} > ;; In: backtype.storm.daemon.nimbus > (^void killTopologyWithOpts [this ^String storm-name ^KillOptions > options] > (check-storm-active! nimbus storm-name true) > (let [topology-conf (try-read-storm-conf-from-name conf storm-name > nimbus)] > (check-authorization! nimbus storm-name topology-conf > "killTopology")) > (let [wait-amt (if (.is_set_wait_secs options) > (.get_wait_secs options) > )] > (transition-name! nimbus storm-name [:kill wait-amt] true) > )) > {code} > As you can see the current implementation does not pass success/failure > information back to the caller. -- This message was sent by Atlassian JIRA (v6.3.4#6332)