[ 
https://issues.apache.org/jira/browse/STORM-695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14369278#comment-14369278
 ] 

ASF GitHub Bot commented on STORM-695:
--------------------------------------

Github user Lewuathe commented on the pull request:

    https://github.com/apache/storm/pull/456#issuecomment-83543346
  
    @revans2 @HeartSaVioR Thank you for reviewing. 
    We should not distinguish error types with exit code. So JVM return code 
informs us the occurrence of some exception, I'll update to return exit code 1 
when any exception has been occurred. 


> storm CLI tool reports zero exit code on error scenario, take 2
> ---------------------------------------------------------------
>
>                 Key: STORM-695
>                 URL: https://issues.apache.org/jira/browse/STORM-695
>             Project: Apache Storm
>          Issue Type: Bug
>    Affects Versions: 0.9.3
>            Reporter: Michael Noll
>            Assignee: Kai Sasaki
>            Priority: Minor
>
> Commands such as "storm kill non-existing-topology" will return an exit code 
> of zero, indicating success when in fact the command failed.
> h3. How to reproduce
> Here is but one example where the {{storm}} CLI tool violates shell best 
> practices:
> {code}
> # Let's kill a topology that is in fact not running in the cluster.
> $ storm kill i-do-not-exist-topo
> <snip>
> Exception in thread "main" NotAliveException(msg:i-do-not-exist-topo is not 
> alive)
> # Print the exit code of last command.
> $ echo $?
> 0  # <<< but since the kill command failed this should be non-zero!
> {code}
> Another example is the "storm jar" command.  If you attempt to submit a 
> topology that has the same name as an existing, running topology, the "storm 
> jar" command will not submit the topology -- instead it will print an 
> exception (think: "the topology FooName is already running"), which is ok, 
> but it will then exit with a return code of zero, which indicates success 
> (which is wrong).
> h3. Impact
> This bug prevents automated deployment tools such as Ansible or Puppet as 
> well as ad-hoc CLI scripting (think: fire-fighting Ops teams) to work 
> properly because Storm violates shell conventions by not returning non-zero 
> exit codes in case of failures.
> h3. How to fix
> From what I understand the solution is two-fold:
> # The various Storm commands that are being called by the {{storm}} script 
> must return proper exit codes.
> # The {{storm}} script must store these exit codes and return itself with the 
> respective exit code of the Storm command it actually ran.
> For example, here's the current code that implements the "storm kill" command:
> {code}
> # In: bin/storm
> def kill(*args):
>     """Syntax: [storm kill topology-name [-w wait-time-secs]]
>     Kills the topology with the name topology-name. Storm will 
>     first deactivate the topology's spouts for the duration of 
>     the topology's message timeout to allow all messages currently 
>     being processed to finish processing. Storm will then shutdown 
>     the workers and clean up their state. You can override the length 
>     of time Storm waits between deactivation and shutdown with the -w flag.
>     """
>     exec_storm_class(
>         "backtype.storm.command.kill_topology", 
>         args=args, 
>         jvmtype="-client", 
>         extrajars=[USER_CONF_DIR, STORM_BIN_DIR])
> {code}
> which in turn calls the following code in {{kill_topology.clj}}:
> {code}
> ;; In: backtype.storm.command.kill-topology
> (ns backtype.storm.command.kill-topology
>   (:use [clojure.tools.cli :only [cli]])
>   (:use [backtype.storm thrift config log])
>   (:import [backtype.storm.generated KillOptions])
>   (:gen-class))
> (defn -main [& args]
>   (let [[{wait :wait} [name] _] (cli args ["-w" "--wait" :default nil 
> :parse-fn #(Integer/parseInt %)])
>         opts (KillOptions.)]
>     (if wait (.set_wait_secs opts wait))
>     (with-configured-nimbus-connection nimbus
>       (.killTopologyWithOpts nimbus name opts)
>       (log-message "Killed topology: " name)
>       )))
> {code}
> which in turn calls the following code in {{nimbus.clj}}:
> {code}
> ;; In: backtype.storm.daemon.nimbus
>       (^void killTopologyWithOpts [this ^String storm-name ^KillOptions 
> options]
>         (check-storm-active! nimbus storm-name true)
>         (let [topology-conf (try-read-storm-conf-from-name conf storm-name 
> nimbus)]
>           (check-authorization! nimbus storm-name topology-conf 
> "killTopology"))
>         (let [wait-amt (if (.is_set_wait_secs options)
>                          (.get_wait_secs options)                         
>                          )]
>           (transition-name! nimbus storm-name [:kill wait-amt] true)
>           ))
> {code}
> As you can see the current implementation does not pass success/failure 
> information back to the caller.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to