[ 
https://issues.apache.org/jira/browse/STORM-307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14196077#comment-14196077
 ] 

Sean Zhong edited comment on STORM-307 at 11/4/14 1:13 PM:
-----------------------------------------------------------

committed to trunk.
Thanks, wurstmeister and Damien Raude-Morvan


was (Author: clockfly):
committed to trunk.
Thanks, wurstmeister

> After host crash, supervisor is unable to restart itself
> --------------------------------------------------------
>
>                 Key: STORM-307
>                 URL: https://issues.apache.org/jira/browse/STORM-307
>             Project: Apache Storm
>          Issue Type: Bug
>    Affects Versions: 0.9.1-incubating
>         Environment: Debian Linux Wheezy
> Zookeeper 3.3.3
> Java 1.7.0_25
>            Reporter: Damien Raude-Morvan
>             Fix For: 0.9.3-rc2
>
>         Attachments: supeof.tar.bz2
>
>
> Hi,
> I've observed [multiple times|#links] that supervisor state de-serialisation 
> after host crash or reboot can fail. Supervisor is then unable to come up 
> without manual intervention. AFAICT, it seems that serialized supervisor 
> state if invalid and coun't be read at next start.
> Observed error in supervisor log :
> {noformat}
> 2014-04-29 19:38:35 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting
> 2014-04-29 19:38:35 o.a.z.ZooKeeper [INFO] Initiating client connection, 
> connectString=127.0.0.1:2181/storm sessionTimeout=20000 
> watcher=com.netflix.curator.ConnectionState@18d055e0
> 2014-04-29 19:38:35 o.a.z.ClientCnxn [INFO] Opening socket connection to 
> server /127.0.0.1:2181
> 2014-04-29 19:38:35 o.a.z.ClientCnxn [INFO] Socket connection established to 
> localhost/127.0.0.1:2181, initiating session
> 2014-04-29 19:38:35 o.a.z.ClientCnxn [INFO] Session establishment complete on 
> server localhost/127.0.0.1:2181, sessionid = 0x145a7cc1c7e48b1, negotiated 
> timeout = 20000
> 2014-04-29 19:38:35 b.s.d.supervisor [INFO] Starting supervisor with id 
> 71b01216-9d00-4fb6-8538-6673058ab5ef at host storm
> 2014-04-29 19:38:36 b.s.event [ERROR] Error when processing event
> java.lang.RuntimeException: java.io.EOFException
>         at backtype.storm.utils.Utils.deserialize(Utils.java:86) 
> ~[storm-core-0.9.1-incubating.jar:0.9.1-incubating]
>         at backtype.storm.utils.LocalState.snapshot(LocalState.java:45) 
> ~[storm-core-0.9.1-incubating.jar:0.9.1-incubating]
>         at backtype.storm.utils.LocalState.get(LocalState.java:56) 
> ~[storm-core-0.9.1-incubating.jar:0.9.1-incubating]
>         at 
> backtype.storm.daemon.supervisor$sync_processes.invoke(supervisor.clj:207) 
> ~[storm-core-0.9.1-incubating.jar:0.9.1-incubating]
>         at clojure.lang.AFn.applyToHelper(AFn.java:161) 
> ~[clojure-1.4.0.jar:na]
>         at clojure.lang.AFn.applyTo(AFn.java:151) ~[clojure-1.4.0.jar:na]
>         at clojure.core$apply.invoke(core.clj:603) ~[clojure-1.4.0.jar:na]
>         at clojure.core$partial$fn__4070.doInvoke(core.clj:2343) 
> ~[clojure-1.4.0.jar:na]
>         at clojure.lang.RestFn.invoke(RestFn.java:397) ~[clojure-1.4.0.jar:na]
>         at backtype.storm.event$event_manager$fn__2593.invoke(event.clj:39) 
> ~[na:na]
>         at clojure.lang.AFn.run(AFn.java:24) ~[clojure-1.4.0.jar:na]
>         at java.lang.Thread.run(Thread.java:724) ~[na:1.7.0_25]
> Caused by: java.io.EOFException: null
>         at 
> java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2323)
>  ~[na:1.7.0_25]
>         at 
> java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2792)
>  ~[na:1.7.0_25]
>         at 
> java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:799) 
> ~[na:1.7.0_25]
>         at java.io.ObjectInputStream.<init>(ObjectInputStream.java:299) 
> ~[na:1.7.0_25]
>         at backtype.storm.utils.Utils.deserialize(Utils.java:81) 
> ~[storm-core-0.9.1-incubating.jar:0.9.1-incubating]
>         ... 11 common frames omitted
> 2014-04-29 19:38:36 b.s.util [INFO] Halting process: ("Error when processing 
> an event")
> {noformat}
> Current workaround : full stop supervisor daemon and delete all Storm's 
> data/supervisor directory helped, and after restarting Supervisor is now 
> running smoothly. 
> {anchor:links} Here is some references of very similar issues :
> * 
> http://mail-archives.apache.org/mod_mbox/storm-user/201402.mbox/%3c23100d14e7ac4cef947f7236ef896...@by2pr08mb144.namprd08.prod.outlook.com%3E
> * https://groups.google.com/forum/#!topic/storm-user/SL9FK9XeoI8
> * https://groups.google.com/forum/#!topic/storm-user/2gapTYTRrX8
> Regards,



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to