[ https://issues.apache.org/jira/browse/STORM-307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14196077#comment-14196077 ]
Sean Zhong edited comment on STORM-307 at 11/4/14 1:13 PM: ----------------------------------------------------------- committed to trunk. Thanks, wurstmeister and Damien Raude-Morvan was (Author: clockfly): committed to trunk. Thanks, wurstmeister > After host crash, supervisor is unable to restart itself > -------------------------------------------------------- > > Key: STORM-307 > URL: https://issues.apache.org/jira/browse/STORM-307 > Project: Apache Storm > Issue Type: Bug > Affects Versions: 0.9.1-incubating > Environment: Debian Linux Wheezy > Zookeeper 3.3.3 > Java 1.7.0_25 > Reporter: Damien Raude-Morvan > Fix For: 0.9.3-rc2 > > Attachments: supeof.tar.bz2 > > > Hi, > I've observed [multiple times|#links] that supervisor state de-serialisation > after host crash or reboot can fail. Supervisor is then unable to come up > without manual intervention. AFAICT, it seems that serialized supervisor > state if invalid and coun't be read at next start. > Observed error in supervisor log : > {noformat} > 2014-04-29 19:38:35 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting > 2014-04-29 19:38:35 o.a.z.ZooKeeper [INFO] Initiating client connection, > connectString=127.0.0.1:2181/storm sessionTimeout=20000 > watcher=com.netflix.curator.ConnectionState@18d055e0 > 2014-04-29 19:38:35 o.a.z.ClientCnxn [INFO] Opening socket connection to > server /127.0.0.1:2181 > 2014-04-29 19:38:35 o.a.z.ClientCnxn [INFO] Socket connection established to > localhost/127.0.0.1:2181, initiating session > 2014-04-29 19:38:35 o.a.z.ClientCnxn [INFO] Session establishment complete on > server localhost/127.0.0.1:2181, sessionid = 0x145a7cc1c7e48b1, negotiated > timeout = 20000 > 2014-04-29 19:38:35 b.s.d.supervisor [INFO] Starting supervisor with id > 71b01216-9d00-4fb6-8538-6673058ab5ef at host storm > 2014-04-29 19:38:36 b.s.event [ERROR] Error when processing event > java.lang.RuntimeException: java.io.EOFException > at backtype.storm.utils.Utils.deserialize(Utils.java:86) > ~[storm-core-0.9.1-incubating.jar:0.9.1-incubating] > at backtype.storm.utils.LocalState.snapshot(LocalState.java:45) > ~[storm-core-0.9.1-incubating.jar:0.9.1-incubating] > at backtype.storm.utils.LocalState.get(LocalState.java:56) > ~[storm-core-0.9.1-incubating.jar:0.9.1-incubating] > at > backtype.storm.daemon.supervisor$sync_processes.invoke(supervisor.clj:207) > ~[storm-core-0.9.1-incubating.jar:0.9.1-incubating] > at clojure.lang.AFn.applyToHelper(AFn.java:161) > ~[clojure-1.4.0.jar:na] > at clojure.lang.AFn.applyTo(AFn.java:151) ~[clojure-1.4.0.jar:na] > at clojure.core$apply.invoke(core.clj:603) ~[clojure-1.4.0.jar:na] > at clojure.core$partial$fn__4070.doInvoke(core.clj:2343) > ~[clojure-1.4.0.jar:na] > at clojure.lang.RestFn.invoke(RestFn.java:397) ~[clojure-1.4.0.jar:na] > at backtype.storm.event$event_manager$fn__2593.invoke(event.clj:39) > ~[na:na] > at clojure.lang.AFn.run(AFn.java:24) ~[clojure-1.4.0.jar:na] > at java.lang.Thread.run(Thread.java:724) ~[na:1.7.0_25] > Caused by: java.io.EOFException: null > at > java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2323) > ~[na:1.7.0_25] > at > java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2792) > ~[na:1.7.0_25] > at > java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:799) > ~[na:1.7.0_25] > at java.io.ObjectInputStream.<init>(ObjectInputStream.java:299) > ~[na:1.7.0_25] > at backtype.storm.utils.Utils.deserialize(Utils.java:81) > ~[storm-core-0.9.1-incubating.jar:0.9.1-incubating] > ... 11 common frames omitted > 2014-04-29 19:38:36 b.s.util [INFO] Halting process: ("Error when processing > an event") > {noformat} > Current workaround : full stop supervisor daemon and delete all Storm's > data/supervisor directory helped, and after restarting Supervisor is now > running smoothly. > {anchor:links} Here is some references of very similar issues : > * > http://mail-archives.apache.org/mod_mbox/storm-user/201402.mbox/%3c23100d14e7ac4cef947f7236ef896...@by2pr08mb144.namprd08.prod.outlook.com%3E > * https://groups.google.com/forum/#!topic/storm-user/SL9FK9XeoI8 > * https://groups.google.com/forum/#!topic/storm-user/2gapTYTRrX8 > Regards, -- This message was sent by Atlassian JIRA (v6.3.4#6332)