[ https://issues.apache.org/jira/browse/STORM-388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rick Kellogg updated STORM-388: ------------------------------- Component/s: storm-core > make supervisor more resilient to missing .ser files > ---------------------------------------------------- > > Key: STORM-388 > URL: https://issues.apache.org/jira/browse/STORM-388 > Project: Apache Storm > Issue Type: Bug > Components: storm-core > Affects Versions: 0.9.2-incubating > Reporter: Radim Kolar > Labels: supervisor > > Currently supervisor process can not run without some kind of supervisor > software like systemd. It exits too often on missing .ser file error with > [INFO] Halting process > examples: > a) > 2014-07-03 20:32:53 b.s.d.supervisor [INFO] Shutting down and clearing state > for > id efd37b78-eb69-46a1-b317-9b5b4ba00584. Current supervisor time: > 1404412373. S > tate: :timed-out, Heartbeat: > #backtype.storm.daemon.common.WorkerHeartbeat{:time > -secs 1404412311, :storm-id "Storm-throughput-test-7-1404411531", :executors > #{[ > 2 2] [4 4] [6 6] [-1 -1]}, :port 6702} > 2014-07-03 20:32:53 b.s.d.supervisor [INFO] Shutting down > 55f2b426-c170-4e48-a76 > 8-2a82c0f383ce:efd37b78-eb69-46a1-b317-9b5b4ba00584 > 2014-07-03 20:32:54 b.s.d.supervisor [INFO] Removing code for storm id > Storm-thr > oughput-test-7-1404411531 > 2014-07-03 20:32:55 b.s.d.supervisor [INFO] Shut down > 55f2b426-c170-4e48-a768-2a > 82c0f383ce:efd37b78-eb69-46a1-b317-9b5b4ba00584 > 2014-07-03 20:32:55 b.s.d.supervisor [INFO] Launching worker with assignment > #ba > cktype.storm.daemon.supervisor.LocalAssignment{:storm-id > "Storm-throughput-test- > 7-1404411531", :executors ([6 6] [4 4] [2 2])} for this supervisor > 55f2b426-c170 > -4e48-a768-2a82c0f383ce on port 6702 with id > 6518a348-1fea-4401-8b7b-365b4ac3627 > 9 > 2014-07-03 20:32:55 b.s.event [ERROR] Error when processing event > java.io.FileNotFoundException: File > 'storm-local/supervisor/stormdist/Storm-thro > ughput-test-7-1404411531/stormconf.ser' does not exist > b) > 2014-07-03 20:32:43 o.a.z.ClientCnxn [INFO] Socket connection established to > localhost/127.0.0.1:2181, initiating session > 2014-07-03 20:32:51 o.a.z.ClientCnxn [INFO] Unable to reconnect to ZooKeeper > service, session 0x146fb27b8400027 has expired, closing socket connection > 2014-07-03 20:32:51 o.a.c.f.s.ConnectionStateManager [INFO] State change: LOST > 8d-1069-44e3-b3ca-c25390cbf719 > 2014-07-03 10:29:22 b.s.d.supervisor [INFO] Removing code for storm id > Storm-throughput-test-1-140433 > 5149 > 2014-07-03 10:29:22 b.s.d.supervisor [INFO] Shut down > 167cf900-2ec6-499b-9c09-12c1e48dbc08:f776588d-1 > 069-44e3-b3ca-c25390cbf719 > 2014-07-03 10:29:22 b.s.d.supervisor [INFO] Launching worker with assignment > #backtype.storm.daemon.s > upervisor.LocalAssignment{:storm-id "Storm-throughput-test-1-1404335149", > :executors ([3 3] [5 5] [4 > 4] [2 2] [1 1])} for this supervisor 167cf900-2ec6-499b-9c09-12c1e48dbc08 on > port 6702 with id 1dd28a > 8e-53cd-4af3-a4ae-7ebae0b9427f > 2014-07-03 10:29:22 b.s.event [ERROR] Error when processing event > java.io.FileNotFoundException: File > 'storm-local/supervisor/stormdist/Storm-throughput-test-1-1404335 > 149/stormconf.ser' does not exist > in both cases there were problems with zookeeper connection event failure > before missing .ser file error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)