[ 
https://issues.apache.org/jira/browse/STORM-388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rick Kellogg updated STORM-388:
-------------------------------
    Component/s: storm-core

> make supervisor more resilient to missing .ser files
> ----------------------------------------------------
>
>                 Key: STORM-388
>                 URL: https://issues.apache.org/jira/browse/STORM-388
>             Project: Apache Storm
>          Issue Type: Bug
>          Components: storm-core
>    Affects Versions: 0.9.2-incubating
>            Reporter: Radim Kolar
>              Labels: supervisor
>
> Currently supervisor process can not run without some kind of supervisor 
> software like systemd. It exits too often on missing .ser file error with 
> [INFO] Halting process
> examples:
> a)
> 2014-07-03 20:32:53 b.s.d.supervisor [INFO] Shutting down and clearing state 
> for
>  id efd37b78-eb69-46a1-b317-9b5b4ba00584. Current supervisor time: 
> 1404412373. S
> tate: :timed-out, Heartbeat: 
> #backtype.storm.daemon.common.WorkerHeartbeat{:time
> -secs 1404412311, :storm-id "Storm-throughput-test-7-1404411531", :executors 
> #{[
> 2 2] [4 4] [6 6] [-1 -1]}, :port 6702}
> 2014-07-03 20:32:53 b.s.d.supervisor [INFO] Shutting down 
> 55f2b426-c170-4e48-a76
> 8-2a82c0f383ce:efd37b78-eb69-46a1-b317-9b5b4ba00584
> 2014-07-03 20:32:54 b.s.d.supervisor [INFO] Removing code for storm id 
> Storm-thr
> oughput-test-7-1404411531
> 2014-07-03 20:32:55 b.s.d.supervisor [INFO] Shut down 
> 55f2b426-c170-4e48-a768-2a
> 82c0f383ce:efd37b78-eb69-46a1-b317-9b5b4ba00584
> 2014-07-03 20:32:55 b.s.d.supervisor [INFO] Launching worker with assignment 
> #ba
> cktype.storm.daemon.supervisor.LocalAssignment{:storm-id 
> "Storm-throughput-test-
> 7-1404411531", :executors ([6 6] [4 4] [2 2])} for this supervisor 
> 55f2b426-c170
> -4e48-a768-2a82c0f383ce on port 6702 with id 
> 6518a348-1fea-4401-8b7b-365b4ac3627
> 9
> 2014-07-03 20:32:55 b.s.event [ERROR] Error when processing event
> java.io.FileNotFoundException: File 
> 'storm-local/supervisor/stormdist/Storm-thro
> ughput-test-7-1404411531/stormconf.ser' does not exist
> b)
> 2014-07-03 20:32:43 o.a.z.ClientCnxn [INFO] Socket connection established to 
> localhost/127.0.0.1:2181, initiating session
> 2014-07-03 20:32:51 o.a.z.ClientCnxn [INFO] Unable to reconnect to ZooKeeper 
> service, session 0x146fb27b8400027 has expired, closing socket connection
> 2014-07-03 20:32:51 o.a.c.f.s.ConnectionStateManager [INFO] State change: LOST
> 8d-1069-44e3-b3ca-c25390cbf719
> 2014-07-03 10:29:22 b.s.d.supervisor [INFO] Removing code for storm id 
> Storm-throughput-test-1-140433
> 5149
> 2014-07-03 10:29:22 b.s.d.supervisor [INFO] Shut down 
> 167cf900-2ec6-499b-9c09-12c1e48dbc08:f776588d-1
> 069-44e3-b3ca-c25390cbf719
> 2014-07-03 10:29:22 b.s.d.supervisor [INFO] Launching worker with assignment 
> #backtype.storm.daemon.s
> upervisor.LocalAssignment{:storm-id "Storm-throughput-test-1-1404335149", 
> :executors ([3 3] [5 5] [4 
> 4] [2 2] [1 1])} for this supervisor 167cf900-2ec6-499b-9c09-12c1e48dbc08 on 
> port 6702 with id 1dd28a
> 8e-53cd-4af3-a4ae-7ebae0b9427f
> 2014-07-03 10:29:22 b.s.event [ERROR] Error when processing event
> java.io.FileNotFoundException: File 
> 'storm-local/supervisor/stormdist/Storm-throughput-test-1-1404335
> 149/stormconf.ser' does not exist
> in both cases there were problems with zookeeper connection event failure 
> before missing .ser file error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to