[ 
https://issues.apache.org/jira/browse/STORM-3501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Diogo Monteiro updated STORM-3501:
----------------------------------
    Description: 
I was trying to launch a topology that I'm developing (in 2.0.0) and noticed 
that the worker was getting restarted each ~30 seconds. 
 I placed a breakpoint in the _kill_ method of _LocalContainer_ 
([https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/daemon/supervisor/LocalContainer.java#L66])
 to try and understand why the worker was getting restarted. 
  
 The call stack was:
 {{kill:66, LocalContainer 
(org.apache.storm.daemon.supervisor)}}{{killContainerFor:269, Slot 
(org.apache.storm.daemon.supervisor) }}\{{handleRunning:724, Slot 
(org.apache.storm.daemon.supervisor) }}\{{stateMachineStep:218, Slot 
(org.apache.storm.daemon.supervisor) }}{{run:931, Slot 
(org.apache.storm.daemon.supervisor)  }}
  
 With this I can understand that the worker is killed because a blob has 
changed 
([https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/daemon/supervisor/Slot.java#L724]).
 In fact, there's a changing blob in the _dynamicState_ at that point.
  
 I checked the _AsyncLocalizer_ which downloads, caches blobs locally, and 
notifies the Slot state machine of a changing blob.
  
 I noticed this:
 * 
[https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/localizer/AsyncLocalizer.java#L339]
 * 
[https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/localizer/AsyncLocalizer.java#L265]
 * 
[https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/localizer/LocallyCachedTopologyBlob.java#L142]
 * 
[https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/localizer/LocallyCachedTopologyBlob.java#L192]
  

Which tell me that (correct me if I'm wrong):
 * Supervisor tries to update blobs each 30 seconds.
 * The topology jar blob requires extraction of the resources directory (either 
from a jar or directly in a classpath URL). It does so in _fetchUnzipToTemp_ 
and it's existence is checked in _isFullyDownloaded_.
 * The Slot is notified of a changing blob if:
 * the remote version is different from the local version (the code has 
changed).
 * OR the blob is not fully downloaded (the jar exists, and the extracted 
resources directory exists).

 
 Well, I did not have a resources folder under the root of the classpath, and 
that's why the worker was being restarted each ~30 seconds, as the Slot was 
being notified of a changing blob everytime _updateBlobs_ ran. 
 I created a resources folder (with dummy files) under the root of the 
classpath and the problem is now solved.
  
 However, if I understand correctly, the resources folder is only required for 
_multilang_. Our topologies do not use _multilang_ and this do not happen in 
Storm 1.1.3 for instance.

 

Happy to submit MR.

  was:
I was trying to launch a topology that I'm developing (in 2.0.0) and noticed 
that the worker was getting restarted each ~30 seconds. 
 I placed a breakpoint in the _kill_ method of _LocalContainer_ 
([https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/daemon/supervisor/LocalContainer.java#L66])
 to try and understand why the worker was getting restarted. 
  
 The call stack was:
 {{kill:66, LocalContainer 
(org.apache.storm.daemon.supervisor)}}{{killContainerFor:269, Slot 
(org.apache.storm.daemon.supervisor) }}\{{handleRunning:724, Slot 
(org.apache.storm.daemon.supervisor) }}\{{stateMachineStep:218, Slot 
(org.apache.storm.daemon.supervisor) }}{{run:931, Slot 
(org.apache.storm.daemon.supervisor)  }}
  
 With this I can understand that the worker is killed because a blob has 
changed 
([https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/daemon/supervisor/Slot.java#L724]).
 In fact, there's a changing blob in the _dynamicState_ at that point.
  
 I checked the _AsyncLocalizer_ which downloads, caches blobs locally, and 
notifies the Slot state machine of a changing blob.
  
 I noticed this:
 * 
[https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/localizer/AsyncLocalizer.java#L339]
 * 
[https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/localizer/AsyncLocalizer.java#L265]
 * 
[https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/localizer/LocallyCachedTopologyBlob.java#L142]
 * 
[https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/localizer/LocallyCachedTopologyBlob.java#L192]
  
 Which tell me that (correct me if I'm wrong):
 * Supervisor tries to update blobs each 30 seconds.
 * The topology jar blob requires extraction of the resources directory (either 
from a jar or directly in a classpath URL). It does so in _fetchUnzipToTemp_ 
and it's existence is checked in _isFullyDownloaded_.
 * The Slot is notified of a changing blob if:
 * the remote version is different from the local version (the code has 
changed).
 * OR the blob is not fully downloaded (the jar exists, and the extracted 
resources directory exists).

 
 Well, I did not have a resources folder under the root of the classpath, and 
that's why the worker was being restarted each ~30 seconds, as the Slot was 
being notified of a changing blob everytime _updateBlobs_ ran. 
 I created a resources folder (with dummy files) under the root of the 
classpath and the problem is now solved.
  
 However, if I understand correctly, the resources folder is only required for 
_multilang_. Our topologies do not use _multilang_ and this do not happen in 
Storm 1.1.3 for instance.

 

Happy to submit MR.


> Local Cluster worker restarts
> -----------------------------
>
>                 Key: STORM-3501
>                 URL: https://issues.apache.org/jira/browse/STORM-3501
>             Project: Apache Storm
>          Issue Type: Bug
>          Components: storm-server
>    Affects Versions: 2.0.0, 2.1.0
>         Environment: Linux
>            Reporter: Diogo Monteiro
>            Priority: Minor
>
> I was trying to launch a topology that I'm developing (in 2.0.0) and noticed 
> that the worker was getting restarted each ~30 seconds. 
>  I placed a breakpoint in the _kill_ method of _LocalContainer_ 
> ([https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/daemon/supervisor/LocalContainer.java#L66])
>  to try and understand why the worker was getting restarted. 
>   
>  The call stack was:
>  {{kill:66, LocalContainer 
> (org.apache.storm.daemon.supervisor)}}{{killContainerFor:269, Slot 
> (org.apache.storm.daemon.supervisor) }}\{{handleRunning:724, Slot 
> (org.apache.storm.daemon.supervisor) }}\{{stateMachineStep:218, Slot 
> (org.apache.storm.daemon.supervisor) }}{{run:931, Slot 
> (org.apache.storm.daemon.supervisor)  }}
>   
>  With this I can understand that the worker is killed because a blob has 
> changed 
> ([https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/daemon/supervisor/Slot.java#L724]).
>  In fact, there's a changing blob in the _dynamicState_ at that point.
>   
>  I checked the _AsyncLocalizer_ which downloads, caches blobs locally, and 
> notifies the Slot state machine of a changing blob.
>   
>  I noticed this:
>  * 
> [https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/localizer/AsyncLocalizer.java#L339]
>  * 
> [https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/localizer/AsyncLocalizer.java#L265]
>  * 
> [https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/localizer/LocallyCachedTopologyBlob.java#L142]
>  * 
> [https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/localizer/LocallyCachedTopologyBlob.java#L192]
>   
> Which tell me that (correct me if I'm wrong):
>  * Supervisor tries to update blobs each 30 seconds.
>  * The topology jar blob requires extraction of the resources directory 
> (either from a jar or directly in a classpath URL). It does so in 
> _fetchUnzipToTemp_ and it's existence is checked in _isFullyDownloaded_.
>  * The Slot is notified of a changing blob if:
>  * the remote version is different from the local version (the code has 
> changed).
>  * OR the blob is not fully downloaded (the jar exists, and the extracted 
> resources directory exists).
>  
>  Well, I did not have a resources folder under the root of the classpath, and 
> that's why the worker was being restarted each ~30 seconds, as the Slot was 
> being notified of a changing blob everytime _updateBlobs_ ran. 
>  I created a resources folder (with dummy files) under the root of the 
> classpath and the problem is now solved.
>   
>  However, if I understand correctly, the resources folder is only required 
> for _multilang_. Our topologies do not use _multilang_ and this do not happen 
> in Storm 1.1.3 for instance.
>  
> Happy to submit MR.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to