[ https://issues.apache.org/jira/browse/STORM-3501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Diogo Monteiro updated STORM-3501: ---------------------------------- Description: I was trying to launch a topology that I'm developing (in 2.0.0) and noticed that the worker was getting restarted each ~30 seconds. I placed a breakpoint in the _kill_ method of _LocalContainer_ ([https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/daemon/supervisor/LocalContainer.java#L66]) to try and understand why the worker was getting restarted. The call stack was: {{kill:66, LocalContainer (org.apache.storm.daemon.supervisor)}}{{killContainerFor:269, Slot (org.apache.storm.daemon.supervisor) }}\{{handleRunning:724, Slot (org.apache.storm.daemon.supervisor) }}\{{stateMachineStep:218, Slot (org.apache.storm.daemon.supervisor) }}{{run:931, Slot (org.apache.storm.daemon.supervisor) }} With this I can understand that the worker is killed because a blob has changed ([https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/daemon/supervisor/Slot.java#L724]). In fact, there's a changing blob in the _dynamicState_ at that point. I checked the _AsyncLocalizer_ which downloads, caches blobs locally, and notifies the Slot state machine of a changing blob. I noticed this: * [https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/localizer/AsyncLocalizer.java#L339] * [https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/localizer/AsyncLocalizer.java#L265] * [https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/localizer/LocallyCachedTopologyBlob.java#L142] * [https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/localizer/LocallyCachedTopologyBlob.java#L192] Which tell me that (correct me if I'm wrong): * Supervisor tries to update blobs each 30 seconds. * The topology jar blob requires extraction of the resources directory (either from a jar or directly in a classpath URL). It does so in _fetchUnzipToTemp_ and it's existence is checked in _isFullyDownloaded_. * The Slot is notified of a changing blob if: * the remote version is different from the local version (the code has changed). * OR the blob is not fully downloaded (the jar exists, and the extracted resources directory exists). Well, I did not have a resources folder under the root of the classpath, and that's why the worker was being restarted each ~30 seconds, as the Slot was being notified of a changing blob everytime _updateBlobs_ ran. I created a resources folder (with dummy files) under the root of the classpath and the problem is now solved. However, if I understand correctly, the resources folder is only required for _multilang_. Our topologies do not use _multilang_ and this do not happen in Storm 1.1.3 for instance. Happy to submit MR. was: I was trying to launch a topology that I'm developing (in 2.0.0) and noticed that the worker was getting restarted each ~30 seconds. I placed a breakpoint in the _kill_ method of _LocalContainer_ ([https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/daemon/supervisor/LocalContainer.java#L66]) to try and understand why the worker was getting restarted. The call stack was: {{kill:66, LocalContainer (org.apache.storm.daemon.supervisor)}}{{killContainerFor:269, Slot (org.apache.storm.daemon.supervisor) }}\{{handleRunning:724, Slot (org.apache.storm.daemon.supervisor) }}\{{stateMachineStep:218, Slot (org.apache.storm.daemon.supervisor) }}{{run:931, Slot (org.apache.storm.daemon.supervisor) }} With this I can understand that the worker is killed because a blob has changed ([https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/daemon/supervisor/Slot.java#L724]). In fact, there's a changing blob in the _dynamicState_ at that point. I checked the _AsyncLocalizer_ which downloads, caches blobs locally, and notifies the Slot state machine of a changing blob. I noticed this: * [https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/localizer/AsyncLocalizer.java#L339] * [https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/localizer/AsyncLocalizer.java#L265] * [https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/localizer/LocallyCachedTopologyBlob.java#L142] * [https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/localizer/LocallyCachedTopologyBlob.java#L192] Which tell me that (correct me if I'm wrong): * Supervisor tries to update blobs each 30 seconds. * The topology jar blob requires extraction of the resources directory (either from a jar or directly in a classpath URL). It does so in _fetchUnzipToTemp_ and it's existence is checked in _isFullyDownloaded_. * The Slot is notified of a changing blob if: * the remote version is different from the local version (the code has changed). * OR the blob is not fully downloaded (the jar exists, and the extracted resources directory exists). Well, I did not have a resources folder under the root of the classpath, and that's why the worker was being restarted each ~30 seconds, as the Slot was being notified of a changing blob everytime _updateBlobs_ ran. I created a resources folder (with dummy files) under the root of the classpath and the problem is now solved. However, if I understand correctly, the resources folder is only required for _multilang_. Our topologies do not use _multilang_ and this do not happen in Storm 1.1.3 for instance. Happy to submit MR. > Local Cluster worker restarts > ----------------------------- > > Key: STORM-3501 > URL: https://issues.apache.org/jira/browse/STORM-3501 > Project: Apache Storm > Issue Type: Bug > Components: storm-server > Affects Versions: 2.0.0, 2.1.0 > Environment: Linux > Reporter: Diogo Monteiro > Priority: Minor > > I was trying to launch a topology that I'm developing (in 2.0.0) and noticed > that the worker was getting restarted each ~30 seconds. > I placed a breakpoint in the _kill_ method of _LocalContainer_ > ([https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/daemon/supervisor/LocalContainer.java#L66]) > to try and understand why the worker was getting restarted. > > The call stack was: > {{kill:66, LocalContainer > (org.apache.storm.daemon.supervisor)}}{{killContainerFor:269, Slot > (org.apache.storm.daemon.supervisor) }}\{{handleRunning:724, Slot > (org.apache.storm.daemon.supervisor) }}\{{stateMachineStep:218, Slot > (org.apache.storm.daemon.supervisor) }}{{run:931, Slot > (org.apache.storm.daemon.supervisor) }} > > With this I can understand that the worker is killed because a blob has > changed > ([https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/daemon/supervisor/Slot.java#L724]). > In fact, there's a changing blob in the _dynamicState_ at that point. > > I checked the _AsyncLocalizer_ which downloads, caches blobs locally, and > notifies the Slot state machine of a changing blob. > > I noticed this: > * > [https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/localizer/AsyncLocalizer.java#L339] > * > [https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/localizer/AsyncLocalizer.java#L265] > * > [https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/localizer/LocallyCachedTopologyBlob.java#L142] > * > [https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/localizer/LocallyCachedTopologyBlob.java#L192] > > Which tell me that (correct me if I'm wrong): > * Supervisor tries to update blobs each 30 seconds. > * The topology jar blob requires extraction of the resources directory > (either from a jar or directly in a classpath URL). It does so in > _fetchUnzipToTemp_ and it's existence is checked in _isFullyDownloaded_. > * The Slot is notified of a changing blob if: > * the remote version is different from the local version (the code has > changed). > * OR the blob is not fully downloaded (the jar exists, and the extracted > resources directory exists). > > Well, I did not have a resources folder under the root of the classpath, and > that's why the worker was being restarted each ~30 seconds, as the Slot was > being notified of a changing blob everytime _updateBlobs_ ran. > I created a resources folder (with dummy files) under the root of the > classpath and the problem is now solved. > > However, if I understand correctly, the resources folder is only required > for _multilang_. Our topologies do not use _multilang_ and this do not happen > in Storm 1.1.3 for instance. > > Happy to submit MR. -- This message was sent by Atlassian Jira (v8.3.2#803003)