[ https://issues.apache.org/jira/browse/STORM-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Joseph Evans updated STORM-2017: --------------------------------------- Assignee: Lasse Kiviluoto > ShellBolt stops reporting task ids > ---------------------------------- > > Key: STORM-2017 > URL: https://issues.apache.org/jira/browse/STORM-2017 > Project: Apache Storm > Issue Type: Bug > Components: storm-core > Affects Versions: 1.0.1, 1.0.3 > Reporter: Lasse Kiviluoto > Assignee: Lasse Kiviluoto > Time Spent: 0.5h > Remaining Estimate: 0h > > After running enough flow throw ShellBolt in some cases after tens of minutes > ShellBolt stopped reporting task ids. After this error condition no new task > ids where reported back. When acking of the tuples processed by the bolt > where set in callback related to arrival of the task ids all tuple trees > going through the bolt would fail after reporting stopped. ShellBolt will > continue to operate new tuples and respond to heartbeats. > After running some tests and making some changes to the code. I have > following hypothesis for the reason: > org.apache.storm.utils.ShellBoltMessageQueue has two queues one being for > taskIds and the other for bolt messages. > taskIds queue is implemented by LinkedList and bolt msg queue > LinkedBlockingQueue. Both of the queues are operated similarly. > One major difference between the structures is that LinkedList is not > synchronized. > In the code: > ShellBoltMessageQueue.java:58 add method is used without holding the lock. > Where as ShellBoltMessageQueue.java:110 uses the poll method with the lock. > As in ShellBolt BoltReaderRunnable and BoltWriterRunnable are run > concurrently this can lead to race condition. > If I move the ShellBoltMessageQueue.java:58 inside the lock and run the test > in similar fashion it seems to solve the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)