[ 
https://issues.apache.org/jira/browse/STORM-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans resolved STORM-2017.
----------------------------------------
       Resolution: Fixed
    Fix Version/s: 1.0.3
                   1.1.0
                   2.0.0

Thanks [~kluoto],

I merged this into master, 1.x-branch and 1.0.x-branch.  Keep up the good work.

> ShellBolt stops reporting task ids
> ----------------------------------
>
>                 Key: STORM-2017
>                 URL: https://issues.apache.org/jira/browse/STORM-2017
>             Project: Apache Storm
>          Issue Type: Bug
>          Components: storm-core
>    Affects Versions: 1.0.1, 1.0.3
>            Reporter: Lasse Kiviluoto
>            Assignee: Lasse Kiviluoto
>             Fix For: 2.0.0, 1.1.0, 1.0.3
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> After running enough flow throw ShellBolt in some cases after tens of minutes 
> ShellBolt stopped reporting task ids. After this error condition no new task 
> ids where reported back. When acking of the tuples processed by the bolt 
> where set in callback related to arrival of the task ids all tuple trees 
> going through the bolt would fail after reporting stopped. ShellBolt will 
> continue to operate new tuples and respond to heartbeats.
> After running some tests and making some changes to the code. I have 
> following hypothesis for the reason:
> org.apache.storm.utils.ShellBoltMessageQueue has two queues one being for 
> taskIds and the other for bolt messages.
> taskIds queue is implemented by LinkedList and bolt msg queue 
> LinkedBlockingQueue. Both of the queues are operated similarly.
> One major difference between the structures is that LinkedList is not 
> synchronized.
> In the code:
> ShellBoltMessageQueue.java:58 add method is used without holding the lock. 
> Where as ShellBoltMessageQueue.java:110 uses the poll method with the lock. 
> As in ShellBolt BoltReaderRunnable and BoltWriterRunnable are run 
> concurrently this can lead to race condition.
> If I move the ShellBoltMessageQueue.java:58 inside the lock and run the test 
> in similar fashion it seems to solve the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to