James Xu created STORM-75:
-----------------------------

             Summary: Dead lock between ShellBolt and ShellProcess
                 Key: STORM-75
                 URL: https://issues.apache.org/jira/browse/STORM-75
             Project: Apache Storm (Incubating)
          Issue Type: Bug
            Reporter: James Xu
            Priority: Minor


https://github.com/nathanmarz/storm/issues/423

The ShellBolt creates shell process and read data from output stream and error 
stream. The current implementation only read error stream when the output 
stream is closed. So messages in error stream will be put into the buffer of 
error stream. When the buffer is fully filled, the output in shell process 
would be blocked waiting for the error stream buffer to become available. While 
in ShellBolt it will also block there wait for the output in output stream from 
shell process. So it's a dead lock.

This behavior seems dangerous as the issue can be hidden, it can hardly be seen 
in normal tests. And normally the error output won't be too big to fill up the 
error stream buffer, but after the system have been running for a while on 
production, the error stream can be accumulated to full, and then dead lock 
would happen. There's no any error in log, hard to debug.

Here in Yahoo we are using many native libraries which is built long time ago, 
which sometimes writes to error stream when there's some error. It's impossible 
for us to inspect all the direct and indirect native library dependencies and 
rebuild all to remove all the error stream writing.

Now we used a workaround to redirect error stream into /dev/null at the 
beginning of our shell process. But I think in long term it should be fixed in 
ShellBolt and ShellProcess.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Reply via email to