James Xu created STORM-75:
-----------------------------
Summary: Dead lock between ShellBolt and ShellProcess
Key: STORM-75
URL: https://issues.apache.org/jira/browse/STORM-75
Project: Apache Storm (Incubating)
Issue Type: Bug
Reporter: James Xu
Priority: Minor
https://github.com/nathanmarz/storm/issues/423
The ShellBolt creates shell process and read data from output stream and error
stream. The current implementation only read error stream when the output
stream is closed. So messages in error stream will be put into the buffer of
error stream. When the buffer is fully filled, the output in shell process
would be blocked waiting for the error stream buffer to become available. While
in ShellBolt it will also block there wait for the output in output stream from
shell process. So it's a dead lock.
This behavior seems dangerous as the issue can be hidden, it can hardly be seen
in normal tests. And normally the error output won't be too big to fill up the
error stream buffer, but after the system have been running for a while on
production, the error stream can be accumulated to full, and then dead lock
would happen. There's no any error in log, hard to debug.
Here in Yahoo we are using many native libraries which is built long time ago,
which sometimes writes to error stream when there's some error. It's impossible
for us to inspect all the direct and indirect native library dependencies and
rebuild all to remove all the error stream writing.
Now we used a workaround to redirect error stream into /dev/null at the
beginning of our shell process. But I think in long term it should be fixed in
ShellBolt and ShellProcess.
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)