GitHub user dashengju opened a pull request:
https://github.com/apache/storm/pull/305
[STORM-442] multilang ShellBolt/ShellSpout die() can be hang when Exception
happened
@d2r , @harshach , @HeartSaVioR , I closed the old pull request by
accident, so I opened a new one, and move all the comments to here.
------------------------------------------------------------------------------------------------
In ShellBolt, the _readerThread read command from python/shell process, and
handle like this:
try
{
ShellMsg shellMsg = _process.readShellMsg();
...
} catch (InterruptedException e) {
} catch (Throwable t) {
die(t);
}
And in the die function, getProcessTerminationInfoString will read
getErrorsString() from processErrorStream.
private void die(Throwable exception) {
String processInfo = _process.getProcessInfoString() +
_process.getProcessTerminationInfoString();
_exception = new RuntimeException(processInfo, exception);
}
so when ShellBolt got exception(for example, readShellMsg() throw NPE ) ,
but it is not an error from sub process, then getProcessTerminationInfoString
will be hang because processErrorStream have no data to read.
------------------------------------------------------------------------------------------------
harshach commented on 21 Aug
@dashengju Can you clarify "getProcessTerminationInfoString will be hang
because processErrorStream have no data to read". Which case it might hang ,
looking at the code
https://github.com/apache/incubator-storm/blob/master/storm-core/src/jvm/backtype/storm/utils/ShellProcess.java#L135
If there is no data in processErrorStream it returns empty string.
If possible can you add a unit test. Thanks.
------------------------------------------------------------------------------------------------
d2r added a note 29 days ago
I think we should change ShellProcess#getErrorsStream so that it checks
that there is actually something to read.
This would fix a couple of other places where a hang looks possible:
https://github.com/apache/storm/blob/24b5eefb1553ae37951807502a433f24d3e95ec1/storm-core/src/jvm/backtype/storm/utils/ShellProcess.java#L68
https://github.com/apache/storm/blob/24b5eefb1553ae37951807502a433f24d3e95ec1/storm-core/src/jvm/backtype/storm/utils/ShellProcess.java#L70
https://github.com/apache/storm/blob/24b5eefb1553ae37951807502a433f24d3e95ec1/storm-core/src/jvm/backtype/storm/utils/ShellProcess.java#L101
ShellProcess#logErrorStream does this.
------------------------------------------------------------------------------------------------
d2r commented a day ago
Hi @dashengju, any update on this pull request?
------------------------------------------------------------------------------------------------
dashengju commented 15 hours ago
@d2r I had no time work on this pull request those days.
I will verify the problem, make some unit test and submit update on the
pull request next few days.
any suggestion?
------------------------------------------------------------------------------------------------
HeartSaVioR commented 5 hours ago
Is there a relation between parent process's exception and subprocess's
stderr?
If it's not, and we should capture it from parent, I think spout may have
stderr reader with new thread.
Because it may not be right way to capture it when only parent process
raises exception.
------------------------------------------------------------------------------------------------
dashengju commented an hour ago
@HeartSaVioR ,
parent process's exception contains two types: 1) net io error caused; for
example, subprocess report an error to stderr and exit; 2) parent process's
handle logic exception.
as 1), we want to read from stderr to get more information;
as 2), as you say, it may not be right way to capture it when only parent
process raises exception.
@d2r 's suggestion, we should change ShellProcess#getErrorsStream so that
it checks that there is actually something to read.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/dashengju/storm Multilang_die_hang
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/storm/pull/305.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #305
----
commit ac5a18dc30ee6e9257eb5c18c0c87e7c7e944b23
Author: JuDasheng <[email protected]>
Date: 2014-10-30T03:46:21Z
change ShellProcess#getErrorsStream so that it checks that there is
actually something to read.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---