[
https://issues.apache.org/jira/browse/STORM-442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189590#comment-14189590
]
ASF GitHub Bot commented on STORM-442:
--------------------------------------
GitHub user dashengju opened a pull request:
https://github.com/apache/storm/pull/305
[STORM-442] multilang ShellBolt/ShellSpout die() can be hang when Exception
happened
@d2r , @harshach , @HeartSaVioR , I closed the old pull request by
accident, so I opened a new one, and move all the comments to here.
------------------------------------------------------------------------------------------------
In ShellBolt, the _readerThread read command from python/shell process, and
handle like this:
try
{
ShellMsg shellMsg = _process.readShellMsg();
...
} catch (InterruptedException e) {
} catch (Throwable t) {
die(t);
}
And in the die function, getProcessTerminationInfoString will read
getErrorsString() from processErrorStream.
private void die(Throwable exception) {
String processInfo = _process.getProcessInfoString() +
_process.getProcessTerminationInfoString();
_exception = new RuntimeException(processInfo, exception);
}
so when ShellBolt got exception(for example, readShellMsg() throw NPE ) ,
but it is not an error from sub process, then getProcessTerminationInfoString
will be hang because processErrorStream have no data to read.
------------------------------------------------------------------------------------------------
harshach commented on 21 Aug
@dashengju Can you clarify "getProcessTerminationInfoString will be hang
because processErrorStream have no data to read". Which case it might hang ,
looking at the code
https://github.com/apache/incubator-storm/blob/master/storm-core/src/jvm/backtype/storm/utils/ShellProcess.java#L135
If there is no data in processErrorStream it returns empty string.
If possible can you add a unit test. Thanks.
------------------------------------------------------------------------------------------------
d2r added a note 29 days ago
I think we should change ShellProcess#getErrorsStream so that it checks
that there is actually something to read.
This would fix a couple of other places where a hang looks possible:
https://github.com/apache/storm/blob/24b5eefb1553ae37951807502a433f24d3e95ec1/storm-core/src/jvm/backtype/storm/utils/ShellProcess.java#L68
https://github.com/apache/storm/blob/24b5eefb1553ae37951807502a433f24d3e95ec1/storm-core/src/jvm/backtype/storm/utils/ShellProcess.java#L70
https://github.com/apache/storm/blob/24b5eefb1553ae37951807502a433f24d3e95ec1/storm-core/src/jvm/backtype/storm/utils/ShellProcess.java#L101
ShellProcess#logErrorStream does this.
------------------------------------------------------------------------------------------------
d2r commented a day ago
Hi @dashengju, any update on this pull request?
------------------------------------------------------------------------------------------------
dashengju commented 15 hours ago
@d2r I had no time work on this pull request those days.
I will verify the problem, make some unit test and submit update on the
pull request next few days.
any suggestion?
------------------------------------------------------------------------------------------------
HeartSaVioR commented 5 hours ago
Is there a relation between parent process's exception and subprocess's
stderr?
If it's not, and we should capture it from parent, I think spout may have
stderr reader with new thread.
Because it may not be right way to capture it when only parent process
raises exception.
------------------------------------------------------------------------------------------------
dashengju commented an hour ago
@HeartSaVioR ,
parent process's exception contains two types: 1) net io error caused; for
example, subprocess report an error to stderr and exit; 2) parent process's
handle logic exception.
as 1), we want to read from stderr to get more information;
as 2), as you say, it may not be right way to capture it when only parent
process raises exception.
@d2r 's suggestion, we should change ShellProcess#getErrorsStream so that
it checks that there is actually something to read.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/dashengju/storm Multilang_die_hang
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/storm/pull/305.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #305
----
commit ac5a18dc30ee6e9257eb5c18c0c87e7c7e944b23
Author: JuDasheng <[email protected]>
Date: 2014-10-30T03:46:21Z
change ShellProcess#getErrorsStream so that it checks that there is
actually something to read.
----
> multilang ShellBolt/ShellSpout die() can be hang when Exception happened
> ------------------------------------------------------------------------
>
> Key: STORM-442
> URL: https://issues.apache.org/jira/browse/STORM-442
> Project: Apache Storm
> Issue Type: Bug
> Affects Versions: 0.9.3
> Reporter: DashengJu
>
> In ShellBolt, the _readerThread read command from python/shell process, and
> handle like this:
> try {
> ShellMsg shellMsg = _process.readShellMsg();
> ...
> } catch (InterruptedException e) {
> } catch (Throwable t) {
> die(t);
> }
> And in the die function, getProcessTerminationInfoString will read
> getErrorsString() from processErrorStream.
> private void die(Throwable exception) {
>
> String processInfo = _process.getProcessInfoString() +
> _process.getProcessTerminationInfoString();
>
> _exception = new RuntimeException(processInfo, exception);
>
> }
> so when ShellBolt got exception(for example, readShellMsg() throw NPE ) ,
> but it is not an error from sub process, then
> getProcessTerminationInfoString will be hang because processErrorStream have
> no data to read.
> On the other hand, as [~xiaokang] says ShellBolt should fail fast on
> exception ( https://github.com/apache/incubator-storm/pull/46 ) , I think it
> is not a good idea to read error info from stream.
> Because [~xiaokang] 's PR is based old version, so I will move his code to
> this PR, and modify some other place in ShellSpout.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)