[ https://issues.apache.org/jira/browse/STORM-442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189590#comment-14189590 ]
ASF GitHub Bot commented on STORM-442: -------------------------------------- GitHub user dashengju opened a pull request: https://github.com/apache/storm/pull/305 [STORM-442] multilang ShellBolt/ShellSpout die() can be hang when Exception happened @d2r , @harshach , @HeartSaVioR , I closed the old pull request by accident, so I opened a new one, and move all the comments to here. ------------------------------------------------------------------------------------------------ In ShellBolt, the _readerThread read command from python/shell process, and handle like this: try { ShellMsg shellMsg = _process.readShellMsg(); ... } catch (InterruptedException e) { } catch (Throwable t) { die(t); } And in the die function, getProcessTerminationInfoString will read getErrorsString() from processErrorStream. private void die(Throwable exception) { String processInfo = _process.getProcessInfoString() + _process.getProcessTerminationInfoString(); _exception = new RuntimeException(processInfo, exception); } so when ShellBolt got exception(for example, readShellMsg() throw NPE ) , but it is not an error from sub process, then getProcessTerminationInfoString will be hang because processErrorStream have no data to read. ------------------------------------------------------------------------------------------------ harshach commented on 21 Aug @dashengju Can you clarify "getProcessTerminationInfoString will be hang because processErrorStream have no data to read". Which case it might hang , looking at the code https://github.com/apache/incubator-storm/blob/master/storm-core/src/jvm/backtype/storm/utils/ShellProcess.java#L135 If there is no data in processErrorStream it returns empty string. If possible can you add a unit test. Thanks. ------------------------------------------------------------------------------------------------ d2r added a note 29 days ago I think we should change ShellProcess#getErrorsStream so that it checks that there is actually something to read. This would fix a couple of other places where a hang looks possible: https://github.com/apache/storm/blob/24b5eefb1553ae37951807502a433f24d3e95ec1/storm-core/src/jvm/backtype/storm/utils/ShellProcess.java#L68 https://github.com/apache/storm/blob/24b5eefb1553ae37951807502a433f24d3e95ec1/storm-core/src/jvm/backtype/storm/utils/ShellProcess.java#L70 https://github.com/apache/storm/blob/24b5eefb1553ae37951807502a433f24d3e95ec1/storm-core/src/jvm/backtype/storm/utils/ShellProcess.java#L101 ShellProcess#logErrorStream does this. ------------------------------------------------------------------------------------------------ d2r commented a day ago Hi @dashengju, any update on this pull request? ------------------------------------------------------------------------------------------------ dashengju commented 15 hours ago @d2r I had no time work on this pull request those days. I will verify the problem, make some unit test and submit update on the pull request next few days. any suggestion? ------------------------------------------------------------------------------------------------ HeartSaVioR commented 5 hours ago Is there a relation between parent process's exception and subprocess's stderr? If it's not, and we should capture it from parent, I think spout may have stderr reader with new thread. Because it may not be right way to capture it when only parent process raises exception. ------------------------------------------------------------------------------------------------ dashengju commented an hour ago @HeartSaVioR , parent process's exception contains two types: 1) net io error caused; for example, subprocess report an error to stderr and exit; 2) parent process's handle logic exception. as 1), we want to read from stderr to get more information; as 2), as you say, it may not be right way to capture it when only parent process raises exception. @d2r 's suggestion, we should change ShellProcess#getErrorsStream so that it checks that there is actually something to read. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dashengju/storm Multilang_die_hang Alternatively you can review and apply these changes as the patch at: https://github.com/apache/storm/pull/305.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #305 ---- commit ac5a18dc30ee6e9257eb5c18c0c87e7c7e944b23 Author: JuDasheng <judash...@meituan.com> Date: 2014-10-30T03:46:21Z change ShellProcess#getErrorsStream so that it checks that there is actually something to read. ---- > multilang ShellBolt/ShellSpout die() can be hang when Exception happened > ------------------------------------------------------------------------ > > Key: STORM-442 > URL: https://issues.apache.org/jira/browse/STORM-442 > Project: Apache Storm > Issue Type: Bug > Affects Versions: 0.9.3 > Reporter: DashengJu > > In ShellBolt, the _readerThread read command from python/shell process, and > handle like this: > try { > ShellMsg shellMsg = _process.readShellMsg(); > ... > } catch (InterruptedException e) { > } catch (Throwable t) { > die(t); > } > And in the die function, getProcessTerminationInfoString will read > getErrorsString() from processErrorStream. > private void die(Throwable exception) { > > String processInfo = _process.getProcessInfoString() + > _process.getProcessTerminationInfoString(); > > _exception = new RuntimeException(processInfo, exception); > > } > so when ShellBolt got exception(for example, readShellMsg() throw NPE ) , > but it is not an error from sub process, then > getProcessTerminationInfoString will be hang because processErrorStream have > no data to read. > On the other hand, as [~xiaokang] says ShellBolt should fail fast on > exception ( https://github.com/apache/incubator-storm/pull/46 ) , I think it > is not a good idea to read error info from stream. > Because [~xiaokang] 's PR is based old version, so I will move his code to > this PR, and modify some other place in ShellSpout. -- This message was sent by Atlassian JIRA (v6.3.4#6332)