[ https://issues.apache.org/jira/browse/DRILL-5721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16147852#comment-16147852 ]
ASF GitHub Bot commented on DRILL-5721: --------------------------------------- Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/919 +1 LGTM > Query with only root fragment and no non-root fragment hangs when Drillbit to > Drillbit Control Connection has network issues > ---------------------------------------------------------------------------------------------------------------------------- > > Key: DRILL-5721 > URL: https://issues.apache.org/jira/browse/DRILL-5721 > Project: Apache Drill > Issue Type: Bug > Reporter: Sorabh Hamirwasia > Assignee: Sorabh Hamirwasia > Fix For: 1.12.0 > > > Recently I found an issue (Thanks to [~knguyen] to create this scenario) > related to Fragment Status reporting and would like some feedback on it. > When a client submits a query to Foreman, then it is planned by Foreman and > later fragments are scheduled to root and non-root nodes. Foreman creates a > DriilbitStatusListener and FragmentStatusListener to know about the health of > Drillbit node and a fragment respectively. The way root and non-root > fragments are setup by Foreman are different: > Root fragments are setup without any communication over control channel > (since it is executed locally on Foreman) > Non-root fragments are setup by sending control message > (REQ_INITIALIZE_FRAGMENTS_VALUE) over wire. If there is failure in sending > any such control message (like due to network hiccup's) during query setup > then the query is failed and client is notified. > Each fragment is executed on it's node with the help Fragment Executor which > has an instance for FragmentStatusReporter. FragmentStatusReporter helps to > update the status of a fragment to Foreman node over a control tunnel or > connection using RPC message (REQ_FRAGMENT_STATUS) both for root and non-root > fragments. > Based on above when root fragment is submitted for setup then it is done > locally without any RPC communication whereas when status for that fragment > is reported by fragment executor that happens over control connection by > sending a RPC message. But for non-root fragment setup and status update both > happens using RPC message over control connection. > *Issue 1:* > What was observed is if for a simple query which has only 1 root fragment > running on Foreman node then setup will work fine. But as part of status > update when the fragment tries to create a control connection and fails to > establish that, then the query hangs. This is because the root fragment will > complete execution but will fail to update Foreman about it and Foreman think > that the query is running for ever. > *Proposed Solution:* > For root fragment the setup of fragment is happening locally without RPC > message, so we can do the same for status update of root fragments. This will > avoid RPC communication for status update of fragments running locally on the > foreman and hence will resolve issue 1. -- This message was sent by Atlassian JIRA (v6.4.14#64029)