Timothy Farkas created DRILL-6143:
-------------------------------------

             Summary: Queries Fail Due To Aggressive RPC Timeout
                 Key: DRILL-6143
                 URL: https://issues.apache.org/jira/browse/DRILL-6143
             Project: Apache Drill
          Issue Type: Bug
            Reporter: Timothy Farkas
            Assignee: Timothy Farkas


Queries frequently fail sporadically on some clusters due to the following error

{code}
oadd.org.apache.drill.common.exceptions.UserRemoteException: CONNECTION ERROR: 
Exceeded timeout (25000) while waiting send intermediate work fragments to 
remote nodes. Sent 5 and only heard response back from 4 nodes.
{code}

This error happens because the FragmentsRunner has a hardcoded timeout 
RPC_WAIT_IN_MSECS_PER_FRAGMENT which is set at 5 seconds. Increasing the 
timeout to 10 seconds resolved the sporadic failures that were observed. This 
timeout should be changed to 10 and should also be configurable via the 
SystemOptionManager




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to