[ https://issues.apache.org/jira/browse/DRILL-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16359170#comment-16359170 ]
ASF GitHub Bot commented on DRILL-6143: --------------------------------------- Github user vrozov commented on the issue: https://github.com/apache/drill/pull/1119 LGTM > Make Fragment Runner's RPC Timeout a SystemOption > ------------------------------------------------- > > Key: DRILL-6143 > URL: https://issues.apache.org/jira/browse/DRILL-6143 > Project: Apache Drill > Issue Type: Bug > Affects Versions: 1.13.0 > Reporter: Timothy Farkas > Assignee: Timothy Farkas > Priority: Major > Labels: ready-to-commit > Fix For: 1.13.0 > > > Queries frequently fail sporadically on some clusters due to the following > error > {code} > oadd.org.apache.drill.common.exceptions.UserRemoteException: CONNECTION > ERROR: Exceeded timeout (25000) while waiting send intermediate work > fragments to remote nodes. Sent 5 and only heard response back from 4 nodes. > {code} > This error happens because the FragmentsRunner has a hardcoded timeout > RPC_WAIT_IN_MSECS_PER_FRAGMENT which is set at 5 seconds. Increasing the > timeout to 10 seconds resolved the sporadic failures that were observed. This > timeout should be changed to 10 and should also be configurable via the > SystemOptionManager -- This message was sent by Atlassian JIRA (v7.6.3#76005)