[ https://issues.apache.org/jira/browse/DRILL-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16358913#comment-16358913 ]
ASF GitHub Bot commented on DRILL-6143: --------------------------------------- Github user ilooner commented on a diff in the pull request: https://github.com/apache/drill/pull/1119#discussion_r167338469 --- Diff: exec/java-exec/src/main/resources/drill-module.conf --- @@ -413,6 +413,7 @@ drill.exec.options: { # to start at least 2 partitions then HashAgg fallbacks to this case. It can be # enabled by setting this flag to true. By default it's set to false such that # query will fail if there is not enough memory + drill.exec.rpc.fragrunner.timeout: 30000, --- End diff -- Thanks for catching the ordering. I reduced the default to 10000. > Make Fragment Runner's RPC Timeout a SystemOption > ------------------------------------------------- > > Key: DRILL-6143 > URL: https://issues.apache.org/jira/browse/DRILL-6143 > Project: Apache Drill > Issue Type: Bug > Affects Versions: 1.13.0 > Reporter: Timothy Farkas > Assignee: Timothy Farkas > Priority: Major > Fix For: 1.13.0 > > > Queries frequently fail sporadically on some clusters due to the following > error > {code} > oadd.org.apache.drill.common.exceptions.UserRemoteException: CONNECTION > ERROR: Exceeded timeout (25000) while waiting send intermediate work > fragments to remote nodes. Sent 5 and only heard response back from 4 nodes. > {code} > This error happens because the FragmentsRunner has a hardcoded timeout > RPC_WAIT_IN_MSECS_PER_FRAGMENT which is set at 5 seconds. Increasing the > timeout to 10 seconds resolved the sporadic failures that were observed. This > timeout should be changed to 10 and should also be configurable via the > SystemOptionManager -- This message was sent by Atlassian JIRA (v7.6.3#76005)