[ https://issues.apache.org/jira/browse/DRILL-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16358955#comment-16358955 ]
ASF GitHub Bot commented on DRILL-6143: --------------------------------------- Github user ilooner commented on a diff in the pull request: https://github.com/apache/drill/pull/1119#discussion_r167346417 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java --- @@ -212,7 +212,8 @@ new OptionDefinition(ExecConstants.CPU_LOAD_AVERAGE), new OptionDefinition(ExecConstants.ENABLE_VECTOR_VALIDATOR), new OptionDefinition(ExecConstants.ENABLE_ITERATOR_VALIDATOR), - new OptionDefinition(ExecConstants.OUTPUT_BATCH_SIZE_VALIDATOR, new OptionMetaData(OptionValue.AccessibleScopes.SYSTEM, true, false)) + new OptionDefinition(ExecConstants.OUTPUT_BATCH_SIZE_VALIDATOR, new OptionMetaData(OptionValue.AccessibleScopes.SYSTEM, true, false)), + new OptionDefinition(ExecConstants.FRAG_RUNNER_RPC_TIMEOUT_VALIDATOR, new OptionMetaData(OptionValue.AccessibleScopes.SYSTEM, false, true)), --- End diff -- internal should be true since we want this to show up in the internal options table and not the standard system options table. Changing to adminOnly = true seems reasonable. > Make Fragment Runner's RPC Timeout a SystemOption > ------------------------------------------------- > > Key: DRILL-6143 > URL: https://issues.apache.org/jira/browse/DRILL-6143 > Project: Apache Drill > Issue Type: Bug > Affects Versions: 1.13.0 > Reporter: Timothy Farkas > Assignee: Timothy Farkas > Priority: Major > Fix For: 1.13.0 > > > Queries frequently fail sporadically on some clusters due to the following > error > {code} > oadd.org.apache.drill.common.exceptions.UserRemoteException: CONNECTION > ERROR: Exceeded timeout (25000) while waiting send intermediate work > fragments to remote nodes. Sent 5 and only heard response back from 4 nodes. > {code} > This error happens because the FragmentsRunner has a hardcoded timeout > RPC_WAIT_IN_MSECS_PER_FRAGMENT which is set at 5 seconds. Increasing the > timeout to 10 seconds resolved the sporadic failures that were observed. This > timeout should be changed to 10 and should also be configurable via the > SystemOptionManager -- This message was sent by Atlassian JIRA (v7.6.3#76005)