[jira] [Commented] (DRILL-6143) Make Fragment Runner's RPC Timeout a SystemOption

ASF GitHub Bot (JIRA) Fri, 09 Feb 2018 12:30:12 -0800

    [ 
https://issues.apache.org/jira/browse/DRILL-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16358913#comment-16358913
 ]


ASF GitHub Bot commented on DRILL-6143:
---------------------------------------

Github user ilooner commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1119#discussion_r167338469
  
    --- Diff: exec/java-exec/src/main/resources/drill-module.conf ---
    @@ -413,6 +413,7 @@ drill.exec.options: {
         # to start at least 2 partitions then HashAgg fallbacks to this case. 
It can be
         # enabled by setting this flag to true. By default it's set to false 
such that
         # query will fail if there is not enough memory
    +    drill.exec.rpc.fragrunner.timeout: 30000,
    --- End diff --
    
    Thanks for catching the ordering. I reduced the default to 10000. 


> Make Fragment Runner's RPC Timeout a SystemOption
> -------------------------------------------------
>
>                 Key: DRILL-6143
>                 URL: https://issues.apache.org/jira/browse/DRILL-6143
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.13.0
>            Reporter: Timothy Farkas
>            Assignee: Timothy Farkas
>            Priority: Major
>             Fix For: 1.13.0
>
>
> Queries frequently fail sporadically on some clusters due to the following 
> error
> {code}
> oadd.org.apache.drill.common.exceptions.UserRemoteException: CONNECTION 
> ERROR: Exceeded timeout (25000) while waiting send intermediate work 
> fragments to remote nodes. Sent 5 and only heard response back from 4 nodes.
> {code}
> This error happens because the FragmentsRunner has a hardcoded timeout 
> RPC_WAIT_IN_MSECS_PER_FRAGMENT which is set at 5 seconds. Increasing the 
> timeout to 10 seconds resolved the sporadic failures that were observed. This 
> timeout should be changed to 10 and should also be configurable via the 
> SystemOptionManager



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6143) Make Fragment Runner's RPC Timeout a SystemOption

Reply via email to