[jira] [Commented] (DRILL-6143) Queries Fail Due To Aggressive Hardcoded RPC Timeout

2018-02-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357929#comment-16357929
 ] 

ASF GitHub Bot commented on DRILL-6143:
---

Github user ilooner commented on the issue:

https://github.com/apache/drill/pull/1119
  
@arina-ielchiieva 


> Queries Fail Due To Aggressive Hardcoded RPC Timeout
> 
>
> Key: DRILL-6143
> URL: https://issues.apache.org/jira/browse/DRILL-6143
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
> Fix For: 1.13.0
>
>
> Queries frequently fail sporadically on some clusters due to the following 
> error
> {code}
> oadd.org.apache.drill.common.exceptions.UserRemoteException: CONNECTION 
> ERROR: Exceeded timeout (25000) while waiting send intermediate work 
> fragments to remote nodes. Sent 5 and only heard response back from 4 nodes.
> {code}
> This error happens because the FragmentsRunner has a hardcoded timeout 
> RPC_WAIT_IN_MSECS_PER_FRAGMENT which is set at 5 seconds. Increasing the 
> timeout to 10 seconds resolved the sporadic failures that were observed. This 
> timeout should be changed to 10 and should also be configurable via the 
> SystemOptionManager



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6143) Queries Fail Due To Aggressive Hardcoded RPC Timeout

2018-02-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357927#comment-16357927
 ] 

ASF GitHub Bot commented on DRILL-6143:
---

GitHub user ilooner opened a pull request:

https://github.com/apache/drill/pull/1119

DRILL-6143: Made FragmentsRunner's rpc timeout larger to reduce rando…

…m failures and made it configurable as a SystemOption.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ilooner/drill DRILL-6143

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1119.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1119


commit d918265c4d5caee11c0b707ff49a49c547c8dc8a
Author: Timothy Farkas 
Date:   2018-02-08T23:25:59Z

DRILL-6143: Made FragmentsRunner's rpc timeout larger to reduce random 
failures and made it configurable as a SystemOption.




> Queries Fail Due To Aggressive Hardcoded RPC Timeout
> 
>
> Key: DRILL-6143
> URL: https://issues.apache.org/jira/browse/DRILL-6143
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
> Fix For: 1.13.0
>
>
> Queries frequently fail sporadically on some clusters due to the following 
> error
> {code}
> oadd.org.apache.drill.common.exceptions.UserRemoteException: CONNECTION 
> ERROR: Exceeded timeout (25000) while waiting send intermediate work 
> fragments to remote nodes. Sent 5 and only heard response back from 4 nodes.
> {code}
> This error happens because the FragmentsRunner has a hardcoded timeout 
> RPC_WAIT_IN_MSECS_PER_FRAGMENT which is set at 5 seconds. Increasing the 
> timeout to 10 seconds resolved the sporadic failures that were observed. This 
> timeout should be changed to 10 and should also be configurable via the 
> SystemOptionManager



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)