[jira] [Updated] (DRILL-6143) Make Fragment Runner's RPC Timeout a SystemOption
[ https://issues.apache.org/jira/browse/DRILL-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Farkas updated DRILL-6143: -- Reviewer: Boaz Ben-Zvi (was: Arina Ielchiieva) > Make Fragment Runner's RPC Timeout a SystemOption > - > > Key: DRILL-6143 > URL: https://issues.apache.org/jira/browse/DRILL-6143 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Major > Labels: ready-to-commit > Fix For: 1.13.0 > > > Queries frequently fail sporadically on some clusters due to the following > error > {code} > oadd.org.apache.drill.common.exceptions.UserRemoteException: CONNECTION > ERROR: Exceeded timeout (25000) while waiting send intermediate work > fragments to remote nodes. Sent 5 and only heard response back from 4 nodes. > {code} > This error happens because the FragmentsRunner has a hardcoded timeout > RPC_WAIT_IN_MSECS_PER_FRAGMENT which is set at 5 seconds. Increasing the > timeout to 10 seconds resolved the sporadic failures that were observed. This > timeout should be changed to 10 and should also be configurable via the > SystemOptionManager -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6143) Make Fragment Runner's RPC Timeout a SystemOption
[ https://issues.apache.org/jira/browse/DRILL-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Farkas updated DRILL-6143: -- Labels: ready-to-commit (was: ) > Make Fragment Runner's RPC Timeout a SystemOption > - > > Key: DRILL-6143 > URL: https://issues.apache.org/jira/browse/DRILL-6143 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Major > Labels: ready-to-commit > Fix For: 1.13.0 > > > Queries frequently fail sporadically on some clusters due to the following > error > {code} > oadd.org.apache.drill.common.exceptions.UserRemoteException: CONNECTION > ERROR: Exceeded timeout (25000) while waiting send intermediate work > fragments to remote nodes. Sent 5 and only heard response back from 4 nodes. > {code} > This error happens because the FragmentsRunner has a hardcoded timeout > RPC_WAIT_IN_MSECS_PER_FRAGMENT which is set at 5 seconds. Increasing the > timeout to 10 seconds resolved the sporadic failures that were observed. This > timeout should be changed to 10 and should also be configurable via the > SystemOptionManager -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6143) Make Fragment Runner's RPC Timeout a SystemOption
[ https://issues.apache.org/jira/browse/DRILL-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Farkas updated DRILL-6143: -- Reviewer: Arina Ielchiieva > Make Fragment Runner's RPC Timeout a SystemOption > - > > Key: DRILL-6143 > URL: https://issues.apache.org/jira/browse/DRILL-6143 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Major > Fix For: 1.13.0 > > > Queries frequently fail sporadically on some clusters due to the following > error > {code} > oadd.org.apache.drill.common.exceptions.UserRemoteException: CONNECTION > ERROR: Exceeded timeout (25000) while waiting send intermediate work > fragments to remote nodes. Sent 5 and only heard response back from 4 nodes. > {code} > This error happens because the FragmentsRunner has a hardcoded timeout > RPC_WAIT_IN_MSECS_PER_FRAGMENT which is set at 5 seconds. Increasing the > timeout to 10 seconds resolved the sporadic failures that were observed. This > timeout should be changed to 10 and should also be configurable via the > SystemOptionManager -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6143) Make Fragment Runner's RPC Timeout a SystemOption
[ https://issues.apache.org/jira/browse/DRILL-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Farkas updated DRILL-6143: -- Summary: Make Fragment Runner's RPC Timeout a SystemOption (was: Queries Fail Due To Aggressive Hardcoded RPC Timeout) > Make Fragment Runner's RPC Timeout a SystemOption > - > > Key: DRILL-6143 > URL: https://issues.apache.org/jira/browse/DRILL-6143 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Major > Fix For: 1.13.0 > > > Queries frequently fail sporadically on some clusters due to the following > error > {code} > oadd.org.apache.drill.common.exceptions.UserRemoteException: CONNECTION > ERROR: Exceeded timeout (25000) while waiting send intermediate work > fragments to remote nodes. Sent 5 and only heard response back from 4 nodes. > {code} > This error happens because the FragmentsRunner has a hardcoded timeout > RPC_WAIT_IN_MSECS_PER_FRAGMENT which is set at 5 seconds. Increasing the > timeout to 10 seconds resolved the sporadic failures that were observed. This > timeout should be changed to 10 and should also be configurable via the > SystemOptionManager -- This message was sent by Atlassian JIRA (v7.6.3#76005)