[jira] [Commented] (DRILL-5902) Regression: Queries encounter random failure due to RPC connection timed out
[ https://issues.apache.org/jira/browse/DRILL-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16368399#comment-16368399 ] ASF GitHub Bot commented on DRILL-5902: --- Github user vrozov commented on a diff in the pull request: https://github.com/apache/drill/pull/1113#discussion_r168935308 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/QueryStateProcessor.java --- @@ -125,20 +125,17 @@ public void cancel() { case PREPARING: case PLANNING: case ENQUEUED: -moveToState(QueryState.CANCELLATION_REQUESTED, null); -return; - case STARTING: case RUNNING: -addToEventQueue(QueryState.CANCELLATION_REQUESTED, null); -return; +moveToState(QueryState.CANCELLATION_REQUESTED, null); --- End diff -- @arina-ielchiieva Please update JIRA title. > Regression: Queries encounter random failure due to RPC connection timed out > > > Key: DRILL-5902 > URL: https://issues.apache.org/jira/browse/DRILL-5902 > Project: Apache Drill > Issue Type: Bug > Components: Execution - RPC >Affects Versions: 1.11.0 >Reporter: Robert Hou >Assignee: Vlad Rozov >Priority: Critical > Attachments: 261230f7-e3b9-0cee-22d8-921cb56e3e12.sys.drill, > node196.drillbit.log > > > Multiple random failures (25) occurred with the latest > Functional-Baseline-88.193 run. Here is a sample query: > {noformat} > /root/drillAutomation/prasadns14/framework/resources/Functional/window_functions/multiple_partitions/q27.sql > -- Kitchen sink > -- Use all supported functions > select > rank() over W, > dense_rank()over W, > percent_rank() over W, > cume_dist() over W, > avg(c_integer + c_integer) over W, > sum(c_integer/100) over W, > count(*)over W, > min(c_integer) over W, > max(c_integer) over W, > row_number()over W > from > j7 > where > c_boolean is not null > window W as (partition by c_bigint, c_date, c_time, c_boolean order by > c_integer) > {noformat} > From the logs: > {noformat} > 2017-10-23 04:14:36,536 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:1 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,537 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:5 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,537 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:9 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,537 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:13 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,537 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:17 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,538 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:21 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,538 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:25 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > {noformat} > {noformat} > 2017-10-23 04:14:53,941 [UserServer-1] INFO > o.a.drill.exec.rpc.user.UserServer - RPC connection /10.10.88.196:31010 <--> > /10.10.88.193:38281 (user server) timed out. Timeout was set to 30 seconds. > Closing connection. > 2017-10-23 04:14:53,952 [UserServer-1] INFO > o.a.d.e.w.fragment.FragmentExecutor - > 261230f8-2698-15b2-952f-d4ade8d6b180:0:0: State change requested RUNNING --> > FAILED > 2017-10-23 04:14:53,952 [261230f8-2698-15b2-952f-d4ade8d6b180:frag:0:0] INFO > o.a.d.e.w.fragment.FragmentExecutor - >
[jira] [Commented] (DRILL-5902) Regression: Queries encounter random failure due to RPC connection timed out
[ https://issues.apache.org/jira/browse/DRILL-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366729#comment-16366729 ] ASF GitHub Bot commented on DRILL-5902: --- Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1113#discussion_r168699838 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/QueryStateProcessor.java --- @@ -125,20 +125,17 @@ public void cancel() { case PREPARING: case PLANNING: case ENQUEUED: -moveToState(QueryState.CANCELLATION_REQUESTED, null); -return; - case STARTING: case RUNNING: -addToEventQueue(QueryState.CANCELLATION_REQUESTED, null); -return; +moveToState(QueryState.CANCELLATION_REQUESTED, null); --- End diff -- 1. Your point makes sense. In this case could you please update java doc for the `cancel` method to be consistent with new changes? 2. Maybe we should remove word `regression` from the commit message to avoid confusion? > Regression: Queries encounter random failure due to RPC connection timed out > > > Key: DRILL-5902 > URL: https://issues.apache.org/jira/browse/DRILL-5902 > Project: Apache Drill > Issue Type: Bug > Components: Execution - RPC >Affects Versions: 1.11.0 >Reporter: Robert Hou >Assignee: Vlad Rozov >Priority: Critical > Attachments: 261230f7-e3b9-0cee-22d8-921cb56e3e12.sys.drill, > node196.drillbit.log > > > Multiple random failures (25) occurred with the latest > Functional-Baseline-88.193 run. Here is a sample query: > {noformat} > /root/drillAutomation/prasadns14/framework/resources/Functional/window_functions/multiple_partitions/q27.sql > -- Kitchen sink > -- Use all supported functions > select > rank() over W, > dense_rank()over W, > percent_rank() over W, > cume_dist() over W, > avg(c_integer + c_integer) over W, > sum(c_integer/100) over W, > count(*)over W, > min(c_integer) over W, > max(c_integer) over W, > row_number()over W > from > j7 > where > c_boolean is not null > window W as (partition by c_bigint, c_date, c_time, c_boolean order by > c_integer) > {noformat} > From the logs: > {noformat} > 2017-10-23 04:14:36,536 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:1 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,537 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:5 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,537 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:9 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,537 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:13 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,537 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:17 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,538 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:21 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,538 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:25 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > {noformat} > {noformat} > 2017-10-23 04:14:53,941 [UserServer-1] INFO > o.a.drill.exec.rpc.user.UserServer - RPC connection /10.10.88.196:31010 <--> > /10.10.88.193:38281 (user server) timed out. Timeout was set to 30 seconds. > Closing connection. > 2017-10-23 04:14:53,952 [UserServer-1] INFO > o.a.d.e.w.fragment.FragmentExecutor - >
[jira] [Commented] (DRILL-5902) Regression: Queries encounter random failure due to RPC connection timed out
[ https://issues.apache.org/jira/browse/DRILL-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366219#comment-16366219 ] ASF GitHub Bot commented on DRILL-5902: --- Github user vrozov commented on a diff in the pull request: https://github.com/apache/drill/pull/1113#discussion_r168599100 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/QueryStateProcessor.java --- @@ -125,20 +125,17 @@ public void cancel() { case PREPARING: case PLANNING: case ENQUEUED: -moveToState(QueryState.CANCELLATION_REQUESTED, null); -return; - case STARTING: case RUNNING: -addToEventQueue(QueryState.CANCELLATION_REQUESTED, null); -return; +moveToState(QueryState.CANCELLATION_REQUESTED, null); --- End diff -- 1. Per my understanding, only during a transition from `STARTING` to `RUNNING` it is necessary to delay the processing of `CANCELLATION_REQUESTED` till requests are submitted to remote nodes for execution. Once the state is transitioned to `RUNNING`, remote drillbits are ready to start processing cancellation request, so no delay is necessary for the `RUNNING` state. In case of `STARTING` there is already a call to `addToEventQueue()` inside `QueryStateProcessor.starting()` that ensures that cancellation will be processed after the transition. 1. I can't say why JIRA title states that it is a regression, as far as I can tell, any delay in saving query profile may cause the issue. One possibility is that the issue is amplified by [DRILL-6053](https://issues.apache.org/jira/browse/DRILL-6053) that is a regression caused by the fix for [DRILL-4963](https://issues.apache.org/jira/browse/DRILL-4963) that introduces even longer delay (due to synchronization). > Regression: Queries encounter random failure due to RPC connection timed out > > > Key: DRILL-5902 > URL: https://issues.apache.org/jira/browse/DRILL-5902 > Project: Apache Drill > Issue Type: Bug > Components: Execution - RPC >Affects Versions: 1.11.0 >Reporter: Robert Hou >Assignee: Vlad Rozov >Priority: Critical > Attachments: 261230f7-e3b9-0cee-22d8-921cb56e3e12.sys.drill, > node196.drillbit.log > > > Multiple random failures (25) occurred with the latest > Functional-Baseline-88.193 run. Here is a sample query: > {noformat} > /root/drillAutomation/prasadns14/framework/resources/Functional/window_functions/multiple_partitions/q27.sql > -- Kitchen sink > -- Use all supported functions > select > rank() over W, > dense_rank()over W, > percent_rank() over W, > cume_dist() over W, > avg(c_integer + c_integer) over W, > sum(c_integer/100) over W, > count(*)over W, > min(c_integer) over W, > max(c_integer) over W, > row_number()over W > from > j7 > where > c_boolean is not null > window W as (partition by c_bigint, c_date, c_time, c_boolean order by > c_integer) > {noformat} > From the logs: > {noformat} > 2017-10-23 04:14:36,536 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:1 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,537 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:5 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,537 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:9 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,537 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:13 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,537 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:17 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,538 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path >
[jira] [Commented] (DRILL-5902) Regression: Queries encounter random failure due to RPC connection timed out
[ https://issues.apache.org/jira/browse/DRILL-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365278#comment-16365278 ] ASF GitHub Bot commented on DRILL-5902: --- Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1113#discussion_r168415470 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/QueryStateProcessor.java --- @@ -125,20 +125,17 @@ public void cancel() { case PREPARING: case PLANNING: case ENQUEUED: -moveToState(QueryState.CANCELLATION_REQUESTED, null); -return; - case STARTING: case RUNNING: -addToEventQueue(QueryState.CANCELLATION_REQUESTED, null); -return; +moveToState(QueryState.CANCELLATION_REQUESTED, null); --- End diff -- 1. Could you please explain why `addToEventQueue` was changed to `moveToState`. Per my understanding `addToEventQueue` is used to ensure that cancellation will be requested only when all fragments are sent out to avoid hanging in cancellation requested state. For preparing, planning and enqueued states we cancel immediately since these states are done locally. 2. Why this Jira title states it's a regression? Do we know what has caused the regression? > Regression: Queries encounter random failure due to RPC connection timed out > > > Key: DRILL-5902 > URL: https://issues.apache.org/jira/browse/DRILL-5902 > Project: Apache Drill > Issue Type: Bug > Components: Execution - RPC >Affects Versions: 1.11.0 >Reporter: Robert Hou >Assignee: Vlad Rozov >Priority: Critical > Attachments: 261230f7-e3b9-0cee-22d8-921cb56e3e12.sys.drill, > node196.drillbit.log > > > Multiple random failures (25) occurred with the latest > Functional-Baseline-88.193 run. Here is a sample query: > {noformat} > /root/drillAutomation/prasadns14/framework/resources/Functional/window_functions/multiple_partitions/q27.sql > -- Kitchen sink > -- Use all supported functions > select > rank() over W, > dense_rank()over W, > percent_rank() over W, > cume_dist() over W, > avg(c_integer + c_integer) over W, > sum(c_integer/100) over W, > count(*)over W, > min(c_integer) over W, > max(c_integer) over W, > row_number()over W > from > j7 > where > c_boolean is not null > window W as (partition by c_bigint, c_date, c_time, c_boolean order by > c_integer) > {noformat} > From the logs: > {noformat} > 2017-10-23 04:14:36,536 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:1 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,537 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:5 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,537 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:9 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,537 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:13 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,537 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:17 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,538 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:21 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,538 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:25 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > {noformat} > {noformat} > 2017-10-23 04:14:53,941 [UserServer-1] INFO > o.a.drill.exec.rpc.user.UserServer - RPC connection
[jira] [Commented] (DRILL-5902) Regression: Queries encounter random failure due to RPC connection timed out
[ https://issues.apache.org/jira/browse/DRILL-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357524#comment-16357524 ] ASF GitHub Bot commented on DRILL-5902: --- Github user ilooner commented on the issue: https://github.com/apache/drill/pull/1113 K thx @vrozov > Regression: Queries encounter random failure due to RPC connection timed out > > > Key: DRILL-5902 > URL: https://issues.apache.org/jira/browse/DRILL-5902 > Project: Apache Drill > Issue Type: Bug > Components: Execution - RPC >Affects Versions: 1.11.0 >Reporter: Robert Hou >Assignee: Vlad Rozov >Priority: Critical > Attachments: 261230f7-e3b9-0cee-22d8-921cb56e3e12.sys.drill, > node196.drillbit.log > > > Multiple random failures (25) occurred with the latest > Functional-Baseline-88.193 run. Here is a sample query: > {noformat} > /root/drillAutomation/prasadns14/framework/resources/Functional/window_functions/multiple_partitions/q27.sql > -- Kitchen sink > -- Use all supported functions > select > rank() over W, > dense_rank()over W, > percent_rank() over W, > cume_dist() over W, > avg(c_integer + c_integer) over W, > sum(c_integer/100) over W, > count(*)over W, > min(c_integer) over W, > max(c_integer) over W, > row_number()over W > from > j7 > where > c_boolean is not null > window W as (partition by c_bigint, c_date, c_time, c_boolean order by > c_integer) > {noformat} > From the logs: > {noformat} > 2017-10-23 04:14:36,536 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:1 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,537 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:5 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,537 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:9 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,537 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:13 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,537 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:17 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,538 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:21 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,538 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:25 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > {noformat} > {noformat} > 2017-10-23 04:14:53,941 [UserServer-1] INFO > o.a.drill.exec.rpc.user.UserServer - RPC connection /10.10.88.196:31010 <--> > /10.10.88.193:38281 (user server) timed out. Timeout was set to 30 seconds. > Closing connection. > 2017-10-23 04:14:53,952 [UserServer-1] INFO > o.a.d.e.w.fragment.FragmentExecutor - > 261230f8-2698-15b2-952f-d4ade8d6b180:0:0: State change requested RUNNING --> > FAILED > 2017-10-23 04:14:53,952 [261230f8-2698-15b2-952f-d4ade8d6b180:frag:0:0] INFO > o.a.d.e.w.fragment.FragmentExecutor - > 261230f8-2698-15b2-952f-d4ade8d6b180:0:0: State change requested FAILED --> > FINISHED > 2017-10-23 04:14:53,956 [UserServer-1] WARN > o.apache.drill.exec.rpc.RequestIdMap - Failure while attempting to fail rpc > response. > java.lang.IllegalArgumentException: Self-suppression not permitted > at java.lang.Throwable.addSuppressed(Throwable.java:1043) > ~[na:1.7.0_45] > at > org.apache.drill.common.DeferredException.addException(DeferredException.java:88) > ~[drill-common-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] > at > org.apache.drill.common.DeferredException.addThrowable(DeferredException.java:97)
[jira] [Commented] (DRILL-5902) Regression: Queries encounter random failure due to RPC connection timed out
[ https://issues.apache.org/jira/browse/DRILL-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357489#comment-16357489 ] Vlad Rozov commented on DRILL-5902: --- The connection is forcibly terminated by a Drillbit (foreman) due to naive flow control that Drill implements. It uses "idle" timeout of 15 seconds to detect "bad" connections and in case processing is done on the thread that handles connection communication and it takes longer than 15 seconds to process a request, the connection is considered "bad" and is forcibly terminated. In this case, while processing cancellation request, Drill writes to query profile and it takes longer than 15 seconds (especially if there are a lot of profiles already written to profiles directory). To fix the issue, foreman cancellation is processed asynchronously. > Regression: Queries encounter random failure due to RPC connection timed out > > > Key: DRILL-5902 > URL: https://issues.apache.org/jira/browse/DRILL-5902 > Project: Apache Drill > Issue Type: Bug > Components: Execution - RPC >Affects Versions: 1.11.0 >Reporter: Robert Hou >Assignee: Vlad Rozov >Priority: Critical > Attachments: 261230f7-e3b9-0cee-22d8-921cb56e3e12.sys.drill, > node196.drillbit.log > > > Multiple random failures (25) occurred with the latest > Functional-Baseline-88.193 run. Here is a sample query: > {noformat} > /root/drillAutomation/prasadns14/framework/resources/Functional/window_functions/multiple_partitions/q27.sql > -- Kitchen sink > -- Use all supported functions > select > rank() over W, > dense_rank()over W, > percent_rank() over W, > cume_dist() over W, > avg(c_integer + c_integer) over W, > sum(c_integer/100) over W, > count(*)over W, > min(c_integer) over W, > max(c_integer) over W, > row_number()over W > from > j7 > where > c_boolean is not null > window W as (partition by c_bigint, c_date, c_time, c_boolean order by > c_integer) > {noformat} > From the logs: > {noformat} > 2017-10-23 04:14:36,536 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:1 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,537 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:5 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,537 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:9 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,537 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:13 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,537 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:17 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,538 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:21 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,538 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:25 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > {noformat} > {noformat} > 2017-10-23 04:14:53,941 [UserServer-1] INFO > o.a.drill.exec.rpc.user.UserServer - RPC connection /10.10.88.196:31010 <--> > /10.10.88.193:38281 (user server) timed out. Timeout was set to 30 seconds. > Closing connection. > 2017-10-23 04:14:53,952 [UserServer-1] INFO > o.a.d.e.w.fragment.FragmentExecutor - > 261230f8-2698-15b2-952f-d4ade8d6b180:0:0: State change requested RUNNING --> > FAILED > 2017-10-23 04:14:53,952 [261230f8-2698-15b2-952f-d4ade8d6b180:frag:0:0] INFO > o.a.d.e.w.fragment.FragmentExecutor - > 261230f8-2698-15b2-952f-d4ade8d6b180:0:0: State change requested FAILED --> > FINISHED >
[jira] [Commented] (DRILL-5902) Regression: Queries encounter random failure due to RPC connection timed out
[ https://issues.apache.org/jira/browse/DRILL-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357478#comment-16357478 ] ASF GitHub Bot commented on DRILL-5902: --- Github user vrozov commented on the issue: https://github.com/apache/drill/pull/1113 @ilooner DRILL-6143 is not related to DRILL-5902. DRILL-6143 requires a separate RCA. See DRILL-5902 for details. > Regression: Queries encounter random failure due to RPC connection timed out > > > Key: DRILL-5902 > URL: https://issues.apache.org/jira/browse/DRILL-5902 > Project: Apache Drill > Issue Type: Bug > Components: Execution - RPC >Affects Versions: 1.11.0 >Reporter: Robert Hou >Assignee: Vlad Rozov >Priority: Critical > Attachments: 261230f7-e3b9-0cee-22d8-921cb56e3e12.sys.drill, > node196.drillbit.log > > > Multiple random failures (25) occurred with the latest > Functional-Baseline-88.193 run. Here is a sample query: > {noformat} > /root/drillAutomation/prasadns14/framework/resources/Functional/window_functions/multiple_partitions/q27.sql > -- Kitchen sink > -- Use all supported functions > select > rank() over W, > dense_rank()over W, > percent_rank() over W, > cume_dist() over W, > avg(c_integer + c_integer) over W, > sum(c_integer/100) over W, > count(*)over W, > min(c_integer) over W, > max(c_integer) over W, > row_number()over W > from > j7 > where > c_boolean is not null > window W as (partition by c_bigint, c_date, c_time, c_boolean order by > c_integer) > {noformat} > From the logs: > {noformat} > 2017-10-23 04:14:36,536 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:1 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,537 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:5 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,537 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:9 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,537 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:13 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,537 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:17 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,538 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:21 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,538 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:25 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > {noformat} > {noformat} > 2017-10-23 04:14:53,941 [UserServer-1] INFO > o.a.drill.exec.rpc.user.UserServer - RPC connection /10.10.88.196:31010 <--> > /10.10.88.193:38281 (user server) timed out. Timeout was set to 30 seconds. > Closing connection. > 2017-10-23 04:14:53,952 [UserServer-1] INFO > o.a.d.e.w.fragment.FragmentExecutor - > 261230f8-2698-15b2-952f-d4ade8d6b180:0:0: State change requested RUNNING --> > FAILED > 2017-10-23 04:14:53,952 [261230f8-2698-15b2-952f-d4ade8d6b180:frag:0:0] INFO > o.a.d.e.w.fragment.FragmentExecutor - > 261230f8-2698-15b2-952f-d4ade8d6b180:0:0: State change requested FAILED --> > FINISHED > 2017-10-23 04:14:53,956 [UserServer-1] WARN > o.apache.drill.exec.rpc.RequestIdMap - Failure while attempting to fail rpc > response. > java.lang.IllegalArgumentException: Self-suppression not permitted > at java.lang.Throwable.addSuppressed(Throwable.java:1043) > ~[na:1.7.0_45] > at > org.apache.drill.common.DeferredException.addException(DeferredException.java:88) >
[jira] [Commented] (DRILL-5902) Regression: Queries encounter random failure due to RPC connection timed out
[ https://issues.apache.org/jira/browse/DRILL-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357459#comment-16357459 ] ASF GitHub Bot commented on DRILL-5902: --- Github user ilooner commented on the issue: https://github.com/apache/drill/pull/1113 @vrozov Can you provide a brief description of the issue? I have recently filed https://issues.apache.org/jira/browse/DRILL-6143 and I want to verify that these are two separate issues. DRILL-6143 causes a premature timeout when fragments are sent to drillbits in the FragmentsRunner. The issue you fixed here seems to involve a timeout when a query is cancelled. So my initial guess is that these two issues are unrelated. Please let me know if they are related. > Regression: Queries encounter random failure due to RPC connection timed out > > > Key: DRILL-5902 > URL: https://issues.apache.org/jira/browse/DRILL-5902 > Project: Apache Drill > Issue Type: Bug > Components: Execution - RPC >Affects Versions: 1.11.0 >Reporter: Robert Hou >Assignee: Vlad Rozov >Priority: Critical > Attachments: 261230f7-e3b9-0cee-22d8-921cb56e3e12.sys.drill, > node196.drillbit.log > > > Multiple random failures (25) occurred with the latest > Functional-Baseline-88.193 run. Here is a sample query: > {noformat} > /root/drillAutomation/prasadns14/framework/resources/Functional/window_functions/multiple_partitions/q27.sql > -- Kitchen sink > -- Use all supported functions > select > rank() over W, > dense_rank()over W, > percent_rank() over W, > cume_dist() over W, > avg(c_integer + c_integer) over W, > sum(c_integer/100) over W, > count(*)over W, > min(c_integer) over W, > max(c_integer) over W, > row_number()over W > from > j7 > where > c_boolean is not null > window W as (partition by c_bigint, c_date, c_time, c_boolean order by > c_integer) > {noformat} > From the logs: > {noformat} > 2017-10-23 04:14:36,536 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:1 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,537 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:5 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,537 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:9 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,537 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:13 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,537 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:17 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,538 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:21 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,538 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:25 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > {noformat} > {noformat} > 2017-10-23 04:14:53,941 [UserServer-1] INFO > o.a.drill.exec.rpc.user.UserServer - RPC connection /10.10.88.196:31010 <--> > /10.10.88.193:38281 (user server) timed out. Timeout was set to 30 seconds. > Closing connection. > 2017-10-23 04:14:53,952 [UserServer-1] INFO > o.a.d.e.w.fragment.FragmentExecutor - > 261230f8-2698-15b2-952f-d4ade8d6b180:0:0: State change requested RUNNING --> > FAILED > 2017-10-23 04:14:53,952 [261230f8-2698-15b2-952f-d4ade8d6b180:frag:0:0] INFO > o.a.d.e.w.fragment.FragmentExecutor - > 261230f8-2698-15b2-952f-d4ade8d6b180:0:0: State change requested FAILED --> > FINISHED > 2017-10-23 04:14:53,956 [UserServer-1] WARN >
[jira] [Commented] (DRILL-5902) Regression: Queries encounter random failure due to RPC connection timed out
[ https://issues.apache.org/jira/browse/DRILL-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353431#comment-16353431 ] ASF GitHub Bot commented on DRILL-5902: --- Github user vrozov commented on the issue: https://github.com/apache/drill/pull/1113 @arina-ielchiieva Please review > Regression: Queries encounter random failure due to RPC connection timed out > > > Key: DRILL-5902 > URL: https://issues.apache.org/jira/browse/DRILL-5902 > Project: Apache Drill > Issue Type: Bug > Components: Execution - RPC >Affects Versions: 1.11.0 >Reporter: Robert Hou >Assignee: Vlad Rozov >Priority: Critical > Attachments: 261230f7-e3b9-0cee-22d8-921cb56e3e12.sys.drill, > node196.drillbit.log > > > Multiple random failures (25) occurred with the latest > Functional-Baseline-88.193 run. Here is a sample query: > {noformat} > /root/drillAutomation/prasadns14/framework/resources/Functional/window_functions/multiple_partitions/q27.sql > -- Kitchen sink > -- Use all supported functions > select > rank() over W, > dense_rank()over W, > percent_rank() over W, > cume_dist() over W, > avg(c_integer + c_integer) over W, > sum(c_integer/100) over W, > count(*)over W, > min(c_integer) over W, > max(c_integer) over W, > row_number()over W > from > j7 > where > c_boolean is not null > window W as (partition by c_bigint, c_date, c_time, c_boolean order by > c_integer) > {noformat} > From the logs: > {noformat} > 2017-10-23 04:14:36,536 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:1 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,537 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:5 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,537 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:9 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,537 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:13 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,537 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:17 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,538 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:21 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,538 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:25 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > {noformat} > {noformat} > 2017-10-23 04:14:53,941 [UserServer-1] INFO > o.a.drill.exec.rpc.user.UserServer - RPC connection /10.10.88.196:31010 <--> > /10.10.88.193:38281 (user server) timed out. Timeout was set to 30 seconds. > Closing connection. > 2017-10-23 04:14:53,952 [UserServer-1] INFO > o.a.d.e.w.fragment.FragmentExecutor - > 261230f8-2698-15b2-952f-d4ade8d6b180:0:0: State change requested RUNNING --> > FAILED > 2017-10-23 04:14:53,952 [261230f8-2698-15b2-952f-d4ade8d6b180:frag:0:0] INFO > o.a.d.e.w.fragment.FragmentExecutor - > 261230f8-2698-15b2-952f-d4ade8d6b180:0:0: State change requested FAILED --> > FINISHED > 2017-10-23 04:14:53,956 [UserServer-1] WARN > o.apache.drill.exec.rpc.RequestIdMap - Failure while attempting to fail rpc > response. > java.lang.IllegalArgumentException: Self-suppression not permitted > at java.lang.Throwable.addSuppressed(Throwable.java:1043) > ~[na:1.7.0_45] > at > org.apache.drill.common.DeferredException.addException(DeferredException.java:88) > ~[drill-common-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] > at >
[jira] [Commented] (DRILL-5902) Regression: Queries encounter random failure due to RPC connection timed out
[ https://issues.apache.org/jira/browse/DRILL-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353428#comment-16353428 ] ASF GitHub Bot commented on DRILL-5902: --- GitHub user vrozov opened a pull request: https://github.com/apache/drill/pull/1113 DRILL-5902: Regression: Queries encounter random failure due to RPC connection timed out You can merge this pull request into a Git repository by running: $ git pull https://github.com/vrozov/drill DRILL-5902 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/1113.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1113 commit fe329c2517710bb9fdec273de24321717fc954e6 Author: Vlad RozovDate: 2018-02-06T03:15:56Z DRILL-5902: Regression: Queries encounter random failure due to RPC connection timed out > Regression: Queries encounter random failure due to RPC connection timed out > > > Key: DRILL-5902 > URL: https://issues.apache.org/jira/browse/DRILL-5902 > Project: Apache Drill > Issue Type: Bug > Components: Execution - RPC >Affects Versions: 1.11.0 >Reporter: Robert Hou >Assignee: Vlad Rozov >Priority: Critical > Attachments: 261230f7-e3b9-0cee-22d8-921cb56e3e12.sys.drill, > node196.drillbit.log > > > Multiple random failures (25) occurred with the latest > Functional-Baseline-88.193 run. Here is a sample query: > {noformat} > /root/drillAutomation/prasadns14/framework/resources/Functional/window_functions/multiple_partitions/q27.sql > -- Kitchen sink > -- Use all supported functions > select > rank() over W, > dense_rank()over W, > percent_rank() over W, > cume_dist() over W, > avg(c_integer + c_integer) over W, > sum(c_integer/100) over W, > count(*)over W, > min(c_integer) over W, > max(c_integer) over W, > row_number()over W > from > j7 > where > c_boolean is not null > window W as (partition by c_bigint, c_date, c_time, c_boolean order by > c_integer) > {noformat} > From the logs: > {noformat} > 2017-10-23 04:14:36,536 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:1 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,537 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:5 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,537 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:9 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,537 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:13 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,537 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:17 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,538 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:21 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,538 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:25 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > {noformat} > {noformat} > 2017-10-23 04:14:53,941 [UserServer-1] INFO > o.a.drill.exec.rpc.user.UserServer - RPC connection /10.10.88.196:31010 <--> > /10.10.88.193:38281 (user server) timed out. Timeout was set to 30 seconds. > Closing connection. > 2017-10-23 04:14:53,952 [UserServer-1] INFO > o.a.d.e.w.fragment.FragmentExecutor - > 261230f8-2698-15b2-952f-d4ade8d6b180:0:0: State change requested RUNNING --> > FAILED > 2017-10-23 04:14:53,952
[jira] [Commented] (DRILL-5902) Regression: Queries encounter random failure due to RPC connection timed out
[ https://issues.apache.org/jira/browse/DRILL-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16216354#comment-16216354 ] Robert Hou commented on DRILL-5902: --- The previous run using commit: {noformat} 1.12.0-SNAPSHOT f1d1945b3772bb782039fd6811e34a7de66441c8DRILL-5582: C++ Client: [Threat Modeling] Drillbit may be spoofed by an attacker and this may lead to data being written to the attacker's target instead of Drillbit 19.10.2017 @ 17:13:05 PDT Unknown 19.10.2017 @ 18:37:19 PDT {noformat} was clean. This does not mean that one of the later commits caused the problem because these are random failures. > Regression: Queries encounter random failure due to RPC connection timed out > > > Key: DRILL-5902 > URL: https://issues.apache.org/jira/browse/DRILL-5902 > Project: Apache Drill > Issue Type: Bug > Components: Execution - RPC >Affects Versions: 1.11.0 >Reporter: Robert Hou >Priority: Critical > Attachments: 261230f7-e3b9-0cee-22d8-921cb56e3e12.sys.drill, > node196.drillbit.log > > > Multiple random failures (25) occurred with the latest > Functional-Baseline-88.193 run. Here is a sample query: > {noformat} > -- Kitchen sink > -- Use all supported functions > select > rank() over W, > dense_rank()over W, > percent_rank() over W, > cume_dist() over W, > avg(c_integer + c_integer) over W, > sum(c_integer/100) over W, > count(*)over W, > min(c_integer) over W, > max(c_integer) over W, > row_number()over W > from > j7 > where > c_boolean is not null > window W as (partition by c_bigint, c_date, c_time, c_boolean order by > c_integer) > {noformat} > From the logs: > {noformat} > 2017-10-23 04:14:36,536 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:1 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,537 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:5 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,537 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:9 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,537 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:13 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,537 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:17 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,538 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:21 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,538 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:25 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > {noformat} > {noformat} > 2017-10-23 04:14:53,941 [UserServer-1] INFO > o.a.drill.exec.rpc.user.UserServer - RPC connection /10.10.88.196:31010 <--> > /10.10.88.193:38281 (user server) timed out. Timeout was set to 30 seconds. > Closing connection. > 2017-10-23 04:14:53,952 [UserServer-1] INFO > o.a.d.e.w.fragment.FragmentExecutor - > 261230f8-2698-15b2-952f-d4ade8d6b180:0:0: State change requested RUNNING --> > FAILED > 2017-10-23 04:14:53,952 [261230f8-2698-15b2-952f-d4ade8d6b180:frag:0:0] INFO > o.a.d.e.w.fragment.FragmentExecutor - > 261230f8-2698-15b2-952f-d4ade8d6b180:0:0: State change requested FAILED --> > FINISHED > 2017-10-23 04:14:53,956 [UserServer-1] WARN > o.apache.drill.exec.rpc.RequestIdMap - Failure while attempting to fail rpc > response. > java.lang.IllegalArgumentException: Self-suppression not permitted > at java.lang.Throwable.addSuppressed(Throwable.java:1043) > ~[na:1.7.0_45] > at >