[jira] [Commented] (DRILL-5902) Regression: Queries encounter random failure due to RPC connection timed out

2018-02-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16368399#comment-16368399
 ] 

ASF GitHub Bot commented on DRILL-5902:
---

Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/1113#discussion_r168935308
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/QueryStateProcessor.java
 ---
@@ -125,20 +125,17 @@ public void cancel() {
   case PREPARING:
   case PLANNING:
   case ENQUEUED:
-moveToState(QueryState.CANCELLATION_REQUESTED, null);
-return;
-
   case STARTING:
   case RUNNING:
-addToEventQueue(QueryState.CANCELLATION_REQUESTED, null);
-return;
+moveToState(QueryState.CANCELLATION_REQUESTED, null);
--- End diff --

@arina-ielchiieva Please update JIRA title.


> Regression: Queries encounter random failure due to RPC connection timed out
> 
>
> Key: DRILL-5902
> URL: https://issues.apache.org/jira/browse/DRILL-5902
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - RPC
>Affects Versions: 1.11.0
>Reporter: Robert Hou
>Assignee: Vlad Rozov
>Priority: Critical
> Attachments: 261230f7-e3b9-0cee-22d8-921cb56e3e12.sys.drill, 
> node196.drillbit.log
>
>
> Multiple random failures (25) occurred with the latest 
> Functional-Baseline-88.193 run.  Here is a sample query:
> {noformat}
> /root/drillAutomation/prasadns14/framework/resources/Functional/window_functions/multiple_partitions/q27.sql
> -- Kitchen sink
> -- Use all supported functions
> select
> rank()  over W,
> dense_rank()over W,
> percent_rank()  over W,
> cume_dist() over W,
> avg(c_integer + c_integer)  over W,
> sum(c_integer/100)  over W,
> count(*)over W,
> min(c_integer)  over W,
> max(c_integer)  over W,
> row_number()over W
> from
> j7
> where
> c_boolean is not null
> window  W as (partition by c_bigint, c_date, c_time, c_boolean order by 
> c_integer)
> {noformat}
> From the logs:
> {noformat}
> 2017-10-23 04:14:36,536 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:1 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:5 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:9 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:13 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:17 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,538 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:21 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,538 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:25 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> {noformat}
> {noformat}
> 2017-10-23 04:14:53,941 [UserServer-1] INFO  
> o.a.drill.exec.rpc.user.UserServer - RPC connection /10.10.88.196:31010 <--> 
> /10.10.88.193:38281 (user server) timed out.  Timeout was set to 30 seconds. 
> Closing connection.
> 2017-10-23 04:14:53,952 [UserServer-1] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 261230f8-2698-15b2-952f-d4ade8d6b180:0:0: State change requested RUNNING --> 
> FAILED
> 2017-10-23 04:14:53,952 [261230f8-2698-15b2-952f-d4ade8d6b180:frag:0:0] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 

[jira] [Commented] (DRILL-5902) Regression: Queries encounter random failure due to RPC connection timed out

2018-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366729#comment-16366729
 ] 

ASF GitHub Bot commented on DRILL-5902:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1113#discussion_r168699838
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/QueryStateProcessor.java
 ---
@@ -125,20 +125,17 @@ public void cancel() {
   case PREPARING:
   case PLANNING:
   case ENQUEUED:
-moveToState(QueryState.CANCELLATION_REQUESTED, null);
-return;
-
   case STARTING:
   case RUNNING:
-addToEventQueue(QueryState.CANCELLATION_REQUESTED, null);
-return;
+moveToState(QueryState.CANCELLATION_REQUESTED, null);
--- End diff --

1. Your point makes sense. In this case could you please update java doc 
for the `cancel` method to be consistent with new changes?
2. Maybe we should remove word `regression` from the commit message to 
avoid confusion?


> Regression: Queries encounter random failure due to RPC connection timed out
> 
>
> Key: DRILL-5902
> URL: https://issues.apache.org/jira/browse/DRILL-5902
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - RPC
>Affects Versions: 1.11.0
>Reporter: Robert Hou
>Assignee: Vlad Rozov
>Priority: Critical
> Attachments: 261230f7-e3b9-0cee-22d8-921cb56e3e12.sys.drill, 
> node196.drillbit.log
>
>
> Multiple random failures (25) occurred with the latest 
> Functional-Baseline-88.193 run.  Here is a sample query:
> {noformat}
> /root/drillAutomation/prasadns14/framework/resources/Functional/window_functions/multiple_partitions/q27.sql
> -- Kitchen sink
> -- Use all supported functions
> select
> rank()  over W,
> dense_rank()over W,
> percent_rank()  over W,
> cume_dist() over W,
> avg(c_integer + c_integer)  over W,
> sum(c_integer/100)  over W,
> count(*)over W,
> min(c_integer)  over W,
> max(c_integer)  over W,
> row_number()over W
> from
> j7
> where
> c_boolean is not null
> window  W as (partition by c_bigint, c_date, c_time, c_boolean order by 
> c_integer)
> {noformat}
> From the logs:
> {noformat}
> 2017-10-23 04:14:36,536 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:1 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:5 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:9 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:13 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:17 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,538 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:21 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,538 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:25 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> {noformat}
> {noformat}
> 2017-10-23 04:14:53,941 [UserServer-1] INFO  
> o.a.drill.exec.rpc.user.UserServer - RPC connection /10.10.88.196:31010 <--> 
> /10.10.88.193:38281 (user server) timed out.  Timeout was set to 30 seconds. 
> Closing connection.
> 2017-10-23 04:14:53,952 [UserServer-1] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 

[jira] [Commented] (DRILL-5902) Regression: Queries encounter random failure due to RPC connection timed out

2018-02-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366219#comment-16366219
 ] 

ASF GitHub Bot commented on DRILL-5902:
---

Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/1113#discussion_r168599100
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/QueryStateProcessor.java
 ---
@@ -125,20 +125,17 @@ public void cancel() {
   case PREPARING:
   case PLANNING:
   case ENQUEUED:
-moveToState(QueryState.CANCELLATION_REQUESTED, null);
-return;
-
   case STARTING:
   case RUNNING:
-addToEventQueue(QueryState.CANCELLATION_REQUESTED, null);
-return;
+moveToState(QueryState.CANCELLATION_REQUESTED, null);
--- End diff --

1. Per my understanding, only during a transition from `STARTING` to 
`RUNNING` it is necessary to delay the processing of `CANCELLATION_REQUESTED` 
till requests are submitted to remote nodes for execution. Once the state is 
transitioned to `RUNNING`, remote drillbits are ready to start processing 
cancellation request, so no delay is necessary for the `RUNNING` state. In case 
of `STARTING` there is already a call to `addToEventQueue()` inside 
`QueryStateProcessor.starting()` that ensures that cancellation will be 
processed after the transition.
1. I can't say why JIRA title states that it is a regression, as far as I 
can tell, any delay in saving query profile may cause the issue. One 
possibility is that the issue is amplified by 
[DRILL-6053](https://issues.apache.org/jira/browse/DRILL-6053) that is a 
regression caused by the fix for 
[DRILL-4963](https://issues.apache.org/jira/browse/DRILL-4963) that introduces 
even longer delay (due to synchronization).


> Regression: Queries encounter random failure due to RPC connection timed out
> 
>
> Key: DRILL-5902
> URL: https://issues.apache.org/jira/browse/DRILL-5902
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - RPC
>Affects Versions: 1.11.0
>Reporter: Robert Hou
>Assignee: Vlad Rozov
>Priority: Critical
> Attachments: 261230f7-e3b9-0cee-22d8-921cb56e3e12.sys.drill, 
> node196.drillbit.log
>
>
> Multiple random failures (25) occurred with the latest 
> Functional-Baseline-88.193 run.  Here is a sample query:
> {noformat}
> /root/drillAutomation/prasadns14/framework/resources/Functional/window_functions/multiple_partitions/q27.sql
> -- Kitchen sink
> -- Use all supported functions
> select
> rank()  over W,
> dense_rank()over W,
> percent_rank()  over W,
> cume_dist() over W,
> avg(c_integer + c_integer)  over W,
> sum(c_integer/100)  over W,
> count(*)over W,
> min(c_integer)  over W,
> max(c_integer)  over W,
> row_number()over W
> from
> j7
> where
> c_boolean is not null
> window  W as (partition by c_bigint, c_date, c_time, c_boolean order by 
> c_integer)
> {noformat}
> From the logs:
> {noformat}
> 2017-10-23 04:14:36,536 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:1 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:5 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:9 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:13 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:17 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,538 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 

[jira] [Commented] (DRILL-5902) Regression: Queries encounter random failure due to RPC connection timed out

2018-02-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365278#comment-16365278
 ] 

ASF GitHub Bot commented on DRILL-5902:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1113#discussion_r168415470
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/QueryStateProcessor.java
 ---
@@ -125,20 +125,17 @@ public void cancel() {
   case PREPARING:
   case PLANNING:
   case ENQUEUED:
-moveToState(QueryState.CANCELLATION_REQUESTED, null);
-return;
-
   case STARTING:
   case RUNNING:
-addToEventQueue(QueryState.CANCELLATION_REQUESTED, null);
-return;
+moveToState(QueryState.CANCELLATION_REQUESTED, null);
--- End diff --

1. Could you please explain why `addToEventQueue` was changed to 
`moveToState`. Per my understanding `addToEventQueue` is used to ensure that 
cancellation will be requested only when all fragments are sent out to avoid 
hanging in cancellation requested state. For preparing, planning and enqueued 
states we cancel immediately since these states are done locally.
2. Why this Jira title states it's a regression? Do we know what has caused 
the regression?


> Regression: Queries encounter random failure due to RPC connection timed out
> 
>
> Key: DRILL-5902
> URL: https://issues.apache.org/jira/browse/DRILL-5902
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - RPC
>Affects Versions: 1.11.0
>Reporter: Robert Hou
>Assignee: Vlad Rozov
>Priority: Critical
> Attachments: 261230f7-e3b9-0cee-22d8-921cb56e3e12.sys.drill, 
> node196.drillbit.log
>
>
> Multiple random failures (25) occurred with the latest 
> Functional-Baseline-88.193 run.  Here is a sample query:
> {noformat}
> /root/drillAutomation/prasadns14/framework/resources/Functional/window_functions/multiple_partitions/q27.sql
> -- Kitchen sink
> -- Use all supported functions
> select
> rank()  over W,
> dense_rank()over W,
> percent_rank()  over W,
> cume_dist() over W,
> avg(c_integer + c_integer)  over W,
> sum(c_integer/100)  over W,
> count(*)over W,
> min(c_integer)  over W,
> max(c_integer)  over W,
> row_number()over W
> from
> j7
> where
> c_boolean is not null
> window  W as (partition by c_bigint, c_date, c_time, c_boolean order by 
> c_integer)
> {noformat}
> From the logs:
> {noformat}
> 2017-10-23 04:14:36,536 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:1 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:5 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:9 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:13 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:17 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,538 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:21 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,538 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:25 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> {noformat}
> {noformat}
> 2017-10-23 04:14:53,941 [UserServer-1] INFO  
> o.a.drill.exec.rpc.user.UserServer - RPC connection 

[jira] [Commented] (DRILL-5902) Regression: Queries encounter random failure due to RPC connection timed out

2018-02-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357524#comment-16357524
 ] 

ASF GitHub Bot commented on DRILL-5902:
---

Github user ilooner commented on the issue:

https://github.com/apache/drill/pull/1113
  
K thx @vrozov 


> Regression: Queries encounter random failure due to RPC connection timed out
> 
>
> Key: DRILL-5902
> URL: https://issues.apache.org/jira/browse/DRILL-5902
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - RPC
>Affects Versions: 1.11.0
>Reporter: Robert Hou
>Assignee: Vlad Rozov
>Priority: Critical
> Attachments: 261230f7-e3b9-0cee-22d8-921cb56e3e12.sys.drill, 
> node196.drillbit.log
>
>
> Multiple random failures (25) occurred with the latest 
> Functional-Baseline-88.193 run.  Here is a sample query:
> {noformat}
> /root/drillAutomation/prasadns14/framework/resources/Functional/window_functions/multiple_partitions/q27.sql
> -- Kitchen sink
> -- Use all supported functions
> select
> rank()  over W,
> dense_rank()over W,
> percent_rank()  over W,
> cume_dist() over W,
> avg(c_integer + c_integer)  over W,
> sum(c_integer/100)  over W,
> count(*)over W,
> min(c_integer)  over W,
> max(c_integer)  over W,
> row_number()over W
> from
> j7
> where
> c_boolean is not null
> window  W as (partition by c_bigint, c_date, c_time, c_boolean order by 
> c_integer)
> {noformat}
> From the logs:
> {noformat}
> 2017-10-23 04:14:36,536 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:1 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:5 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:9 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:13 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:17 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,538 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:21 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,538 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:25 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> {noformat}
> {noformat}
> 2017-10-23 04:14:53,941 [UserServer-1] INFO  
> o.a.drill.exec.rpc.user.UserServer - RPC connection /10.10.88.196:31010 <--> 
> /10.10.88.193:38281 (user server) timed out.  Timeout was set to 30 seconds. 
> Closing connection.
> 2017-10-23 04:14:53,952 [UserServer-1] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 261230f8-2698-15b2-952f-d4ade8d6b180:0:0: State change requested RUNNING --> 
> FAILED
> 2017-10-23 04:14:53,952 [261230f8-2698-15b2-952f-d4ade8d6b180:frag:0:0] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 261230f8-2698-15b2-952f-d4ade8d6b180:0:0: State change requested FAILED --> 
> FINISHED
> 2017-10-23 04:14:53,956 [UserServer-1] WARN  
> o.apache.drill.exec.rpc.RequestIdMap - Failure while attempting to fail rpc 
> response.
> java.lang.IllegalArgumentException: Self-suppression not permitted
> at java.lang.Throwable.addSuppressed(Throwable.java:1043) 
> ~[na:1.7.0_45]
> at 
> org.apache.drill.common.DeferredException.addException(DeferredException.java:88)
>  ~[drill-common-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
> at 
> org.apache.drill.common.DeferredException.addThrowable(DeferredException.java:97)

[jira] [Commented] (DRILL-5902) Regression: Queries encounter random failure due to RPC connection timed out

2018-02-08 Thread Vlad Rozov (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357489#comment-16357489
 ] 

Vlad Rozov commented on DRILL-5902:
---

The connection is forcibly terminated by a Drillbit (foreman) due to naive flow 
control that Drill implements. It uses "idle" timeout of 15 seconds to detect 
"bad" connections and in case processing is done on the thread that handles 
connection communication and it takes longer than 15 seconds to process a 
request, the connection is considered "bad" and is forcibly terminated. In this 
case, while processing cancellation request, Drill writes to query profile and 
it takes longer than 15 seconds (especially if there are a lot of profiles 
already written to profiles directory). To fix the issue, foreman cancellation 
is processed asynchronously. 

> Regression: Queries encounter random failure due to RPC connection timed out
> 
>
> Key: DRILL-5902
> URL: https://issues.apache.org/jira/browse/DRILL-5902
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - RPC
>Affects Versions: 1.11.0
>Reporter: Robert Hou
>Assignee: Vlad Rozov
>Priority: Critical
> Attachments: 261230f7-e3b9-0cee-22d8-921cb56e3e12.sys.drill, 
> node196.drillbit.log
>
>
> Multiple random failures (25) occurred with the latest 
> Functional-Baseline-88.193 run.  Here is a sample query:
> {noformat}
> /root/drillAutomation/prasadns14/framework/resources/Functional/window_functions/multiple_partitions/q27.sql
> -- Kitchen sink
> -- Use all supported functions
> select
> rank()  over W,
> dense_rank()over W,
> percent_rank()  over W,
> cume_dist() over W,
> avg(c_integer + c_integer)  over W,
> sum(c_integer/100)  over W,
> count(*)over W,
> min(c_integer)  over W,
> max(c_integer)  over W,
> row_number()over W
> from
> j7
> where
> c_boolean is not null
> window  W as (partition by c_bigint, c_date, c_time, c_boolean order by 
> c_integer)
> {noformat}
> From the logs:
> {noformat}
> 2017-10-23 04:14:36,536 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:1 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:5 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:9 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:13 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:17 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,538 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:21 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,538 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:25 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> {noformat}
> {noformat}
> 2017-10-23 04:14:53,941 [UserServer-1] INFO  
> o.a.drill.exec.rpc.user.UserServer - RPC connection /10.10.88.196:31010 <--> 
> /10.10.88.193:38281 (user server) timed out.  Timeout was set to 30 seconds. 
> Closing connection.
> 2017-10-23 04:14:53,952 [UserServer-1] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 261230f8-2698-15b2-952f-d4ade8d6b180:0:0: State change requested RUNNING --> 
> FAILED
> 2017-10-23 04:14:53,952 [261230f8-2698-15b2-952f-d4ade8d6b180:frag:0:0] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 261230f8-2698-15b2-952f-d4ade8d6b180:0:0: State change requested FAILED --> 
> FINISHED
> 

[jira] [Commented] (DRILL-5902) Regression: Queries encounter random failure due to RPC connection timed out

2018-02-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357478#comment-16357478
 ] 

ASF GitHub Bot commented on DRILL-5902:
---

Github user vrozov commented on the issue:

https://github.com/apache/drill/pull/1113
  
@ilooner DRILL-6143 is not related to DRILL-5902. DRILL-6143 requires a 
separate RCA. See DRILL-5902 for details.


> Regression: Queries encounter random failure due to RPC connection timed out
> 
>
> Key: DRILL-5902
> URL: https://issues.apache.org/jira/browse/DRILL-5902
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - RPC
>Affects Versions: 1.11.0
>Reporter: Robert Hou
>Assignee: Vlad Rozov
>Priority: Critical
> Attachments: 261230f7-e3b9-0cee-22d8-921cb56e3e12.sys.drill, 
> node196.drillbit.log
>
>
> Multiple random failures (25) occurred with the latest 
> Functional-Baseline-88.193 run.  Here is a sample query:
> {noformat}
> /root/drillAutomation/prasadns14/framework/resources/Functional/window_functions/multiple_partitions/q27.sql
> -- Kitchen sink
> -- Use all supported functions
> select
> rank()  over W,
> dense_rank()over W,
> percent_rank()  over W,
> cume_dist() over W,
> avg(c_integer + c_integer)  over W,
> sum(c_integer/100)  over W,
> count(*)over W,
> min(c_integer)  over W,
> max(c_integer)  over W,
> row_number()over W
> from
> j7
> where
> c_boolean is not null
> window  W as (partition by c_bigint, c_date, c_time, c_boolean order by 
> c_integer)
> {noformat}
> From the logs:
> {noformat}
> 2017-10-23 04:14:36,536 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:1 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:5 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:9 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:13 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:17 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,538 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:21 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,538 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:25 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> {noformat}
> {noformat}
> 2017-10-23 04:14:53,941 [UserServer-1] INFO  
> o.a.drill.exec.rpc.user.UserServer - RPC connection /10.10.88.196:31010 <--> 
> /10.10.88.193:38281 (user server) timed out.  Timeout was set to 30 seconds. 
> Closing connection.
> 2017-10-23 04:14:53,952 [UserServer-1] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 261230f8-2698-15b2-952f-d4ade8d6b180:0:0: State change requested RUNNING --> 
> FAILED
> 2017-10-23 04:14:53,952 [261230f8-2698-15b2-952f-d4ade8d6b180:frag:0:0] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 261230f8-2698-15b2-952f-d4ade8d6b180:0:0: State change requested FAILED --> 
> FINISHED
> 2017-10-23 04:14:53,956 [UserServer-1] WARN  
> o.apache.drill.exec.rpc.RequestIdMap - Failure while attempting to fail rpc 
> response.
> java.lang.IllegalArgumentException: Self-suppression not permitted
> at java.lang.Throwable.addSuppressed(Throwable.java:1043) 
> ~[na:1.7.0_45]
> at 
> org.apache.drill.common.DeferredException.addException(DeferredException.java:88)
>  

[jira] [Commented] (DRILL-5902) Regression: Queries encounter random failure due to RPC connection timed out

2018-02-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357459#comment-16357459
 ] 

ASF GitHub Bot commented on DRILL-5902:
---

Github user ilooner commented on the issue:

https://github.com/apache/drill/pull/1113
  
@vrozov Can you provide a brief description of the issue? I have recently 
filed https://issues.apache.org/jira/browse/DRILL-6143 and I want to verify 
that these are two separate issues. 

DRILL-6143 causes a premature timeout when fragments are sent to drillbits 
in the FragmentsRunner. The issue you fixed here seems to involve a timeout 
when a query is cancelled. So my initial guess is that these two issues are 
unrelated. Please let me know if they are related.


> Regression: Queries encounter random failure due to RPC connection timed out
> 
>
> Key: DRILL-5902
> URL: https://issues.apache.org/jira/browse/DRILL-5902
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - RPC
>Affects Versions: 1.11.0
>Reporter: Robert Hou
>Assignee: Vlad Rozov
>Priority: Critical
> Attachments: 261230f7-e3b9-0cee-22d8-921cb56e3e12.sys.drill, 
> node196.drillbit.log
>
>
> Multiple random failures (25) occurred with the latest 
> Functional-Baseline-88.193 run.  Here is a sample query:
> {noformat}
> /root/drillAutomation/prasadns14/framework/resources/Functional/window_functions/multiple_partitions/q27.sql
> -- Kitchen sink
> -- Use all supported functions
> select
> rank()  over W,
> dense_rank()over W,
> percent_rank()  over W,
> cume_dist() over W,
> avg(c_integer + c_integer)  over W,
> sum(c_integer/100)  over W,
> count(*)over W,
> min(c_integer)  over W,
> max(c_integer)  over W,
> row_number()over W
> from
> j7
> where
> c_boolean is not null
> window  W as (partition by c_bigint, c_date, c_time, c_boolean order by 
> c_integer)
> {noformat}
> From the logs:
> {noformat}
> 2017-10-23 04:14:36,536 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:1 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:5 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:9 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:13 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:17 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,538 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:21 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,538 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:25 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> {noformat}
> {noformat}
> 2017-10-23 04:14:53,941 [UserServer-1] INFO  
> o.a.drill.exec.rpc.user.UserServer - RPC connection /10.10.88.196:31010 <--> 
> /10.10.88.193:38281 (user server) timed out.  Timeout was set to 30 seconds. 
> Closing connection.
> 2017-10-23 04:14:53,952 [UserServer-1] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 261230f8-2698-15b2-952f-d4ade8d6b180:0:0: State change requested RUNNING --> 
> FAILED
> 2017-10-23 04:14:53,952 [261230f8-2698-15b2-952f-d4ade8d6b180:frag:0:0] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 261230f8-2698-15b2-952f-d4ade8d6b180:0:0: State change requested FAILED --> 
> FINISHED
> 2017-10-23 04:14:53,956 [UserServer-1] WARN  
> 

[jira] [Commented] (DRILL-5902) Regression: Queries encounter random failure due to RPC connection timed out

2018-02-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353431#comment-16353431
 ] 

ASF GitHub Bot commented on DRILL-5902:
---

Github user vrozov commented on the issue:

https://github.com/apache/drill/pull/1113
  
@arina-ielchiieva Please review


> Regression: Queries encounter random failure due to RPC connection timed out
> 
>
> Key: DRILL-5902
> URL: https://issues.apache.org/jira/browse/DRILL-5902
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - RPC
>Affects Versions: 1.11.0
>Reporter: Robert Hou
>Assignee: Vlad Rozov
>Priority: Critical
> Attachments: 261230f7-e3b9-0cee-22d8-921cb56e3e12.sys.drill, 
> node196.drillbit.log
>
>
> Multiple random failures (25) occurred with the latest 
> Functional-Baseline-88.193 run.  Here is a sample query:
> {noformat}
> /root/drillAutomation/prasadns14/framework/resources/Functional/window_functions/multiple_partitions/q27.sql
> -- Kitchen sink
> -- Use all supported functions
> select
> rank()  over W,
> dense_rank()over W,
> percent_rank()  over W,
> cume_dist() over W,
> avg(c_integer + c_integer)  over W,
> sum(c_integer/100)  over W,
> count(*)over W,
> min(c_integer)  over W,
> max(c_integer)  over W,
> row_number()over W
> from
> j7
> where
> c_boolean is not null
> window  W as (partition by c_bigint, c_date, c_time, c_boolean order by 
> c_integer)
> {noformat}
> From the logs:
> {noformat}
> 2017-10-23 04:14:36,536 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:1 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:5 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:9 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:13 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:17 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,538 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:21 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,538 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:25 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> {noformat}
> {noformat}
> 2017-10-23 04:14:53,941 [UserServer-1] INFO  
> o.a.drill.exec.rpc.user.UserServer - RPC connection /10.10.88.196:31010 <--> 
> /10.10.88.193:38281 (user server) timed out.  Timeout was set to 30 seconds. 
> Closing connection.
> 2017-10-23 04:14:53,952 [UserServer-1] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 261230f8-2698-15b2-952f-d4ade8d6b180:0:0: State change requested RUNNING --> 
> FAILED
> 2017-10-23 04:14:53,952 [261230f8-2698-15b2-952f-d4ade8d6b180:frag:0:0] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 261230f8-2698-15b2-952f-d4ade8d6b180:0:0: State change requested FAILED --> 
> FINISHED
> 2017-10-23 04:14:53,956 [UserServer-1] WARN  
> o.apache.drill.exec.rpc.RequestIdMap - Failure while attempting to fail rpc 
> response.
> java.lang.IllegalArgumentException: Self-suppression not permitted
> at java.lang.Throwable.addSuppressed(Throwable.java:1043) 
> ~[na:1.7.0_45]
> at 
> org.apache.drill.common.DeferredException.addException(DeferredException.java:88)
>  ~[drill-common-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
> at 
> 

[jira] [Commented] (DRILL-5902) Regression: Queries encounter random failure due to RPC connection timed out

2018-02-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353428#comment-16353428
 ] 

ASF GitHub Bot commented on DRILL-5902:
---

GitHub user vrozov opened a pull request:

https://github.com/apache/drill/pull/1113

DRILL-5902: Regression: Queries encounter random failure due to RPC 
connection timed out



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vrozov/drill DRILL-5902

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1113.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1113


commit fe329c2517710bb9fdec273de24321717fc954e6
Author: Vlad Rozov 
Date:   2018-02-06T03:15:56Z

DRILL-5902: Regression: Queries encounter random failure due to RPC 
connection timed out




> Regression: Queries encounter random failure due to RPC connection timed out
> 
>
> Key: DRILL-5902
> URL: https://issues.apache.org/jira/browse/DRILL-5902
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - RPC
>Affects Versions: 1.11.0
>Reporter: Robert Hou
>Assignee: Vlad Rozov
>Priority: Critical
> Attachments: 261230f7-e3b9-0cee-22d8-921cb56e3e12.sys.drill, 
> node196.drillbit.log
>
>
> Multiple random failures (25) occurred with the latest 
> Functional-Baseline-88.193 run.  Here is a sample query:
> {noformat}
> /root/drillAutomation/prasadns14/framework/resources/Functional/window_functions/multiple_partitions/q27.sql
> -- Kitchen sink
> -- Use all supported functions
> select
> rank()  over W,
> dense_rank()over W,
> percent_rank()  over W,
> cume_dist() over W,
> avg(c_integer + c_integer)  over W,
> sum(c_integer/100)  over W,
> count(*)over W,
> min(c_integer)  over W,
> max(c_integer)  over W,
> row_number()over W
> from
> j7
> where
> c_boolean is not null
> window  W as (partition by c_bigint, c_date, c_time, c_boolean order by 
> c_integer)
> {noformat}
> From the logs:
> {noformat}
> 2017-10-23 04:14:36,536 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:1 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:5 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:9 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:13 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:17 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,538 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:21 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,538 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:25 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> {noformat}
> {noformat}
> 2017-10-23 04:14:53,941 [UserServer-1] INFO  
> o.a.drill.exec.rpc.user.UserServer - RPC connection /10.10.88.196:31010 <--> 
> /10.10.88.193:38281 (user server) timed out.  Timeout was set to 30 seconds. 
> Closing connection.
> 2017-10-23 04:14:53,952 [UserServer-1] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 261230f8-2698-15b2-952f-d4ade8d6b180:0:0: State change requested RUNNING --> 
> FAILED
> 2017-10-23 04:14:53,952 

[jira] [Commented] (DRILL-5902) Regression: Queries encounter random failure due to RPC connection timed out

2017-10-23 Thread Robert Hou (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16216354#comment-16216354
 ] 

Robert Hou commented on DRILL-5902:
---

The previous run using commit:
{noformat}
1.12.0-SNAPSHOT f1d1945b3772bb782039fd6811e34a7de66441c8DRILL-5582: C++ 
Client: [Threat Modeling] Drillbit may be spoofed by an attacker and this may 
lead to data being written to the attacker's target instead of Drillbit   
19.10.2017 @ 17:13:05 PDT   Unknown 19.10.2017 @ 18:37:19 PDT
{noformat}
was clean.  This does not mean that one of the later commits caused the problem 
because these are random failures.

> Regression: Queries encounter random failure due to RPC connection timed out
> 
>
> Key: DRILL-5902
> URL: https://issues.apache.org/jira/browse/DRILL-5902
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - RPC
>Affects Versions: 1.11.0
>Reporter: Robert Hou
>Priority: Critical
> Attachments: 261230f7-e3b9-0cee-22d8-921cb56e3e12.sys.drill, 
> node196.drillbit.log
>
>
> Multiple random failures (25) occurred with the latest 
> Functional-Baseline-88.193 run.  Here is a sample query:
> {noformat}
> -- Kitchen sink
> -- Use all supported functions
> select
> rank()  over W,
> dense_rank()over W,
> percent_rank()  over W,
> cume_dist() over W,
> avg(c_integer + c_integer)  over W,
> sum(c_integer/100)  over W,
> count(*)over W,
> min(c_integer)  over W,
> max(c_integer)  over W,
> row_number()over W
> from
> j7
> where
> c_boolean is not null
> window  W as (partition by c_bigint, c_date, c_time, c_boolean order by 
> c_integer)
> {noformat}
> From the logs:
> {noformat}
> 2017-10-23 04:14:36,536 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:1 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:5 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:9 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:13 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:17 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,538 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:21 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,538 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:25 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> {noformat}
> {noformat}
> 2017-10-23 04:14:53,941 [UserServer-1] INFO  
> o.a.drill.exec.rpc.user.UserServer - RPC connection /10.10.88.196:31010 <--> 
> /10.10.88.193:38281 (user server) timed out.  Timeout was set to 30 seconds. 
> Closing connection.
> 2017-10-23 04:14:53,952 [UserServer-1] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 261230f8-2698-15b2-952f-d4ade8d6b180:0:0: State change requested RUNNING --> 
> FAILED
> 2017-10-23 04:14:53,952 [261230f8-2698-15b2-952f-d4ade8d6b180:frag:0:0] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 261230f8-2698-15b2-952f-d4ade8d6b180:0:0: State change requested FAILED --> 
> FINISHED
> 2017-10-23 04:14:53,956 [UserServer-1] WARN  
> o.apache.drill.exec.rpc.RequestIdMap - Failure while attempting to fail rpc 
> response.
> java.lang.IllegalArgumentException: Self-suppression not permitted
> at java.lang.Throwable.addSuppressed(Throwable.java:1043) 
> ~[na:1.7.0_45]
> at 
>