[ 
https://issues.apache.org/jira/browse/DRILL-3743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15288715#comment-15288715
 ] 

Khurram Faraaz edited comment on DRILL-3743 at 5/18/16 10:30 AM:
-----------------------------------------------------------------

I tried to verify and see if this is fixed, and here is what we see,
The same long running query was submitted and the foreman was killed (kill -9 
PID)

Two observations, we do not see the query used in the test, anywhere on any of 
the profiles on web UI on any of the 4 nodes, after the foreman was killed. And 
when another query was submitted after the foreman was killed, we see this 
message "IllegalArgumentException: Attempted to send a message when connection 
is no longer valid.", please confirm if this is the expected behavior and I 
will go ahead and mark this as verified and close the JIRA.

{noformat}
0: jdbc:drill:schema=dfs.tmp> select * from `twoKeyJsn.json` limit 2000000;
...
| 8.12071244667E8  | d    |
| 5.4410470977E8   | l    |
| 1.27114395985E7  | c    |
| 2.07284907136E8  | i    |
| 4.36519831895E8  | a    |
| 7.74844464533E7  | b    |
| 4.61974126986E8  | i    |
Error: CONNECTION ERROR: Connection /10.10.100.201:52906 <--> 
/10.10.100.204:31010 (user client) closed unexpectedly. Drillbit down?


[Error Id: e4905313-64cc-4e95-918c-6fde9148f337 ] (state=,code=0)
0: jdbc:drill:schema=dfs.tmp> select * from  `twoKeyJsn.json` limit 20;
Error: SYSTEM ERROR: IllegalArgumentException: Attempted to send a message when 
connection is no longer valid.

Query submission to Drillbit failed.

[Error Id: 7ea3690e-5506-45fc-be55-9d2a23c22443 ] (state=,code=0)
0: jdbc:drill:schema=dfs.tmp>
{noformat}

Also, I could not find the error ID in drillbit.log returned once the foreman 
was killed on any of the nodes
root@centos-01 logs]# clush -g khurram grep 
"e4905313-64cc-4e95-918c-6fde9148f337" 
/opt/mapr/drill/drill-1.7.0/logs/drillbit.log
clush: 10.10.100.203: exited with exit code 1
clush: 10.10.100.204: exited with exit code 1
clush: 10.10.100.201: exited with exit code 1
clush: 10.10.100.202: exited with exit code 1


was (Author: khfaraaz):
I tried to verify and see if this is fixed, and here is what we see,
The same long running query was submitted and the foreman was killed (kill -9 
PID)

Two observations, we do not see the query used in the test, anywhere on any of 
the profiles on web UI on any of the 4 nodes, after the foreman was killed. And 
when another query was submitted after the foreman was killed, we see this 
message "IllegalArgumentException: Attempted to send a message when connection 
is no longer valid.", please confirm if this is the expected behavior and I 
will go ahead and mark this as verified and close the JIRA.

{noformat}
0: jdbc:drill:schema=dfs.tmp> select * from `twoKeyJsn.json` limit 2000000;
...
| 8.12071244667E8  | d    |
| 5.4410470977E8   | l    |
| 1.27114395985E7  | c    |
| 2.07284907136E8  | i    |
| 4.36519831895E8  | a    |
| 7.74844464533E7  | b    |
| 4.61974126986E8  | i    |
Error: CONNECTION ERROR: Connection /10.10.100.201:52906 <--> 
/10.10.100.204:31010 (user client) closed unexpectedly. Drillbit down?


[Error Id: e4905313-64cc-4e95-918c-6fde9148f337 ] (state=,code=0)
0: jdbc:drill:schema=dfs.tmp> select * from  `twoKeyJsn.json` limit 20;
Error: SYSTEM ERROR: IllegalArgumentException: Attempted to send a message when 
connection is no longer valid.

Query submission to Drillbit failed.

[Error Id: 7ea3690e-5506-45fc-be55-9d2a23c22443 ] (state=,code=0)
0: jdbc:drill:schema=dfs.tmp>
{noformat}

> query hangs on sqlline once Drillbit on foreman node is killed
> --------------------------------------------------------------
>
>                 Key: DRILL-3743
>                 URL: https://issues.apache.org/jira/browse/DRILL-3743
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Flow
>    Affects Versions: 1.2.0
>         Environment: 4 node cluster CentOS
>            Reporter: Khurram Faraaz
>            Assignee: Sudheesh Katkam
>            Priority: Critical
>             Fix For: 1.7.0
>
>
> sqlline/query hangs once Drillbit (on Foreman node) is killed. (kill -9 <PID>)
> query was issued from the Foreman node. The query returns many records, and 
> it is a long running query.
> Steps to reproduce the problem.
> set planner.slice_target=1
> 1.  clush -g khurram service mapr-warden stop
> 2.  clush -g khurram service mapr-warden start
> 3.  ./sqlline -u "jdbc:drill:schema=dfs.tmp"
> 0: jdbc:drill:schema=dfs.tmp> select * from `twoKeyJsn.json` limit 2000000;
> 4.  Immediately from another console do a jps and kill the Drillbit process 
> (in this case foreman) while the query is being run on sqlline. You will 
> notice that sqlline just hangs, we do not see any exceptions or errors being 
> reported on sqlline prompt or in drillbit.log or drillbit.out
> I do see this Exception in sqlline.log on the node from where sqlline was 
> started
> {code}
> 2015-09-04 18:45:12,069 [Client-1] INFO  o.a.d.e.rpc.user.QueryResultHandler 
> - User Error Occurred
> org.apache.drill.common.exceptions.UserException: CONNECTION ERROR: 
> Connection /10.10.100.201:53425 <--> /10.10.100.201:31010 (user client) 
> closed unexpectedly.
> [Error Id: ec316cfd-c9a5-4905-98e3-da20cb799ba5 ]
>         at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:524)
>  ~[drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.rpc.user.QueryResultHandler$SubmissionListener$ChannelClosedListener.operationComplete(QueryResultHandler.java:298)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at 
> io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
>  [netty-common-4.0.27.Final.jar:4.0.27.Final]
>         at 
> io.netty.util.concurrent.DefaultPromise$LateListeners.run(DefaultPromise.java:845)
>  [netty-common-4.0.27.Final.jar:4.0.27.Final]
>         at 
> io.netty.util.concurrent.DefaultPromise$LateListenerNotifier.run(DefaultPromise.java:873)
>  [netty-common-4.0.27.Final.jar:4.0.27.Final]
>         at 
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
>  [netty-common-4.0.27.Final.jar:4.0.27.Final]
>         at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:254) 
> [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
>         at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>  [netty-common-4.0.27.Final.jar:4.0.27.Final]
>         at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
> 2015-09-04 18:45:12,069 [Client-1] INFO  
> o.a.d.j.i.DrillResultSetImpl$ResultsListener - [#7] Query failed:
> org.apache.drill.common.exceptions.UserException: CONNECTION ERROR: 
> Connection /10.10.100.201:53425 <--> /10.10.100.201:31010 (user client) 
> closed unexpectedly.
> [Error Id: ec316cfd-c9a5-4905-98e3-da20cb799ba5 ]
>         at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:524)
>  ~[drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.rpc.user.QueryResultHandler$SubmissionListener$ChannelClosedListener.operationComplete(QueryResultHandler.java:298)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at 
> io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
>  [netty-common-4.0.27.Final.jar:4.0.27.Final]
>         at 
> io.netty.util.concurrent.DefaultPromise$LateListeners.run(DefaultPromise.java:845)
>  [netty-common-4.0.27.Final.jar:4.0.27.Final]
>         at 
> io.netty.util.concurrent.DefaultPromise$LateListenerNotifier.run(DefaultPromise.java:873)
>  [netty-common-4.0.27.Final.jar:4.0.27.Final]
>         at 
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
>  [netty-common-4.0.27.Final.jar:4.0.27.Final]
>         at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:254) 
> [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
>         at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>  [netty-common-4.0.27.Final.jar:4.0.27.Final]
>         at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
> 2015-09-04 18:45:12,071 [Client-1] ERROR o.a.d.e.rpc.user.QueryResultHandler 
> - SYSTEM ERROR: ChannelClosedException
> [Error Id: c53c477f-f1cf-4458-8620-b1e11ba31701 ]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> ChannelClosedException
> [Error Id: c53c477f-f1cf-4458-8620-b1e11ba31701 ]
>         at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:524)
>  ~[drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.rpc.user.QueryResultHandler$SubmissionListener.failed(QueryResultHandler.java:312)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.rpc.CoordinationQueue$RpcListener.setException(CoordinationQueue.java:103)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.rpc.CoordinationQueue$RpcListener.operationComplete(CoordinationQueue.java:89)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.rpc.CoordinationQueue$RpcListener.operationComplete(CoordinationQueue.java:67)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at 
> io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
>  [netty-common-4.0.27.Final.jar:4.0.27.Final]
>         at 
> io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:603)
>  [netty-common-4.0.27.Final.jar:4.0.27.Final]
>         at 
> io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:563)
>  [netty-common-4.0.27.Final.jar:4.0.27.Final]
>         at 
> io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424) 
> [netty-common-4.0.27.Final.jar:4.0.27.Final]
>         at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:788)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:689)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1114)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:705)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at 
> io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:32)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at 
> io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:980)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at 
> io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:1032)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at 
> io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:965)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at 
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
>  [netty-common-4.0.27.Final.jar:4.0.27.Final]
>         at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:254) 
> [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
>         at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>  [netty-common-4.0.27.Final.jar:4.0.27.Final]
>         at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
> Caused by: org.apache.drill.exec.rpc.ChannelClosedException: null
>         ... 18 common frames omitted
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to