Khurram Faraaz created DRILL-3751:
-------------------------------------
Summary: Query hang when zookeeper is stopped
Key: DRILL-3751
URL: https://issues.apache.org/jira/browse/DRILL-3751
Project: Apache Drill
Issue Type: Bug
Components: Execution - Flow
Affects Versions: 1.2.0
Environment: 4 node cluster on CentOS
Reporter: Khurram Faraaz
Assignee: Chris Westin
Fix For: 1.2.0
I see an indefinite hang on sqlline prompt, issue a long running query and then
stop zookeeper process when the query is still being executed. Sqlline prompt
is never returned and it hangs showing the below stack trace. I am on master.
Steps to reproduce the problem
clush -g khurram service mapr-warden stop
clush -g khurram service mapr-warden start
Issue long running query from sqlline
While query is running, stop zookeeper using script.
To stop zookeeper
{code}
[root@centos-01 bin]# ./zkServer.sh stop
JMX enabled by default
Using config: /opt/mapr/zookeeper/zookeeper-3.4.5/bin/../conf/zoo.cfg
Stopping zookeeper ... STOPPED
{code}
Issue below long running query from sqlline
{code}
./sqlline -u "jdbc:drill:schema=dfs.tmp"
0: jdbc:drill:schema=dfs.tmp> select * from `twoKeyJsn.json` limit 8000000;
...
| 7.40907649723E8 | g |
| 1.12378007695E9 | d |
03:03:28.482 [CuratorFramework-0] ERROR org.apache.curator.ConnectionState -
Connection timed out for connection string (10.10.100.201:5181) and timeout
(5000) / elapsed (5013)
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode =
ConnectionLoss
at
org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:198)
[curator-client-2.5.0.jar:na]
at
org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:88)
[curator-client-2.5.0.jar:na]
at
org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:115)
[curator-client-2.5.0.jar:na]
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:807)
[curator-framework-2.5.0.jar:na]
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:793)
[curator-framework-2.5.0.jar:na]
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.access$400(CuratorFrameworkImpl.java:57)
[curator-framework-2.5.0.jar:na]
at
org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:275)
[curator-framework-2.5.0.jar:na]
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
[na:1.7.0_45]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[na:1.7.0_45]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[na:1.7.0_45]
at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
{code}
Here is the stack for sqlline process
{code}
[root@centos-01 bin]# /usr/java/jdk1.7.0_45/bin/jstack 32136
2015-09-05 03:21:52
Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.45-b08 mixed mode):
"Attach Listener" daemon prio=10 tid=0x00007f8328003800 nid=0x27f1 waiting on
condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"CuratorFramework-0-EventThread" daemon prio=10 tid=0x00000000012fd800
nid=0x26e1 waiting on condition [0x00007f8317c2e000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000007e2117798> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:491)
"CuratorFramework-0-SendThread(centos-01.qa.lab:5181)" daemon prio=10
tid=0x0000000001109800 nid=0x26e0 waiting on condition [0x00007f8317b2d000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at
org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:86)
at
org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:937)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:995)
"threadDeathWatcher-2-1" daemon prio=10 tid=0x00007f833043b800 nid=0x7e16
waiting on condition [0x00007f831751f000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at
io.netty.util.ThreadDeathWatcher$Watcher.run(ThreadDeathWatcher.java:137)
at
io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
at java.lang.Thread.run(Thread.java:744)
"Client-1" daemon prio=10 tid=0x00007f8378df7000 nid=0x7e15 runnable
[0x00007f8317620000]
java.lang.Thread.State: RUNNABLE
at io.netty.channel.epoll.Native.epollWait0(Native Method)
at io.netty.channel.epoll.Native.epollWait(Native.java:148)
at
io.netty.channel.epoll.EpollEventLoop.epollWait(EpollEventLoop.java:180)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:205)
at
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:744)
"ServiceCache-0" daemon prio=10 tid=0x00007f8378d22000 nid=0x7e13 waiting on
condition [0x00007f831792b000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000006fff9c658> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
"CuratorFramework-0" daemon prio=10 tid=0x00007f8378c95800 nid=0x7e12 waiting
on condition [0x00007f8317a2c000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000006fff9ebd0> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
at java.util.concurrent.DelayQueue.take(DelayQueue.java:220)
at java.util.concurrent.DelayQueue.take(DelayQueue.java:68)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:781)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.access$400(CuratorFrameworkImpl.java:57)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:275)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
"ConnectionStateManager-0" daemon prio=10 tid=0x00007f8378c60800 nid=0x7e0f
waiting on condition [0x00007f8317d2f000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000006fffb2288> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at
java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:374)
at
org.apache.curator.framework.state.ConnectionStateManager.processEvents(ConnectionStateManager.java:208)
at
org.apache.curator.framework.state.ConnectionStateManager.access$000(ConnectionStateManager.java:42)
at
org.apache.curator.framework.state.ConnectionStateManager$1.call(ConnectionStateManager.java:110)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
"NonBlockingInputStreamThread" daemon prio=10 tid=0x00007f8378836000 nid=0x7de0
in Object.wait() [0x00007f83186ab000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000006fffb2438> (a
jline.internal.NonBlockingInputStream)
at
jline.internal.NonBlockingInputStream.run(NonBlockingInputStream.java:278)
- locked <0x00000006fffb2438> (a jline.internal.NonBlockingInputStream)
at java.lang.Thread.run(Thread.java:744)
"Service Thread" daemon prio=10 tid=0x00007f83780c1000 nid=0x7dcd runnable
[0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread1" daemon prio=10 tid=0x00007f83780be800 nid=0x7dcc waiting
on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread0" daemon prio=10 tid=0x00007f83780bb800 nid=0x7dcb waiting
on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Signal Dispatcher" daemon prio=10 tid=0x00007f83780b1800 nid=0x7dca runnable
[0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Finalizer" daemon prio=10 tid=0x00007f837809a800 nid=0x7dc9 in Object.wait()
[0x00007f832c574000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000006fffb2668> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
- locked <0x00000006fffb2668> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:189)
"Reference Handler" daemon prio=10 tid=0x00007f8378091000 nid=0x7dc8 in
Object.wait() [0x00007f832c675000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000006fffb2700> (a java.lang.ref.Reference$Lock)
at java.lang.Object.wait(Object.java:503)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
- locked <0x00000006fffb2700> (a java.lang.ref.Reference$Lock)
"main" prio=10 tid=0x00007f8378011000 nid=0x7db4 waiting on condition
[0x00007f837cac2000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000700d3a210> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
at
java.util.concurrent.LinkedBlockingDeque.pollFirst(LinkedBlockingDeque.java:519)
at
java.util.concurrent.LinkedBlockingDeque.poll(LinkedBlockingDeque.java:682)
at
org.apache.drill.jdbc.impl.DrillResultSetImpl$ResultsListener.getNext(DrillResultSetImpl.java:1536)
at
org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:175)
at org.apache.drill.jdbc.impl.DrillCursor.next(DrillCursor.java:320)
at
net.hydromatic.avatica.AvaticaResultSet.next(AvaticaResultSet.java:187)
at
org.apache.drill.jdbc.impl.DrillResultSetImpl.next(DrillResultSetImpl.java:161)
at sqlline.IncrementalRows.hasNext(IncrementalRows.java:62)
at
sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87)
at sqlline.TableOutputFormat.print(TableOutputFormat.java:118)
at sqlline.SqlLine.print(SqlLine.java:1583)
at sqlline.Commands.execute(Commands.java:852)
at sqlline.Commands.sql(Commands.java:751)
at sqlline.SqlLine.dispatch(SqlLine.java:738)
at sqlline.SqlLine.begin(SqlLine.java:612)
at sqlline.SqlLine.start(SqlLine.java:366)
at sqlline.SqlLine.main(SqlLine.java:259)
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)