Khurram Faraaz created DRILL-3751:
-------------------------------------

             Summary: Query hang when zookeeper is stopped
                 Key: DRILL-3751
                 URL: https://issues.apache.org/jira/browse/DRILL-3751
             Project: Apache Drill
          Issue Type: Bug
          Components: Execution - Flow
    Affects Versions: 1.2.0
         Environment: 4 node cluster on CentOS
            Reporter: Khurram Faraaz
            Assignee: Chris Westin
             Fix For: 1.2.0


I see an indefinite hang on sqlline prompt, issue a long running query and then 
stop zookeeper process when the query is still being executed. Sqlline prompt 
is never returned and it hangs showing the below stack trace. I am on master.

Steps to reproduce the problem
clush -g khurram service mapr-warden stop
clush -g khurram service mapr-warden start
Issue long running query from sqlline
While query is running, stop zookeeper using script.

To stop zookeeper 
{code}
[root@centos-01 bin]# ./zkServer.sh stop
JMX enabled by default
Using config: /opt/mapr/zookeeper/zookeeper-3.4.5/bin/../conf/zoo.cfg
Stopping zookeeper ... STOPPED
{code}

Issue below long running query from sqlline
{code}
./sqlline -u "jdbc:drill:schema=dfs.tmp"
0: jdbc:drill:schema=dfs.tmp> select * from `twoKeyJsn.json` limit 8000000;
...
| 7.40907649723E8  | g    |
| 1.12378007695E9  | d    |
03:03:28.482 [CuratorFramework-0] ERROR org.apache.curator.ConnectionState - 
Connection timed out for connection string (10.10.100.201:5181) and timeout 
(5000) / elapsed (5013)
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = 
ConnectionLoss
        at 
org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:198) 
[curator-client-2.5.0.jar:na]
        at 
org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:88) 
[curator-client-2.5.0.jar:na]
        at 
org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:115)
 [curator-client-2.5.0.jar:na]
        at 
org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:807)
 [curator-framework-2.5.0.jar:na]
        at 
org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:793)
 [curator-framework-2.5.0.jar:na]
        at 
org.apache.curator.framework.imps.CuratorFrameworkImpl.access$400(CuratorFrameworkImpl.java:57)
 [curator-framework-2.5.0.jar:na]
        at 
org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:275)
 [curator-framework-2.5.0.jar:na]
        at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
[na:1.7.0_45]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
[na:1.7.0_45]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
[na:1.7.0_45]
        at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
{code}

Here is the stack for sqlline process

{code}
[root@centos-01 bin]# /usr/java/jdk1.7.0_45/bin/jstack 32136
2015-09-05 03:21:52
Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.45-b08 mixed mode):

"Attach Listener" daemon prio=10 tid=0x00007f8328003800 nid=0x27f1 waiting on 
condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"CuratorFramework-0-EventThread" daemon prio=10 tid=0x00000000012fd800 
nid=0x26e1 waiting on condition [0x00007f8317c2e000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000007e2117798> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
        at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:491)

"CuratorFramework-0-SendThread(centos-01.qa.lab:5181)" daemon prio=10 
tid=0x0000000001109800 nid=0x26e0 waiting on condition [0x00007f8317b2d000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at 
org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:86)
        at 
org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:937)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:995)

"threadDeathWatcher-2-1" daemon prio=10 tid=0x00007f833043b800 nid=0x7e16 
waiting on condition [0x00007f831751f000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at 
io.netty.util.ThreadDeathWatcher$Watcher.run(ThreadDeathWatcher.java:137)
        at 
io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
        at java.lang.Thread.run(Thread.java:744)

"Client-1" daemon prio=10 tid=0x00007f8378df7000 nid=0x7e15 runnable 
[0x00007f8317620000]
   java.lang.Thread.State: RUNNABLE
        at io.netty.channel.epoll.Native.epollWait0(Native Method)
        at io.netty.channel.epoll.Native.epollWait(Native.java:148)
        at 
io.netty.channel.epoll.EpollEventLoop.epollWait(EpollEventLoop.java:180)
        at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:205)
        at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
        at java.lang.Thread.run(Thread.java:744)

"ServiceCache-0" daemon prio=10 tid=0x00007f8378d22000 nid=0x7e13 waiting on 
condition [0x00007f831792b000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000006fff9c658> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
        at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
        at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)

"CuratorFramework-0" daemon prio=10 tid=0x00007f8378c95800 nid=0x7e12 waiting 
on condition [0x00007f8317a2c000]
   java.lang.Thread.State: TIMED_WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000006fff9ebd0> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
        at java.util.concurrent.DelayQueue.take(DelayQueue.java:220)
        at java.util.concurrent.DelayQueue.take(DelayQueue.java:68)
        at 
org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:781)
        at 
org.apache.curator.framework.imps.CuratorFrameworkImpl.access$400(CuratorFrameworkImpl.java:57)
        at 
org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:275)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)

"ConnectionStateManager-0" daemon prio=10 tid=0x00007f8378c60800 nid=0x7e0f 
waiting on condition [0x00007f8317d2f000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000006fffb2288> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
        at 
java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:374)
        at 
org.apache.curator.framework.state.ConnectionStateManager.processEvents(ConnectionStateManager.java:208)
        at 
org.apache.curator.framework.state.ConnectionStateManager.access$000(ConnectionStateManager.java:42)
        at 
org.apache.curator.framework.state.ConnectionStateManager$1.call(ConnectionStateManager.java:110)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)

"NonBlockingInputStreamThread" daemon prio=10 tid=0x00007f8378836000 nid=0x7de0 
in Object.wait() [0x00007f83186ab000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00000006fffb2438> (a 
jline.internal.NonBlockingInputStream)
        at 
jline.internal.NonBlockingInputStream.run(NonBlockingInputStream.java:278)
        - locked <0x00000006fffb2438> (a jline.internal.NonBlockingInputStream)
        at java.lang.Thread.run(Thread.java:744)

"Service Thread" daemon prio=10 tid=0x00007f83780c1000 nid=0x7dcd runnable 
[0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread1" daemon prio=10 tid=0x00007f83780be800 nid=0x7dcc waiting 
on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread0" daemon prio=10 tid=0x00007f83780bb800 nid=0x7dcb waiting 
on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" daemon prio=10 tid=0x00007f83780b1800 nid=0x7dca runnable 
[0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Finalizer" daemon prio=10 tid=0x00007f837809a800 nid=0x7dc9 in Object.wait() 
[0x00007f832c574000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00000006fffb2668> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
        - locked <0x00000006fffb2668> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
        at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:189)

"Reference Handler" daemon prio=10 tid=0x00007f8378091000 nid=0x7dc8 in 
Object.wait() [0x00007f832c675000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00000006fffb2700> (a java.lang.ref.Reference$Lock)
        at java.lang.Object.wait(Object.java:503)
        at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
        - locked <0x00000006fffb2700> (a java.lang.ref.Reference$Lock)

"main" prio=10 tid=0x00007f8378011000 nid=0x7db4 waiting on condition 
[0x00007f837cac2000]
   java.lang.Thread.State: TIMED_WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x0000000700d3a210> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
        at 
java.util.concurrent.LinkedBlockingDeque.pollFirst(LinkedBlockingDeque.java:519)
        at 
java.util.concurrent.LinkedBlockingDeque.poll(LinkedBlockingDeque.java:682)
        at 
org.apache.drill.jdbc.impl.DrillResultSetImpl$ResultsListener.getNext(DrillResultSetImpl.java:1536)
        at 
org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:175)
        at org.apache.drill.jdbc.impl.DrillCursor.next(DrillCursor.java:320)
        at 
net.hydromatic.avatica.AvaticaResultSet.next(AvaticaResultSet.java:187)
        at 
org.apache.drill.jdbc.impl.DrillResultSetImpl.next(DrillResultSetImpl.java:161)
        at sqlline.IncrementalRows.hasNext(IncrementalRows.java:62)
        at 
sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87)
        at sqlline.TableOutputFormat.print(TableOutputFormat.java:118)
        at sqlline.SqlLine.print(SqlLine.java:1583)
        at sqlline.Commands.execute(Commands.java:852)
        at sqlline.Commands.sql(Commands.java:751)
        at sqlline.SqlLine.dispatch(SqlLine.java:738)
        at sqlline.SqlLine.begin(SqlLine.java:612)
        at sqlline.SqlLine.start(SqlLine.java:366)
        at sqlline.SqlLine.main(SqlLine.java:259)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to