Hi Christopher, I have talked through this issue with Keith internally but haven't raised an official channel for discussion, because I suspected the issue could be related to ZK / Netty framework as only after enabling TLS we are seeing this. May be I should have opened an issue on Accumulo git first. I'm doing this now.
Thanks, Karthick On Wed, 1 Apr 2020 at 08:51, Christopher <ctubb...@apache.org> wrote: > Karthick, I haven't seen this discussed in the Accumulo community. Can you > point me to the conversation there? > > On Wed, Apr 1, 2020 at 2:19 AM Andor Molnar <an...@apache.org> wrote: > > > Why would they need to be daemon threads? > > I’m not an expert of Java threading, but afaik I/O threads should not be > > daemon threads in most cases. > > > > Also those threads are Netty internal threads, so this question is better > > to be asked in Netty community. > > ZK threads reported in jstack are just waiting for input to send/receive. > > Do you know at which point Accumulo does stuck? > > > > Andor > > > > > > > > > On 2020. Mar 31., at 14:27, karthick rn <karthick.narend...@gmail.com> > > wrote: > > > > > > Hi Enrico, > > > > > > Yes, I have already run this through Accumulo folks they have looked at > > the > > > jstack output & advised to check with ZK devs if those 2 threads (#27 & > > > #30) are expected to be non-daemon threads. > > > Also, in this cluster we have wire encryption enabled only for ZK and > by > > > disabling it we don't encounter this issue. > > > There are no error messages reported on the ZK server log, below are > the > > > INFO messages when running the "accumulo-service master start" command > > > > > > 2020-03-31 12:16:28,626 [myid:2] - INFO > > > [nioEventLoopGroup-7-5:X509AuthenticationProvider@172] - Authenticated > > Id > > > 'CN=host2' for Scheme 'x509' > > > > > > 2020-03-31 12:16:28,676 [myid:2] - INFO > > > [nioEventLoopGroup-7-5:ZooKeeperServer@1095] - got auth packet /<host2 > > > IP>:46332 > > > > > > 2020-03-31 12:16:28,676 [myid:2] - INFO > > > [nioEventLoopGroup-7-5:ZooKeeperServer@1113] - auth success /<host2 > > > IP>:46332 > > > > > > This issue is reproducible everytime I start Accumulo master. Let me > know > > > for any further details? > > > > > > Many thanks > > > > > > Regards, > > > Karthick > > > > > > > > > > > > > > > > > > > > > > > > On Tue, 31 Mar 2020 at 07:23, Enrico Olivelli <eolive...@gmail.com> > > wrote: > > > > > >> Hi, > > >> Did you check with Accumulo community? > > >> Do you see errors or informational messages in ZK server logs? > > >> > > >> Enrico > > >> > > >> Il Mar 31 Mar 2020, 01:12 karthick rn <karthick.narend...@gmail.com> > ha > > >> scritto: > > >> > > >>> Hello dev team, > > >>> > > >>> We are using Hadoop, Accumulo & Zookeeper in our environment, after > > >>> enabling TLS for ZK we noticed that starting Accumulo master service > > >> hangs > > >>> in an intermediate process as shown below and require to kill the > > process > > >>> in-order for Accumulo master to start. > > >>> > > >>> [user1@host1 ~]$ jps -m > > >>> > > >>> 23314 JournalNode > > >>> > > >>> 23011 NameNode > > >>> > > >>> 23539 DFSZKFailoverController > > >>> > > >>> *84118 Main org.apache.accumulo.master.state.SetGoalState NORMAL* > > >>> > > >>> 22590 QuorumPeerMain > > >>> > > >>> 89790 Jps -m > > >>> > > >>> > > >>> [user1@host1 ~]$ *kill -9 84118* > > >>> > > >>> [user1@host1 ~]$ jps -m > > >>> > > >>> 23314 JournalNode > > >>> > > >>> 23011 NameNode > > >>> > > >>> 23539 DFSZKFailoverController > > >>> > > >>> 89892 Jps -m > > >>> > > >>> *89847 Main master* > > >>> > > >>> 22590 QuorumPeerMain > > >>> > > >>> [user1@host1 ~]$ > > >>> > > >>> Jstack collected during the hang shows 2 non-daemon threads (#27 & > #30) > > >>> while the rest are daemon threads. Would like to check with the dev > > team > > >> if > > >>> "nioEventLoopGroup" threads are expected to be non-daemon? If so, any > > >>> thoughts on what else might be causing the issue? > > >>> I have copied only a portion of the jstack output, let me know > in-case > > >> you > > >>> need the full output. Fyi, I'm using Apache Zookeeper 3.5.7, Hadoop > > >> 3.2.1 & > > >>> Accumulo 2.0. Let me know if you need any further details? Many > thanks > > >>> > > >>> > "org.apache.accumulo.master.state.SetGoalState-SendThread(host1:2281)" > > >> #25 > > >>> daemon prio=5 os_prio=0 cpu=127.90ms elapsed=95.38s > > >> tid=0x0000000003a10800 > > >>> nid=0x1624e waiting on condition [0x00007f5c7bd67000] > > >>> java.lang.Thread.State: TIMED_WAITING (parking) > > >>> at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method) > > >>> - parking to wait for <0x000000070f0acf38> (a > > >>> > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > > >>> at java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.6 > > >>> /LockSupport.java:234) > > >>> at > > >>> > > >>> > > >> > > > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(java.base@11.0.6 > > >>> /AbstractQueuedSynchronizer.java:2123) > > >>> at > java.util.concurrent.LinkedBlockingDeque.pollFirst(java.base@11.0.6 > > >>> /LinkedBlockingDeque.java:513) > > >>> at java.util.concurrent.LinkedBlockingDeque.poll(java.base@11.0.6 > > >>> /LinkedBlockingDeque.java:675) > > >>> at > > >>> > > >>> > > >> > > > org.apache.zookeeper.ClientCnxnSocketNetty.doTransport(ClientCnxnSocketNetty.java:278) > > >>> at > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1223) > > >>> > > >>> Locked ownable synchronizers: > > >>> - None > > >>> > > >>> "org.apache.accumulo.master.state.SetGoalState-EventThread" #26 > daemon > > >>> prio=5 os_prio=0 cpu=0.88ms elapsed=95.38s tid=0x0000000003a16800 > > >>> nid=0x1624f waiting on condition [0x00007f5c7bc66000] > > >>> java.lang.Thread.State: WAITING (parking) > > >>> at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method) > > >>> - parking to wait for <0x000000070f0f8af0> (a > > >>> > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > > >>> at java.util.concurrent.locks.LockSupport.park(java.base@11.0.6 > > >>> /LockSupport.java:194) > > >>> at > > >>> > > >>> > > >> > > > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@11.0.6 > > >>> /AbstractQueuedSynchronizer.java:2081) > > >>> at java.util.concurrent.LinkedBlockingQueue.take(java.base@11.0.6 > > >>> /LinkedBlockingQueue.java:433) > > >>> at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506) > > >>> > > >>> Locked ownable synchronizers: > > >>> - None > > >>> > > >>> "*nioEventLoopGroup-2-1*" #27 prio=10 os_prio=0 cpu=718.22ms > > >> elapsed=95.28s > > >>> tid=0x0000000003cd2800 nid=0x16250 runnable [0x00007f5c7d6b9000] > > >>> java.lang.Thread.State: RUNNABLE > > >>> at sun.nio.ch.EPoll.wait(java.base@11.0.6/Native Method) > > >>> at sun.nio.ch.EPollSelectorImpl.doSelect(java.base@11.0.6 > > >>> /EPollSelectorImpl.java:120) > > >>> at sun.nio.ch.SelectorImpl.lockAndDoSelect(java.base@11.0.6 > > >>> /SelectorImpl.java:124) > > >>> - locked <0x000000070f079a28> (a > > >>> io.netty.channel.nio.SelectedSelectionKeySet) > > >>> - locked <0x000000070f06ca80> (a sun.nio.ch.EPollSelectorImpl) > > >>> at sun.nio.ch.SelectorImpl.select(java.base@11.0.6 > > >> /SelectorImpl.java:141) > > >>> at > > >>> > > >>> > > >> > > > io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:68) > > >>> at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:803) > > >>> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:457) > > >>> at > > >>> > > >>> > > >> > > > io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) > > >>> at > > >>> > > io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) > > >>> at > > >>> > > >>> > > >> > > > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > > >>> at java.lang.Thread.run(java.base@11.0.6/Thread.java:834) > > >>> > > >>> Locked ownable synchronizers: > > >>> - None > > >>> > > >>> > "org.apache.accumulo.master.state.SetGoalState-SendThread(host2:2281)" > > >> #28 > > >>> daemon prio=5 os_prio=0 cpu=8.27ms elapsed=94.31s > > tid=0x0000000005942800 > > >>> nid=0x16259 waiting on condition [0x00007f5c7b145000] > > >>> java.lang.Thread.State: TIMED_WAITING (parking) > > >>> at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method) > > >>> - parking to wait for <0x000000070f2acbc0> (a > > >>> > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > > >>> at java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.6 > > >>> /LockSupport.java:234) > > >>> at > > >>> > > >>> > > >> > > > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(java.base@11.0.6 > > >>> /AbstractQueuedSynchronizer.java:2123) > > >>> at > java.util.concurrent.LinkedBlockingDeque.pollFirst(java.base@11.0.6 > > >>> /LinkedBlockingDeque.java:513) > > >>> at java.util.concurrent.LinkedBlockingDeque.poll(java.base@11.0.6 > > >>> /LinkedBlockingDeque.java:675) > > >>> at > > >>> > > >>> > > >> > > > org.apache.zookeeper.ClientCnxnSocketNetty.doTransport(ClientCnxnSocketNetty.java:278) > > >>> at > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1223) > > >>> > > >>> Locked ownable synchronizers: > > >>> - None > > >>> > > >>> "org.apache.accumulo.master.state.SetGoalState-EventThread" #29 > daemon > > >>> prio=5 os_prio=0 cpu=0.25ms elapsed=94.31s tid=0x0000000005943800 > > >>> nid=0x1625a waiting on condition [0x00007f5c7b044000] > > >>> java.lang.Thread.State: WAITING (parking) > > >>> at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method) > > >>> - parking to wait for <0x000000070f2adff8> (a > > >>> > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > > >>> at java.util.concurrent.locks.LockSupport.park(java.base@11.0.6 > > >>> /LockSupport.java:194) > > >>> at > > >>> > > >>> > > >> > > > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@11.0.6 > > >>> /AbstractQueuedSynchronizer.java:2081) > > >>> at java.util.concurrent.LinkedBlockingQueue.take(java.base@11.0.6 > > >>> /LinkedBlockingQueue.java:433) > > >>> at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506) > > >>> > > >>> Locked ownable synchronizers: > > >>> - None > > >>> > > >>> "*nioEventLoopGroup-3-1*" #30 prio=10 os_prio=0 cpu=297.70ms > > >> elapsed=94.30s > > >>> tid=0x00000000044af800 nid=0x1625b runnable [0x00007f5c7b646000] > > >>> java.lang.Thread.State: RUNNABLE > > >>> at sun.nio.ch.EPoll.wait(java.base@11.0.6/Native Method) > > >>> at sun.nio.ch.EPollSelectorImpl.doSelect(java.base@11.0.6 > > >>> /EPollSelectorImpl.java:120) > > >>> at sun.nio.ch.SelectorImpl.lockAndDoSelect(java.base@11.0.6 > > >>> /SelectorImpl.java:124) > > >>> - locked <0x000000070f2ab868> (a > > >>> io.netty.channel.nio.SelectedSelectionKeySet) > > >>> - locked <0x000000070f2ab640> (a sun.nio.ch.EPollSelectorImpl) > > >>> at sun.nio.ch.SelectorImpl.select(java.base@11.0.6 > > >> /SelectorImpl.java:141) > > >>> at > > >>> > > >>> > > >> > > > io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:68) > > >>> at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:803) > > >>> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:457) > > >>> at > > >>> > > >>> > > >> > > > io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) > > >>> at > > >>> > > io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) > > >>> at > > >>> > > >>> > > >> > > > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > > >>> at java.lang.Thread.run(java.base@11.0.6/Thread.java:834) > > >>> > > >>> Locked ownable synchronizers: > > >>> - None > > >>> > > >> > > > > >