Hi Christopher, I have talked through this issue with Keith internally but
haven't raised an official channel for discussion, because I suspected the
issue could be related to ZK / Netty framework as only after enabling TLS
we are seeing this.
May be I should have opened an issue on Accumulo git first. I'm doing this
now.

Thanks,
Karthick

On Wed, 1 Apr 2020 at 08:51, Christopher <ctubb...@apache.org> wrote:

> Karthick, I haven't seen this discussed in the Accumulo community. Can you
> point me to the conversation there?
>
> On Wed, Apr 1, 2020 at 2:19 AM Andor Molnar <an...@apache.org> wrote:
>
> > Why would they need to be daemon threads?
> > I’m not an expert of Java threading, but afaik I/O threads should not be
> > daemon threads in most cases.
> >
> > Also those threads are Netty internal threads, so this question is better
> > to be asked in Netty community.
> > ZK threads reported in jstack are just waiting for input to send/receive.
> > Do you know at which point Accumulo does stuck?
> >
> > Andor
> >
> >
> >
> > > On 2020. Mar 31., at 14:27, karthick rn <karthick.narend...@gmail.com>
> > wrote:
> > >
> > > Hi Enrico,
> > >
> > > Yes, I have already run this through Accumulo folks they have looked at
> > the
> > > jstack output & advised to check with ZK devs if those 2 threads (#27 &
> > > #30) are expected to be non-daemon threads.
> > > Also, in this cluster we have wire encryption enabled only for ZK and
> by
> > > disabling it we don't encounter this issue.
> > > There are no error messages reported on the ZK server log, below are
> the
> > > INFO messages when running the "accumulo-service master start" command
> > >
> > > 2020-03-31 12:16:28,626 [myid:2] - INFO
> > > [nioEventLoopGroup-7-5:X509AuthenticationProvider@172] - Authenticated
> > Id
> > > 'CN=host2' for Scheme 'x509'
> > >
> > > 2020-03-31 12:16:28,676 [myid:2] - INFO
> > > [nioEventLoopGroup-7-5:ZooKeeperServer@1095] - got auth packet /<host2
> > > IP>:46332
> > >
> > > 2020-03-31 12:16:28,676 [myid:2] - INFO
> > > [nioEventLoopGroup-7-5:ZooKeeperServer@1113] - auth success /<host2
> > > IP>:46332
> > >
> > > This issue is reproducible everytime I start Accumulo master. Let me
> know
> > > for any further details?
> > >
> > > Many thanks
> > >
> > > Regards,
> > > Karthick
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Tue, 31 Mar 2020 at 07:23, Enrico Olivelli <eolive...@gmail.com>
> > wrote:
> > >
> > >> Hi,
> > >> Did you check with Accumulo community?
> > >> Do you see errors or informational messages in ZK server logs?
> > >>
> > >> Enrico
> > >>
> > >> Il Mar 31 Mar 2020, 01:12 karthick rn <karthick.narend...@gmail.com>
> ha
> > >> scritto:
> > >>
> > >>> Hello dev team,
> > >>>
> > >>> We are using Hadoop, Accumulo & Zookeeper in our environment, after
> > >>> enabling TLS for ZK we noticed that starting Accumulo master service
> > >> hangs
> > >>> in an intermediate process as shown below and require to kill the
> > process
> > >>> in-order for Accumulo master to start.
> > >>>
> > >>> [user1@host1 ~]$ jps -m
> > >>>
> > >>> 23314 JournalNode
> > >>>
> > >>> 23011 NameNode
> > >>>
> > >>> 23539 DFSZKFailoverController
> > >>>
> > >>> *84118 Main org.apache.accumulo.master.state.SetGoalState NORMAL*
> > >>>
> > >>> 22590 QuorumPeerMain
> > >>>
> > >>> 89790 Jps -m
> > >>>
> > >>>
> > >>> [user1@host1 ~]$ *kill -9 84118*
> > >>>
> > >>> [user1@host1 ~]$ jps -m
> > >>>
> > >>> 23314 JournalNode
> > >>>
> > >>> 23011 NameNode
> > >>>
> > >>> 23539 DFSZKFailoverController
> > >>>
> > >>> 89892 Jps -m
> > >>>
> > >>> *89847 Main master*
> > >>>
> > >>> 22590 QuorumPeerMain
> > >>>
> > >>> [user1@host1 ~]$
> > >>>
> > >>> Jstack collected during the hang shows 2 non-daemon threads (#27 &
> #30)
> > >>> while the rest are daemon threads. Would like to check with the dev
> > team
> > >> if
> > >>> "nioEventLoopGroup" threads are expected to be non-daemon? If so, any
> > >>> thoughts on what else might be causing the issue?
> > >>> I have copied only a portion of the jstack output, let me know
> in-case
> > >> you
> > >>> need the full output. Fyi, I'm using Apache Zookeeper 3.5.7, Hadoop
> > >> 3.2.1 &
> > >>> Accumulo 2.0. Let me know if you need any further details? Many
> thanks
> > >>>
> > >>>
> "org.apache.accumulo.master.state.SetGoalState-SendThread(host1:2281)"
> > >> #25
> > >>> daemon prio=5 os_prio=0 cpu=127.90ms elapsed=95.38s
> > >> tid=0x0000000003a10800
> > >>> nid=0x1624e waiting on condition  [0x00007f5c7bd67000]
> > >>>   java.lang.Thread.State: TIMED_WAITING (parking)
> > >>> at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method)
> > >>> - parking to wait for  <0x000000070f0acf38> (a
> > >>>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> > >>> at java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.6
> > >>> /LockSupport.java:234)
> > >>> at
> > >>>
> > >>>
> > >>
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(java.base@11.0.6
> > >>> /AbstractQueuedSynchronizer.java:2123)
> > >>> at
> java.util.concurrent.LinkedBlockingDeque.pollFirst(java.base@11.0.6
> > >>> /LinkedBlockingDeque.java:513)
> > >>> at java.util.concurrent.LinkedBlockingDeque.poll(java.base@11.0.6
> > >>> /LinkedBlockingDeque.java:675)
> > >>> at
> > >>>
> > >>>
> > >>
> >
> org.apache.zookeeper.ClientCnxnSocketNetty.doTransport(ClientCnxnSocketNetty.java:278)
> > >>> at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1223)
> > >>>
> > >>>   Locked ownable synchronizers:
> > >>> - None
> > >>>
> > >>> "org.apache.accumulo.master.state.SetGoalState-EventThread" #26
> daemon
> > >>> prio=5 os_prio=0 cpu=0.88ms elapsed=95.38s tid=0x0000000003a16800
> > >>> nid=0x1624f waiting on condition  [0x00007f5c7bc66000]
> > >>>   java.lang.Thread.State: WAITING (parking)
> > >>> at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method)
> > >>> - parking to wait for  <0x000000070f0f8af0> (a
> > >>>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> > >>> at java.util.concurrent.locks.LockSupport.park(java.base@11.0.6
> > >>> /LockSupport.java:194)
> > >>> at
> > >>>
> > >>>
> > >>
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@11.0.6
> > >>> /AbstractQueuedSynchronizer.java:2081)
> > >>> at java.util.concurrent.LinkedBlockingQueue.take(java.base@11.0.6
> > >>> /LinkedBlockingQueue.java:433)
> > >>> at
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506)
> > >>>
> > >>>   Locked ownable synchronizers:
> > >>> - None
> > >>>
> > >>> "*nioEventLoopGroup-2-1*" #27 prio=10 os_prio=0 cpu=718.22ms
> > >> elapsed=95.28s
> > >>> tid=0x0000000003cd2800 nid=0x16250 runnable  [0x00007f5c7d6b9000]
> > >>>   java.lang.Thread.State: RUNNABLE
> > >>> at sun.nio.ch.EPoll.wait(java.base@11.0.6/Native Method)
> > >>> at sun.nio.ch.EPollSelectorImpl.doSelect(java.base@11.0.6
> > >>> /EPollSelectorImpl.java:120)
> > >>> at sun.nio.ch.SelectorImpl.lockAndDoSelect(java.base@11.0.6
> > >>> /SelectorImpl.java:124)
> > >>> - locked <0x000000070f079a28> (a
> > >>> io.netty.channel.nio.SelectedSelectionKeySet)
> > >>> - locked <0x000000070f06ca80> (a sun.nio.ch.EPollSelectorImpl)
> > >>> at sun.nio.ch.SelectorImpl.select(java.base@11.0.6
> > >> /SelectorImpl.java:141)
> > >>> at
> > >>>
> > >>>
> > >>
> >
> io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:68)
> > >>> at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:803)
> > >>> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:457)
> > >>> at
> > >>>
> > >>>
> > >>
> >
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
> > >>> at
> > >>>
> > io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
> > >>> at
> > >>>
> > >>>
> > >>
> >
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> > >>> at java.lang.Thread.run(java.base@11.0.6/Thread.java:834)
> > >>>
> > >>>   Locked ownable synchronizers:
> > >>> - None
> > >>>
> > >>>
> "org.apache.accumulo.master.state.SetGoalState-SendThread(host2:2281)"
> > >> #28
> > >>> daemon prio=5 os_prio=0 cpu=8.27ms elapsed=94.31s
> > tid=0x0000000005942800
> > >>> nid=0x16259 waiting on condition  [0x00007f5c7b145000]
> > >>>   java.lang.Thread.State: TIMED_WAITING (parking)
> > >>> at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method)
> > >>> - parking to wait for  <0x000000070f2acbc0> (a
> > >>>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> > >>> at java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.6
> > >>> /LockSupport.java:234)
> > >>> at
> > >>>
> > >>>
> > >>
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(java.base@11.0.6
> > >>> /AbstractQueuedSynchronizer.java:2123)
> > >>> at
> java.util.concurrent.LinkedBlockingDeque.pollFirst(java.base@11.0.6
> > >>> /LinkedBlockingDeque.java:513)
> > >>> at java.util.concurrent.LinkedBlockingDeque.poll(java.base@11.0.6
> > >>> /LinkedBlockingDeque.java:675)
> > >>> at
> > >>>
> > >>>
> > >>
> >
> org.apache.zookeeper.ClientCnxnSocketNetty.doTransport(ClientCnxnSocketNetty.java:278)
> > >>> at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1223)
> > >>>
> > >>>   Locked ownable synchronizers:
> > >>> - None
> > >>>
> > >>> "org.apache.accumulo.master.state.SetGoalState-EventThread" #29
> daemon
> > >>> prio=5 os_prio=0 cpu=0.25ms elapsed=94.31s tid=0x0000000005943800
> > >>> nid=0x1625a waiting on condition  [0x00007f5c7b044000]
> > >>>   java.lang.Thread.State: WAITING (parking)
> > >>> at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method)
> > >>> - parking to wait for  <0x000000070f2adff8> (a
> > >>>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> > >>> at java.util.concurrent.locks.LockSupport.park(java.base@11.0.6
> > >>> /LockSupport.java:194)
> > >>> at
> > >>>
> > >>>
> > >>
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@11.0.6
> > >>> /AbstractQueuedSynchronizer.java:2081)
> > >>> at java.util.concurrent.LinkedBlockingQueue.take(java.base@11.0.6
> > >>> /LinkedBlockingQueue.java:433)
> > >>> at
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506)
> > >>>
> > >>>   Locked ownable synchronizers:
> > >>> - None
> > >>>
> > >>> "*nioEventLoopGroup-3-1*" #30 prio=10 os_prio=0 cpu=297.70ms
> > >> elapsed=94.30s
> > >>> tid=0x00000000044af800 nid=0x1625b runnable  [0x00007f5c7b646000]
> > >>>   java.lang.Thread.State: RUNNABLE
> > >>> at sun.nio.ch.EPoll.wait(java.base@11.0.6/Native Method)
> > >>> at sun.nio.ch.EPollSelectorImpl.doSelect(java.base@11.0.6
> > >>> /EPollSelectorImpl.java:120)
> > >>> at sun.nio.ch.SelectorImpl.lockAndDoSelect(java.base@11.0.6
> > >>> /SelectorImpl.java:124)
> > >>> - locked <0x000000070f2ab868> (a
> > >>> io.netty.channel.nio.SelectedSelectionKeySet)
> > >>> - locked <0x000000070f2ab640> (a sun.nio.ch.EPollSelectorImpl)
> > >>> at sun.nio.ch.SelectorImpl.select(java.base@11.0.6
> > >> /SelectorImpl.java:141)
> > >>> at
> > >>>
> > >>>
> > >>
> >
> io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:68)
> > >>> at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:803)
> > >>> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:457)
> > >>> at
> > >>>
> > >>>
> > >>
> >
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
> > >>> at
> > >>>
> > io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
> > >>> at
> > >>>
> > >>>
> > >>
> >
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> > >>> at java.lang.Thread.run(java.base@11.0.6/Thread.java:834)
> > >>>
> > >>>   Locked ownable synchronizers:
> > >>> - None
> > >>>
> > >>
> >
> >
>

Reply via email to