I did! Thanks for clarifying :)
The client that is part of Zookeeper itself actually does support timeouts. On Mon, Feb 2, 2015 at 9:54 AM, Guozhang Wang <wangg...@gmail.com> wrote: > Hi Jaikiran, > > I think Gwen was talking about contributing to ZkClient project: > > https://github.com/sgroschupf/zkclient > > Guozhang > > > On Sun, Feb 1, 2015 at 5:30 AM, Jaikiran Pai <jai.forums2...@gmail.com> > wrote: > >> Hi Gwen, >> >> Yes, the KafkaZkClient is a wrapper around ZkClient and not a complete >> replacement. >> >> As for contributing to Zookeeper, yes that indeed in on my mind, but I >> haven't yet had a chance to really look deeper into Zookeeper or get in >> touch with their dev team to try and explain this potential improvement to >> them. I have no objection to contributing this or something similar to >> Zookeeper directly. I think I should be able to bring this up in the >> Zookeeper dev forum, sometime soon in the next few weekends. >> >> -Jaikiran >> >> >> On Sunday 01 February 2015 11:40 AM, Gwen Shapira wrote: >> >>> It looks like the new KafkaZkClient is a wrapper around ZkClient, but >>> not a replacement. Did I get it right? >>> >>> I think a wrapper for ZkClient can be useful - for example KAFKA-1664 >>> can also use one. >>> >>> However, I'm wondering why not contribute the fix directly to ZKClient >>> project and ask for a release that contains the fix? >>> This will benefit other users of the project who may also need a >>> timeout (thats pretty basic...) >>> >>> As an alternative, if we don't want to collaborate with ZKClient for >>> some reason, forking the project into Kafka will probably give us more >>> control than wrappers and without much downside. >>> >>> Just a thought. >>> >>> Gwen >>> >>> >>> >>> >>> >>> On Sat, Jan 31, 2015 at 6:32 AM, Jaikiran Pai <jai.forums2...@gmail.com> >>> wrote: >>> >>>> Neha, Ewen (and others), my initial attempt to solve this is uploaded >>>> here >>>> https://reviews.apache.org/r/30477/. It solves the shutdown problem and >>>> now >>>> the server shuts down even when Zookeeper has gone down before the Kafka >>>> server. >>>> >>>> I went with the approach of introducing a custom (enhanced) ZkClient >>>> which >>>> for now allows time outs to be optionally specified for certain >>>> operations. >>>> I intentionally haven't forced the use of this new KafkaZkClient all over >>>> the code and instead for now have just used it in the KafkaServer. >>>> >>>> Does this patch look like something worth using? >>>> >>>> -Jaikiran >>>> >>>> >>>> On Thursday 29 January 2015 10:41 PM, Neha Narkhede wrote: >>>> >>>>> Ewen is right. ZkClient APIs are blocking and the right fix for this >>>>> seems >>>>> to be patching ZkClient. At some point, if we find ourselves fiddling >>>>> too >>>>> much with ZkClient, it wouldn't hurt to write our own little zookeeper >>>>> client wrapper. >>>>> >>>>> On Thu, Jan 29, 2015 at 12:57 AM, Ewen Cheslack-Postava >>>>> <e...@confluent.io> >>>>> wrote: >>>>> >>>>> Looks like a bug to me -- the underlying ZK library wraps a lot of >>>>>> blocking >>>>>> method implementations with waitUntilConnected() calls without any >>>>>> timeouts. Ideally we could just add a version of >>>>>> ZkUtils.getController() >>>>>> with a timeout, but I don't see an easy way to accomplish that with >>>>>> ZkClient. >>>>>> >>>>>> There's at least one other call to ZkUtils besides the one in the >>>>>> stacktrace you gave that would cause the same issue, possibly more that >>>>>> aren't directly called in that method. One ugly solution would be to >>>>>> use >>>>>> an >>>>>> extra thread during shutdown to trigger timeouts, but I'd imagine we >>>>>> probably have other threads that could end up blocking in similar ways. >>>>>> >>>>>> I filed https://issues.apache.org/jira/browse/KAFKA-1907 to track the >>>>>> issue. >>>>>> >>>>>> >>>>>> On Mon, Jan 26, 2015 at 6:35 AM, Jaikiran Pai < >>>>>> jai.forums2...@gmail.com> >>>>>> wrote: >>>>>> >>>>>> The main culprit is this thread which goes into "forever retry >>>>>>> connection >>>>>>> to a closed zookeeper" when I shutdown Kafka (via a Ctrl + C) after >>>>>>> zookeeper has already been shutdown. I have attached the complete >>>>>>> thread >>>>>>> dump, but I don't know if it will be delivered to the mailing list. >>>>>>> >>>>>>> "Thread-2" prio=10 tid=0xb3305000 nid=0x4758 waiting on condition >>>>>>> [0x6ad69000] >>>>>>> java.lang.Thread.State: TIMED_WAITING (parking) >>>>>>> at sun.misc.Unsafe.park(Native Method) >>>>>>> - parking to wait for <0x70a93368> (a >>>>>>> java.util.concurrent.locks. >>>>>>> AbstractQueuedSynchronizer$ConditionObject) >>>>>>> at java.util.concurrent.locks.LockSupport.parkUntil( >>>>>>> LockSupport.java:267) >>>>>>> at java.util.concurrent.locks.AbstractQueuedSynchronizer$ >>>>>>> ConditionObject.awaitUntil(AbstractQueuedSynchronizer.java:2130) >>>>>>> at >>>>>>> org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.java:636) >>>>>>> at >>>>>>> org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:619) >>>>>>> at >>>>>>> org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:615) >>>>>>> at >>>>>>> >>>>>> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:679) >>>>>> >>>>>>> at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:766) >>>>>>> at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761) >>>>>>> at kafka.utils.ZkUtils$.readDataMaybeNull(ZkUtils.scala:456) >>>>>>> at kafka.utils.ZkUtils$.getController(ZkUtils.scala:65) >>>>>>> at kafka.server.KafkaServer.kafka$server$KafkaServer$$ >>>>>>> controlledShutdown(KafkaServer.scala:194) >>>>>>> at kafka.server.KafkaServer$$anonfun$shutdown$1.apply$mcV$ >>>>>>> sp(KafkaServer.scala:269) >>>>>>> at kafka.utils.Utils$.swallow(Utils.scala:172) >>>>>>> at kafka.utils.Logging$class.swallowWarn(Logging.scala:92) >>>>>>> at kafka.utils.Utils$.swallowWarn(Utils.scala:45) >>>>>>> at kafka.utils.Logging$class.swallow(Logging.scala:94) >>>>>>> at kafka.utils.Utils$.swallow(Utils.scala:45) >>>>>>> at kafka.server.KafkaServer.shutdown(KafkaServer.scala:269) >>>>>>> at kafka.server.KafkaServerStartable.shutdown( >>>>>>> KafkaServerStartable.scala:42) >>>>>>> at kafka.Kafka$$anon$1.run(Kafka.scala:42) >>>>>>> >>>>>>> -Jaikiran >>>>>>> >>>>>>> >>>>>>> On Monday 26 January 2015 05:46 AM, Neha Narkhede wrote: >>>>>>> >>>>>>> For a clean shutdown, the broker tries to talk to the controller and >>>>>>>> >>>>>>> also >>>>>> >>>>>>> issues reads to zookeeper. Possibly that is where it tries to >>>>>>>> reconnect >>>>>>>> >>>>>>> to >>>>>> >>>>>>> zk. It will help to look at the thread dump. >>>>>>>> >>>>>>>> Thanks >>>>>>>> Neha >>>>>>>> >>>>>>>> On Fri, Jan 23, 2015 at 8:53 PM, Jaikiran Pai < >>>>>>>> jai.forums2...@gmail.com >>>>>>>> wrote: >>>>>>>> >>>>>>>> I was just playing around with the RC2 of 0.8.2 and noticed that >>>>>>>> if I >>>>>>>> >>>>>>>>> shutdown zookeeper first I can't shutdown Kafka server at all since >>>>>>>>> it >>>>>>>>> goes >>>>>>>>> into a never ending attempt to reconnect with zookeeper. I had to >>>>>>>>> kill >>>>>>>>> the >>>>>>>>> Kafka process to stop it. I tried it against trunk too and there >>>>>>>>> too I >>>>>>>>> see >>>>>>>>> the same issue. Should I file a JIRA for this and see if I can come >>>>>>>>> up >>>>>>>>> with >>>>>>>>> a patch? >>>>>>>>> >>>>>>>>> FWIW, here's the unending (and IMO too frequent) attempts at trying >>>>>>>>> to >>>>>>>>> reconnect. I've a thread dump too which shows that the other thread >>>>>>>>> >>>>>>>> which >>>>>> >>>>>>> is trying to complete a controlled shutdown of Kafka is blocked >>>>>>>>> forever >>>>>>>>> for >>>>>>>>> the zookeeper to be up. I can attach it to the JIRA. >>>>>>>>> >>>>>>>>> 2015-01-24 10:15:46,278] WARN Session 0x14b1a4136800000 for server >>>>>>>>> >>>>>>>> null, >>>>>> >>>>>>> unexpected error, closing socket connection and attempting reconnect >>>>>>>>> (org.apache.zookeeper.ClientCnxn) >>>>>>>>> java.net.ConnectException: Connection refused >>>>>>>>> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) >>>>>>>>> at sun.nio.ch.SocketChannelImpl.finishConnect( >>>>>>>>> SocketChannelImpl.java:739) >>>>>>>>> at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport( >>>>>>>>> ClientCnxnSocketNIO.java:361) >>>>>>>>> at org.apache.zookeeper.ClientCnxn$SendThread.run( >>>>>>>>> ClientCnxn.java:1081) >>>>>>>>> [2015-01-24 10:15:47,437] INFO Opening socket connection to server >>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate using >>>>>>>>> SASL >>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn) >>>>>>>>> [2015-01-24 10:15:47,438] WARN Session 0x14b1a4136800000 for server >>>>>>>>> >>>>>>>> null, >>>>>> >>>>>>> unexpected error, closing socket connection and attempting reconnect >>>>>>>>> (org.apache.zookeeper.ClientCnxn) >>>>>>>>> java.net.ConnectException: Connection refused >>>>>>>>> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) >>>>>>>>> at sun.nio.ch.SocketChannelImpl.finishConnect( >>>>>>>>> SocketChannelImpl.java:739) >>>>>>>>> at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport( >>>>>>>>> ClientCnxnSocketNIO.java:361) >>>>>>>>> at org.apache.zookeeper.ClientCnxn$SendThread.run( >>>>>>>>> ClientCnxn.java:1081) >>>>>>>>> [2015-01-24 10:15:49,056] INFO Opening socket connection to server >>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate using >>>>>>>>> SASL >>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn) >>>>>>>>> [2015-01-24 10:15:49,057] WARN Session 0x14b1a4136800000 for server >>>>>>>>> >>>>>>>> null, >>>>>> >>>>>>> unexpected error, closing socket connection and attempting reconnect >>>>>>>>> (org.apache.zookeeper.ClientCnxn) >>>>>>>>> java.net.ConnectException: Connection refused >>>>>>>>> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) >>>>>>>>> at sun.nio.ch.SocketChannelImpl.finishConnect( >>>>>>>>> SocketChannelImpl.java:739) >>>>>>>>> at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport( >>>>>>>>> ClientCnxnSocketNIO.java:361) >>>>>>>>> at org.apache.zookeeper.ClientCnxn$SendThread.run( >>>>>>>>> ClientCnxn.java:1081) >>>>>>>>> [2015-01-24 10:15:50,801] INFO Opening socket connection to server >>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate using >>>>>>>>> SASL >>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn) >>>>>>>>> [2015-01-24 10:15:50,802] WARN Session 0x14b1a4136800000 for server >>>>>>>>> >>>>>>>> null, >>>>>> >>>>>>> unexpected error, closing socket connection and attempting reconnect >>>>>>>>> (org.apache.zookeeper.ClientCnxn) >>>>>>>>> java.net.ConnectException: Connection refused >>>>>>>>> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) >>>>>>>>> at sun.nio.ch.SocketChannelImpl.finishConnect( >>>>>>>>> SocketChannelImpl.java:739) >>>>>>>>> at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport( >>>>>>>>> ClientCnxnSocketNIO.java:361) >>>>>>>>> at org.apache.zookeeper.ClientCnxn$SendThread.run( >>>>>>>>> ClientCnxn.java:1081) >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -Jaikiran >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>> Thanks, >>>>>> Ewen >>>>>> >>>>>> >>>>> >> > > > -- > -- Guozhang