I did!

Thanks for clarifying :)

The client that is part of Zookeeper itself actually does support timeouts.

On Mon, Feb 2, 2015 at 9:54 AM, Guozhang Wang <wangg...@gmail.com> wrote:
> Hi Jaikiran,
>
> I think Gwen was talking about contributing to ZkClient project:
>
> https://github.com/sgroschupf/zkclient
>
> Guozhang
>
>
> On Sun, Feb 1, 2015 at 5:30 AM, Jaikiran Pai <jai.forums2...@gmail.com>
> wrote:
>
>> Hi Gwen,
>>
>> Yes, the KafkaZkClient is a wrapper around ZkClient and not a complete
>> replacement.
>>
>> As for contributing to Zookeeper, yes that indeed in on my mind, but I
>> haven't yet had a chance to really look deeper into Zookeeper or get in
>> touch with their dev team to try and explain this potential improvement to
>> them. I have no objection to contributing this or something similar to
>> Zookeeper directly. I think I should be able to bring this up in the
>> Zookeeper dev forum, sometime soon in the next few weekends.
>>
>> -Jaikiran
>>
>>
>> On Sunday 01 February 2015 11:40 AM, Gwen Shapira wrote:
>>
>>> It looks like the new KafkaZkClient is a wrapper around ZkClient, but
>>> not a replacement. Did I get it right?
>>>
>>> I think a wrapper for ZkClient can be useful - for example KAFKA-1664
>>> can also use one.
>>>
>>> However, I'm wondering why not contribute the fix directly to ZKClient
>>> project and ask for a release that contains the fix?
>>> This will benefit other users of the project who may also need a
>>> timeout (thats pretty basic...)
>>>
>>> As an alternative, if we don't want to collaborate with ZKClient for
>>> some reason, forking the project into Kafka will probably give us more
>>> control than wrappers and without much downside.
>>>
>>> Just a thought.
>>>
>>> Gwen
>>>
>>>
>>>
>>>
>>>
>>> On Sat, Jan 31, 2015 at 6:32 AM, Jaikiran Pai <jai.forums2...@gmail.com>
>>> wrote:
>>>
>>>> Neha, Ewen (and others), my initial attempt to solve this is uploaded
>>>> here
>>>> https://reviews.apache.org/r/30477/. It solves the shutdown problem and
>>>> now
>>>> the server shuts down even when Zookeeper has gone down before the Kafka
>>>> server.
>>>>
>>>> I went with the approach of introducing a custom (enhanced) ZkClient
>>>> which
>>>> for now allows time outs to be optionally specified for certain
>>>> operations.
>>>> I intentionally haven't forced the use of this new KafkaZkClient all over
>>>> the code and instead for now have just used it in the KafkaServer.
>>>>
>>>> Does this patch look like something worth using?
>>>>
>>>> -Jaikiran
>>>>
>>>>
>>>> On Thursday 29 January 2015 10:41 PM, Neha Narkhede wrote:
>>>>
>>>>> Ewen is right. ZkClient APIs are blocking and the right fix for this
>>>>> seems
>>>>> to be patching ZkClient. At some point, if we find ourselves fiddling
>>>>> too
>>>>> much with ZkClient, it wouldn't hurt to write our own little zookeeper
>>>>> client wrapper.
>>>>>
>>>>> On Thu, Jan 29, 2015 at 12:57 AM, Ewen Cheslack-Postava
>>>>> <e...@confluent.io>
>>>>> wrote:
>>>>>
>>>>>  Looks like a bug to me -- the underlying ZK library wraps a lot of
>>>>>> blocking
>>>>>> method implementations with waitUntilConnected() calls without any
>>>>>> timeouts. Ideally we could just add a version of
>>>>>> ZkUtils.getController()
>>>>>> with a timeout, but I don't see an easy way to accomplish that with
>>>>>> ZkClient.
>>>>>>
>>>>>> There's at least one other call to ZkUtils besides the one in the
>>>>>> stacktrace you gave that would cause the same issue, possibly more that
>>>>>> aren't directly called in that method. One ugly solution would be to
>>>>>> use
>>>>>> an
>>>>>> extra thread during shutdown to trigger timeouts, but I'd imagine we
>>>>>> probably have other threads that could end up blocking in similar ways.
>>>>>>
>>>>>> I filed https://issues.apache.org/jira/browse/KAFKA-1907 to track the
>>>>>> issue.
>>>>>>
>>>>>>
>>>>>> On Mon, Jan 26, 2015 at 6:35 AM, Jaikiran Pai <
>>>>>> jai.forums2...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>  The main culprit is this thread which goes into "forever retry
>>>>>>> connection
>>>>>>> to a closed zookeeper" when I shutdown Kafka (via a Ctrl + C) after
>>>>>>> zookeeper has already been shutdown. I have attached the complete
>>>>>>> thread
>>>>>>> dump, but I don't know if it will be delivered to the mailing list.
>>>>>>>
>>>>>>> "Thread-2" prio=10 tid=0xb3305000 nid=0x4758 waiting on condition
>>>>>>> [0x6ad69000]
>>>>>>>      java.lang.Thread.State: TIMED_WAITING (parking)
>>>>>>>       at sun.misc.Unsafe.park(Native Method)
>>>>>>>       - parking to wait for  <0x70a93368> (a
>>>>>>> java.util.concurrent.locks.
>>>>>>> AbstractQueuedSynchronizer$ConditionObject)
>>>>>>>       at java.util.concurrent.locks.LockSupport.parkUntil(
>>>>>>> LockSupport.java:267)
>>>>>>>       at java.util.concurrent.locks.AbstractQueuedSynchronizer$
>>>>>>> ConditionObject.awaitUntil(AbstractQueuedSynchronizer.java:2130)
>>>>>>>       at
>>>>>>> org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.java:636)
>>>>>>>       at
>>>>>>> org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:619)
>>>>>>>       at
>>>>>>> org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:615)
>>>>>>>       at
>>>>>>>
>>>>>> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:679)
>>>>>>
>>>>>>>       at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:766)
>>>>>>>       at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761)
>>>>>>>       at kafka.utils.ZkUtils$.readDataMaybeNull(ZkUtils.scala:456)
>>>>>>>       at kafka.utils.ZkUtils$.getController(ZkUtils.scala:65)
>>>>>>>       at kafka.server.KafkaServer.kafka$server$KafkaServer$$
>>>>>>> controlledShutdown(KafkaServer.scala:194)
>>>>>>>       at kafka.server.KafkaServer$$anonfun$shutdown$1.apply$mcV$
>>>>>>> sp(KafkaServer.scala:269)
>>>>>>>       at kafka.utils.Utils$.swallow(Utils.scala:172)
>>>>>>>       at kafka.utils.Logging$class.swallowWarn(Logging.scala:92)
>>>>>>>       at kafka.utils.Utils$.swallowWarn(Utils.scala:45)
>>>>>>>       at kafka.utils.Logging$class.swallow(Logging.scala:94)
>>>>>>>       at kafka.utils.Utils$.swallow(Utils.scala:45)
>>>>>>>       at kafka.server.KafkaServer.shutdown(KafkaServer.scala:269)
>>>>>>>       at kafka.server.KafkaServerStartable.shutdown(
>>>>>>> KafkaServerStartable.scala:42)
>>>>>>>       at kafka.Kafka$$anon$1.run(Kafka.scala:42)
>>>>>>>
>>>>>>> -Jaikiran
>>>>>>>
>>>>>>>
>>>>>>> On Monday 26 January 2015 05:46 AM, Neha Narkhede wrote:
>>>>>>>
>>>>>>>  For a clean shutdown, the broker tries to talk to the controller and
>>>>>>>>
>>>>>>> also
>>>>>>
>>>>>>> issues reads to zookeeper. Possibly that is where it tries to
>>>>>>>> reconnect
>>>>>>>>
>>>>>>> to
>>>>>>
>>>>>>> zk. It will help to look at the thread dump.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Neha
>>>>>>>>
>>>>>>>> On Fri, Jan 23, 2015 at 8:53 PM, Jaikiran Pai <
>>>>>>>> jai.forums2...@gmail.com
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>    I was just playing around with the RC2 of 0.8.2 and noticed that
>>>>>>>> if I
>>>>>>>>
>>>>>>>>> shutdown zookeeper first I can't shutdown Kafka server at all since
>>>>>>>>> it
>>>>>>>>> goes
>>>>>>>>> into a never ending attempt to reconnect with zookeeper. I had to
>>>>>>>>> kill
>>>>>>>>> the
>>>>>>>>> Kafka process to stop it. I tried it against trunk too and there
>>>>>>>>> too I
>>>>>>>>> see
>>>>>>>>> the same issue. Should I file a JIRA for this and see if I can come
>>>>>>>>> up
>>>>>>>>> with
>>>>>>>>> a patch?
>>>>>>>>>
>>>>>>>>> FWIW, here's the unending (and IMO too frequent) attempts at trying
>>>>>>>>> to
>>>>>>>>> reconnect. I've a thread dump too which shows that the other thread
>>>>>>>>>
>>>>>>>> which
>>>>>>
>>>>>>> is trying to complete a controlled shutdown of Kafka is blocked
>>>>>>>>> forever
>>>>>>>>> for
>>>>>>>>> the zookeeper to be up. I can attach it to the JIRA.
>>>>>>>>>
>>>>>>>>> 2015-01-24 10:15:46,278] WARN Session 0x14b1a4136800000 for server
>>>>>>>>>
>>>>>>>> null,
>>>>>>
>>>>>>> unexpected error, closing socket connection and attempting reconnect
>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>>        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>>>>>>>        at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>>>>> SocketChannelImpl.java:739)
>>>>>>>>>        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>>>>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>>>>        at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>>>>> ClientCnxn.java:1081)
>>>>>>>>> [2015-01-24 10:15:47,437] INFO Opening socket connection to server
>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate using
>>>>>>>>> SASL
>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
>>>>>>>>> [2015-01-24 10:15:47,438] WARN Session 0x14b1a4136800000 for server
>>>>>>>>>
>>>>>>>> null,
>>>>>>
>>>>>>> unexpected error, closing socket connection and attempting reconnect
>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>>        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>>>>>>>        at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>>>>> SocketChannelImpl.java:739)
>>>>>>>>>        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>>>>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>>>>        at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>>>>> ClientCnxn.java:1081)
>>>>>>>>> [2015-01-24 10:15:49,056] INFO Opening socket connection to server
>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate using
>>>>>>>>> SASL
>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
>>>>>>>>> [2015-01-24 10:15:49,057] WARN Session 0x14b1a4136800000 for server
>>>>>>>>>
>>>>>>>> null,
>>>>>>
>>>>>>> unexpected error, closing socket connection and attempting reconnect
>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>>        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>>>>>>>        at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>>>>> SocketChannelImpl.java:739)
>>>>>>>>>        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>>>>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>>>>        at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>>>>> ClientCnxn.java:1081)
>>>>>>>>> [2015-01-24 10:15:50,801] INFO Opening socket connection to server
>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate using
>>>>>>>>> SASL
>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
>>>>>>>>> [2015-01-24 10:15:50,802] WARN Session 0x14b1a4136800000 for server
>>>>>>>>>
>>>>>>>> null,
>>>>>>
>>>>>>> unexpected error, closing socket connection and attempting reconnect
>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>>        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>>>>>>>        at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>>>>> SocketChannelImpl.java:739)
>>>>>>>>>        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>>>>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>>>>        at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>>>>> ClientCnxn.java:1081)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -Jaikiran
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  --
>>>>>> Thanks,
>>>>>> Ewen
>>>>>>
>>>>>>
>>>>>
>>
>
>
> --
> -- Guozhang

Reply via email to