So I think the current plan is:
1. Add timeout in zkclient
2. Ask zkclient to release new version (we need it for few other things too)
3. Rebase on new zkclient
4. Fix this jira and the few others than were waiting for the new zkclient

Does that make sense?

Gwen

On Mon, Feb 2, 2015 at 8:33 PM, Jaikiran Pai <jai.forums2...@gmail.com> wrote:
> I just heard back from Stefan, who manages the ZkClient repo and he seems to
> be open to have these changes be part of ZkClient project. I'll be creating
> a pull request for that project to have it reviewed and merged. Although I
> haven't heard of exact release plans, Stefan's reply did indicate that the
> project could be released after this change is merged.
>
> -Jaikiran
>
> On Tuesday 03 February 2015 09:03 AM, Jaikiran Pai wrote:
>>
>> Thanks for pointing to that repo!
>>
>> I just had a look at it and it appears that the project isn't much active
>> (going by the lack of activity). The latest contribution is from Gwen and
>> that was around 3 months back. I haven't found release plans for that
>> project or a place to ask about it (filing an issue doesn't seem right to
>> ask this question). So I'll get in touch with the repo owner and see what
>> his plans for the project are.
>>
>> -Jaikiran
>>
>> On Monday 02 February 2015 11:33 PM, Gwen Shapira wrote:
>>>
>>> I did!
>>>
>>> Thanks for clarifying :)
>>>
>>> The client that is part of Zookeeper itself actually does support
>>> timeouts.
>>>
>>> On Mon, Feb 2, 2015 at 9:54 AM, Guozhang Wang <wangg...@gmail.com> wrote:
>>>>
>>>> Hi Jaikiran,
>>>>
>>>> I think Gwen was talking about contributing to ZkClient project:
>>>>
>>>> https://github.com/sgroschupf/zkclient
>>>>
>>>> Guozhang
>>>>
>>>>
>>>> On Sun, Feb 1, 2015 at 5:30 AM, Jaikiran Pai <jai.forums2...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Gwen,
>>>>>
>>>>> Yes, the KafkaZkClient is a wrapper around ZkClient and not a complete
>>>>> replacement.
>>>>>
>>>>> As for contributing to Zookeeper, yes that indeed in on my mind, but I
>>>>> haven't yet had a chance to really look deeper into Zookeeper or get in
>>>>> touch with their dev team to try and explain this potential improvement
>>>>> to
>>>>> them. I have no objection to contributing this or something similar to
>>>>> Zookeeper directly. I think I should be able to bring this up in the
>>>>> Zookeeper dev forum, sometime soon in the next few weekends.
>>>>>
>>>>> -Jaikiran
>>>>>
>>>>>
>>>>> On Sunday 01 February 2015 11:40 AM, Gwen Shapira wrote:
>>>>>
>>>>>> It looks like the new KafkaZkClient is a wrapper around ZkClient, but
>>>>>> not a replacement. Did I get it right?
>>>>>>
>>>>>> I think a wrapper for ZkClient can be useful - for example KAFKA-1664
>>>>>> can also use one.
>>>>>>
>>>>>> However, I'm wondering why not contribute the fix directly to ZKClient
>>>>>> project and ask for a release that contains the fix?
>>>>>> This will benefit other users of the project who may also need a
>>>>>> timeout (thats pretty basic...)
>>>>>>
>>>>>> As an alternative, if we don't want to collaborate with ZKClient for
>>>>>> some reason, forking the project into Kafka will probably give us more
>>>>>> control than wrappers and without much downside.
>>>>>>
>>>>>> Just a thought.
>>>>>>
>>>>>> Gwen
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sat, Jan 31, 2015 at 6:32 AM, Jaikiran Pai
>>>>>> <jai.forums2...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Neha, Ewen (and others), my initial attempt to solve this is uploaded
>>>>>>> here
>>>>>>> https://reviews.apache.org/r/30477/. It solves the shutdown problem
>>>>>>> and
>>>>>>> now
>>>>>>> the server shuts down even when Zookeeper has gone down before the
>>>>>>> Kafka
>>>>>>> server.
>>>>>>>
>>>>>>> I went with the approach of introducing a custom (enhanced) ZkClient
>>>>>>> which
>>>>>>> for now allows time outs to be optionally specified for certain
>>>>>>> operations.
>>>>>>> I intentionally haven't forced the use of this new KafkaZkClient all
>>>>>>> over
>>>>>>> the code and instead for now have just used it in the KafkaServer.
>>>>>>>
>>>>>>> Does this patch look like something worth using?
>>>>>>>
>>>>>>> -Jaikiran
>>>>>>>
>>>>>>>
>>>>>>> On Thursday 29 January 2015 10:41 PM, Neha Narkhede wrote:
>>>>>>>
>>>>>>>> Ewen is right. ZkClient APIs are blocking and the right fix for this
>>>>>>>> seems
>>>>>>>> to be patching ZkClient. At some point, if we find ourselves
>>>>>>>> fiddling
>>>>>>>> too
>>>>>>>> much with ZkClient, it wouldn't hurt to write our own little
>>>>>>>> zookeeper
>>>>>>>> client wrapper.
>>>>>>>>
>>>>>>>> On Thu, Jan 29, 2015 at 12:57 AM, Ewen Cheslack-Postava
>>>>>>>> <e...@confluent.io>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>   Looks like a bug to me -- the underlying ZK library wraps a lot of
>>>>>>>>>
>>>>>>>>> blocking
>>>>>>>>> method implementations with waitUntilConnected() calls without any
>>>>>>>>> timeouts. Ideally we could just add a version of
>>>>>>>>> ZkUtils.getController()
>>>>>>>>> with a timeout, but I don't see an easy way to accomplish that with
>>>>>>>>> ZkClient.
>>>>>>>>>
>>>>>>>>> There's at least one other call to ZkUtils besides the one in the
>>>>>>>>> stacktrace you gave that would cause the same issue, possibly more
>>>>>>>>> that
>>>>>>>>> aren't directly called in that method. One ugly solution would be
>>>>>>>>> to
>>>>>>>>> use
>>>>>>>>> an
>>>>>>>>> extra thread during shutdown to trigger timeouts, but I'd imagine
>>>>>>>>> we
>>>>>>>>> probably have other threads that could end up blocking in similar
>>>>>>>>> ways.
>>>>>>>>>
>>>>>>>>> I filed https://issues.apache.org/jira/browse/KAFKA-1907 to track
>>>>>>>>> the
>>>>>>>>> issue.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Jan 26, 2015 at 6:35 AM, Jaikiran Pai <
>>>>>>>>> jai.forums2...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>   The main culprit is this thread which goes into "forever retry
>>>>>>>>>>
>>>>>>>>>> connection
>>>>>>>>>> to a closed zookeeper" when I shutdown Kafka (via a Ctrl + C)
>>>>>>>>>> after
>>>>>>>>>> zookeeper has already been shutdown. I have attached the complete
>>>>>>>>>> thread
>>>>>>>>>> dump, but I don't know if it will be delivered to the mailing
>>>>>>>>>> list.
>>>>>>>>>>
>>>>>>>>>> "Thread-2" prio=10 tid=0xb3305000 nid=0x4758 waiting on condition
>>>>>>>>>> [0x6ad69000]
>>>>>>>>>>       java.lang.Thread.State: TIMED_WAITING (parking)
>>>>>>>>>>        at sun.misc.Unsafe.park(Native Method)
>>>>>>>>>>        - parking to wait for  <0x70a93368> (a
>>>>>>>>>> java.util.concurrent.locks.
>>>>>>>>>> AbstractQueuedSynchronizer$ConditionObject)
>>>>>>>>>>        at java.util.concurrent.locks.LockSupport.parkUntil(
>>>>>>>>>> LockSupport.java:267)
>>>>>>>>>>        at java.util.concurrent.locks.AbstractQueuedSynchronizer$
>>>>>>>>>> ConditionObject.awaitUntil(AbstractQueuedSynchronizer.java:2130)
>>>>>>>>>>        at
>>>>>>>>>> org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.java:636)
>>>>>>>>>>        at
>>>>>>>>>> org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:619)
>>>>>>>>>>        at
>>>>>>>>>> org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:615)
>>>>>>>>>>        at
>>>>>>>>>>
>>>>>>>>> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:679)
>>>>>>>>>
>>>>>>>>>>        at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:766)
>>>>>>>>>>        at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761)
>>>>>>>>>>        at
>>>>>>>>>> kafka.utils.ZkUtils$.readDataMaybeNull(ZkUtils.scala:456)
>>>>>>>>>>        at kafka.utils.ZkUtils$.getController(ZkUtils.scala:65)
>>>>>>>>>>        at kafka.server.KafkaServer.kafka$server$KafkaServer$$
>>>>>>>>>> controlledShutdown(KafkaServer.scala:194)
>>>>>>>>>>        at kafka.server.KafkaServer$$anonfun$shutdown$1.apply$mcV$
>>>>>>>>>> sp(KafkaServer.scala:269)
>>>>>>>>>>        at kafka.utils.Utils$.swallow(Utils.scala:172)
>>>>>>>>>>        at kafka.utils.Logging$class.swallowWarn(Logging.scala:92)
>>>>>>>>>>        at kafka.utils.Utils$.swallowWarn(Utils.scala:45)
>>>>>>>>>>        at kafka.utils.Logging$class.swallow(Logging.scala:94)
>>>>>>>>>>        at kafka.utils.Utils$.swallow(Utils.scala:45)
>>>>>>>>>>        at kafka.server.KafkaServer.shutdown(KafkaServer.scala:269)
>>>>>>>>>>        at kafka.server.KafkaServerStartable.shutdown(
>>>>>>>>>> KafkaServerStartable.scala:42)
>>>>>>>>>>        at kafka.Kafka$$anon$1.run(Kafka.scala:42)
>>>>>>>>>>
>>>>>>>>>> -Jaikiran
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Monday 26 January 2015 05:46 AM, Neha Narkhede wrote:
>>>>>>>>>>
>>>>>>>>>>   For a clean shutdown, the broker tries to talk to the controller
>>>>>>>>>> and
>>>>>>>>>> also
>>>>>>>>>> issues reads to zookeeper. Possibly that is where it tries to
>>>>>>>>>>>
>>>>>>>>>>> reconnect
>>>>>>>>>>>
>>>>>>>>>> to
>>>>>>>>>> zk. It will help to look at the thread dump.
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>> Neha
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Jan 23, 2015 at 8:53 PM, Jaikiran Pai <
>>>>>>>>>>> jai.forums2...@gmail.com
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>     I was just playing around with the RC2 of 0.8.2 and noticed
>>>>>>>>>>> that
>>>>>>>>>>> if I
>>>>>>>>>>>
>>>>>>>>>>>> shutdown zookeeper first I can't shutdown Kafka server at all
>>>>>>>>>>>> since
>>>>>>>>>>>> it
>>>>>>>>>>>> goes
>>>>>>>>>>>> into a never ending attempt to reconnect with zookeeper. I had
>>>>>>>>>>>> to
>>>>>>>>>>>> kill
>>>>>>>>>>>> the
>>>>>>>>>>>> Kafka process to stop it. I tried it against trunk too and there
>>>>>>>>>>>> too I
>>>>>>>>>>>> see
>>>>>>>>>>>> the same issue. Should I file a JIRA for this and see if I can
>>>>>>>>>>>> come
>>>>>>>>>>>> up
>>>>>>>>>>>> with
>>>>>>>>>>>> a patch?
>>>>>>>>>>>>
>>>>>>>>>>>> FWIW, here's the unending (and IMO too frequent) attempts at
>>>>>>>>>>>> trying
>>>>>>>>>>>> to
>>>>>>>>>>>> reconnect. I've a thread dump too which shows that the other
>>>>>>>>>>>> thread
>>>>>>>>>>>>
>>>>>>>>>>> which
>>>>>>>>>>
>>>>>>>>>> is trying to complete a controlled shutdown of Kafka is blocked
>>>>>>>>>>>>
>>>>>>>>>>>> forever
>>>>>>>>>>>> for
>>>>>>>>>>>> the zookeeper to be up. I can attach it to the JIRA.
>>>>>>>>>>>>
>>>>>>>>>>>> 2015-01-24 10:15:46,278] WARN Session 0x14b1a4136800000 for
>>>>>>>>>>>> server
>>>>>>>>>>>>
>>>>>>>>>>> null,
>>>>>>>>>>
>>>>>>>>>> unexpected error, closing socket connection and attempting
>>>>>>>>>> reconnect
>>>>>>>>>>>>
>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>>>>>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native
>>>>>>>>>>>> Method)
>>>>>>>>>>>>         at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>>>>>>>> SocketChannelImpl.java:739)
>>>>>>>>>>>>         at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>>>>>>>         at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>>>>>>>> ClientCnxn.java:1081)
>>>>>>>>>>>> [2015-01-24 10:15:47,437] INFO Opening socket connection to
>>>>>>>>>>>> server
>>>>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate using
>>>>>>>>>>>> SASL
>>>>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>>> [2015-01-24 10:15:47,438] WARN Session 0x14b1a4136800000 for
>>>>>>>>>>>> server
>>>>>>>>>>>>
>>>>>>>>>>> null,
>>>>>>>>>>
>>>>>>>>>> unexpected error, closing socket connection and attempting
>>>>>>>>>> reconnect
>>>>>>>>>>>>
>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>>>>>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native
>>>>>>>>>>>> Method)
>>>>>>>>>>>>         at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>>>>>>>> SocketChannelImpl.java:739)
>>>>>>>>>>>>         at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>>>>>>>         at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>>>>>>>> ClientCnxn.java:1081)
>>>>>>>>>>>> [2015-01-24 10:15:49,056] INFO Opening socket connection to
>>>>>>>>>>>> server
>>>>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate using
>>>>>>>>>>>> SASL
>>>>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>>> [2015-01-24 10:15:49,057] WARN Session 0x14b1a4136800000 for
>>>>>>>>>>>> server
>>>>>>>>>>>>
>>>>>>>>>>> null,
>>>>>>>>>>
>>>>>>>>>> unexpected error, closing socket connection and attempting
>>>>>>>>>> reconnect
>>>>>>>>>>>>
>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>>>>>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native
>>>>>>>>>>>> Method)
>>>>>>>>>>>>         at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>>>>>>>> SocketChannelImpl.java:739)
>>>>>>>>>>>>         at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>>>>>>>         at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>>>>>>>> ClientCnxn.java:1081)
>>>>>>>>>>>> [2015-01-24 10:15:50,801] INFO Opening socket connection to
>>>>>>>>>>>> server
>>>>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate using
>>>>>>>>>>>> SASL
>>>>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>>> [2015-01-24 10:15:50,802] WARN Session 0x14b1a4136800000 for
>>>>>>>>>>>> server
>>>>>>>>>>>>
>>>>>>>>>>> null,
>>>>>>>>>>
>>>>>>>>>> unexpected error, closing socket connection and attempting
>>>>>>>>>> reconnect
>>>>>>>>>>>>
>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>>>>>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native
>>>>>>>>>>>> Method)
>>>>>>>>>>>>         at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>>>>>>>> SocketChannelImpl.java:739)
>>>>>>>>>>>>         at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>>>>>>>         at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>>>>>>>> ClientCnxn.java:1081)
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> -Jaikiran
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>   --
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Ewen
>>>>>>>>>
>>>>>>>>>
>>>>
>>>> --
>>>> -- Guozhang
>>
>>
>

Reply via email to