It looks like the new KafkaZkClient is a wrapper around ZkClient, but
not a replacement. Did I get it right?

I think a wrapper for ZkClient can be useful - for example KAFKA-1664
can also use one.

However, I'm wondering why not contribute the fix directly to ZKClient
project and ask for a release that contains the fix?
This will benefit other users of the project who may also need a
timeout (thats pretty basic...)

As an alternative, if we don't want to collaborate with ZKClient for
some reason, forking the project into Kafka will probably give us more
control than wrappers and without much downside.

Just a thought.

Gwen





On Sat, Jan 31, 2015 at 6:32 AM, Jaikiran Pai <jai.forums2...@gmail.com> wrote:
> Neha, Ewen (and others), my initial attempt to solve this is uploaded here
> https://reviews.apache.org/r/30477/. It solves the shutdown problem and now
> the server shuts down even when Zookeeper has gone down before the Kafka
> server.
>
> I went with the approach of introducing a custom (enhanced) ZkClient which
> for now allows time outs to be optionally specified for certain operations.
> I intentionally haven't forced the use of this new KafkaZkClient all over
> the code and instead for now have just used it in the KafkaServer.
>
> Does this patch look like something worth using?
>
> -Jaikiran
>
>
> On Thursday 29 January 2015 10:41 PM, Neha Narkhede wrote:
>>
>> Ewen is right. ZkClient APIs are blocking and the right fix for this seems
>> to be patching ZkClient. At some point, if we find ourselves fiddling too
>> much with ZkClient, it wouldn't hurt to write our own little zookeeper
>> client wrapper.
>>
>> On Thu, Jan 29, 2015 at 12:57 AM, Ewen Cheslack-Postava
>> <e...@confluent.io>
>> wrote:
>>
>>> Looks like a bug to me -- the underlying ZK library wraps a lot of
>>> blocking
>>> method implementations with waitUntilConnected() calls without any
>>> timeouts. Ideally we could just add a version of ZkUtils.getController()
>>> with a timeout, but I don't see an easy way to accomplish that with
>>> ZkClient.
>>>
>>> There's at least one other call to ZkUtils besides the one in the
>>> stacktrace you gave that would cause the same issue, possibly more that
>>> aren't directly called in that method. One ugly solution would be to use
>>> an
>>> extra thread during shutdown to trigger timeouts, but I'd imagine we
>>> probably have other threads that could end up blocking in similar ways.
>>>
>>> I filed https://issues.apache.org/jira/browse/KAFKA-1907 to track the
>>> issue.
>>>
>>>
>>> On Mon, Jan 26, 2015 at 6:35 AM, Jaikiran Pai <jai.forums2...@gmail.com>
>>> wrote:
>>>
>>>> The main culprit is this thread which goes into "forever retry
>>>> connection
>>>> to a closed zookeeper" when I shutdown Kafka (via a Ctrl + C) after
>>>> zookeeper has already been shutdown. I have attached the complete thread
>>>> dump, but I don't know if it will be delivered to the mailing list.
>>>>
>>>> "Thread-2" prio=10 tid=0xb3305000 nid=0x4758 waiting on condition
>>>> [0x6ad69000]
>>>>     java.lang.Thread.State: TIMED_WAITING (parking)
>>>>      at sun.misc.Unsafe.park(Native Method)
>>>>      - parking to wait for  <0x70a93368> (a java.util.concurrent.locks.
>>>> AbstractQueuedSynchronizer$ConditionObject)
>>>>      at java.util.concurrent.locks.LockSupport.parkUntil(
>>>> LockSupport.java:267)
>>>>      at java.util.concurrent.locks.AbstractQueuedSynchronizer$
>>>> ConditionObject.awaitUntil(AbstractQueuedSynchronizer.java:2130)
>>>>      at
>>>> org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.java:636)
>>>>      at
>>>> org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:619)
>>>>      at
>>>> org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:615)
>>>>      at
>>>
>>> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:679)
>>>>
>>>>      at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:766)
>>>>      at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761)
>>>>      at kafka.utils.ZkUtils$.readDataMaybeNull(ZkUtils.scala:456)
>>>>      at kafka.utils.ZkUtils$.getController(ZkUtils.scala:65)
>>>>      at kafka.server.KafkaServer.kafka$server$KafkaServer$$
>>>> controlledShutdown(KafkaServer.scala:194)
>>>>      at kafka.server.KafkaServer$$anonfun$shutdown$1.apply$mcV$
>>>> sp(KafkaServer.scala:269)
>>>>      at kafka.utils.Utils$.swallow(Utils.scala:172)
>>>>      at kafka.utils.Logging$class.swallowWarn(Logging.scala:92)
>>>>      at kafka.utils.Utils$.swallowWarn(Utils.scala:45)
>>>>      at kafka.utils.Logging$class.swallow(Logging.scala:94)
>>>>      at kafka.utils.Utils$.swallow(Utils.scala:45)
>>>>      at kafka.server.KafkaServer.shutdown(KafkaServer.scala:269)
>>>>      at kafka.server.KafkaServerStartable.shutdown(
>>>> KafkaServerStartable.scala:42)
>>>>      at kafka.Kafka$$anon$1.run(Kafka.scala:42)
>>>>
>>>> -Jaikiran
>>>>
>>>>
>>>> On Monday 26 January 2015 05:46 AM, Neha Narkhede wrote:
>>>>
>>>>> For a clean shutdown, the broker tries to talk to the controller and
>>>
>>> also
>>>>>
>>>>> issues reads to zookeeper. Possibly that is where it tries to reconnect
>>>
>>> to
>>>>>
>>>>> zk. It will help to look at the thread dump.
>>>>>
>>>>> Thanks
>>>>> Neha
>>>>>
>>>>> On Fri, Jan 23, 2015 at 8:53 PM, Jaikiran Pai <jai.forums2...@gmail.com
>>>>> wrote:
>>>>>
>>>>>   I was just playing around with the RC2 of 0.8.2 and noticed that if I
>>>>>>
>>>>>> shutdown zookeeper first I can't shutdown Kafka server at all since it
>>>>>> goes
>>>>>> into a never ending attempt to reconnect with zookeeper. I had to kill
>>>>>> the
>>>>>> Kafka process to stop it. I tried it against trunk too and there too I
>>>>>> see
>>>>>> the same issue. Should I file a JIRA for this and see if I can come up
>>>>>> with
>>>>>> a patch?
>>>>>>
>>>>>> FWIW, here's the unending (and IMO too frequent) attempts at trying to
>>>>>> reconnect. I've a thread dump too which shows that the other thread
>>>
>>> which
>>>>>>
>>>>>> is trying to complete a controlled shutdown of Kafka is blocked
>>>>>> forever
>>>>>> for
>>>>>> the zookeeper to be up. I can attach it to the JIRA.
>>>>>>
>>>>>> 2015-01-24 10:15:46,278] WARN Session 0x14b1a4136800000 for server
>>>
>>> null,
>>>>>>
>>>>>> unexpected error, closing socket connection and attempting reconnect
>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>> java.net.ConnectException: Connection refused
>>>>>>       at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>>>>       at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>> SocketChannelImpl.java:739)
>>>>>>       at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>       at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>> ClientCnxn.java:1081)
>>>>>> [2015-01-24 10:15:47,437] INFO Opening socket connection to server
>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL
>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
>>>>>> [2015-01-24 10:15:47,438] WARN Session 0x14b1a4136800000 for server
>>>
>>> null,
>>>>>>
>>>>>> unexpected error, closing socket connection and attempting reconnect
>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>> java.net.ConnectException: Connection refused
>>>>>>       at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>>>>       at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>> SocketChannelImpl.java:739)
>>>>>>       at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>       at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>> ClientCnxn.java:1081)
>>>>>> [2015-01-24 10:15:49,056] INFO Opening socket connection to server
>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL
>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
>>>>>> [2015-01-24 10:15:49,057] WARN Session 0x14b1a4136800000 for server
>>>
>>> null,
>>>>>>
>>>>>> unexpected error, closing socket connection and attempting reconnect
>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>> java.net.ConnectException: Connection refused
>>>>>>       at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>>>>       at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>> SocketChannelImpl.java:739)
>>>>>>       at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>       at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>> ClientCnxn.java:1081)
>>>>>> [2015-01-24 10:15:50,801] INFO Opening socket connection to server
>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL
>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
>>>>>> [2015-01-24 10:15:50,802] WARN Session 0x14b1a4136800000 for server
>>>
>>> null,
>>>>>>
>>>>>> unexpected error, closing socket connection and attempting reconnect
>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>> java.net.ConnectException: Connection refused
>>>>>>       at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>>>>       at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>> SocketChannelImpl.java:739)
>>>>>>       at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>       at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>> ClientCnxn.java:1081)
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> -Jaikiran
>>>>>>
>>>>>>
>>>>>
>>>
>>> --
>>> Thanks,
>>> Ewen
>>>
>>
>>
>

Reply via email to