Yes, that's the plan :)

-Jaikiran
On Wednesday 04 February 2015 12:33 AM, Gwen Shapira wrote:
So I think the current plan is:
1. Add timeout in zkclient
2. Ask zkclient to release new version (we need it for few other things too)
3. Rebase on new zkclient
4. Fix this jira and the few others than were waiting for the new zkclient

Does that make sense?

Gwen

On Mon, Feb 2, 2015 at 8:33 PM, Jaikiran Pai <jai.forums2...@gmail.com> wrote:
I just heard back from Stefan, who manages the ZkClient repo and he seems to
be open to have these changes be part of ZkClient project. I'll be creating
a pull request for that project to have it reviewed and merged. Although I
haven't heard of exact release plans, Stefan's reply did indicate that the
project could be released after this change is merged.

-Jaikiran

On Tuesday 03 February 2015 09:03 AM, Jaikiran Pai wrote:
Thanks for pointing to that repo!

I just had a look at it and it appears that the project isn't much active
(going by the lack of activity). The latest contribution is from Gwen and
that was around 3 months back. I haven't found release plans for that
project or a place to ask about it (filing an issue doesn't seem right to
ask this question). So I'll get in touch with the repo owner and see what
his plans for the project are.

-Jaikiran

On Monday 02 February 2015 11:33 PM, Gwen Shapira wrote:
I did!

Thanks for clarifying :)

The client that is part of Zookeeper itself actually does support
timeouts.

On Mon, Feb 2, 2015 at 9:54 AM, Guozhang Wang <wangg...@gmail.com> wrote:
Hi Jaikiran,

I think Gwen was talking about contributing to ZkClient project:

https://github.com/sgroschupf/zkclient

Guozhang


On Sun, Feb 1, 2015 at 5:30 AM, Jaikiran Pai <jai.forums2...@gmail.com>
wrote:

Hi Gwen,

Yes, the KafkaZkClient is a wrapper around ZkClient and not a complete
replacement.

As for contributing to Zookeeper, yes that indeed in on my mind, but I
haven't yet had a chance to really look deeper into Zookeeper or get in
touch with their dev team to try and explain this potential improvement
to
them. I have no objection to contributing this or something similar to
Zookeeper directly. I think I should be able to bring this up in the
Zookeeper dev forum, sometime soon in the next few weekends.

-Jaikiran


On Sunday 01 February 2015 11:40 AM, Gwen Shapira wrote:

It looks like the new KafkaZkClient is a wrapper around ZkClient, but
not a replacement. Did I get it right?

I think a wrapper for ZkClient can be useful - for example KAFKA-1664
can also use one.

However, I'm wondering why not contribute the fix directly to ZKClient
project and ask for a release that contains the fix?
This will benefit other users of the project who may also need a
timeout (thats pretty basic...)

As an alternative, if we don't want to collaborate with ZKClient for
some reason, forking the project into Kafka will probably give us more
control than wrappers and without much downside.

Just a thought.

Gwen





On Sat, Jan 31, 2015 at 6:32 AM, Jaikiran Pai
<jai.forums2...@gmail.com>
wrote:

Neha, Ewen (and others), my initial attempt to solve this is uploaded
here
https://reviews.apache.org/r/30477/. It solves the shutdown problem
and
now
the server shuts down even when Zookeeper has gone down before the
Kafka
server.

I went with the approach of introducing a custom (enhanced) ZkClient
which
for now allows time outs to be optionally specified for certain
operations.
I intentionally haven't forced the use of this new KafkaZkClient all
over
the code and instead for now have just used it in the KafkaServer.

Does this patch look like something worth using?

-Jaikiran


On Thursday 29 January 2015 10:41 PM, Neha Narkhede wrote:

Ewen is right. ZkClient APIs are blocking and the right fix for this
seems
to be patching ZkClient. At some point, if we find ourselves
fiddling
too
much with ZkClient, it wouldn't hurt to write our own little
zookeeper
client wrapper.

On Thu, Jan 29, 2015 at 12:57 AM, Ewen Cheslack-Postava
<e...@confluent.io>
wrote:

   Looks like a bug to me -- the underlying ZK library wraps a lot of
blocking
method implementations with waitUntilConnected() calls without any
timeouts. Ideally we could just add a version of
ZkUtils.getController()
with a timeout, but I don't see an easy way to accomplish that with
ZkClient.

There's at least one other call to ZkUtils besides the one in the
stacktrace you gave that would cause the same issue, possibly more
that
aren't directly called in that method. One ugly solution would be
to
use
an
extra thread during shutdown to trigger timeouts, but I'd imagine
we
probably have other threads that could end up blocking in similar
ways.

I filed https://issues.apache.org/jira/browse/KAFKA-1907 to track
the
issue.


On Mon, Jan 26, 2015 at 6:35 AM, Jaikiran Pai <
jai.forums2...@gmail.com>
wrote:

   The main culprit is this thread which goes into "forever retry
connection
to a closed zookeeper" when I shutdown Kafka (via a Ctrl + C)
after
zookeeper has already been shutdown. I have attached the complete
thread
dump, but I don't know if it will be delivered to the mailing
list.

"Thread-2" prio=10 tid=0xb3305000 nid=0x4758 waiting on condition
[0x6ad69000]
       java.lang.Thread.State: TIMED_WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x70a93368> (a
java.util.concurrent.locks.
AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.parkUntil(
LockSupport.java:267)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$
ConditionObject.awaitUntil(AbstractQueuedSynchronizer.java:2130)
        at
org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.java:636)
        at
org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:619)
        at
org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:615)
        at

org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:679)

        at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:766)
        at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761)
        at
kafka.utils.ZkUtils$.readDataMaybeNull(ZkUtils.scala:456)
        at kafka.utils.ZkUtils$.getController(ZkUtils.scala:65)
        at kafka.server.KafkaServer.kafka$server$KafkaServer$$
controlledShutdown(KafkaServer.scala:194)
        at kafka.server.KafkaServer$$anonfun$shutdown$1.apply$mcV$
sp(KafkaServer.scala:269)
        at kafka.utils.Utils$.swallow(Utils.scala:172)
        at kafka.utils.Logging$class.swallowWarn(Logging.scala:92)
        at kafka.utils.Utils$.swallowWarn(Utils.scala:45)
        at kafka.utils.Logging$class.swallow(Logging.scala:94)
        at kafka.utils.Utils$.swallow(Utils.scala:45)
        at kafka.server.KafkaServer.shutdown(KafkaServer.scala:269)
        at kafka.server.KafkaServerStartable.shutdown(
KafkaServerStartable.scala:42)
        at kafka.Kafka$$anon$1.run(Kafka.scala:42)

-Jaikiran


On Monday 26 January 2015 05:46 AM, Neha Narkhede wrote:

   For a clean shutdown, the broker tries to talk to the controller
and
also
issues reads to zookeeper. Possibly that is where it tries to
reconnect

to
zk. It will help to look at the thread dump.
Thanks
Neha

On Fri, Jan 23, 2015 at 8:53 PM, Jaikiran Pai <
jai.forums2...@gmail.com
wrote:

     I was just playing around with the RC2 of 0.8.2 and noticed
that
if I

shutdown zookeeper first I can't shutdown Kafka server at all
since
it
goes
into a never ending attempt to reconnect with zookeeper. I had
to
kill
the
Kafka process to stop it. I tried it against trunk too and there
too I
see
the same issue. Should I file a JIRA for this and see if I can
come
up
with
a patch?

FWIW, here's the unending (and IMO too frequent) attempts at
trying
to
reconnect. I've a thread dump too which shows that the other
thread

which
is trying to complete a controlled shutdown of Kafka is blocked
forever
for
the zookeeper to be up. I can attach it to the JIRA.

2015-01-24 10:15:46,278] WARN Session 0x14b1a4136800000 for
server

null,
unexpected error, closing socket connection and attempting
reconnect
(org.apache.zookeeper.ClientCnxn)
java.net.ConnectException: Connection refused
         at sun.nio.ch.SocketChannelImpl.checkConnect(Native
Method)
         at sun.nio.ch.SocketChannelImpl.finishConnect(
SocketChannelImpl.java:739)
         at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
ClientCnxnSocketNIO.java:361)
         at org.apache.zookeeper.ClientCnxn$SendThread.run(
ClientCnxn.java:1081)
[2015-01-24 10:15:47,437] INFO Opening socket connection to
server
localhost/127.0.0.1:2181. Will not attempt to authenticate using
SASL
(unknown error) (org.apache.zookeeper.ClientCnxn)
[2015-01-24 10:15:47,438] WARN Session 0x14b1a4136800000 for
server

null,
unexpected error, closing socket connection and attempting
reconnect
(org.apache.zookeeper.ClientCnxn)
java.net.ConnectException: Connection refused
         at sun.nio.ch.SocketChannelImpl.checkConnect(Native
Method)
         at sun.nio.ch.SocketChannelImpl.finishConnect(
SocketChannelImpl.java:739)
         at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
ClientCnxnSocketNIO.java:361)
         at org.apache.zookeeper.ClientCnxn$SendThread.run(
ClientCnxn.java:1081)
[2015-01-24 10:15:49,056] INFO Opening socket connection to
server
localhost/127.0.0.1:2181. Will not attempt to authenticate using
SASL
(unknown error) (org.apache.zookeeper.ClientCnxn)
[2015-01-24 10:15:49,057] WARN Session 0x14b1a4136800000 for
server

null,
unexpected error, closing socket connection and attempting
reconnect
(org.apache.zookeeper.ClientCnxn)
java.net.ConnectException: Connection refused
         at sun.nio.ch.SocketChannelImpl.checkConnect(Native
Method)
         at sun.nio.ch.SocketChannelImpl.finishConnect(
SocketChannelImpl.java:739)
         at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
ClientCnxnSocketNIO.java:361)
         at org.apache.zookeeper.ClientCnxn$SendThread.run(
ClientCnxn.java:1081)
[2015-01-24 10:15:50,801] INFO Opening socket connection to
server
localhost/127.0.0.1:2181. Will not attempt to authenticate using
SASL
(unknown error) (org.apache.zookeeper.ClientCnxn)
[2015-01-24 10:15:50,802] WARN Session 0x14b1a4136800000 for
server

null,
unexpected error, closing socket connection and attempting
reconnect
(org.apache.zookeeper.ClientCnxn)
java.net.ConnectException: Connection refused
         at sun.nio.ch.SocketChannelImpl.checkConnect(Native
Method)
         at sun.nio.ch.SocketChannelImpl.finishConnect(
SocketChannelImpl.java:739)
         at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
ClientCnxnSocketNIO.java:361)
         at org.apache.zookeeper.ClientCnxn$SendThread.run(
ClientCnxn.java:1081)




-Jaikiran


   --
Thanks,
Ewen


--
-- Guozhang


Reply via email to