Agree with Stephane that it's worth at least taking a shot at trying to get
ZOOKEEPER-2184 fixed rather than adding a config that will be deprecated in
the not-too distant future.

I know Zookeeper development feels more like the turtle than the hare these
days, but Kafka is a high-visibility project, so there's a decent chance
you'll be able to get the attention of the zookeeper maintainers to get a
patch merged and possibly even a new release cut incorporating this fix.

On Tue, Oct 31, 2017 at 3:28 PM, Stephane Maarek <
steph...@simplemachines.com.au> wrote:

> Hi Jun,
>
> Thanks for the reply.
>
> 1) The reason I'm asking about it is I wonder if it's not worth focusing
> the development efforts on taking ownership of the existing PR (
> https://github.com/apache/zookeeper/pull/150)  to fix ZOOKEEPER-2184,
> rebase it and have it merged into the ZK codebase shortly.  I feel this KIP
> might introduce a setting that could be deprecated shortly and confuse the
> end user a bit further with one more knob to turn.
>
> 3) I'm not sure if I fully understand, sorry for the beginner's question:
> if the default timeout is infinite, then it won't change anything to how
> Kafka works from today, does it? (unless I'm missing something sorry). If
> not set to infinite, then we introduce the risk of a whole cluster shutting
> down at once?
>
> Thanks,
> Stephane
>
> On 31/10/17, 1:00 pm, "Jun Rao" <j...@confluent.io> wrote:
>
>     Hi, Stephane,
>
>     Thanks for the reply.
>
>     1) Fixing the issue in ZK will be ideal. Not sure when it will happen
>     though. Once it's fixed, we can probably deprecate this config.
>
>     2) That could be useful. Is there a java api to do that at runtime?
> Also,
>     invalidating DNS cache doesn't always fix the issue of unresolved
> host. In
>     some of the cases, human intervention is needed.
>
>     3) The default timeout is infinite though.
>
>     Jun
>
>
>     On Sat, Oct 28, 2017 at 11:48 PM, Stephane Maarek <
>     steph...@simplemachines.com.au> wrote:
>
>     > Hi Jun,
>     >
>     > I think this is very helpful. Restarting Kafka brokers in case of
> zookeeper
>     > host change is not a well known operation.
>     >
>     > Few questions:
>     > 1) would it not be worth fixing the problem at the source ? This has
> been
>     > stuck for a while though, maybe a little push would help :
>     > https://issues.apache.org/jira/plugins/servlet/mobile#
> issue/ZOOKEEPER-2184
>     >
>     > 2) upon recreating the zookeeper object , is it not possible to
> invalidate
>     > the DNS cache so that it resolves the new hostname?
>     >
>     > 3) could the cluster be down in this situation: one migrates an
> entire
>     > zookeeper cluster to new machines (one by one). The quorum is still
> alive
>     > without downtime, but now every broker in a cluster can't resolve
> zookeeper
>     > at the same time. They all shut down at the same time after the new
>     > time-out setting.
>     >
>     > Thanks !
>     > Stéphane
>     >
>     > On 28 Oct. 2017 9:42 am, "Jun Rao" <j...@confluent.io> wrote:
>     >
>     > > Hi, Everyone,
>     > >
>     > > We created "KIP-217: Expose a timeout to allow an expired ZK
> session to
>     > be
>     > > re-created".
>     > >
>     > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>     > > 217%3A+Expose+a+timeout+to+allow+an+expired+ZK+session+
> to+be+re-created
>     > >
>     > > Please take a look and provide your feedback.
>     > >
>     > > Thanks,
>     > >
>     > > Jun
>     > >
>     >
>
>
>
>


-- 

*Jeff Widman*
jeffwidman.com <http://www.jeffwidman.com/> | 740-WIDMAN-J (943-6265)
<><

Reply via email to