The following JIRA provides some background on why upgrading immediately
following new release may not be prudent (though I expect this to be rare):

ZOOKEEPER-2347

On Thu, Nov 2, 2017 at 3:00 PM, Ted Yu <yuzhih...@gmail.com> wrote:

> Stephane:
> bq. hasn't acted in over a year
>
> The above fact implies some reluctance from the zookeeper community to
> fully solve the issue (maybe due to technical issues).
> Anyway, we should plan on not relying on the fix to go through in the near
> future.
>
> As for Jun's latest suggestion, I think we should add periodic logging
> indicating the retry.
>
> A KIP is not needed if we go that route.
>
> Cheers
>
> On Thu, Nov 2, 2017 at 2:54 PM, Stephane Maarek <
> steph...@simplemachines.com.au> wrote:
>
>> Hi Jun
>>
>> I think this is a better option. Would that change require a kip then as
>> it's not a change in public API ?
>>
>> @ted it was marked as a blocked for 3.4.11 but they pushed it. It seems
>> that the owner of the pr hasn't acted in over a year and I think someone
>> needs to take ownership of that. Additionally, this would be a change in
>> Kafka zookeeper client dependency, so no need to update your zookeeper
>> quorum to benefit from the change
>>
>> Thanks
>> Stéphane
>>
>>
>> On 3 Nov. 2017 8:45 am, "Jun Rao" <j...@confluent.io> wrote:
>>
>> Stephane, Jeff,
>>
>> Another option is to not expose the reconnect timeout config and just
>> retry
>> the creation of Zookeeper forever. This is an improvement from the current
>> situation and if zookeeper-2184 is fixed in the future, we don't need to
>> deprecate the config.
>>
>> Thanks,
>>
>> Jun
>>
>> On Thu, Nov 2, 2017 at 9:02 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>>
>> > ZOOKEEPER-2184 is scheduled for 3.4.12 whose release is unknown.
>> >
>> > I think adding the session recreation on Kafka side should benefit Kafka
>> > users, especially those who don't plan to move to 3.4.12+ in the near
>> > future.
>> >
>> > On Wed, Nov 1, 2017 at 6:34 PM, Jun Rao <j...@confluent.io> wrote:
>> >
>> > > Hi, Stephane,
>> > >
>> > > 3) The difference is that currently, there is no retry when
>> re-creating
>> > the
>> > > Zookeeper object when a ZK session expires. So, if the re-creation of
>> > > Zookeeper fails, the broker just logs the error and the Zookeeper
>> object
>> > > will never be created again. With this KIP, we will keep retrying the
>> > > creation of Zookeeper until success.
>> > >
>> > > Thanks,
>> > >
>> > > Jun
>> > >
>> > > On Tue, Oct 31, 2017 at 3:28 PM, Stephane Maarek <
>> > > steph...@simplemachines.com.au> wrote:
>> > >
>> > > > Hi Jun,
>> > > >
>> > > > Thanks for the reply.
>> > > >
>> > > > 1) The reason I'm asking about it is I wonder if it's not worth
>> > focusing
>> > > > the development efforts on taking ownership of the existing PR (
>> > > > https://github.com/apache/zookeeper/pull/150)  to fix
>> ZOOKEEPER-2184,
>> > > > rebase it and have it merged into the ZK codebase shortly.  I feel
>> this
>> > > KIP
>> > > > might introduce a setting that could be deprecated shortly and
>> confuse
>> > > the
>> > > > end user a bit further with one more knob to turn.
>> > > >
>> > > > 3) I'm not sure if I fully understand, sorry for the beginner's
>> > question:
>> > > > if the default timeout is infinite, then it won't change anything to
>> > how
>> > > > Kafka works from today, does it? (unless I'm missing something
>> sorry).
>> > If
>> > > > not set to infinite, then we introduce the risk of a whole cluster
>> > > shutting
>> > > > down at once?
>> > > >
>> > > > Thanks,
>> > > > Stephane
>> > > >
>> > > > On 31/10/17, 1:00 pm, "Jun Rao" <j...@confluent.io> wrote:
>> > > >
>> > > >     Hi, Stephane,
>> > > >
>> > > >     Thanks for the reply.
>> > > >
>> > > >     1) Fixing the issue in ZK will be ideal. Not sure when it will
>> > happen
>> > > >     though. Once it's fixed, we can probably deprecate this config.
>> > > >
>> > > >     2) That could be useful. Is there a java api to do that at
>> runtime?
>> > > > Also,
>> > > >     invalidating DNS cache doesn't always fix the issue of
>> unresolved
>> > > > host. In
>> > > >     some of the cases, human intervention is needed.
>> > > >
>> > > >     3) The default timeout is infinite though.
>> > > >
>> > > >     Jun
>> > > >
>> > > >
>> > > >     On Sat, Oct 28, 2017 at 11:48 PM, Stephane Maarek <
>> > > >     steph...@simplemachines.com.au> wrote:
>> > > >
>> > > >     > Hi Jun,
>> > > >     >
>> > > >     > I think this is very helpful. Restarting Kafka brokers in case
>> of
>> > > > zookeeper
>> > > >     > host change is not a well known operation.
>> > > >     >
>> > > >     > Few questions:
>> > > >     > 1) would it not be worth fixing the problem at the source ?
>> This
>> > > has
>> > > > been
>> > > >     > stuck for a while though, maybe a little push would help :
>> > > >     > https://issues.apache.org/jira/plugins/servlet/mobile#
>> > > > issue/ZOOKEEPER-2184
>> > > >     >
>> > > >     > 2) upon recreating the zookeeper object , is it not possible
>> to
>> > > > invalidate
>> > > >     > the DNS cache so that it resolves the new hostname?
>> > > >     >
>> > > >     > 3) could the cluster be down in this situation: one migrates
>> an
>> > > > entire
>> > > >     > zookeeper cluster to new machines (one by one). The quorum is
>> > still
>> > > > alive
>> > > >     > without downtime, but now every broker in a cluster can't
>> resolve
>> > > > zookeeper
>> > > >     > at the same time. They all shut down at the same time after
>> the
>> > new
>> > > >     > time-out setting.
>> > > >     >
>> > > >     > Thanks !
>> > > >     > Stéphane
>> > > >     >
>> > > >     > On 28 Oct. 2017 9:42 am, "Jun Rao" <j...@confluent.io> wrote:
>> > > >     >
>> > > >     > > Hi, Everyone,
>> > > >     > >
>> > > >     > > We created "KIP-217: Expose a timeout to allow an expired ZK
>> > > > session to
>> > > >     > be
>> > > >     > > re-created".
>> > > >     > >
>> > > >     > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> > > >     > > 217%3A+Expose+a+timeout+to+allow+an+expired+ZK+session+
>> > > > to+be+re-created
>> > > >     > >
>> > > >     > > Please take a look and provide your feedback.
>> > > >     > >
>> > > >     > > Thanks,
>> > > >     > >
>> > > >     > > Jun
>> > > >     > >
>> > > >     >
>> > > >
>> > > >
>> > > >
>> > > >
>> > >
>> >
>>
>
>

Reply via email to