Tom,

on the implications you are referring to.
For me they seem the same for the __consumer_offsets and the
__transaction_state topic.

So I am wondering if we can rely on the same solutions for them, like
providing a *.replication.factor config option.


Best regards,


   Steven

Op ma 19 mrt. 2018 om 14:30 schreef Tom Bentley <t.j.bent...@gmail.com>:

> Last week I was able to spend a bit of time working on KIP-236 again and,
> based on the discussion about that with Jun back in December, I refactored
> the controller to store the reassignment state in /brokers/topics/${topic}
> instead of introducing new ZK nodes. This morning I was wondering what to
> do as a next step, as these changes are more or less useless on their own,
> without APIs for discovering the current partitions and/or reassigning
> partitions. I started thinking again about this KIP, and realised that
> using an internal compacted topic (say __partition_reassignments), as
> suggested by Steven and Colin, would require changes in basically the same
> places.
>
> Thinking through some of the failure modes ("what if I update ZK, but can't
> update produce to the topic?") I realised that it would actually be
> possible to simply remove storing this info from ZK entirely and just store
> this state in the __partition_reassignments topic. Doing it that way would
> eliminate those failure modes and would allow clients interested in
> reassignment completion the possibility to consume from this topic and
> respond to records published with a null value (indicating completion of a
> reassignment).
>
> There are some interesting implications to doing this:
>
> 1. This __partition_reassignments topic would need to be replicated in
> order to have availability of reassignment (if the leader of a partition of
> __partition_reassignments is not available then reassignment of those
> partitions whose state is held by the partition
> of __partition_reassignments would not be reassignable).
> 2. We would want to avoid unclean leader election for this topic.
>
> But I am interested in what other people think about this approach?
>
> Cheers,
>
> Tom
>
>
> On 9 January 2018 at 21:18, Colin McCabe <cmcc...@apache.org> wrote:
>
> > What if we had an internal topic which watchers could listen to for
> > information about partition reassignments?  The information could be in
> > JSON, so if we want to add new fields later, we always could.
> >
> > This avoids introducing a new AdminClient API.  For clients that want to
> > be notified about partition reassignments in a timely fashion, this
> avoids
> > the "polling an AdminClient API in a tight loop" antipattern.  It allows
> > watchers to be notified in a simple and natural way about what is going
> > on.  Access can be controlled by the existing topic ACL mechanisms.
> >
> > best,
> > Colin
> >
> >
> > On Fri, Dec 22, 2017, at 06:48, Tom Bentley wrote:
> > > Hi Steven,
> > >
> > > I must admit that I didn't really considered that option. I can see how
> > > attractive it is from your perspective. In practice it would come with
> > lots
> > > of edge cases which would need to be thought through:
> > >
> > > 1. What happens if the controller can't produce a record to this topic
> > > because the partitions leader is unavailable?
> > > 2. One solution to that is for the topic to be replicated on every
> > broker,
> > > so that the controller could elect itself leader on controller
> failover.
> > > But that raises another problem: What if, upon controller failover, the
> > > controller is ineligible for leader election because it's not in the
> ISR?
> > > 3. The above questions suggest the controller might not always be able
> to
> > > produce to the topic, but the controller isn't able to control when
> other
> > > brokers catch up replicating moved partitions and has to deal with
> those
> > > events. The controller would have to record (in memory) that the
> > > reassignment was complete, but hadn't been published, and publish
> later,
> > > when it was able to.
> > > 4. Further to 3, we would need to recover the in-memory state of
> > > reassignments on controller failover. But now we have to consider what
> > > happens if the controller cannot *consume* from the topic.
> > >
> > > This seems pretty complicated to me. I think each of the above points
> has
> > > alternatives (or compromises) which might make the problem more
> > tractable,
> > > so I'd welcome hearing from anyone who has ideas on that. In particular
> > > there are parallels with consumer offsets which might be worth thinking
> > > about some more.
> > >
> > > I would be useful it define better the use case we're trying to cater
> to
> > > here.
> > >
> > > * Is it just a notification that a given reassignment has finished that
> > > you're interested in?
> > > * What are the consequences if such a notification is delayed, or
> dropped
> > > entirely?
> > >
> > > Regards,
> > >
> > > Tom
> > >
> > >
> > >
> > > On 19 December 2017 at 20:34, Steven Aerts <steven.ae...@gmail.com>
> > wrote:
> > >
> > > > Hello Tom,
> > > >
> > > >
> > > > when you were working out KIP-236, did you consider migrating the
> > > > reassignment
> > > > state from zookeeper to an internal kafka topic, keyed by partition
> > > > and log compacted?
> > > >
> > > > It would allow an admin client and controller to easily subscribe for
> > > > those changes,
> > > > without the need to extend the network protocol as discussed in
> > KIP-240.
> > > >
> > > > This is just a theoretical idea I wanted to share, as I can't find a
> > > > reason why it would
> > > > be a stupid idea.
> > > > But I assume that in practice, this will imply too much change to the
> > > > code base to be
> > > > viable.
> > > >
> > > >
> > > > Regards,
> > > >
> > > >
> > > >    Steven
> > > >
> > > >
> > > > 2017-12-18 11:49 GMT+01:00 Tom Bentley <t.j.bent...@gmail.com>:
> > > > > Hi Steven,
> > > > >
> > > > > I think it would be useful to be able to subscribe yourself on
> > updates of
> > > > >> reassignment changes.
> > > > >
> > > > >
> > > > > I agree this would be really useful, but, to the extent I
> understand
> > the
> > > > > networking underpinnings of the admin client, it might be difficult
> > to do
> > > > > well in practice. Part of the problem is that you might "set a
> > watch" (to
> > > > > borrow the ZK terminology) via one broker (or the controller), only
> > for
> > > > > that broker to fail (or the controller be re-elected). Obviously
> you
> > can
> > > > > detect the loss of connection and set a new watch via a different
> > broker
> > > > > (or the new controller), but that couldn't be transparent to the
> > user,
> > > > > because the AdminClient doesn't know what changed while it was
> > > > > disconnected/not watching.
> > > > >
> > > > > Another issue is that to avoid races you really need to combine
> > fetching
> > > > > the current state with setting the watch (as is done in the native
> > > > > ZooKeeper API). I think there are lots of subtle issues of this
> sort
> > > > which
> > > > > would need to be addressed to make something reliable.
> > > > >
> > > > > In the mean time, ZooKeeper already has a (proven and mature) API
> for
> > > > > watches, so there is, in principle, a good workaround. I say "in
> > > > principle"
> > > > > because in the KIP-236 proposal right now the
> > /admin/reassign_partitions
> > > > > znode is legacy and the reassignment is represented by
> > > > > /admin/reassigments/$topic/$partition. That naming scheme for the
> > znode
> > > > > would make it harder for ZooKeeper clients like yours because such
> > > > clients
> > > > > would need to set a child watch per topic. The original proposal
> for
> > the
> > > > > naming scheme was /admin/reassigments/$topic-$partition, which
> would
> > > > mean
> > > > > clients like yours would need only 1 child watch. The advantage of
> > > > > /admin/reassigments/$topic/$partition is it scales better. I don't
> > > > > currently know how well ZooKeeper copes with nodes with many
> > children, so
> > > > > it's difficult for me weigh those two options, but I would be happy
> > to
> > > > > switch back to /admin/reassigments/$topic-$partition if we could
> > > > reassure
> > > > > ourselves it would scale OK to the reassignment sizes would people
> > need
> > > > in
> > > > > practice.
> > > > >
> > > > > Overall I would prefer not to tackle something like this in *this*
> > KIP,
> > > > > though it could be something for a future KIP. Of course I'm happy
> to
> > > > hear
> > > > > more discussion about this too!
> > > > >
> > > > > Cheers,
> > > > >
> > > > > Tom
> > > > >
> > > > >
> > > > > On 15 December 2017 at 18:51, Steven Aerts <steven.ae...@gmail.com
> >
> > > > wrote:
> > > > >
> > > > >> Tom,
> > > > >>
> > > > >>
> > > > >> I think it would be useful to be able to subscribe yourself on
> > updates
> > > > of
> > > > >> reassignment changes.
> > > > >> Our internal kafka supervisor and monitoring tools are currently
> > > > subscribed
> > > > >> to these changes in zookeeper so they can babysit our clusters.
> > > > >>
> > > > >> I think it would be nice if we could receive these events through
> > the
> > > > >> adminclient.
> > > > >> In the api proposal, you can only poll for changes.
> > > > >>
> > > > >> No clue how difficult it would be to implement, maybe you can
> > piggyback
> > > > on
> > > > >> some version number in the repartition messages or on zookeeper.
> > > > >>
> > > > >> This is just an idea, not a must have feature for me.  We can
> always
> > > > poll
> > > > >> over
> > > > >> the proposed api.
> > > > >>
> > > > >>
> > > > >> Regards,
> > > > >>
> > > > >>
> > > > >>    Steven
> > > > >>
> > > > >>
> > > > >> Op vr 15 dec. 2017 om 19:16 schreef Tom Bentley <
> > t.j.bent...@gmail.com
> > > > >:
> > > > >>
> > > > >> > Hi,
> > > > >> >
> > > > >> > KIP-236 lays the foundations for AdminClient APIs to do with
> > partition
> > > > >> > reassignment. I'd now like to start discussing KIP-240, which
> adds
> > > > APIs
> > > > >> to
> > > > >> > the AdminClient to list and describe the current reassignments.
> > > > >> >
> > > > >> >
> > > > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > >> 240%3A+AdminClient.listReassignments+AdminClient.
> > describeReassignments
> > > > >> >
> > > > >> > Aside: I have fairly developed ideas for the API for starting a
> > > > >> > reassignment, but I intend to put that in a third KIP.
> > > > >> >
> > > > >> > Cheers,
> > > > >> >
> > > > >> > Tom
> > > > >> >
> > > > >>
> > > >
> >
>

Reply via email to