Hi Steven,

I must admit that I didn't really considered that option. I can see how
attractive it is from your perspective. In practice it would come with lots
of edge cases which would need to be thought through:

1. What happens if the controller can't produce a record to this topic
because the partitions leader is unavailable?
2. One solution to that is for the topic to be replicated on every broker,
so that the controller could elect itself leader on controller failover.
But that raises another problem: What if, upon controller failover, the
controller is ineligible for leader election because it's not in the ISR?
3. The above questions suggest the controller might not always be able to
produce to the topic, but the controller isn't able to control when other
brokers catch up replicating moved partitions and has to deal with those
events. The controller would have to record (in memory) that the
reassignment was complete, but hadn't been published, and publish later,
when it was able to.
4. Further to 3, we would need to recover the in-memory state of
reassignments on controller failover. But now we have to consider what
happens if the controller cannot *consume* from the topic.

This seems pretty complicated to me. I think each of the above points has
alternatives (or compromises) which might make the problem more tractable,
so I'd welcome hearing from anyone who has ideas on that. In particular
there are parallels with consumer offsets which might be worth thinking
about some more.

I would be useful it define better the use case we're trying to cater to
here.

* Is it just a notification that a given reassignment has finished that
you're interested in?
* What are the consequences if such a notification is delayed, or dropped
entirely?

Regards,

Tom



On 19 December 2017 at 20:34, Steven Aerts <steven.ae...@gmail.com> wrote:

> Hello Tom,
>
>
> when you were working out KIP-236, did you consider migrating the
> reassignment
> state from zookeeper to an internal kafka topic, keyed by partition
> and log compacted?
>
> It would allow an admin client and controller to easily subscribe for
> those changes,
> without the need to extend the network protocol as discussed in KIP-240.
>
> This is just a theoretical idea I wanted to share, as I can't find a
> reason why it would
> be a stupid idea.
> But I assume that in practice, this will imply too much change to the
> code base to be
> viable.
>
>
> Regards,
>
>
>    Steven
>
>
> 2017-12-18 11:49 GMT+01:00 Tom Bentley <t.j.bent...@gmail.com>:
> > Hi Steven,
> >
> > I think it would be useful to be able to subscribe yourself on updates of
> >> reassignment changes.
> >
> >
> > I agree this would be really useful, but, to the extent I understand the
> > networking underpinnings of the admin client, it might be difficult to do
> > well in practice. Part of the problem is that you might "set a watch" (to
> > borrow the ZK terminology) via one broker (or the controller), only for
> > that broker to fail (or the controller be re-elected). Obviously you can
> > detect the loss of connection and set a new watch via a different broker
> > (or the new controller), but that couldn't be transparent to the user,
> > because the AdminClient doesn't know what changed while it was
> > disconnected/not watching.
> >
> > Another issue is that to avoid races you really need to combine fetching
> > the current state with setting the watch (as is done in the native
> > ZooKeeper API). I think there are lots of subtle issues of this sort
> which
> > would need to be addressed to make something reliable.
> >
> > In the mean time, ZooKeeper already has a (proven and mature) API for
> > watches, so there is, in principle, a good workaround. I say "in
> principle"
> > because in the KIP-236 proposal right now the /admin/reassign_partitions
> > znode is legacy and the reassignment is represented by
> > /admin/reassigments/$topic/$partition. That naming scheme for the znode
> > would make it harder for ZooKeeper clients like yours because such
> clients
> > would need to set a child watch per topic. The original proposal for the
> > naming scheme was /admin/reassigments/$topic-$partition, which would
> mean
> > clients like yours would need only 1 child watch. The advantage of
> > /admin/reassigments/$topic/$partition is it scales better. I don't
> > currently know how well ZooKeeper copes with nodes with many children, so
> > it's difficult for me weigh those two options, but I would be happy to
> > switch back to /admin/reassigments/$topic-$partition if we could
> reassure
> > ourselves it would scale OK to the reassignment sizes would people need
> in
> > practice.
> >
> > Overall I would prefer not to tackle something like this in *this* KIP,
> > though it could be something for a future KIP. Of course I'm happy to
> hear
> > more discussion about this too!
> >
> > Cheers,
> >
> > Tom
> >
> >
> > On 15 December 2017 at 18:51, Steven Aerts <steven.ae...@gmail.com>
> wrote:
> >
> >> Tom,
> >>
> >>
> >> I think it would be useful to be able to subscribe yourself on updates
> of
> >> reassignment changes.
> >> Our internal kafka supervisor and monitoring tools are currently
> subscribed
> >> to these changes in zookeeper so they can babysit our clusters.
> >>
> >> I think it would be nice if we could receive these events through the
> >> adminclient.
> >> In the api proposal, you can only poll for changes.
> >>
> >> No clue how difficult it would be to implement, maybe you can piggyback
> on
> >> some version number in the repartition messages or on zookeeper.
> >>
> >> This is just an idea, not a must have feature for me.  We can always
> poll
> >> over
> >> the proposed api.
> >>
> >>
> >> Regards,
> >>
> >>
> >>    Steven
> >>
> >>
> >> Op vr 15 dec. 2017 om 19:16 schreef Tom Bentley <t.j.bent...@gmail.com
> >:
> >>
> >> > Hi,
> >> >
> >> > KIP-236 lays the foundations for AdminClient APIs to do with partition
> >> > reassignment. I'd now like to start discussing KIP-240, which adds
> APIs
> >> to
> >> > the AdminClient to list and describe the current reassignments.
> >> >
> >> >
> >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >> 240%3A+AdminClient.listReassignments+AdminClient.describeReassignments
> >> >
> >> > Aside: I have fairly developed ideas for the API for starting a
> >> > reassignment, but I intend to put that in a third KIP.
> >> >
> >> > Cheers,
> >> >
> >> > Tom
> >> >
> >>
>

Reply via email to