Re: [DISCUSS] KIP-240: AdminClient.listReassignments AdminClient.describeReassignments

2018-03-20 Thread Steven Aerts
Tom,

on the implications you are referring to.
For me they seem the same for the __consumer_offsets and the
__transaction_state topic.

So I am wondering if we can rely on the same solutions for them, like
providing a *.replication.factor config option.


Best regards,


   Steven

Op ma 19 mrt. 2018 om 14:30 schreef Tom Bentley :

> Last week I was able to spend a bit of time working on KIP-236 again and,
> based on the discussion about that with Jun back in December, I refactored
> the controller to store the reassignment state in /brokers/topics/${topic}
> instead of introducing new ZK nodes. This morning I was wondering what to
> do as a next step, as these changes are more or less useless on their own,
> without APIs for discovering the current partitions and/or reassigning
> partitions. I started thinking again about this KIP, and realised that
> using an internal compacted topic (say __partition_reassignments), as
> suggested by Steven and Colin, would require changes in basically the same
> places.
>
> Thinking through some of the failure modes ("what if I update ZK, but can't
> update produce to the topic?") I realised that it would actually be
> possible to simply remove storing this info from ZK entirely and just store
> this state in the __partition_reassignments topic. Doing it that way would
> eliminate those failure modes and would allow clients interested in
> reassignment completion the possibility to consume from this topic and
> respond to records published with a null value (indicating completion of a
> reassignment).
>
> There are some interesting implications to doing this:
>
> 1. This __partition_reassignments topic would need to be replicated in
> order to have availability of reassignment (if the leader of a partition of
> __partition_reassignments is not available then reassignment of those
> partitions whose state is held by the partition
> of __partition_reassignments would not be reassignable).
> 2. We would want to avoid unclean leader election for this topic.
>
> But I am interested in what other people think about this approach?
>
> Cheers,
>
> Tom
>
>
> On 9 January 2018 at 21:18, Colin McCabe  wrote:
>
> > What if we had an internal topic which watchers could listen to for
> > information about partition reassignments?  The information could be in
> > JSON, so if we want to add new fields later, we always could.
> >
> > This avoids introducing a new AdminClient API.  For clients that want to
> > be notified about partition reassignments in a timely fashion, this
> avoids
> > the "polling an AdminClient API in a tight loop" antipattern.  It allows
> > watchers to be notified in a simple and natural way about what is going
> > on.  Access can be controlled by the existing topic ACL mechanisms.
> >
> > best,
> > Colin
> >
> >
> > On Fri, Dec 22, 2017, at 06:48, Tom Bentley wrote:
> > > Hi Steven,
> > >
> > > I must admit that I didn't really considered that option. I can see how
> > > attractive it is from your perspective. In practice it would come with
> > lots
> > > of edge cases which would need to be thought through:
> > >
> > > 1. What happens if the controller can't produce a record to this topic
> > > because the partitions leader is unavailable?
> > > 2. One solution to that is for the topic to be replicated on every
> > broker,
> > > so that the controller could elect itself leader on controller
> failover.
> > > But that raises another problem: What if, upon controller failover, the
> > > controller is ineligible for leader election because it's not in the
> ISR?
> > > 3. The above questions suggest the controller might not always be able
> to
> > > produce to the topic, but the controller isn't able to control when
> other
> > > brokers catch up replicating moved partitions and has to deal with
> those
> > > events. The controller would have to record (in memory) that the
> > > reassignment was complete, but hadn't been published, and publish
> later,
> > > when it was able to.
> > > 4. Further to 3, we would need to recover the in-memory state of
> > > reassignments on controller failover. But now we have to consider what
> > > happens if the controller cannot *consume* from the topic.
> > >
> > > This seems pretty complicated to me. I think each of the above points
> has
> > > alternatives (or compromises) which might make the problem more
> > tractable,
> > > so I'd welcome hearing from anyone who has ideas on that. In particular
> > > there are parallels with consumer offsets which might be worth thinking
> > > about some more.
> > >
> > > I would be useful it define better the use case we're trying to cater
> to
> > > here.
> > >
> > > * Is it just a notification that a given reassignment has finished that
> > > you're interested in?
> > > * What are the consequences if such a notification is delayed, or
> dropped
> > > entirely?
> > >
> > > Regards,
> > >
> > > Tom
> > >
> > >
> > >
> > > On 19 December 2017 at 20:34, Steven Aerts 
> > wrote:
> > >
> > > > 

Re: [DISCUSS] KIP-240: AdminClient.listReassignments AdminClient.describeReassignments

2018-03-19 Thread Tom Bentley
Last week I was able to spend a bit of time working on KIP-236 again and,
based on the discussion about that with Jun back in December, I refactored
the controller to store the reassignment state in /brokers/topics/${topic}
instead of introducing new ZK nodes. This morning I was wondering what to
do as a next step, as these changes are more or less useless on their own,
without APIs for discovering the current partitions and/or reassigning
partitions. I started thinking again about this KIP, and realised that
using an internal compacted topic (say __partition_reassignments), as
suggested by Steven and Colin, would require changes in basically the same
places.

Thinking through some of the failure modes ("what if I update ZK, but can't
update produce to the topic?") I realised that it would actually be
possible to simply remove storing this info from ZK entirely and just store
this state in the __partition_reassignments topic. Doing it that way would
eliminate those failure modes and would allow clients interested in
reassignment completion the possibility to consume from this topic and
respond to records published with a null value (indicating completion of a
reassignment).

There are some interesting implications to doing this:

1. This __partition_reassignments topic would need to be replicated in
order to have availability of reassignment (if the leader of a partition of
__partition_reassignments is not available then reassignment of those
partitions whose state is held by the partition
of __partition_reassignments would not be reassignable).
2. We would want to avoid unclean leader election for this topic.

But I am interested in what other people think about this approach?

Cheers,

Tom


On 9 January 2018 at 21:18, Colin McCabe  wrote:

> What if we had an internal topic which watchers could listen to for
> information about partition reassignments?  The information could be in
> JSON, so if we want to add new fields later, we always could.
>
> This avoids introducing a new AdminClient API.  For clients that want to
> be notified about partition reassignments in a timely fashion, this avoids
> the "polling an AdminClient API in a tight loop" antipattern.  It allows
> watchers to be notified in a simple and natural way about what is going
> on.  Access can be controlled by the existing topic ACL mechanisms.
>
> best,
> Colin
>
>
> On Fri, Dec 22, 2017, at 06:48, Tom Bentley wrote:
> > Hi Steven,
> >
> > I must admit that I didn't really considered that option. I can see how
> > attractive it is from your perspective. In practice it would come with
> lots
> > of edge cases which would need to be thought through:
> >
> > 1. What happens if the controller can't produce a record to this topic
> > because the partitions leader is unavailable?
> > 2. One solution to that is for the topic to be replicated on every
> broker,
> > so that the controller could elect itself leader on controller failover.
> > But that raises another problem: What if, upon controller failover, the
> > controller is ineligible for leader election because it's not in the ISR?
> > 3. The above questions suggest the controller might not always be able to
> > produce to the topic, but the controller isn't able to control when other
> > brokers catch up replicating moved partitions and has to deal with those
> > events. The controller would have to record (in memory) that the
> > reassignment was complete, but hadn't been published, and publish later,
> > when it was able to.
> > 4. Further to 3, we would need to recover the in-memory state of
> > reassignments on controller failover. But now we have to consider what
> > happens if the controller cannot *consume* from the topic.
> >
> > This seems pretty complicated to me. I think each of the above points has
> > alternatives (or compromises) which might make the problem more
> tractable,
> > so I'd welcome hearing from anyone who has ideas on that. In particular
> > there are parallels with consumer offsets which might be worth thinking
> > about some more.
> >
> > I would be useful it define better the use case we're trying to cater to
> > here.
> >
> > * Is it just a notification that a given reassignment has finished that
> > you're interested in?
> > * What are the consequences if such a notification is delayed, or dropped
> > entirely?
> >
> > Regards,
> >
> > Tom
> >
> >
> >
> > On 19 December 2017 at 20:34, Steven Aerts 
> wrote:
> >
> > > Hello Tom,
> > >
> > >
> > > when you were working out KIP-236, did you consider migrating the
> > > reassignment
> > > state from zookeeper to an internal kafka topic, keyed by partition
> > > and log compacted?
> > >
> > > It would allow an admin client and controller to easily subscribe for
> > > those changes,
> > > without the need to extend the network protocol as discussed in
> KIP-240.
> > >
> > > This is just a theoretical idea I wanted to share, as I can't find a
> > > reason why it would
> > > be a stupid idea.
> > > But I assume tha

Re: [DISCUSS] KIP-240: AdminClient.listReassignments AdminClient.describeReassignments

2018-01-09 Thread Colin McCabe
What if we had an internal topic which watchers could listen to for information 
about partition reassignments?  The information could be in JSON, so if we want 
to add new fields later, we always could.  

This avoids introducing a new AdminClient API.  For clients that want to be 
notified about partition reassignments in a timely fashion, this avoids the 
"polling an AdminClient API in a tight loop" antipattern.  It allows watchers 
to be notified in a simple and natural way about what is going on.  Access can 
be controlled by the existing topic ACL mechanisms.

best,
Colin


On Fri, Dec 22, 2017, at 06:48, Tom Bentley wrote:
> Hi Steven,
> 
> I must admit that I didn't really considered that option. I can see how
> attractive it is from your perspective. In practice it would come with lots
> of edge cases which would need to be thought through:
> 
> 1. What happens if the controller can't produce a record to this topic
> because the partitions leader is unavailable?
> 2. One solution to that is for the topic to be replicated on every broker,
> so that the controller could elect itself leader on controller failover.
> But that raises another problem: What if, upon controller failover, the
> controller is ineligible for leader election because it's not in the ISR?
> 3. The above questions suggest the controller might not always be able to
> produce to the topic, but the controller isn't able to control when other
> brokers catch up replicating moved partitions and has to deal with those
> events. The controller would have to record (in memory) that the
> reassignment was complete, but hadn't been published, and publish later,
> when it was able to.
> 4. Further to 3, we would need to recover the in-memory state of
> reassignments on controller failover. But now we have to consider what
> happens if the controller cannot *consume* from the topic.
> 
> This seems pretty complicated to me. I think each of the above points has
> alternatives (or compromises) which might make the problem more tractable,
> so I'd welcome hearing from anyone who has ideas on that. In particular
> there are parallels with consumer offsets which might be worth thinking
> about some more.
> 
> I would be useful it define better the use case we're trying to cater to
> here.
> 
> * Is it just a notification that a given reassignment has finished that
> you're interested in?
> * What are the consequences if such a notification is delayed, or dropped
> entirely?
> 
> Regards,
> 
> Tom
> 
> 
> 
> On 19 December 2017 at 20:34, Steven Aerts  wrote:
> 
> > Hello Tom,
> >
> >
> > when you were working out KIP-236, did you consider migrating the
> > reassignment
> > state from zookeeper to an internal kafka topic, keyed by partition
> > and log compacted?
> >
> > It would allow an admin client and controller to easily subscribe for
> > those changes,
> > without the need to extend the network protocol as discussed in KIP-240.
> >
> > This is just a theoretical idea I wanted to share, as I can't find a
> > reason why it would
> > be a stupid idea.
> > But I assume that in practice, this will imply too much change to the
> > code base to be
> > viable.
> >
> >
> > Regards,
> >
> >
> >Steven
> >
> >
> > 2017-12-18 11:49 GMT+01:00 Tom Bentley :
> > > Hi Steven,
> > >
> > > I think it would be useful to be able to subscribe yourself on updates of
> > >> reassignment changes.
> > >
> > >
> > > I agree this would be really useful, but, to the extent I understand the
> > > networking underpinnings of the admin client, it might be difficult to do
> > > well in practice. Part of the problem is that you might "set a watch" (to
> > > borrow the ZK terminology) via one broker (or the controller), only for
> > > that broker to fail (or the controller be re-elected). Obviously you can
> > > detect the loss of connection and set a new watch via a different broker
> > > (or the new controller), but that couldn't be transparent to the user,
> > > because the AdminClient doesn't know what changed while it was
> > > disconnected/not watching.
> > >
> > > Another issue is that to avoid races you really need to combine fetching
> > > the current state with setting the watch (as is done in the native
> > > ZooKeeper API). I think there are lots of subtle issues of this sort
> > which
> > > would need to be addressed to make something reliable.
> > >
> > > In the mean time, ZooKeeper already has a (proven and mature) API for
> > > watches, so there is, in principle, a good workaround. I say "in
> > principle"
> > > because in the KIP-236 proposal right now the /admin/reassign_partitions
> > > znode is legacy and the reassignment is represented by
> > > /admin/reassigments/$topic/$partition. That naming scheme for the znode
> > > would make it harder for ZooKeeper clients like yours because such
> > clients
> > > would need to set a child watch per topic. The original proposal for the
> > > naming scheme was /admin/reassigments/$topic-$partition, which would
>

Re: [DISCUSS] KIP-240: AdminClient.listReassignments AdminClient.describeReassignments

2017-12-22 Thread Tom Bentley
Hi Steven,

I must admit that I didn't really considered that option. I can see how
attractive it is from your perspective. In practice it would come with lots
of edge cases which would need to be thought through:

1. What happens if the controller can't produce a record to this topic
because the partitions leader is unavailable?
2. One solution to that is for the topic to be replicated on every broker,
so that the controller could elect itself leader on controller failover.
But that raises another problem: What if, upon controller failover, the
controller is ineligible for leader election because it's not in the ISR?
3. The above questions suggest the controller might not always be able to
produce to the topic, but the controller isn't able to control when other
brokers catch up replicating moved partitions and has to deal with those
events. The controller would have to record (in memory) that the
reassignment was complete, but hadn't been published, and publish later,
when it was able to.
4. Further to 3, we would need to recover the in-memory state of
reassignments on controller failover. But now we have to consider what
happens if the controller cannot *consume* from the topic.

This seems pretty complicated to me. I think each of the above points has
alternatives (or compromises) which might make the problem more tractable,
so I'd welcome hearing from anyone who has ideas on that. In particular
there are parallels with consumer offsets which might be worth thinking
about some more.

I would be useful it define better the use case we're trying to cater to
here.

* Is it just a notification that a given reassignment has finished that
you're interested in?
* What are the consequences if such a notification is delayed, or dropped
entirely?

Regards,

Tom



On 19 December 2017 at 20:34, Steven Aerts  wrote:

> Hello Tom,
>
>
> when you were working out KIP-236, did you consider migrating the
> reassignment
> state from zookeeper to an internal kafka topic, keyed by partition
> and log compacted?
>
> It would allow an admin client and controller to easily subscribe for
> those changes,
> without the need to extend the network protocol as discussed in KIP-240.
>
> This is just a theoretical idea I wanted to share, as I can't find a
> reason why it would
> be a stupid idea.
> But I assume that in practice, this will imply too much change to the
> code base to be
> viable.
>
>
> Regards,
>
>
>Steven
>
>
> 2017-12-18 11:49 GMT+01:00 Tom Bentley :
> > Hi Steven,
> >
> > I think it would be useful to be able to subscribe yourself on updates of
> >> reassignment changes.
> >
> >
> > I agree this would be really useful, but, to the extent I understand the
> > networking underpinnings of the admin client, it might be difficult to do
> > well in practice. Part of the problem is that you might "set a watch" (to
> > borrow the ZK terminology) via one broker (or the controller), only for
> > that broker to fail (or the controller be re-elected). Obviously you can
> > detect the loss of connection and set a new watch via a different broker
> > (or the new controller), but that couldn't be transparent to the user,
> > because the AdminClient doesn't know what changed while it was
> > disconnected/not watching.
> >
> > Another issue is that to avoid races you really need to combine fetching
> > the current state with setting the watch (as is done in the native
> > ZooKeeper API). I think there are lots of subtle issues of this sort
> which
> > would need to be addressed to make something reliable.
> >
> > In the mean time, ZooKeeper already has a (proven and mature) API for
> > watches, so there is, in principle, a good workaround. I say "in
> principle"
> > because in the KIP-236 proposal right now the /admin/reassign_partitions
> > znode is legacy and the reassignment is represented by
> > /admin/reassigments/$topic/$partition. That naming scheme for the znode
> > would make it harder for ZooKeeper clients like yours because such
> clients
> > would need to set a child watch per topic. The original proposal for the
> > naming scheme was /admin/reassigments/$topic-$partition, which would
> mean
> > clients like yours would need only 1 child watch. The advantage of
> > /admin/reassigments/$topic/$partition is it scales better. I don't
> > currently know how well ZooKeeper copes with nodes with many children, so
> > it's difficult for me weigh those two options, but I would be happy to
> > switch back to /admin/reassigments/$topic-$partition if we could
> reassure
> > ourselves it would scale OK to the reassignment sizes would people need
> in
> > practice.
> >
> > Overall I would prefer not to tackle something like this in *this* KIP,
> > though it could be something for a future KIP. Of course I'm happy to
> hear
> > more discussion about this too!
> >
> > Cheers,
> >
> > Tom
> >
> >
> > On 15 December 2017 at 18:51, Steven Aerts 
> wrote:
> >
> >> Tom,
> >>
> >>
> >> I think it would be useful to be able to subscrib

Re: [DISCUSS] KIP-240: AdminClient.listReassignments AdminClient.describeReassignments

2017-12-19 Thread Steven Aerts
Hello Tom,


when you were working out KIP-236, did you consider migrating the reassignment
state from zookeeper to an internal kafka topic, keyed by partition
and log compacted?

It would allow an admin client and controller to easily subscribe for
those changes,
without the need to extend the network protocol as discussed in KIP-240.

This is just a theoretical idea I wanted to share, as I can't find a
reason why it would
be a stupid idea.
But I assume that in practice, this will imply too much change to the
code base to be
viable.


Regards,


   Steven


2017-12-18 11:49 GMT+01:00 Tom Bentley :
> Hi Steven,
>
> I think it would be useful to be able to subscribe yourself on updates of
>> reassignment changes.
>
>
> I agree this would be really useful, but, to the extent I understand the
> networking underpinnings of the admin client, it might be difficult to do
> well in practice. Part of the problem is that you might "set a watch" (to
> borrow the ZK terminology) via one broker (or the controller), only for
> that broker to fail (or the controller be re-elected). Obviously you can
> detect the loss of connection and set a new watch via a different broker
> (or the new controller), but that couldn't be transparent to the user,
> because the AdminClient doesn't know what changed while it was
> disconnected/not watching.
>
> Another issue is that to avoid races you really need to combine fetching
> the current state with setting the watch (as is done in the native
> ZooKeeper API). I think there are lots of subtle issues of this sort which
> would need to be addressed to make something reliable.
>
> In the mean time, ZooKeeper already has a (proven and mature) API for
> watches, so there is, in principle, a good workaround. I say "in principle"
> because in the KIP-236 proposal right now the /admin/reassign_partitions
> znode is legacy and the reassignment is represented by
> /admin/reassigments/$topic/$partition. That naming scheme for the znode
> would make it harder for ZooKeeper clients like yours because such clients
> would need to set a child watch per topic. The original proposal for the
> naming scheme was /admin/reassigments/$topic-$partition, which would mean
> clients like yours would need only 1 child watch. The advantage of
> /admin/reassigments/$topic/$partition is it scales better. I don't
> currently know how well ZooKeeper copes with nodes with many children, so
> it's difficult for me weigh those two options, but I would be happy to
> switch back to /admin/reassigments/$topic-$partition if we could reassure
> ourselves it would scale OK to the reassignment sizes would people need in
> practice.
>
> Overall I would prefer not to tackle something like this in *this* KIP,
> though it could be something for a future KIP. Of course I'm happy to hear
> more discussion about this too!
>
> Cheers,
>
> Tom
>
>
> On 15 December 2017 at 18:51, Steven Aerts  wrote:
>
>> Tom,
>>
>>
>> I think it would be useful to be able to subscribe yourself on updates of
>> reassignment changes.
>> Our internal kafka supervisor and monitoring tools are currently subscribed
>> to these changes in zookeeper so they can babysit our clusters.
>>
>> I think it would be nice if we could receive these events through the
>> adminclient.
>> In the api proposal, you can only poll for changes.
>>
>> No clue how difficult it would be to implement, maybe you can piggyback on
>> some version number in the repartition messages or on zookeeper.
>>
>> This is just an idea, not a must have feature for me.  We can always poll
>> over
>> the proposed api.
>>
>>
>> Regards,
>>
>>
>>Steven
>>
>>
>> Op vr 15 dec. 2017 om 19:16 schreef Tom Bentley :
>>
>> > Hi,
>> >
>> > KIP-236 lays the foundations for AdminClient APIs to do with partition
>> > reassignment. I'd now like to start discussing KIP-240, which adds APIs
>> to
>> > the AdminClient to list and describe the current reassignments.
>> >
>> >
>> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> 240%3A+AdminClient.listReassignments+AdminClient.describeReassignments
>> >
>> > Aside: I have fairly developed ideas for the API for starting a
>> > reassignment, but I intend to put that in a third KIP.
>> >
>> > Cheers,
>> >
>> > Tom
>> >
>>


Re: [DISCUSS] KIP-240: AdminClient.listReassignments AdminClient.describeReassignments

2017-12-18 Thread Tom Bentley
I've removed the option to pass a null reassignments argument to
AdminClient.describeReassignments(), because to support this would require
passing the topic and partition of each reassignment in the response, for
what might be many partitions. It just seems like unnecessary bloat.

AdminClient.listPartitions() already provides for passing a null collection
of partitions to discover all the reassignments, so overall there is no
loss of functionality. I'm happy to add it back if people really think it's
necessary.

On 18 December 2017 at 10:49, Tom Bentley  wrote:

> Hi Ted,
>
> For class Reassignment, it seems you forgot to include set of brokers.
>
>
> I omitted a set of brokers intentionally, because the Reassignment is a an
> immutable reference to a (mutable) reassignment (in ZooKeeper). See KIP-236
> to understand the background.
>
> I've fixed the other two issues, thanks for noticing them.
>
> Cheers,
>
> Tom
>
> On 15 December 2017 at 18:53, Ted Yu  wrote:
>
>> Please create corresponding JIRA.
>>
>> For class Reassignment, it seems you forgot to include set of brokers.
>>
>> For class DescribeReassignmentsResult:
>> public KafkaFuture reassignments();
>> the return value should be a Collection.
>>
>>
>> On Fri, Dec 15, 2017 at 10:16 AM, Tom Bentley 
>> wrote:
>>
>> > Hi,
>> >
>> > KIP-236 lays the foundations for AdminClient APIs to do with partition
>> > reassignment. I'd now like to start discussing KIP-240, which adds APIs
>> to
>> > the AdminClient to list and describe the current reassignments.
>> >
>> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-240%
>> 3A+AdminClient.
>> > listReassignments+AdminClient.describeReassignments
>> >
>> > Aside: I have fairly developed ideas for the API for starting a
>> > reassignment, but I intend to put that in a third KIP.
>> >
>> > Cheers,
>> >
>> > Tom
>> >
>>
>
>


Re: [DISCUSS] KIP-240: AdminClient.listReassignments AdminClient.describeReassignments

2017-12-18 Thread Tom Bentley
Hi Ted,

For class Reassignment, it seems you forgot to include set of brokers.


I omitted a set of brokers intentionally, because the Reassignment is a an
immutable reference to a (mutable) reassignment (in ZooKeeper). See KIP-236
to understand the background.

I've fixed the other two issues, thanks for noticing them.

Cheers,

Tom

On 15 December 2017 at 18:53, Ted Yu  wrote:

> Please create corresponding JIRA.
>
> For class Reassignment, it seems you forgot to include set of brokers.
>
> For class DescribeReassignmentsResult:
> public KafkaFuture reassignments();
> the return value should be a Collection.
>
>
> On Fri, Dec 15, 2017 at 10:16 AM, Tom Bentley 
> wrote:
>
> > Hi,
> >
> > KIP-236 lays the foundations for AdminClient APIs to do with partition
> > reassignment. I'd now like to start discussing KIP-240, which adds APIs
> to
> > the AdminClient to list and describe the current reassignments.
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-240%3A+AdminClient
> .
> > listReassignments+AdminClient.describeReassignments
> >
> > Aside: I have fairly developed ideas for the API for starting a
> > reassignment, but I intend to put that in a third KIP.
> >
> > Cheers,
> >
> > Tom
> >
>


Re: [DISCUSS] KIP-240: AdminClient.listReassignments AdminClient.describeReassignments

2017-12-18 Thread Tom Bentley
Hi Steven,

I think it would be useful to be able to subscribe yourself on updates of
> reassignment changes.


I agree this would be really useful, but, to the extent I understand the
networking underpinnings of the admin client, it might be difficult to do
well in practice. Part of the problem is that you might "set a watch" (to
borrow the ZK terminology) via one broker (or the controller), only for
that broker to fail (or the controller be re-elected). Obviously you can
detect the loss of connection and set a new watch via a different broker
(or the new controller), but that couldn't be transparent to the user,
because the AdminClient doesn't know what changed while it was
disconnected/not watching.

Another issue is that to avoid races you really need to combine fetching
the current state with setting the watch (as is done in the native
ZooKeeper API). I think there are lots of subtle issues of this sort which
would need to be addressed to make something reliable.

In the mean time, ZooKeeper already has a (proven and mature) API for
watches, so there is, in principle, a good workaround. I say "in principle"
because in the KIP-236 proposal right now the /admin/reassign_partitions
znode is legacy and the reassignment is represented by
/admin/reassigments/$topic/$partition. That naming scheme for the znode
would make it harder for ZooKeeper clients like yours because such clients
would need to set a child watch per topic. The original proposal for the
naming scheme was /admin/reassigments/$topic-$partition, which would mean
clients like yours would need only 1 child watch. The advantage of
/admin/reassigments/$topic/$partition is it scales better. I don't
currently know how well ZooKeeper copes with nodes with many children, so
it's difficult for me weigh those two options, but I would be happy to
switch back to /admin/reassigments/$topic-$partition if we could reassure
ourselves it would scale OK to the reassignment sizes would people need in
practice.

Overall I would prefer not to tackle something like this in *this* KIP,
though it could be something for a future KIP. Of course I'm happy to hear
more discussion about this too!

Cheers,

Tom


On 15 December 2017 at 18:51, Steven Aerts  wrote:

> Tom,
>
>
> I think it would be useful to be able to subscribe yourself on updates of
> reassignment changes.
> Our internal kafka supervisor and monitoring tools are currently subscribed
> to these changes in zookeeper so they can babysit our clusters.
>
> I think it would be nice if we could receive these events through the
> adminclient.
> In the api proposal, you can only poll for changes.
>
> No clue how difficult it would be to implement, maybe you can piggyback on
> some version number in the repartition messages or on zookeeper.
>
> This is just an idea, not a must have feature for me.  We can always poll
> over
> the proposed api.
>
>
> Regards,
>
>
>Steven
>
>
> Op vr 15 dec. 2017 om 19:16 schreef Tom Bentley :
>
> > Hi,
> >
> > KIP-236 lays the foundations for AdminClient APIs to do with partition
> > reassignment. I'd now like to start discussing KIP-240, which adds APIs
> to
> > the AdminClient to list and describe the current reassignments.
> >
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 240%3A+AdminClient.listReassignments+AdminClient.describeReassignments
> >
> > Aside: I have fairly developed ideas for the API for starting a
> > reassignment, but I intend to put that in a third KIP.
> >
> > Cheers,
> >
> > Tom
> >
>


Re: [DISCUSS] KIP-240: AdminClient.listReassignments AdminClient.describeReassignments

2017-12-15 Thread Ted Yu
Please create corresponding JIRA.

For class Reassignment, it seems you forgot to include set of brokers.

For class DescribeReassignmentsResult:
public KafkaFuture reassignments();
the return value should be a Collection.


On Fri, Dec 15, 2017 at 10:16 AM, Tom Bentley  wrote:

> Hi,
>
> KIP-236 lays the foundations for AdminClient APIs to do with partition
> reassignment. I'd now like to start discussing KIP-240, which adds APIs to
> the AdminClient to list and describe the current reassignments.
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-240%3A+AdminClient.
> listReassignments+AdminClient.describeReassignments
>
> Aside: I have fairly developed ideas for the API for starting a
> reassignment, but I intend to put that in a third KIP.
>
> Cheers,
>
> Tom
>


Re: [DISCUSS] KIP-240: AdminClient.listReassignments AdminClient.describeReassignments

2017-12-15 Thread Steven Aerts
Tom,


I think it would be useful to be able to subscribe yourself on updates of
reassignment changes.
Our internal kafka supervisor and monitoring tools are currently subscribed
to these changes in zookeeper so they can babysit our clusters.

I think it would be nice if we could receive these events through the
adminclient.
In the api proposal, you can only poll for changes.

No clue how difficult it would be to implement, maybe you can piggyback on
some version number in the repartition messages or on zookeeper.

This is just an idea, not a must have feature for me.  We can always poll
over
the proposed api.


Regards,


   Steven


Op vr 15 dec. 2017 om 19:16 schreef Tom Bentley :

> Hi,
>
> KIP-236 lays the foundations for AdminClient APIs to do with partition
> reassignment. I'd now like to start discussing KIP-240, which adds APIs to
> the AdminClient to list and describe the current reassignments.
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-240%3A+AdminClient.listReassignments+AdminClient.describeReassignments
>
> Aside: I have fairly developed ideas for the API for starting a
> reassignment, but I intend to put that in a third KIP.
>
> Cheers,
>
> Tom
>


[DISCUSS] KIP-240: AdminClient.listReassignments AdminClient.describeReassignments

2017-12-15 Thread Tom Bentley
Hi,

KIP-236 lays the foundations for AdminClient APIs to do with partition
reassignment. I'd now like to start discussing KIP-240, which adds APIs to
the AdminClient to list and describe the current reassignments.

https://cwiki.apache.org/confluence/display/KAFKA/KIP-240%3A+AdminClient.listReassignments+AdminClient.describeReassignments

Aside: I have fairly developed ideas for the API for starting a
reassignment, but I intend to put that in a third KIP.

Cheers,

Tom