Re: [VOTE] KIP-352: Distinguish URPs caused by reassignment

2019-10-17 Thread Viktor Somogyi-Vass
Hi Ismael,
By Jason's suggestion we finally went with the originally voted proposal
that is to include reassignment bytes in/out in replication bytes in/out
and we discuss this when throttling is changed. Sorry for not updating this
thread, I was busy with the code review. I'll get the KIP in shape to
reflect the final version.

Thanks,
Viktor

On Wed, Oct 16, 2019 at 10:06 PM Ismael Juma  wrote:

> The current proposal says that replication throughput would change not to
> include reassignment though.
>
> Ismael
>
> On Wed, Oct 16, 2019, 11:53 AM Colin McCabe  wrote:
>
> > Hi Ismael,
> >
> > I think every replica is doing replication, by definition.  But not every
> > replica is undergoing reassignment.
> >
> > If the broker that died was in the set of new replicas being added, its
> > death will not add a new under-replicated partition.  Otherwise, it will
> > add a new URP.
> >
> > best,
> > Colin
> >
> >
> > On Wed, Oct 16, 2019, at 11:26, Ismael Juma wrote:
> > > If a broker dies and loses the disk, is it replication or reassignment?
> > >
> > > Ismael
> > >
> > > On Thu, Oct 10, 2019 at 3:30 AM Viktor Somogyi-Vass <
> > viktorsomo...@gmail.com>
> > > wrote:
> > >
> > > > Hi All,
> > > >
> > > > During the code review it came up that we shouldn't count replication
> > bytes
> > > > together with reassignment bytes so they count to a different
> metrics.
> > This
> > > > is a change in the semantics of ReplicationBytesInPerSec and
> > > > ReplicationBytesOutPerSec metrics but since we plan to separate
> > > > reassignment from replication in terms of throttling, it makes sense
> to
> > > > record metrics separately as well.
> > > > If there are no objections we'll proceed with this interpretation
> but I
> > > > wanted to send a shoutout here as well.
> > > >
> > > > Also the ReassignmentMaxLag will go in a different JIRA as it
> requires
> > more
> > > > discussion.
> > > >
> > > > Thanks,
> > > > Viktor
> > > >
> > > > On Mon, Aug 26, 2019 at 6:30 PM Jason Gustafson 
> > > > wrote:
> > > >
> > > > > Closing this vote. The final result is +9 with 4 binding votes.
> > > > >
> > > > > @Satish Sorry, I missed your question above. Good point about
> > updating
> > > > > documentation. I will create a separate jira to make sure this gets
> > done.
> > > > >
> > > > > -Jason
> > > > >
> > > > > On Fri, Aug 23, 2019 at 11:23 AM Jason Gustafson <
> ja...@confluent.io
> > >
> > > > > wrote:
> > > > >
> > > > > > Thanks Stan, good catch. I have updated the KIP. I will plan to
> > close
> > > > the
> > > > > > vote Monday if there are no objections.
> > > > > >
> > > > > > -Jason
> > > > > >
> > > > > > On Fri, Aug 23, 2019 at 11:14 AM Colin McCabe <
> cmcc...@apache.org>
> > > > > wrote:
> > > > > >
> > > > > >> On Fri, Aug 23, 2019, at 11:08, Stanislav Kozlovski wrote:
> > > > > >> > Thanks for the KIP, this is very helpful
> > > > > >> >
> > > > > >> > I had an offline discussion with Jason and we discussed the
> > > > semantics
> > > > > of
> > > > > >> > the underMinIsr/atMinIsr metrics. The current proposal would
> > expose
> > > > a
> > > > > >> gap
> > > > > >> > where we could report URP but no MinIsr.
> > > > > >> > A brief example:
> > > > > >> > original replica set = [0,1,2]
> > > > > >> > new replica set = [3,4,5]
> > > > > >> > isr = [0, 3, 4]
> > > > > >> > config.minIsr = 3
> > > > > >> >
> > > > > >> > As the KIP said
> > > > > >> > > In other words, we will subtract the AddingReplica from both
> > the
> > > > > total
> > > > > >> > replicas and the current ISR when determining URP
> satisfaction.
> > > > > >> > We would report URP=2 (1 and 2 are not in ISR) but not
> > underMinIsr,
> > > > as
> > > > > >> we
> > > > > >> > have an ISR of 3.
> > > > > >> > Technically, any produce requests with acks=all would succeed,
> > so it
> > > > > >> would
> > > > > >> > be false to report `underMinIsr`. We thought it'd be good to
> > keep
> > > > both
> > > > > >> > metrics consistent, so a new proposal is to use the following
> > > > > algorithm:
> > > > > >> > ```
> > > > > >> > isUrp == size(original replicas) - size(isr) > 0
> > > > > >> > ```
> > > > > >>
> > > > > >> Hi Stan,
> > > > > >>
> > > > > >> That's a good point.  Basically we should regard the size of the
> > > > > original
> > > > > >> replica set as the desired replication factor, and calculate the
> > URPs
> > > > > based
> > > > > >> on that.  +1 for this.  (I assume Jason will update the KIP...)
> > > > > >>
> > > > > >> best,
> > > > > >> Colin
> > > > > >>
> > > > > >>
> > > > > >> >
> > > > > >> > Taking that into account, +1 from me! (non-binding)
> > > > > >> >
> > > > > >> > On Fri, Aug 23, 2019 at 7:00 PM Colin McCabe <
> > cmcc...@apache.org>
> > > > > >> wrote:
> > > > > >> >
> > > > > >> > > +1 (binding).
> > > > > >> > >
> > > > > >> > > cheers,
> > > > > >> > > Colin
> > > > > >> > >
> > > > > >> > > On Tue, Aug 20, 2019, at 10:55, Jason Gustafson wrote:
> > > > > >> > > > Hi All,
> > > > > >> > > >
> > > > > >> > > > I'

Re: [VOTE] KIP-352: Distinguish URPs caused by reassignment

2019-10-16 Thread Ismael Juma
The current proposal says that replication throughput would change not to
include reassignment though.

Ismael

On Wed, Oct 16, 2019, 11:53 AM Colin McCabe  wrote:

> Hi Ismael,
>
> I think every replica is doing replication, by definition.  But not every
> replica is undergoing reassignment.
>
> If the broker that died was in the set of new replicas being added, its
> death will not add a new under-replicated partition.  Otherwise, it will
> add a new URP.
>
> best,
> Colin
>
>
> On Wed, Oct 16, 2019, at 11:26, Ismael Juma wrote:
> > If a broker dies and loses the disk, is it replication or reassignment?
> >
> > Ismael
> >
> > On Thu, Oct 10, 2019 at 3:30 AM Viktor Somogyi-Vass <
> viktorsomo...@gmail.com>
> > wrote:
> >
> > > Hi All,
> > >
> > > During the code review it came up that we shouldn't count replication
> bytes
> > > together with reassignment bytes so they count to a different metrics.
> This
> > > is a change in the semantics of ReplicationBytesInPerSec and
> > > ReplicationBytesOutPerSec metrics but since we plan to separate
> > > reassignment from replication in terms of throttling, it makes sense to
> > > record metrics separately as well.
> > > If there are no objections we'll proceed with this interpretation but I
> > > wanted to send a shoutout here as well.
> > >
> > > Also the ReassignmentMaxLag will go in a different JIRA as it requires
> more
> > > discussion.
> > >
> > > Thanks,
> > > Viktor
> > >
> > > On Mon, Aug 26, 2019 at 6:30 PM Jason Gustafson 
> > > wrote:
> > >
> > > > Closing this vote. The final result is +9 with 4 binding votes.
> > > >
> > > > @Satish Sorry, I missed your question above. Good point about
> updating
> > > > documentation. I will create a separate jira to make sure this gets
> done.
> > > >
> > > > -Jason
> > > >
> > > > On Fri, Aug 23, 2019 at 11:23 AM Jason Gustafson  >
> > > > wrote:
> > > >
> > > > > Thanks Stan, good catch. I have updated the KIP. I will plan to
> close
> > > the
> > > > > vote Monday if there are no objections.
> > > > >
> > > > > -Jason
> > > > >
> > > > > On Fri, Aug 23, 2019 at 11:14 AM Colin McCabe 
> > > > wrote:
> > > > >
> > > > >> On Fri, Aug 23, 2019, at 11:08, Stanislav Kozlovski wrote:
> > > > >> > Thanks for the KIP, this is very helpful
> > > > >> >
> > > > >> > I had an offline discussion with Jason and we discussed the
> > > semantics
> > > > of
> > > > >> > the underMinIsr/atMinIsr metrics. The current proposal would
> expose
> > > a
> > > > >> gap
> > > > >> > where we could report URP but no MinIsr.
> > > > >> > A brief example:
> > > > >> > original replica set = [0,1,2]
> > > > >> > new replica set = [3,4,5]
> > > > >> > isr = [0, 3, 4]
> > > > >> > config.minIsr = 3
> > > > >> >
> > > > >> > As the KIP said
> > > > >> > > In other words, we will subtract the AddingReplica from both
> the
> > > > total
> > > > >> > replicas and the current ISR when determining URP satisfaction.
> > > > >> > We would report URP=2 (1 and 2 are not in ISR) but not
> underMinIsr,
> > > as
> > > > >> we
> > > > >> > have an ISR of 3.
> > > > >> > Technically, any produce requests with acks=all would succeed,
> so it
> > > > >> would
> > > > >> > be false to report `underMinIsr`. We thought it'd be good to
> keep
> > > both
> > > > >> > metrics consistent, so a new proposal is to use the following
> > > > algorithm:
> > > > >> > ```
> > > > >> > isUrp == size(original replicas) - size(isr) > 0
> > > > >> > ```
> > > > >>
> > > > >> Hi Stan,
> > > > >>
> > > > >> That's a good point.  Basically we should regard the size of the
> > > > original
> > > > >> replica set as the desired replication factor, and calculate the
> URPs
> > > > based
> > > > >> on that.  +1 for this.  (I assume Jason will update the KIP...)
> > > > >>
> > > > >> best,
> > > > >> Colin
> > > > >>
> > > > >>
> > > > >> >
> > > > >> > Taking that into account, +1 from me! (non-binding)
> > > > >> >
> > > > >> > On Fri, Aug 23, 2019 at 7:00 PM Colin McCabe <
> cmcc...@apache.org>
> > > > >> wrote:
> > > > >> >
> > > > >> > > +1 (binding).
> > > > >> > >
> > > > >> > > cheers,
> > > > >> > > Colin
> > > > >> > >
> > > > >> > > On Tue, Aug 20, 2019, at 10:55, Jason Gustafson wrote:
> > > > >> > > > Hi All,
> > > > >> > > >
> > > > >> > > > I'd like to start a vote on KIP-352, which is a follow-up to
> > > > >> KIP-455 to
> > > > >> > > > fix
> > > > >> > > > a long-known shortcoming of URP reporting and to improve
> > > > >> reassignment
> > > > >> > > > monitoring:
> > > > >> > > >
> > > > >> > >
> > > > >>
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-352%3A+Distinguish+URPs+caused+by+reassignment
> > > > >> > > > .
> > > > >> > > >
> > > > >> > > > Note that I have added one new metric following the
> discussion.
> > > It
> > > > >> seemed
> > > > >> > > > useful to have a lag metric for reassigning partitions.
> > > > >> > > >
> > > > >> > > > Thanks,
> > > > >> > > > Jason
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >> >
> > > > >>

Re: [VOTE] KIP-352: Distinguish URPs caused by reassignment

2019-10-16 Thread Colin McCabe
Hi Ismael,

I think every replica is doing replication, by definition.  But not every 
replica is undergoing reassignment.

If the broker that died was in the set of new replicas being added, its death 
will not add a new under-replicated partition.  Otherwise, it will add a new 
URP.

best,
Colin


On Wed, Oct 16, 2019, at 11:26, Ismael Juma wrote:
> If a broker dies and loses the disk, is it replication or reassignment?
> 
> Ismael
> 
> On Thu, Oct 10, 2019 at 3:30 AM Viktor Somogyi-Vass 
> wrote:
> 
> > Hi All,
> >
> > During the code review it came up that we shouldn't count replication bytes
> > together with reassignment bytes so they count to a different metrics. This
> > is a change in the semantics of ReplicationBytesInPerSec and
> > ReplicationBytesOutPerSec metrics but since we plan to separate
> > reassignment from replication in terms of throttling, it makes sense to
> > record metrics separately as well.
> > If there are no objections we'll proceed with this interpretation but I
> > wanted to send a shoutout here as well.
> >
> > Also the ReassignmentMaxLag will go in a different JIRA as it requires more
> > discussion.
> >
> > Thanks,
> > Viktor
> >
> > On Mon, Aug 26, 2019 at 6:30 PM Jason Gustafson 
> > wrote:
> >
> > > Closing this vote. The final result is +9 with 4 binding votes.
> > >
> > > @Satish Sorry, I missed your question above. Good point about updating
> > > documentation. I will create a separate jira to make sure this gets done.
> > >
> > > -Jason
> > >
> > > On Fri, Aug 23, 2019 at 11:23 AM Jason Gustafson 
> > > wrote:
> > >
> > > > Thanks Stan, good catch. I have updated the KIP. I will plan to close
> > the
> > > > vote Monday if there are no objections.
> > > >
> > > > -Jason
> > > >
> > > > On Fri, Aug 23, 2019 at 11:14 AM Colin McCabe 
> > > wrote:
> > > >
> > > >> On Fri, Aug 23, 2019, at 11:08, Stanislav Kozlovski wrote:
> > > >> > Thanks for the KIP, this is very helpful
> > > >> >
> > > >> > I had an offline discussion with Jason and we discussed the
> > semantics
> > > of
> > > >> > the underMinIsr/atMinIsr metrics. The current proposal would expose
> > a
> > > >> gap
> > > >> > where we could report URP but no MinIsr.
> > > >> > A brief example:
> > > >> > original replica set = [0,1,2]
> > > >> > new replica set = [3,4,5]
> > > >> > isr = [0, 3, 4]
> > > >> > config.minIsr = 3
> > > >> >
> > > >> > As the KIP said
> > > >> > > In other words, we will subtract the AddingReplica from both the
> > > total
> > > >> > replicas and the current ISR when determining URP satisfaction.
> > > >> > We would report URP=2 (1 and 2 are not in ISR) but not underMinIsr,
> > as
> > > >> we
> > > >> > have an ISR of 3.
> > > >> > Technically, any produce requests with acks=all would succeed, so it
> > > >> would
> > > >> > be false to report `underMinIsr`. We thought it'd be good to keep
> > both
> > > >> > metrics consistent, so a new proposal is to use the following
> > > algorithm:
> > > >> > ```
> > > >> > isUrp == size(original replicas) - size(isr) > 0
> > > >> > ```
> > > >>
> > > >> Hi Stan,
> > > >>
> > > >> That's a good point.  Basically we should regard the size of the
> > > original
> > > >> replica set as the desired replication factor, and calculate the URPs
> > > based
> > > >> on that.  +1 for this.  (I assume Jason will update the KIP...)
> > > >>
> > > >> best,
> > > >> Colin
> > > >>
> > > >>
> > > >> >
> > > >> > Taking that into account, +1 from me! (non-binding)
> > > >> >
> > > >> > On Fri, Aug 23, 2019 at 7:00 PM Colin McCabe 
> > > >> wrote:
> > > >> >
> > > >> > > +1 (binding).
> > > >> > >
> > > >> > > cheers,
> > > >> > > Colin
> > > >> > >
> > > >> > > On Tue, Aug 20, 2019, at 10:55, Jason Gustafson wrote:
> > > >> > > > Hi All,
> > > >> > > >
> > > >> > > > I'd like to start a vote on KIP-352, which is a follow-up to
> > > >> KIP-455 to
> > > >> > > > fix
> > > >> > > > a long-known shortcoming of URP reporting and to improve
> > > >> reassignment
> > > >> > > > monitoring:
> > > >> > > >
> > > >> > >
> > > >>
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-352%3A+Distinguish+URPs+caused+by+reassignment
> > > >> > > > .
> > > >> > > >
> > > >> > > > Note that I have added one new metric following the discussion.
> > It
> > > >> seemed
> > > >> > > > useful to have a lag metric for reassigning partitions.
> > > >> > > >
> > > >> > > > Thanks,
> > > >> > > > Jason
> > > >> > > >
> > > >> > >
> > > >> >
> > > >> >
> > > >> > --
> > > >> > Best,
> > > >> > Stanislav
> > > >> >
> > > >>
> > > >
> > >
> >
>


Re: [VOTE] KIP-352: Distinguish URPs caused by reassignment

2019-10-16 Thread Ismael Juma
If a broker dies and loses the disk, is it replication or reassignment?

Ismael

On Thu, Oct 10, 2019 at 3:30 AM Viktor Somogyi-Vass 
wrote:

> Hi All,
>
> During the code review it came up that we shouldn't count replication bytes
> together with reassignment bytes so they count to a different metrics. This
> is a change in the semantics of ReplicationBytesInPerSec and
> ReplicationBytesOutPerSec metrics but since we plan to separate
> reassignment from replication in terms of throttling, it makes sense to
> record metrics separately as well.
> If there are no objections we'll proceed with this interpretation but I
> wanted to send a shoutout here as well.
>
> Also the ReassignmentMaxLag will go in a different JIRA as it requires more
> discussion.
>
> Thanks,
> Viktor
>
> On Mon, Aug 26, 2019 at 6:30 PM Jason Gustafson 
> wrote:
>
> > Closing this vote. The final result is +9 with 4 binding votes.
> >
> > @Satish Sorry, I missed your question above. Good point about updating
> > documentation. I will create a separate jira to make sure this gets done.
> >
> > -Jason
> >
> > On Fri, Aug 23, 2019 at 11:23 AM Jason Gustafson 
> > wrote:
> >
> > > Thanks Stan, good catch. I have updated the KIP. I will plan to close
> the
> > > vote Monday if there are no objections.
> > >
> > > -Jason
> > >
> > > On Fri, Aug 23, 2019 at 11:14 AM Colin McCabe 
> > wrote:
> > >
> > >> On Fri, Aug 23, 2019, at 11:08, Stanislav Kozlovski wrote:
> > >> > Thanks for the KIP, this is very helpful
> > >> >
> > >> > I had an offline discussion with Jason and we discussed the
> semantics
> > of
> > >> > the underMinIsr/atMinIsr metrics. The current proposal would expose
> a
> > >> gap
> > >> > where we could report URP but no MinIsr.
> > >> > A brief example:
> > >> > original replica set = [0,1,2]
> > >> > new replica set = [3,4,5]
> > >> > isr = [0, 3, 4]
> > >> > config.minIsr = 3
> > >> >
> > >> > As the KIP said
> > >> > > In other words, we will subtract the AddingReplica from both the
> > total
> > >> > replicas and the current ISR when determining URP satisfaction.
> > >> > We would report URP=2 (1 and 2 are not in ISR) but not underMinIsr,
> as
> > >> we
> > >> > have an ISR of 3.
> > >> > Technically, any produce requests with acks=all would succeed, so it
> > >> would
> > >> > be false to report `underMinIsr`. We thought it'd be good to keep
> both
> > >> > metrics consistent, so a new proposal is to use the following
> > algorithm:
> > >> > ```
> > >> > isUrp == size(original replicas) - size(isr) > 0
> > >> > ```
> > >>
> > >> Hi Stan,
> > >>
> > >> That's a good point.  Basically we should regard the size of the
> > original
> > >> replica set as the desired replication factor, and calculate the URPs
> > based
> > >> on that.  +1 for this.  (I assume Jason will update the KIP...)
> > >>
> > >> best,
> > >> Colin
> > >>
> > >>
> > >> >
> > >> > Taking that into account, +1 from me! (non-binding)
> > >> >
> > >> > On Fri, Aug 23, 2019 at 7:00 PM Colin McCabe 
> > >> wrote:
> > >> >
> > >> > > +1 (binding).
> > >> > >
> > >> > > cheers,
> > >> > > Colin
> > >> > >
> > >> > > On Tue, Aug 20, 2019, at 10:55, Jason Gustafson wrote:
> > >> > > > Hi All,
> > >> > > >
> > >> > > > I'd like to start a vote on KIP-352, which is a follow-up to
> > >> KIP-455 to
> > >> > > > fix
> > >> > > > a long-known shortcoming of URP reporting and to improve
> > >> reassignment
> > >> > > > monitoring:
> > >> > > >
> > >> > >
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-352%3A+Distinguish+URPs+caused+by+reassignment
> > >> > > > .
> > >> > > >
> > >> > > > Note that I have added one new metric following the discussion.
> It
> > >> seemed
> > >> > > > useful to have a lag metric for reassigning partitions.
> > >> > > >
> > >> > > > Thanks,
> > >> > > > Jason
> > >> > > >
> > >> > >
> > >> >
> > >> >
> > >> > --
> > >> > Best,
> > >> > Stanislav
> > >> >
> > >>
> > >
> >
>


Re: [VOTE] KIP-352: Distinguish URPs caused by reassignment

2019-10-11 Thread Stanislav Kozlovski
+1 (non-binding), thanks for bringing it up

On Thu, Oct 10, 2019 at 5:48 PM Colin McCabe  wrote:

> +1.  Thanks, Viktor.
>
> Colin
>
> On Thu, Oct 10, 2019, at 03:30, Viktor Somogyi-Vass wrote:
> > Hi All,
> >
> > During the code review it came up that we shouldn't count replication
> bytes
> > together with reassignment bytes so they count to a different metrics.
> This
> > is a change in the semantics of ReplicationBytesInPerSec and
> > ReplicationBytesOutPerSec metrics but since we plan to separate
> > reassignment from replication in terms of throttling, it makes sense to
> > record metrics separately as well.
> > If there are no objections we'll proceed with this interpretation but I
> > wanted to send a shoutout here as well.
> >
> > Also the ReassignmentMaxLag will go in a different JIRA as it requires
> more
> > discussion.
> >
> > Thanks,
> > Viktor
> >
> > On Mon, Aug 26, 2019 at 6:30 PM Jason Gustafson 
> wrote:
> >
> > > Closing this vote. The final result is +9 with 4 binding votes.
> > >
> > > @Satish Sorry, I missed your question above. Good point about updating
> > > documentation. I will create a separate jira to make sure this gets
> done.
> > >
> > > -Jason
> > >
> > > On Fri, Aug 23, 2019 at 11:23 AM Jason Gustafson 
> > > wrote:
> > >
> > > > Thanks Stan, good catch. I have updated the KIP. I will plan to
> close the
> > > > vote Monday if there are no objections.
> > > >
> > > > -Jason
> > > >
> > > > On Fri, Aug 23, 2019 at 11:14 AM Colin McCabe 
> > > wrote:
> > > >
> > > >> On Fri, Aug 23, 2019, at 11:08, Stanislav Kozlovski wrote:
> > > >> > Thanks for the KIP, this is very helpful
> > > >> >
> > > >> > I had an offline discussion with Jason and we discussed the
> semantics
> > > of
> > > >> > the underMinIsr/atMinIsr metrics. The current proposal would
> expose a
> > > >> gap
> > > >> > where we could report URP but no MinIsr.
> > > >> > A brief example:
> > > >> > original replica set = [0,1,2]
> > > >> > new replica set = [3,4,5]
> > > >> > isr = [0, 3, 4]
> > > >> > config.minIsr = 3
> > > >> >
> > > >> > As the KIP said
> > > >> > > In other words, we will subtract the AddingReplica from both the
> > > total
> > > >> > replicas and the current ISR when determining URP satisfaction.
> > > >> > We would report URP=2 (1 and 2 are not in ISR) but not
> underMinIsr, as
> > > >> we
> > > >> > have an ISR of 3.
> > > >> > Technically, any produce requests with acks=all would succeed, so
> it
> > > >> would
> > > >> > be false to report `underMinIsr`. We thought it'd be good to keep
> both
> > > >> > metrics consistent, so a new proposal is to use the following
> > > algorithm:
> > > >> > ```
> > > >> > isUrp == size(original replicas) - size(isr) > 0
> > > >> > ```
> > > >>
> > > >> Hi Stan,
> > > >>
> > > >> That's a good point.  Basically we should regard the size of the
> > > original
> > > >> replica set as the desired replication factor, and calculate the
> URPs
> > > based
> > > >> on that.  +1 for this.  (I assume Jason will update the KIP...)
> > > >>
> > > >> best,
> > > >> Colin
> > > >>
> > > >>
> > > >> >
> > > >> > Taking that into account, +1 from me! (non-binding)
> > > >> >
> > > >> > On Fri, Aug 23, 2019 at 7:00 PM Colin McCabe 
> > > >> wrote:
> > > >> >
> > > >> > > +1 (binding).
> > > >> > >
> > > >> > > cheers,
> > > >> > > Colin
> > > >> > >
> > > >> > > On Tue, Aug 20, 2019, at 10:55, Jason Gustafson wrote:
> > > >> > > > Hi All,
> > > >> > > >
> > > >> > > > I'd like to start a vote on KIP-352, which is a follow-up to
> > > >> KIP-455 to
> > > >> > > > fix
> > > >> > > > a long-known shortcoming of URP reporting and to improve
> > > >> reassignment
> > > >> > > > monitoring:
> > > >> > > >
> > > >> > >
> > > >>
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-352%3A+Distinguish+URPs+caused+by+reassignment
> > > >> > > > .
> > > >> > > >
> > > >> > > > Note that I have added one new metric following the
> discussion. It
> > > >> seemed
> > > >> > > > useful to have a lag metric for reassigning partitions.
> > > >> > > >
> > > >> > > > Thanks,
> > > >> > > > Jason
> > > >> > > >
> > > >> > >
> > > >> >
> > > >> >
> > > >> > --
> > > >> > Best,
> > > >> > Stanislav
> > > >> >
> > > >>
> > > >
> > >
> >
>


-- 
Best,
Stanislav


Re: [VOTE] KIP-352: Distinguish URPs caused by reassignment

2019-10-10 Thread Colin McCabe
+1.  Thanks, Viktor.

Colin

On Thu, Oct 10, 2019, at 03:30, Viktor Somogyi-Vass wrote:
> Hi All,
> 
> During the code review it came up that we shouldn't count replication bytes
> together with reassignment bytes so they count to a different metrics. This
> is a change in the semantics of ReplicationBytesInPerSec and
> ReplicationBytesOutPerSec metrics but since we plan to separate
> reassignment from replication in terms of throttling, it makes sense to
> record metrics separately as well.
> If there are no objections we'll proceed with this interpretation but I
> wanted to send a shoutout here as well.
> 
> Also the ReassignmentMaxLag will go in a different JIRA as it requires more
> discussion.
> 
> Thanks,
> Viktor
> 
> On Mon, Aug 26, 2019 at 6:30 PM Jason Gustafson  wrote:
> 
> > Closing this vote. The final result is +9 with 4 binding votes.
> >
> > @Satish Sorry, I missed your question above. Good point about updating
> > documentation. I will create a separate jira to make sure this gets done.
> >
> > -Jason
> >
> > On Fri, Aug 23, 2019 at 11:23 AM Jason Gustafson 
> > wrote:
> >
> > > Thanks Stan, good catch. I have updated the KIP. I will plan to close the
> > > vote Monday if there are no objections.
> > >
> > > -Jason
> > >
> > > On Fri, Aug 23, 2019 at 11:14 AM Colin McCabe 
> > wrote:
> > >
> > >> On Fri, Aug 23, 2019, at 11:08, Stanislav Kozlovski wrote:
> > >> > Thanks for the KIP, this is very helpful
> > >> >
> > >> > I had an offline discussion with Jason and we discussed the semantics
> > of
> > >> > the underMinIsr/atMinIsr metrics. The current proposal would expose a
> > >> gap
> > >> > where we could report URP but no MinIsr.
> > >> > A brief example:
> > >> > original replica set = [0,1,2]
> > >> > new replica set = [3,4,5]
> > >> > isr = [0, 3, 4]
> > >> > config.minIsr = 3
> > >> >
> > >> > As the KIP said
> > >> > > In other words, we will subtract the AddingReplica from both the
> > total
> > >> > replicas and the current ISR when determining URP satisfaction.
> > >> > We would report URP=2 (1 and 2 are not in ISR) but not underMinIsr, as
> > >> we
> > >> > have an ISR of 3.
> > >> > Technically, any produce requests with acks=all would succeed, so it
> > >> would
> > >> > be false to report `underMinIsr`. We thought it'd be good to keep both
> > >> > metrics consistent, so a new proposal is to use the following
> > algorithm:
> > >> > ```
> > >> > isUrp == size(original replicas) - size(isr) > 0
> > >> > ```
> > >>
> > >> Hi Stan,
> > >>
> > >> That's a good point.  Basically we should regard the size of the
> > original
> > >> replica set as the desired replication factor, and calculate the URPs
> > based
> > >> on that.  +1 for this.  (I assume Jason will update the KIP...)
> > >>
> > >> best,
> > >> Colin
> > >>
> > >>
> > >> >
> > >> > Taking that into account, +1 from me! (non-binding)
> > >> >
> > >> > On Fri, Aug 23, 2019 at 7:00 PM Colin McCabe 
> > >> wrote:
> > >> >
> > >> > > +1 (binding).
> > >> > >
> > >> > > cheers,
> > >> > > Colin
> > >> > >
> > >> > > On Tue, Aug 20, 2019, at 10:55, Jason Gustafson wrote:
> > >> > > > Hi All,
> > >> > > >
> > >> > > > I'd like to start a vote on KIP-352, which is a follow-up to
> > >> KIP-455 to
> > >> > > > fix
> > >> > > > a long-known shortcoming of URP reporting and to improve
> > >> reassignment
> > >> > > > monitoring:
> > >> > > >
> > >> > >
> > >>
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-352%3A+Distinguish+URPs+caused+by+reassignment
> > >> > > > .
> > >> > > >
> > >> > > > Note that I have added one new metric following the discussion. It
> > >> seemed
> > >> > > > useful to have a lag metric for reassigning partitions.
> > >> > > >
> > >> > > > Thanks,
> > >> > > > Jason
> > >> > > >
> > >> > >
> > >> >
> > >> >
> > >> > --
> > >> > Best,
> > >> > Stanislav
> > >> >
> > >>
> > >
> >
>


Re: [VOTE] KIP-352: Distinguish URPs caused by reassignment

2019-10-10 Thread Viktor Somogyi-Vass
Hi All,

During the code review it came up that we shouldn't count replication bytes
together with reassignment bytes so they count to a different metrics. This
is a change in the semantics of ReplicationBytesInPerSec and
ReplicationBytesOutPerSec metrics but since we plan to separate
reassignment from replication in terms of throttling, it makes sense to
record metrics separately as well.
If there are no objections we'll proceed with this interpretation but I
wanted to send a shoutout here as well.

Also the ReassignmentMaxLag will go in a different JIRA as it requires more
discussion.

Thanks,
Viktor

On Mon, Aug 26, 2019 at 6:30 PM Jason Gustafson  wrote:

> Closing this vote. The final result is +9 with 4 binding votes.
>
> @Satish Sorry, I missed your question above. Good point about updating
> documentation. I will create a separate jira to make sure this gets done.
>
> -Jason
>
> On Fri, Aug 23, 2019 at 11:23 AM Jason Gustafson 
> wrote:
>
> > Thanks Stan, good catch. I have updated the KIP. I will plan to close the
> > vote Monday if there are no objections.
> >
> > -Jason
> >
> > On Fri, Aug 23, 2019 at 11:14 AM Colin McCabe 
> wrote:
> >
> >> On Fri, Aug 23, 2019, at 11:08, Stanislav Kozlovski wrote:
> >> > Thanks for the KIP, this is very helpful
> >> >
> >> > I had an offline discussion with Jason and we discussed the semantics
> of
> >> > the underMinIsr/atMinIsr metrics. The current proposal would expose a
> >> gap
> >> > where we could report URP but no MinIsr.
> >> > A brief example:
> >> > original replica set = [0,1,2]
> >> > new replica set = [3,4,5]
> >> > isr = [0, 3, 4]
> >> > config.minIsr = 3
> >> >
> >> > As the KIP said
> >> > > In other words, we will subtract the AddingReplica from both the
> total
> >> > replicas and the current ISR when determining URP satisfaction.
> >> > We would report URP=2 (1 and 2 are not in ISR) but not underMinIsr, as
> >> we
> >> > have an ISR of 3.
> >> > Technically, any produce requests with acks=all would succeed, so it
> >> would
> >> > be false to report `underMinIsr`. We thought it'd be good to keep both
> >> > metrics consistent, so a new proposal is to use the following
> algorithm:
> >> > ```
> >> > isUrp == size(original replicas) - size(isr) > 0
> >> > ```
> >>
> >> Hi Stan,
> >>
> >> That's a good point.  Basically we should regard the size of the
> original
> >> replica set as the desired replication factor, and calculate the URPs
> based
> >> on that.  +1 for this.  (I assume Jason will update the KIP...)
> >>
> >> best,
> >> Colin
> >>
> >>
> >> >
> >> > Taking that into account, +1 from me! (non-binding)
> >> >
> >> > On Fri, Aug 23, 2019 at 7:00 PM Colin McCabe 
> >> wrote:
> >> >
> >> > > +1 (binding).
> >> > >
> >> > > cheers,
> >> > > Colin
> >> > >
> >> > > On Tue, Aug 20, 2019, at 10:55, Jason Gustafson wrote:
> >> > > > Hi All,
> >> > > >
> >> > > > I'd like to start a vote on KIP-352, which is a follow-up to
> >> KIP-455 to
> >> > > > fix
> >> > > > a long-known shortcoming of URP reporting and to improve
> >> reassignment
> >> > > > monitoring:
> >> > > >
> >> > >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-352%3A+Distinguish+URPs+caused+by+reassignment
> >> > > > .
> >> > > >
> >> > > > Note that I have added one new metric following the discussion. It
> >> seemed
> >> > > > useful to have a lag metric for reassigning partitions.
> >> > > >
> >> > > > Thanks,
> >> > > > Jason
> >> > > >
> >> > >
> >> >
> >> >
> >> > --
> >> > Best,
> >> > Stanislav
> >> >
> >>
> >
>


Re: [VOTE] KIP-352: Distinguish URPs caused by reassignment

2019-08-26 Thread Jason Gustafson
Closing this vote. The final result is +9 with 4 binding votes.

@Satish Sorry, I missed your question above. Good point about updating
documentation. I will create a separate jira to make sure this gets done.

-Jason

On Fri, Aug 23, 2019 at 11:23 AM Jason Gustafson  wrote:

> Thanks Stan, good catch. I have updated the KIP. I will plan to close the
> vote Monday if there are no objections.
>
> -Jason
>
> On Fri, Aug 23, 2019 at 11:14 AM Colin McCabe  wrote:
>
>> On Fri, Aug 23, 2019, at 11:08, Stanislav Kozlovski wrote:
>> > Thanks for the KIP, this is very helpful
>> >
>> > I had an offline discussion with Jason and we discussed the semantics of
>> > the underMinIsr/atMinIsr metrics. The current proposal would expose a
>> gap
>> > where we could report URP but no MinIsr.
>> > A brief example:
>> > original replica set = [0,1,2]
>> > new replica set = [3,4,5]
>> > isr = [0, 3, 4]
>> > config.minIsr = 3
>> >
>> > As the KIP said
>> > > In other words, we will subtract the AddingReplica from both the total
>> > replicas and the current ISR when determining URP satisfaction.
>> > We would report URP=2 (1 and 2 are not in ISR) but not underMinIsr, as
>> we
>> > have an ISR of 3.
>> > Technically, any produce requests with acks=all would succeed, so it
>> would
>> > be false to report `underMinIsr`. We thought it'd be good to keep both
>> > metrics consistent, so a new proposal is to use the following algorithm:
>> > ```
>> > isUrp == size(original replicas) - size(isr) > 0
>> > ```
>>
>> Hi Stan,
>>
>> That's a good point.  Basically we should regard the size of the original
>> replica set as the desired replication factor, and calculate the URPs based
>> on that.  +1 for this.  (I assume Jason will update the KIP...)
>>
>> best,
>> Colin
>>
>>
>> >
>> > Taking that into account, +1 from me! (non-binding)
>> >
>> > On Fri, Aug 23, 2019 at 7:00 PM Colin McCabe 
>> wrote:
>> >
>> > > +1 (binding).
>> > >
>> > > cheers,
>> > > Colin
>> > >
>> > > On Tue, Aug 20, 2019, at 10:55, Jason Gustafson wrote:
>> > > > Hi All,
>> > > >
>> > > > I'd like to start a vote on KIP-352, which is a follow-up to
>> KIP-455 to
>> > > > fix
>> > > > a long-known shortcoming of URP reporting and to improve
>> reassignment
>> > > > monitoring:
>> > > >
>> > >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-352%3A+Distinguish+URPs+caused+by+reassignment
>> > > > .
>> > > >
>> > > > Note that I have added one new metric following the discussion. It
>> seemed
>> > > > useful to have a lag metric for reassigning partitions.
>> > > >
>> > > > Thanks,
>> > > > Jason
>> > > >
>> > >
>> >
>> >
>> > --
>> > Best,
>> > Stanislav
>> >
>>
>


Re: [VOTE] KIP-352: Distinguish URPs caused by reassignment

2019-08-23 Thread Jason Gustafson
Thanks Stan, good catch. I have updated the KIP. I will plan to close the
vote Monday if there are no objections.

-Jason

On Fri, Aug 23, 2019 at 11:14 AM Colin McCabe  wrote:

> On Fri, Aug 23, 2019, at 11:08, Stanislav Kozlovski wrote:
> > Thanks for the KIP, this is very helpful
> >
> > I had an offline discussion with Jason and we discussed the semantics of
> > the underMinIsr/atMinIsr metrics. The current proposal would expose a gap
> > where we could report URP but no MinIsr.
> > A brief example:
> > original replica set = [0,1,2]
> > new replica set = [3,4,5]
> > isr = [0, 3, 4]
> > config.minIsr = 3
> >
> > As the KIP said
> > > In other words, we will subtract the AddingReplica from both the total
> > replicas and the current ISR when determining URP satisfaction.
> > We would report URP=2 (1 and 2 are not in ISR) but not underMinIsr, as we
> > have an ISR of 3.
> > Technically, any produce requests with acks=all would succeed, so it
> would
> > be false to report `underMinIsr`. We thought it'd be good to keep both
> > metrics consistent, so a new proposal is to use the following algorithm:
> > ```
> > isUrp == size(original replicas) - size(isr) > 0
> > ```
>
> Hi Stan,
>
> That's a good point.  Basically we should regard the size of the original
> replica set as the desired replication factor, and calculate the URPs based
> on that.  +1 for this.  (I assume Jason will update the KIP...)
>
> best,
> Colin
>
>
> >
> > Taking that into account, +1 from me! (non-binding)
> >
> > On Fri, Aug 23, 2019 at 7:00 PM Colin McCabe  wrote:
> >
> > > +1 (binding).
> > >
> > > cheers,
> > > Colin
> > >
> > > On Tue, Aug 20, 2019, at 10:55, Jason Gustafson wrote:
> > > > Hi All,
> > > >
> > > > I'd like to start a vote on KIP-352, which is a follow-up to KIP-455
> to
> > > > fix
> > > > a long-known shortcoming of URP reporting and to improve reassignment
> > > > monitoring:
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-352%3A+Distinguish+URPs+caused+by+reassignment
> > > > .
> > > >
> > > > Note that I have added one new metric following the discussion. It
> seemed
> > > > useful to have a lag metric for reassigning partitions.
> > > >
> > > > Thanks,
> > > > Jason
> > > >
> > >
> >
> >
> > --
> > Best,
> > Stanislav
> >
>


Re: [VOTE] KIP-352: Distinguish URPs caused by reassignment

2019-08-23 Thread Colin McCabe
On Fri, Aug 23, 2019, at 11:08, Stanislav Kozlovski wrote:
> Thanks for the KIP, this is very helpful
> 
> I had an offline discussion with Jason and we discussed the semantics of
> the underMinIsr/atMinIsr metrics. The current proposal would expose a gap
> where we could report URP but no MinIsr.
> A brief example:
> original replica set = [0,1,2]
> new replica set = [3,4,5]
> isr = [0, 3, 4]
> config.minIsr = 3
> 
> As the KIP said
> > In other words, we will subtract the AddingReplica from both the total
> replicas and the current ISR when determining URP satisfaction.
> We would report URP=2 (1 and 2 are not in ISR) but not underMinIsr, as we
> have an ISR of 3.
> Technically, any produce requests with acks=all would succeed, so it would
> be false to report `underMinIsr`. We thought it'd be good to keep both
> metrics consistent, so a new proposal is to use the following algorithm:
> ```
> isUrp == size(original replicas) - size(isr) > 0
> ```

Hi Stan,

That's a good point.  Basically we should regard the size of the original 
replica set as the desired replication factor, and calculate the URPs based on 
that.  +1 for this.  (I assume Jason will update the KIP...)

best,
Colin


> 
> Taking that into account, +1 from me! (non-binding)
> 
> On Fri, Aug 23, 2019 at 7:00 PM Colin McCabe  wrote:
> 
> > +1 (binding).
> >
> > cheers,
> > Colin
> >
> > On Tue, Aug 20, 2019, at 10:55, Jason Gustafson wrote:
> > > Hi All,
> > >
> > > I'd like to start a vote on KIP-352, which is a follow-up to KIP-455 to
> > > fix
> > > a long-known shortcoming of URP reporting and to improve reassignment
> > > monitoring:
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-352%3A+Distinguish+URPs+caused+by+reassignment
> > > .
> > >
> > > Note that I have added one new metric following the discussion. It seemed
> > > useful to have a lag metric for reassigning partitions.
> > >
> > > Thanks,
> > > Jason
> > >
> >
> 
> 
> -- 
> Best,
> Stanislav
>


Re: [VOTE] KIP-352: Distinguish URPs caused by reassignment

2019-08-23 Thread Stanislav Kozlovski
Thanks for the KIP, this is very helpful

I had an offline discussion with Jason and we discussed the semantics of
the underMinIsr/atMinIsr metrics. The current proposal would expose a gap
where we could report URP but no MinIsr.
A brief example:
original replica set = [0,1,2]
new replica set = [3,4,5]
isr = [0, 3, 4]
config.minIsr = 3

As the KIP said
> In other words, we will subtract the AddingReplica from both the total
replicas and the current ISR when determining URP satisfaction.
We would report URP=2 (1 and 2 are not in ISR) but not underMinIsr, as we
have an ISR of 3.
Technically, any produce requests with acks=all would succeed, so it would
be false to report `underMinIsr`. We thought it'd be good to keep both
metrics consistent, so a new proposal is to use the following algorithm:
```
isUrp == size(original replicas) - size(isr) > 0
```

Taking that into account, +1 from me! (non-binding)

On Fri, Aug 23, 2019 at 7:00 PM Colin McCabe  wrote:

> +1 (binding).
>
> cheers,
> Colin
>
> On Tue, Aug 20, 2019, at 10:55, Jason Gustafson wrote:
> > Hi All,
> >
> > I'd like to start a vote on KIP-352, which is a follow-up to KIP-455 to
> > fix
> > a long-known shortcoming of URP reporting and to improve reassignment
> > monitoring:
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-352%3A+Distinguish+URPs+caused+by+reassignment
> > .
> >
> > Note that I have added one new metric following the discussion. It seemed
> > useful to have a lag metric for reassigning partitions.
> >
> > Thanks,
> > Jason
> >
>


-- 
Best,
Stanislav


Re: [VOTE] KIP-352: Distinguish URPs caused by reassignment

2019-08-23 Thread Colin McCabe
+1 (binding).

cheers,
Colin

On Tue, Aug 20, 2019, at 10:55, Jason Gustafson wrote:
> Hi All,
> 
> I'd like to start a vote on KIP-352, which is a follow-up to KIP-455 to 
> fix
> a long-known shortcoming of URP reporting and to improve reassignment
> monitoring:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-352%3A+Distinguish+URPs+caused+by+reassignment
> .
> 
> Note that I have added one new metric following the discussion. It seemed
> useful to have a lag metric for reassigning partitions.
> 
> Thanks,
> Jason
>


Re: [VOTE] KIP-352: Distinguish URPs caused by reassignment

2019-08-22 Thread Vahid Hashemian
+1 (binding)

Thanks Jason. This is super useful.

--Vahid

On Tue, Aug 20, 2019 at 10:55 AM Jason Gustafson  wrote:

> Hi All,
>
> I'd like to start a vote on KIP-352, which is a follow-up to KIP-455 to fix
> a long-known shortcoming of URP reporting and to improve reassignment
> monitoring:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-352%3A+Distinguish+URPs+caused+by+reassignment
> .
>
> Note that I have added one new metric following the discussion. It seemed
> useful to have a lag metric for reassigning partitions.
>
> Thanks,
> Jason
>


-- 

Thanks!
--Vahid


Re: [VOTE] KIP-352: Distinguish URPs caused by reassignment

2019-08-22 Thread Harsha Ch
+1 (binding)

Thanks,

Harsha

On Wed, Aug 21, 2019 at 4:23 PM, Robert Barrett < bob.barr...@confluent.io > 
wrote:

> 
> 
> 
> +1 (non-binding)
> 
> 
> 
> This will be great to have, thanks Jason!
> 
> 
> 
> On Wed, Aug 21, 2019 at 4:29 AM Manikumar < manikumar. reddy@ gmail. com (
> manikumar.re...@gmail.com ) > wrote:
> 
> 
>> 
>> 
>> +1 (binding).
>> 
>> 
>> 
>> Thanks for the KIP. LGTM.
>> 
>> 
>> 
>> On Wed, Aug 21, 2019 at 3:12 PM Satish Duggana < satish. duggana@ gmail. com
>> ( satish.dugg...@gmail.com ) > wrote:
>> 
>> 
>>> 
>>> 
>>> Hi Jason,
>>> +1 (non binding) Thanks for the KIP!
>>> 
>>> 
>>> 
>>> Do we need to have a separate JIRA to update the docs as it introduces
>>> 
>>> 
>> 
>> 
>> 
>> new
>> 
>> 
>>> 
>>> 
>>> metrics and a change in behavior for the existing metric?
>>> 
>>> 
>>> 
>>> On Wed, Aug 21, 2019 at 2:41 PM Mickael Maison < mickael. maison@ gmail. com
>>> ( mickael.mai...@gmail.com )
>>> 
>>> 
>>> 
>>> wrote:
>>> 
>>> 
 
 
 +1 (non binding)
 Thanks Jason
 
 
 
 On Wed, Aug 21, 2019 at 8:15 AM David Jacot < djacot@ confluent. io (
 dja...@confluent.io ) >
 
 
>>> 
>>> 
>> 
>> 
>> 
>> wrote:
>> 
>> 
>>> 
 
> 
> 
> +1 (non-binding)
> 
> 
> 
> Thanks for the KIP!
> 
> 
> 
> Best,
> David
> 
> 
> 
> On Tue, Aug 20, 2019 at 7:55 PM Jason Gustafson < jason@ confluent. io (
> ja...@confluent.io ) >
> 
> 
 
 
 
 wrote:
 
 
> 
>> 
>> 
>> Hi All,
>> 
>> 
>> 
>> I'd like to start a vote on KIP-352, which is a follow-up to
>> 
>> 
> 
> 
 
 
>>> 
>>> 
>> 
>> 
>> 
>> KIP-455
>> 
>> 
>>> 
 
 
 to fix
 
 
> 
>> 
>> 
>> a long-known shortcoming of URP reporting and to improve
>> 
>> 
> 
> 
 
 
>>> 
>>> 
>> 
>> 
>> 
>> reassignment
>> 
>> 
>>> 
 
> 
>> 
>> 
>> monitoring:
>> 
>> 
> 
> 
 
 
>>> 
>>> 
>> 
>> 
>> 
>> https:/ / cwiki. apache. org/ confluence/ display/ KAFKA/ 
>> KIP-352%3A+Distinguish+URPs+caused+by+reassignment
>> (
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-352%3A+Distinguish+URPs+caused+by+reassignment
>> )
>> 
>> 
>>> 
 
> 
>> 
>> 
>> .
>> 
>> 
>> 
>> Note that I have added one new metric following the discussion. It
>> 
>> 
> 
> 
 
 
 
 seemed
 
 
> 
>> 
>> 
>> useful to have a lag metric for reassigning partitions.
>> 
>> 
>> 
>> Thanks,
>> Jason
>> 
>> 
> 
> 
 
 
>>> 
>>> 
>> 
>> 
> 
> 
>

Re: [VOTE] KIP-352: Distinguish URPs caused by reassignment

2019-08-21 Thread Robert Barrett
+1 (non-binding)

This will be great to have, thanks Jason!

On Wed, Aug 21, 2019 at 4:29 AM Manikumar  wrote:

> +1 (binding).
>
> Thanks for the KIP. LGTM.
>
>
> On Wed, Aug 21, 2019 at 3:12 PM Satish Duggana 
> wrote:
>
> > Hi Jason,
> > +1 (non binding) Thanks for the KIP!
> >
> > Do we need to have a separate JIRA to update the docs as it introduces
> new
> > metrics and a change in behavior for the existing metric?
> >
> >
> >
> > On Wed, Aug 21, 2019 at 2:41 PM Mickael Maison  >
> > wrote:
> >
> > > +1 (non binding)
> > > Thanks Jason
> > >
> > > On Wed, Aug 21, 2019 at 8:15 AM David Jacot 
> wrote:
> > > >
> > > > +1 (non-binding)
> > > >
> > > > Thanks for the KIP!
> > > >
> > > > Best,
> > > > David
> > > >
> > > > On Tue, Aug 20, 2019 at 7:55 PM Jason Gustafson 
> > > wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > I'd like to start a vote on KIP-352, which is a follow-up to
> KIP-455
> > > to fix
> > > > > a long-known shortcoming of URP reporting and to improve
> reassignment
> > > > > monitoring:
> > > > >
> > > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-352%3A+Distinguish+URPs+caused+by+reassignment
> > > > > .
> > > > >
> > > > > Note that I have added one new metric following the discussion. It
> > > seemed
> > > > > useful to have a lag metric for reassigning partitions.
> > > > >
> > > > > Thanks,
> > > > > Jason
> > > > >
> > >
> >
>


Re: [VOTE] KIP-352: Distinguish URPs caused by reassignment

2019-08-21 Thread Manikumar
+1 (binding).

Thanks for the KIP. LGTM.


On Wed, Aug 21, 2019 at 3:12 PM Satish Duggana 
wrote:

> Hi Jason,
> +1 (non binding) Thanks for the KIP!
>
> Do we need to have a separate JIRA to update the docs as it introduces new
> metrics and a change in behavior for the existing metric?
>
>
>
> On Wed, Aug 21, 2019 at 2:41 PM Mickael Maison 
> wrote:
>
> > +1 (non binding)
> > Thanks Jason
> >
> > On Wed, Aug 21, 2019 at 8:15 AM David Jacot  wrote:
> > >
> > > +1 (non-binding)
> > >
> > > Thanks for the KIP!
> > >
> > > Best,
> > > David
> > >
> > > On Tue, Aug 20, 2019 at 7:55 PM Jason Gustafson 
> > wrote:
> > >
> > > > Hi All,
> > > >
> > > > I'd like to start a vote on KIP-352, which is a follow-up to KIP-455
> > to fix
> > > > a long-known shortcoming of URP reporting and to improve reassignment
> > > > monitoring:
> > > >
> > > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-352%3A+Distinguish+URPs+caused+by+reassignment
> > > > .
> > > >
> > > > Note that I have added one new metric following the discussion. It
> > seemed
> > > > useful to have a lag metric for reassigning partitions.
> > > >
> > > > Thanks,
> > > > Jason
> > > >
> >
>


Re: [VOTE] KIP-352: Distinguish URPs caused by reassignment

2019-08-21 Thread Satish Duggana
Hi Jason,
+1 (non binding) Thanks for the KIP!

Do we need to have a separate JIRA to update the docs as it introduces new
metrics and a change in behavior for the existing metric?



On Wed, Aug 21, 2019 at 2:41 PM Mickael Maison 
wrote:

> +1 (non binding)
> Thanks Jason
>
> On Wed, Aug 21, 2019 at 8:15 AM David Jacot  wrote:
> >
> > +1 (non-binding)
> >
> > Thanks for the KIP!
> >
> > Best,
> > David
> >
> > On Tue, Aug 20, 2019 at 7:55 PM Jason Gustafson 
> wrote:
> >
> > > Hi All,
> > >
> > > I'd like to start a vote on KIP-352, which is a follow-up to KIP-455
> to fix
> > > a long-known shortcoming of URP reporting and to improve reassignment
> > > monitoring:
> > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-352%3A+Distinguish+URPs+caused+by+reassignment
> > > .
> > >
> > > Note that I have added one new metric following the discussion. It
> seemed
> > > useful to have a lag metric for reassigning partitions.
> > >
> > > Thanks,
> > > Jason
> > >
>


Re: [VOTE] KIP-352: Distinguish URPs caused by reassignment

2019-08-21 Thread Mickael Maison
+1 (non binding)
Thanks Jason

On Wed, Aug 21, 2019 at 8:15 AM David Jacot  wrote:
>
> +1 (non-binding)
>
> Thanks for the KIP!
>
> Best,
> David
>
> On Tue, Aug 20, 2019 at 7:55 PM Jason Gustafson  wrote:
>
> > Hi All,
> >
> > I'd like to start a vote on KIP-352, which is a follow-up to KIP-455 to fix
> > a long-known shortcoming of URP reporting and to improve reassignment
> > monitoring:
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-352%3A+Distinguish+URPs+caused+by+reassignment
> > .
> >
> > Note that I have added one new metric following the discussion. It seemed
> > useful to have a lag metric for reassigning partitions.
> >
> > Thanks,
> > Jason
> >


Re: [VOTE] KIP-352: Distinguish URPs caused by reassignment

2019-08-21 Thread David Jacot
+1 (non-binding)

Thanks for the KIP!

Best,
David

On Tue, Aug 20, 2019 at 7:55 PM Jason Gustafson  wrote:

> Hi All,
>
> I'd like to start a vote on KIP-352, which is a follow-up to KIP-455 to fix
> a long-known shortcoming of URP reporting and to improve reassignment
> monitoring:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-352%3A+Distinguish+URPs+caused+by+reassignment
> .
>
> Note that I have added one new metric following the discussion. It seemed
> useful to have a lag metric for reassigning partitions.
>
> Thanks,
> Jason
>


[VOTE] KIP-352: Distinguish URPs caused by reassignment

2019-08-20 Thread Jason Gustafson
Hi All,

I'd like to start a vote on KIP-352, which is a follow-up to KIP-455 to fix
a long-known shortcoming of URP reporting and to improve reassignment
monitoring:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-352%3A+Distinguish+URPs+caused+by+reassignment
.

Note that I have added one new metric following the discussion. It seemed
useful to have a lag metric for reassigning partitions.

Thanks,
Jason


Re: KIP-352: Distinguish URPs caused by reassignment

2019-08-08 Thread George Li
 Hi Jason,

Can KIP-352 split the two metrics MaxLag and TotalLag for reassignment 
replication as well?  From the dashboard of these 2 metrics, one can see 
whether the replication is stuck (flat line) and estimate how long the 
reassignments will complete (how fast the Lag line is going down).

Thanks,
George


On Thursday, August 8, 2019, 01:41:49 PM PDT, Jason Gustafson 
 wrote:  
 
 Hey Stan,

Thanks for the suggestion. I have updated the proposal to include two new
meters for reassignment traffic inbound and outbound.

-Jason

On Thu, Aug 8, 2019 at 12:07 PM Stanislav Kozlovski 
wrote:

> Agreed on not totally spitting the replication incoming/outgoing metric -
> that could cause confusion. Just adding a new metric sounds good to me!
>
> The throttle follow-up is mentioned as part of future work in KIP-455 and I
> agree that it is way out of scope for this one.
>
>
> On Thu, Aug 8, 2019 at 8:03 PM Jason Gustafson  wrote:
>
> > Hi Stan,
> >
> > That's an interesting thought. I'm wondering if it would be better to
> leave
> > the current replication metrics counting for the total replication
> traffic
> > and add a new metric for reassignment traffic?
> >
> > By the way, a further KIP-455 follow-up that I won't attempt here would
> be
> > to have a separate throttle for reassignment traffic.
> >
> > -Jason
> >
> > On Thu, Aug 8, 2019 at 11:34 AM Stanislav Kozlovski <
> > stanis...@confluent.io>
> > wrote:
> >
> > > Hi Jason,
> > >
> > > I like the new ReassigningPartitions metric. Would it be easy to expand
> > the
> > > scope of the KIP to split the ReplicationIncoming/Outgoing metric to
> > > distringuish between reassigning/non-reassigning traffic, or do you
> > prefer
> > > to keep this KIP nice and small?
> > >
> > > On Thu, Aug 8, 2019 at 12:08 AM Jason Gustafson 
> > > wrote:
> > >
> > > > Hi All,
> > > >
> > > > Since KIP-455 is passed, I would like to revive this proposal. I have
> > > > reduced the scope so that it is just changing the way we compute URP
> > and
> > > > adding a new metric for the number of reassigning partitions. Please
> > > take a
> > > > look:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-352%3A+Distinguish+URPs+caused+by+reassignment
> > > > .
> > > >
> > > > Thanks,
> > > > Jason
> > > >
> > > > On Sun, Aug 5, 2018 at 3:41 PM Dong Lin  wrote:
> > > >
> > > > > Hey Jason,
> > > > >
> > > > > Yes, I also think the right solution is probably to include the
> > > > > expected_replication_factor in the per-topic znode and include this
> > > > > information in the UpdateMetadataRequest.
> > > > >
> > > > > And I agree with Gwen and Ismael that it is reasonable to redefine
> > URP
> > > as
> > > > > those partitions whose ISR set size < expected replication factor.
> > > Given
> > > > > that URP is being used for monitoring cluster availability and and
> > > > > reassignment progress, it seems that one of them will break
> depending
> > > on
> > > > > the URP definition. It seems better to keep the legitimate
> use-case,
> > > i.e.
> > > > > monitoring cluster availability. Users who want to monitor
> > reassignment
> > > > > progress probably should use the "kafka-reassign-partitions.sh
> > > --verify".
> > > > >
> > > > >
> > > > > Thanks,
> > > > > Dong
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Aug 3, 2018 at 2:57 PM, Jason Gustafson <
> ja...@confluent.io>
> > > > > wrote:
> > > > >
> > > > > > @Dong
> > > > > >
> > > > > > Ugh, you are right. This is indeed trickier than I imagined. You
> > > could
> > > > > > argue that the problem here is that there is no single source of
> > > truth
> > > > > for
> > > > > > the expected replication factor. While a reassignment is in
> > progress,
> > > > the
> > > > > > reassignment itself has the expected replica set. Otherwise, it
> is
> > > > stored
> > > > > > partition metadata itself. This is why manually deleting the
> > > > reassignment
> > > > > > is problematic. We lose the expected replica set and we depend on
> > > users
> > > > > to
> > > > > > reinstall it. I guess I'm starting to think that the way we track
> > > > > > reassignment state in the controller is problematic. In addition
> to
> > > the
> > > > > > problems caused by deletion, we cannot easily change an existing
> > > > > > reassignment.
> > > > > >
> > > > > > High level what I'm thinking is that we need to move the pending
> > > > > > reassignment state out of the single znode and into the
> individual
> > > > > metadata
> > > > > > of the reassigned partitions so that there is a single source of
> > > truth
> > > > > for
> > > > > > the expected replicas (and hence the replication factor). This
> > would
> > > > also
> > > > > > give us an easier mechanism to manage the batching of multiple
> > > > > > reassignments. Let me think on it a bit and see if I can come up
> > > with a
> > > > > > proposal.
> > > > > >
> > > > > > @Gwen, @Ismael
> > > > > >
> > > > > > That is fair. I also prefer to redefine URP if we

Re: KIP-352: Distinguish URPs caused by reassignment

2019-08-08 Thread Jason Gustafson
Hey Stan,

Thanks for the suggestion. I have updated the proposal to include two new
meters for reassignment traffic inbound and outbound.

-Jason

On Thu, Aug 8, 2019 at 12:07 PM Stanislav Kozlovski 
wrote:

> Agreed on not totally spitting the replication incoming/outgoing metric -
> that could cause confusion. Just adding a new metric sounds good to me!
>
> The throttle follow-up is mentioned as part of future work in KIP-455 and I
> agree that it is way out of scope for this one.
>
>
> On Thu, Aug 8, 2019 at 8:03 PM Jason Gustafson  wrote:
>
> > Hi Stan,
> >
> > That's an interesting thought. I'm wondering if it would be better to
> leave
> > the current replication metrics counting for the total replication
> traffic
> > and add a new metric for reassignment traffic?
> >
> > By the way, a further KIP-455 follow-up that I won't attempt here would
> be
> > to have a separate throttle for reassignment traffic.
> >
> > -Jason
> >
> > On Thu, Aug 8, 2019 at 11:34 AM Stanislav Kozlovski <
> > stanis...@confluent.io>
> > wrote:
> >
> > > Hi Jason,
> > >
> > > I like the new ReassigningPartitions metric. Would it be easy to expand
> > the
> > > scope of the KIP to split the ReplicationIncoming/Outgoing metric to
> > > distringuish between reassigning/non-reassigning traffic, or do you
> > prefer
> > > to keep this KIP nice and small?
> > >
> > > On Thu, Aug 8, 2019 at 12:08 AM Jason Gustafson 
> > > wrote:
> > >
> > > > Hi All,
> > > >
> > > > Since KIP-455 is passed, I would like to revive this proposal. I have
> > > > reduced the scope so that it is just changing the way we compute URP
> > and
> > > > adding a new metric for the number of reassigning partitions. Please
> > > take a
> > > > look:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-352%3A+Distinguish+URPs+caused+by+reassignment
> > > > .
> > > >
> > > > Thanks,
> > > > Jason
> > > >
> > > > On Sun, Aug 5, 2018 at 3:41 PM Dong Lin  wrote:
> > > >
> > > > > Hey Jason,
> > > > >
> > > > > Yes, I also think the right solution is probably to include the
> > > > > expected_replication_factor in the per-topic znode and include this
> > > > > information in the UpdateMetadataRequest.
> > > > >
> > > > > And I agree with Gwen and Ismael that it is reasonable to redefine
> > URP
> > > as
> > > > > those partitions whose ISR set size < expected replication factor.
> > > Given
> > > > > that URP is being used for monitoring cluster availability and and
> > > > > reassignment progress, it seems that one of them will break
> depending
> > > on
> > > > > the URP definition. It seems better to keep the legitimate
> use-case,
> > > i.e.
> > > > > monitoring cluster availability. Users who want to monitor
> > reassignment
> > > > > progress probably should use the "kafka-reassign-partitions.sh
> > > --verify".
> > > > >
> > > > >
> > > > > Thanks,
> > > > > Dong
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Aug 3, 2018 at 2:57 PM, Jason Gustafson <
> ja...@confluent.io>
> > > > > wrote:
> > > > >
> > > > > > @Dong
> > > > > >
> > > > > > Ugh, you are right. This is indeed trickier than I imagined. You
> > > could
> > > > > > argue that the problem here is that there is no single source of
> > > truth
> > > > > for
> > > > > > the expected replication factor. While a reassignment is in
> > progress,
> > > > the
> > > > > > reassignment itself has the expected replica set. Otherwise, it
> is
> > > > stored
> > > > > > partition metadata itself. This is why manually deleting the
> > > > reassignment
> > > > > > is problematic. We lose the expected replica set and we depend on
> > > users
> > > > > to
> > > > > > reinstall it. I guess I'm starting to think that the way we track
> > > > > > reassignment state in the controller is problematic. In addition
> to
> > > the
> > > > > > problems caused by deletion, we cannot easily change an existing
> > > > > > reassignment.
> > > > > >
> > > > > > High level what I'm thinking is that we need to move the pending
> > > > > > reassignment state out of the single znode and into the
> individual
> > > > > metadata
> > > > > > of the reassigned partitions so that there is a single source of
> > > truth
> > > > > for
> > > > > > the expected replicas (and hence the replication factor). This
> > would
> > > > also
> > > > > > give us an easier mechanism to manage the batching of multiple
> > > > > > reassignments. Let me think on it a bit and see if I can come up
> > > with a
> > > > > > proposal.
> > > > > >
> > > > > > @Gwen, @Ismael
> > > > > >
> > > > > > That is fair. I also prefer to redefine URP if we think the
> > > > compatibility
> > > > > > impact is a lower concern than continued misuse.
> > > > > >
> > > > > > -Jason
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, Aug 3, 2018 at 12:25 PM, Ismael Juma 
> > > > wrote:
> > > > > >
> > > > > > > RIght, that was my thinking too.
> > > > > > >
> > > > > > > Ismael
> > > > > > >
> > > > > > > On Fri, Aug 3, 2018 at 12:04 PM Gwen Sh

Re: KIP-352: Distinguish URPs caused by reassignment

2019-08-08 Thread Stanislav Kozlovski
Agreed on not totally spitting the replication incoming/outgoing metric -
that could cause confusion. Just adding a new metric sounds good to me!

The throttle follow-up is mentioned as part of future work in KIP-455 and I
agree that it is way out of scope for this one.


On Thu, Aug 8, 2019 at 8:03 PM Jason Gustafson  wrote:

> Hi Stan,
>
> That's an interesting thought. I'm wondering if it would be better to leave
> the current replication metrics counting for the total replication traffic
> and add a new metric for reassignment traffic?
>
> By the way, a further KIP-455 follow-up that I won't attempt here would be
> to have a separate throttle for reassignment traffic.
>
> -Jason
>
> On Thu, Aug 8, 2019 at 11:34 AM Stanislav Kozlovski <
> stanis...@confluent.io>
> wrote:
>
> > Hi Jason,
> >
> > I like the new ReassigningPartitions metric. Would it be easy to expand
> the
> > scope of the KIP to split the ReplicationIncoming/Outgoing metric to
> > distringuish between reassigning/non-reassigning traffic, or do you
> prefer
> > to keep this KIP nice and small?
> >
> > On Thu, Aug 8, 2019 at 12:08 AM Jason Gustafson 
> > wrote:
> >
> > > Hi All,
> > >
> > > Since KIP-455 is passed, I would like to revive this proposal. I have
> > > reduced the scope so that it is just changing the way we compute URP
> and
> > > adding a new metric for the number of reassigning partitions. Please
> > take a
> > > look:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-352%3A+Distinguish+URPs+caused+by+reassignment
> > > .
> > >
> > > Thanks,
> > > Jason
> > >
> > > On Sun, Aug 5, 2018 at 3:41 PM Dong Lin  wrote:
> > >
> > > > Hey Jason,
> > > >
> > > > Yes, I also think the right solution is probably to include the
> > > > expected_replication_factor in the per-topic znode and include this
> > > > information in the UpdateMetadataRequest.
> > > >
> > > > And I agree with Gwen and Ismael that it is reasonable to redefine
> URP
> > as
> > > > those partitions whose ISR set size < expected replication factor.
> > Given
> > > > that URP is being used for monitoring cluster availability and and
> > > > reassignment progress, it seems that one of them will break depending
> > on
> > > > the URP definition. It seems better to keep the legitimate use-case,
> > i.e.
> > > > monitoring cluster availability. Users who want to monitor
> reassignment
> > > > progress probably should use the "kafka-reassign-partitions.sh
> > --verify".
> > > >
> > > >
> > > > Thanks,
> > > > Dong
> > > >
> > > >
> > > >
> > > > On Fri, Aug 3, 2018 at 2:57 PM, Jason Gustafson 
> > > > wrote:
> > > >
> > > > > @Dong
> > > > >
> > > > > Ugh, you are right. This is indeed trickier than I imagined. You
> > could
> > > > > argue that the problem here is that there is no single source of
> > truth
> > > > for
> > > > > the expected replication factor. While a reassignment is in
> progress,
> > > the
> > > > > reassignment itself has the expected replica set. Otherwise, it is
> > > stored
> > > > > partition metadata itself. This is why manually deleting the
> > > reassignment
> > > > > is problematic. We lose the expected replica set and we depend on
> > users
> > > > to
> > > > > reinstall it. I guess I'm starting to think that the way we track
> > > > > reassignment state in the controller is problematic. In addition to
> > the
> > > > > problems caused by deletion, we cannot easily change an existing
> > > > > reassignment.
> > > > >
> > > > > High level what I'm thinking is that we need to move the pending
> > > > > reassignment state out of the single znode and into the individual
> > > > metadata
> > > > > of the reassigned partitions so that there is a single source of
> > truth
> > > > for
> > > > > the expected replicas (and hence the replication factor). This
> would
> > > also
> > > > > give us an easier mechanism to manage the batching of multiple
> > > > > reassignments. Let me think on it a bit and see if I can come up
> > with a
> > > > > proposal.
> > > > >
> > > > > @Gwen, @Ismael
> > > > >
> > > > > That is fair. I also prefer to redefine URP if we think the
> > > compatibility
> > > > > impact is a lower concern than continued misuse.
> > > > >
> > > > > -Jason
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Aug 3, 2018 at 12:25 PM, Ismael Juma 
> > > wrote:
> > > > >
> > > > > > RIght, that was my thinking too.
> > > > > >
> > > > > > Ismael
> > > > > >
> > > > > > On Fri, Aug 3, 2018 at 12:04 PM Gwen Shapira 
> > > > wrote:
> > > > > >
> > > > > > > On Fri, Aug 3, 2018 at 11:23 AM, Jason Gustafson <
> > > ja...@confluent.io
> > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hey Ismael,
> > > > > > > >
> > > > > > > > Yeah, my initial inclination was to redefine URP as well. My
> > only
> > > > > doubt
> > > > > > > was
> > > > > > > > how it would affect existing tools which might depend on URPs
> > to
> > > > > track
> > > > > > > the
> > > > > > > > progress of a reassignment. I decided to be conservative i

Re: KIP-352: Distinguish URPs caused by reassignment

2019-08-08 Thread Jason Gustafson
Hi Stan,

That's an interesting thought. I'm wondering if it would be better to leave
the current replication metrics counting for the total replication traffic
and add a new metric for reassignment traffic?

By the way, a further KIP-455 follow-up that I won't attempt here would be
to have a separate throttle for reassignment traffic.

-Jason

On Thu, Aug 8, 2019 at 11:34 AM Stanislav Kozlovski 
wrote:

> Hi Jason,
>
> I like the new ReassigningPartitions metric. Would it be easy to expand the
> scope of the KIP to split the ReplicationIncoming/Outgoing metric to
> distringuish between reassigning/non-reassigning traffic, or do you prefer
> to keep this KIP nice and small?
>
> On Thu, Aug 8, 2019 at 12:08 AM Jason Gustafson 
> wrote:
>
> > Hi All,
> >
> > Since KIP-455 is passed, I would like to revive this proposal. I have
> > reduced the scope so that it is just changing the way we compute URP and
> > adding a new metric for the number of reassigning partitions. Please
> take a
> > look:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-352%3A+Distinguish+URPs+caused+by+reassignment
> > .
> >
> > Thanks,
> > Jason
> >
> > On Sun, Aug 5, 2018 at 3:41 PM Dong Lin  wrote:
> >
> > > Hey Jason,
> > >
> > > Yes, I also think the right solution is probably to include the
> > > expected_replication_factor in the per-topic znode and include this
> > > information in the UpdateMetadataRequest.
> > >
> > > And I agree with Gwen and Ismael that it is reasonable to redefine URP
> as
> > > those partitions whose ISR set size < expected replication factor.
> Given
> > > that URP is being used for monitoring cluster availability and and
> > > reassignment progress, it seems that one of them will break depending
> on
> > > the URP definition. It seems better to keep the legitimate use-case,
> i.e.
> > > monitoring cluster availability. Users who want to monitor reassignment
> > > progress probably should use the "kafka-reassign-partitions.sh
> --verify".
> > >
> > >
> > > Thanks,
> > > Dong
> > >
> > >
> > >
> > > On Fri, Aug 3, 2018 at 2:57 PM, Jason Gustafson 
> > > wrote:
> > >
> > > > @Dong
> > > >
> > > > Ugh, you are right. This is indeed trickier than I imagined. You
> could
> > > > argue that the problem here is that there is no single source of
> truth
> > > for
> > > > the expected replication factor. While a reassignment is in progress,
> > the
> > > > reassignment itself has the expected replica set. Otherwise, it is
> > stored
> > > > partition metadata itself. This is why manually deleting the
> > reassignment
> > > > is problematic. We lose the expected replica set and we depend on
> users
> > > to
> > > > reinstall it. I guess I'm starting to think that the way we track
> > > > reassignment state in the controller is problematic. In addition to
> the
> > > > problems caused by deletion, we cannot easily change an existing
> > > > reassignment.
> > > >
> > > > High level what I'm thinking is that we need to move the pending
> > > > reassignment state out of the single znode and into the individual
> > > metadata
> > > > of the reassigned partitions so that there is a single source of
> truth
> > > for
> > > > the expected replicas (and hence the replication factor). This would
> > also
> > > > give us an easier mechanism to manage the batching of multiple
> > > > reassignments. Let me think on it a bit and see if I can come up
> with a
> > > > proposal.
> > > >
> > > > @Gwen, @Ismael
> > > >
> > > > That is fair. I also prefer to redefine URP if we think the
> > compatibility
> > > > impact is a lower concern than continued misuse.
> > > >
> > > > -Jason
> > > >
> > > >
> > > >
> > > > On Fri, Aug 3, 2018 at 12:25 PM, Ismael Juma 
> > wrote:
> > > >
> > > > > RIght, that was my thinking too.
> > > > >
> > > > > Ismael
> > > > >
> > > > > On Fri, Aug 3, 2018 at 12:04 PM Gwen Shapira 
> > > wrote:
> > > > >
> > > > > > On Fri, Aug 3, 2018 at 11:23 AM, Jason Gustafson <
> > ja...@confluent.io
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hey Ismael,
> > > > > > >
> > > > > > > Yeah, my initial inclination was to redefine URP as well. My
> only
> > > > doubt
> > > > > > was
> > > > > > > how it would affect existing tools which might depend on URPs
> to
> > > > track
> > > > > > the
> > > > > > > progress of a reassignment. I decided to be conservative in the
> > > end,
> > > > > but
> > > > > > > I'd reconsider if we think it is not a major concern. It is
> > > annoying
> > > > to
> > > > > > > need a new category.
> > > > > > >
> > > > > >
> > > > > > There are existing tools that use URP to track reassignment, but
> > > there
> > > > > are
> > > > > > many more tools that use URP for monitoring and alerting. If I
> > > > understand
> > > > > > Ismael's suggestion correctly, a re-definition will improve the
> > > > > reliability
> > > > > > of the monitoring tools (since there won't be false alerts in
> case
> > of
> > > > > > re-assignment) without having to switch to a new metric.
> >

Re: KIP-352: Distinguish URPs caused by reassignment

2019-08-08 Thread Stanislav Kozlovski
Hi Jason,

I like the new ReassigningPartitions metric. Would it be easy to expand the
scope of the KIP to split the ReplicationIncoming/Outgoing metric to
distringuish between reassigning/non-reassigning traffic, or do you prefer
to keep this KIP nice and small?

On Thu, Aug 8, 2019 at 12:08 AM Jason Gustafson  wrote:

> Hi All,
>
> Since KIP-455 is passed, I would like to revive this proposal. I have
> reduced the scope so that it is just changing the way we compute URP and
> adding a new metric for the number of reassigning partitions. Please take a
> look:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-352%3A+Distinguish+URPs+caused+by+reassignment
> .
>
> Thanks,
> Jason
>
> On Sun, Aug 5, 2018 at 3:41 PM Dong Lin  wrote:
>
> > Hey Jason,
> >
> > Yes, I also think the right solution is probably to include the
> > expected_replication_factor in the per-topic znode and include this
> > information in the UpdateMetadataRequest.
> >
> > And I agree with Gwen and Ismael that it is reasonable to redefine URP as
> > those partitions whose ISR set size < expected replication factor. Given
> > that URP is being used for monitoring cluster availability and and
> > reassignment progress, it seems that one of them will break depending on
> > the URP definition. It seems better to keep the legitimate use-case, i.e.
> > monitoring cluster availability. Users who want to monitor reassignment
> > progress probably should use the "kafka-reassign-partitions.sh --verify".
> >
> >
> > Thanks,
> > Dong
> >
> >
> >
> > On Fri, Aug 3, 2018 at 2:57 PM, Jason Gustafson 
> > wrote:
> >
> > > @Dong
> > >
> > > Ugh, you are right. This is indeed trickier than I imagined. You could
> > > argue that the problem here is that there is no single source of truth
> > for
> > > the expected replication factor. While a reassignment is in progress,
> the
> > > reassignment itself has the expected replica set. Otherwise, it is
> stored
> > > partition metadata itself. This is why manually deleting the
> reassignment
> > > is problematic. We lose the expected replica set and we depend on users
> > to
> > > reinstall it. I guess I'm starting to think that the way we track
> > > reassignment state in the controller is problematic. In addition to the
> > > problems caused by deletion, we cannot easily change an existing
> > > reassignment.
> > >
> > > High level what I'm thinking is that we need to move the pending
> > > reassignment state out of the single znode and into the individual
> > metadata
> > > of the reassigned partitions so that there is a single source of truth
> > for
> > > the expected replicas (and hence the replication factor). This would
> also
> > > give us an easier mechanism to manage the batching of multiple
> > > reassignments. Let me think on it a bit and see if I can come up with a
> > > proposal.
> > >
> > > @Gwen, @Ismael
> > >
> > > That is fair. I also prefer to redefine URP if we think the
> compatibility
> > > impact is a lower concern than continued misuse.
> > >
> > > -Jason
> > >
> > >
> > >
> > > On Fri, Aug 3, 2018 at 12:25 PM, Ismael Juma 
> wrote:
> > >
> > > > RIght, that was my thinking too.
> > > >
> > > > Ismael
> > > >
> > > > On Fri, Aug 3, 2018 at 12:04 PM Gwen Shapira 
> > wrote:
> > > >
> > > > > On Fri, Aug 3, 2018 at 11:23 AM, Jason Gustafson <
> ja...@confluent.io
> > >
> > > > > wrote:
> > > > >
> > > > > > Hey Ismael,
> > > > > >
> > > > > > Yeah, my initial inclination was to redefine URP as well. My only
> > > doubt
> > > > > was
> > > > > > how it would affect existing tools which might depend on URPs to
> > > track
> > > > > the
> > > > > > progress of a reassignment. I decided to be conservative in the
> > end,
> > > > but
> > > > > > I'd reconsider if we think it is not a major concern. It is
> > annoying
> > > to
> > > > > > need a new category.
> > > > > >
> > > > >
> > > > > There are existing tools that use URP to track reassignment, but
> > there
> > > > are
> > > > > many more tools that use URP for monitoring and alerting. If I
> > > understand
> > > > > Ismael's suggestion correctly, a re-definition will improve the
> > > > reliability
> > > > > of the monitoring tools (since there won't be false alerts in case
> of
> > > > > re-assignment) without having to switch to a new metric.
> > > > >
> > > > > I think we should choose the proposal that improves the more common
> > > usage
> > > > > of the metric, in this case, failure monitoring rather than
> > > reassignment.
> > > > >
> > > > >
> > > > > >
> > > > > > About your question about storage in ZK, I can't think of
> anything
> > > > > > additional that we need. Probably the main difficulty is getting
> > > access
> > > > > to
> > > > > > the replication factor in the topic utility. My basic thought was
> > > just
> > > > to
> > > > > > collect the URPs (as we know them today) and use the config API
> to
> > > > > > partition them based on the replication factor. Do you see any
> > > problems
> > > > > > with this

Re: KIP-352: Distinguish URPs caused by reassignment

2019-08-07 Thread Jason Gustafson
Hi All,

Since KIP-455 is passed, I would like to revive this proposal. I have
reduced the scope so that it is just changing the way we compute URP and
adding a new metric for the number of reassigning partitions. Please take a
look:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-352%3A+Distinguish+URPs+caused+by+reassignment
.

Thanks,
Jason

On Sun, Aug 5, 2018 at 3:41 PM Dong Lin  wrote:

> Hey Jason,
>
> Yes, I also think the right solution is probably to include the
> expected_replication_factor in the per-topic znode and include this
> information in the UpdateMetadataRequest.
>
> And I agree with Gwen and Ismael that it is reasonable to redefine URP as
> those partitions whose ISR set size < expected replication factor. Given
> that URP is being used for monitoring cluster availability and and
> reassignment progress, it seems that one of them will break depending on
> the URP definition. It seems better to keep the legitimate use-case, i.e.
> monitoring cluster availability. Users who want to monitor reassignment
> progress probably should use the "kafka-reassign-partitions.sh --verify".
>
>
> Thanks,
> Dong
>
>
>
> On Fri, Aug 3, 2018 at 2:57 PM, Jason Gustafson 
> wrote:
>
> > @Dong
> >
> > Ugh, you are right. This is indeed trickier than I imagined. You could
> > argue that the problem here is that there is no single source of truth
> for
> > the expected replication factor. While a reassignment is in progress, the
> > reassignment itself has the expected replica set. Otherwise, it is stored
> > partition metadata itself. This is why manually deleting the reassignment
> > is problematic. We lose the expected replica set and we depend on users
> to
> > reinstall it. I guess I'm starting to think that the way we track
> > reassignment state in the controller is problematic. In addition to the
> > problems caused by deletion, we cannot easily change an existing
> > reassignment.
> >
> > High level what I'm thinking is that we need to move the pending
> > reassignment state out of the single znode and into the individual
> metadata
> > of the reassigned partitions so that there is a single source of truth
> for
> > the expected replicas (and hence the replication factor). This would also
> > give us an easier mechanism to manage the batching of multiple
> > reassignments. Let me think on it a bit and see if I can come up with a
> > proposal.
> >
> > @Gwen, @Ismael
> >
> > That is fair. I also prefer to redefine URP if we think the compatibility
> > impact is a lower concern than continued misuse.
> >
> > -Jason
> >
> >
> >
> > On Fri, Aug 3, 2018 at 12:25 PM, Ismael Juma  wrote:
> >
> > > RIght, that was my thinking too.
> > >
> > > Ismael
> > >
> > > On Fri, Aug 3, 2018 at 12:04 PM Gwen Shapira 
> wrote:
> > >
> > > > On Fri, Aug 3, 2018 at 11:23 AM, Jason Gustafson  >
> > > > wrote:
> > > >
> > > > > Hey Ismael,
> > > > >
> > > > > Yeah, my initial inclination was to redefine URP as well. My only
> > doubt
> > > > was
> > > > > how it would affect existing tools which might depend on URPs to
> > track
> > > > the
> > > > > progress of a reassignment. I decided to be conservative in the
> end,
> > > but
> > > > > I'd reconsider if we think it is not a major concern. It is
> annoying
> > to
> > > > > need a new category.
> > > > >
> > > >
> > > > There are existing tools that use URP to track reassignment, but
> there
> > > are
> > > > many more tools that use URP for monitoring and alerting. If I
> > understand
> > > > Ismael's suggestion correctly, a re-definition will improve the
> > > reliability
> > > > of the monitoring tools (since there won't be false alerts in case of
> > > > re-assignment) without having to switch to a new metric.
> > > >
> > > > I think we should choose the proposal that improves the more common
> > usage
> > > > of the metric, in this case, failure monitoring rather than
> > reassignment.
> > > >
> > > >
> > > > >
> > > > > About your question about storage in ZK, I can't think of anything
> > > > > additional that we need. Probably the main difficulty is getting
> > access
> > > > to
> > > > > the replication factor in the topic utility. My basic thought was
> > just
> > > to
> > > > > collect the URPs (as we know them today) and use the config API to
> > > > > partition them based on the replication factor. Do you see any
> > problems
> > > > > with this?
> > > > >
> > > > > -Jason
> > > > >
> > > > >
> > > > > On Thu, Aug 2, 2018 at 12:14 PM, Ismael Juma 
> > > wrote:
> > > > >
> > > > > > Thanks Jason. This is definitely a pain point. I actually prefer
> > the
> > > > > option
> > > > > > to redefine what under-replicated means (currently under rejected
> > > > > > alternatives). Also, do we need to make changes to what we store
> in
> > > ZK?
> > > > > If
> > > > > > so, that should be in the KIP too.
> > > > > >
> > > > > > Ismael
> > > > > >
> > > > > > On Thu, Aug 2, 2018 at 11:45 AM Jason Gustafson <
> > ja...@confluent.io>
> > > > > > wrote:
> > > > > >

Re: KIP-352: Distinguish URPs caused by reassignment

2018-08-05 Thread Dong Lin
Hey Jason,

Yes, I also think the right solution is probably to include the
expected_replication_factor in the per-topic znode and include this
information in the UpdateMetadataRequest.

And I agree with Gwen and Ismael that it is reasonable to redefine URP as
those partitions whose ISR set size < expected replication factor. Given
that URP is being used for monitoring cluster availability and and
reassignment progress, it seems that one of them will break depending on
the URP definition. It seems better to keep the legitimate use-case, i.e.
monitoring cluster availability. Users who want to monitor reassignment
progress probably should use the "kafka-reassign-partitions.sh --verify".


Thanks,
Dong



On Fri, Aug 3, 2018 at 2:57 PM, Jason Gustafson  wrote:

> @Dong
>
> Ugh, you are right. This is indeed trickier than I imagined. You could
> argue that the problem here is that there is no single source of truth for
> the expected replication factor. While a reassignment is in progress, the
> reassignment itself has the expected replica set. Otherwise, it is stored
> partition metadata itself. This is why manually deleting the reassignment
> is problematic. We lose the expected replica set and we depend on users to
> reinstall it. I guess I'm starting to think that the way we track
> reassignment state in the controller is problematic. In addition to the
> problems caused by deletion, we cannot easily change an existing
> reassignment.
>
> High level what I'm thinking is that we need to move the pending
> reassignment state out of the single znode and into the individual metadata
> of the reassigned partitions so that there is a single source of truth for
> the expected replicas (and hence the replication factor). This would also
> give us an easier mechanism to manage the batching of multiple
> reassignments. Let me think on it a bit and see if I can come up with a
> proposal.
>
> @Gwen, @Ismael
>
> That is fair. I also prefer to redefine URP if we think the compatibility
> impact is a lower concern than continued misuse.
>
> -Jason
>
>
>
> On Fri, Aug 3, 2018 at 12:25 PM, Ismael Juma  wrote:
>
> > RIght, that was my thinking too.
> >
> > Ismael
> >
> > On Fri, Aug 3, 2018 at 12:04 PM Gwen Shapira  wrote:
> >
> > > On Fri, Aug 3, 2018 at 11:23 AM, Jason Gustafson 
> > > wrote:
> > >
> > > > Hey Ismael,
> > > >
> > > > Yeah, my initial inclination was to redefine URP as well. My only
> doubt
> > > was
> > > > how it would affect existing tools which might depend on URPs to
> track
> > > the
> > > > progress of a reassignment. I decided to be conservative in the end,
> > but
> > > > I'd reconsider if we think it is not a major concern. It is annoying
> to
> > > > need a new category.
> > > >
> > >
> > > There are existing tools that use URP to track reassignment, but there
> > are
> > > many more tools that use URP for monitoring and alerting. If I
> understand
> > > Ismael's suggestion correctly, a re-definition will improve the
> > reliability
> > > of the monitoring tools (since there won't be false alerts in case of
> > > re-assignment) without having to switch to a new metric.
> > >
> > > I think we should choose the proposal that improves the more common
> usage
> > > of the metric, in this case, failure monitoring rather than
> reassignment.
> > >
> > >
> > > >
> > > > About your question about storage in ZK, I can't think of anything
> > > > additional that we need. Probably the main difficulty is getting
> access
> > > to
> > > > the replication factor in the topic utility. My basic thought was
> just
> > to
> > > > collect the URPs (as we know them today) and use the config API to
> > > > partition them based on the replication factor. Do you see any
> problems
> > > > with this?
> > > >
> > > > -Jason
> > > >
> > > >
> > > > On Thu, Aug 2, 2018 at 12:14 PM, Ismael Juma 
> > wrote:
> > > >
> > > > > Thanks Jason. This is definitely a pain point. I actually prefer
> the
> > > > option
> > > > > to redefine what under-replicated means (currently under rejected
> > > > > alternatives). Also, do we need to make changes to what we store in
> > ZK?
> > > > If
> > > > > so, that should be in the KIP too.
> > > > >
> > > > > Ismael
> > > > >
> > > > > On Thu, Aug 2, 2018 at 11:45 AM Jason Gustafson <
> ja...@confluent.io>
> > > > > wrote:
> > > > >
> > > > > > Hey All,
> > > > > >
> > > > > > Another day, another KIP. This one is hopefully straightforward:
> > > > > >
> > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-352%
> > > > > 3A+Distinguish+URPs+caused+by+reassignment
> > > > > > .
> > > > > > Have a look and let me know what you think!
> > > > > >
> > > > > > Thanks,
> > > > > > Jason
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > *Gwen Shapira*
> > > Product Manager | Confluent
> > > 650.450.2760 | @gwenshap
> > > Follow us: Twitter  | blog
> > > 
> > >
> >
>


Re: KIP-352: Distinguish URPs caused by reassignment

2018-08-03 Thread Jason Gustafson
@Dong

Ugh, you are right. This is indeed trickier than I imagined. You could
argue that the problem here is that there is no single source of truth for
the expected replication factor. While a reassignment is in progress, the
reassignment itself has the expected replica set. Otherwise, it is stored
partition metadata itself. This is why manually deleting the reassignment
is problematic. We lose the expected replica set and we depend on users to
reinstall it. I guess I'm starting to think that the way we track
reassignment state in the controller is problematic. In addition to the
problems caused by deletion, we cannot easily change an existing
reassignment.

High level what I'm thinking is that we need to move the pending
reassignment state out of the single znode and into the individual metadata
of the reassigned partitions so that there is a single source of truth for
the expected replicas (and hence the replication factor). This would also
give us an easier mechanism to manage the batching of multiple
reassignments. Let me think on it a bit and see if I can come up with a
proposal.

@Gwen, @Ismael

That is fair. I also prefer to redefine URP if we think the compatibility
impact is a lower concern than continued misuse.

-Jason



On Fri, Aug 3, 2018 at 12:25 PM, Ismael Juma  wrote:

> RIght, that was my thinking too.
>
> Ismael
>
> On Fri, Aug 3, 2018 at 12:04 PM Gwen Shapira  wrote:
>
> > On Fri, Aug 3, 2018 at 11:23 AM, Jason Gustafson 
> > wrote:
> >
> > > Hey Ismael,
> > >
> > > Yeah, my initial inclination was to redefine URP as well. My only doubt
> > was
> > > how it would affect existing tools which might depend on URPs to track
> > the
> > > progress of a reassignment. I decided to be conservative in the end,
> but
> > > I'd reconsider if we think it is not a major concern. It is annoying to
> > > need a new category.
> > >
> >
> > There are existing tools that use URP to track reassignment, but there
> are
> > many more tools that use URP for monitoring and alerting. If I understand
> > Ismael's suggestion correctly, a re-definition will improve the
> reliability
> > of the monitoring tools (since there won't be false alerts in case of
> > re-assignment) without having to switch to a new metric.
> >
> > I think we should choose the proposal that improves the more common usage
> > of the metric, in this case, failure monitoring rather than reassignment.
> >
> >
> > >
> > > About your question about storage in ZK, I can't think of anything
> > > additional that we need. Probably the main difficulty is getting access
> > to
> > > the replication factor in the topic utility. My basic thought was just
> to
> > > collect the URPs (as we know them today) and use the config API to
> > > partition them based on the replication factor. Do you see any problems
> > > with this?
> > >
> > > -Jason
> > >
> > >
> > > On Thu, Aug 2, 2018 at 12:14 PM, Ismael Juma 
> wrote:
> > >
> > > > Thanks Jason. This is definitely a pain point. I actually prefer the
> > > option
> > > > to redefine what under-replicated means (currently under rejected
> > > > alternatives). Also, do we need to make changes to what we store in
> ZK?
> > > If
> > > > so, that should be in the KIP too.
> > > >
> > > > Ismael
> > > >
> > > > On Thu, Aug 2, 2018 at 11:45 AM Jason Gustafson 
> > > > wrote:
> > > >
> > > > > Hey All,
> > > > >
> > > > > Another day, another KIP. This one is hopefully straightforward:
> > > > >
> > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-352%
> > > > 3A+Distinguish+URPs+caused+by+reassignment
> > > > > .
> > > > > Have a look and let me know what you think!
> > > > >
> > > > > Thanks,
> > > > > Jason
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > *Gwen Shapira*
> > Product Manager | Confluent
> > 650.450.2760 | @gwenshap
> > Follow us: Twitter  | blog
> > 
> >
>


Re: KIP-352: Distinguish URPs caused by reassignment

2018-08-03 Thread Ismael Juma
RIght, that was my thinking too.

Ismael

On Fri, Aug 3, 2018 at 12:04 PM Gwen Shapira  wrote:

> On Fri, Aug 3, 2018 at 11:23 AM, Jason Gustafson 
> wrote:
>
> > Hey Ismael,
> >
> > Yeah, my initial inclination was to redefine URP as well. My only doubt
> was
> > how it would affect existing tools which might depend on URPs to track
> the
> > progress of a reassignment. I decided to be conservative in the end, but
> > I'd reconsider if we think it is not a major concern. It is annoying to
> > need a new category.
> >
>
> There are existing tools that use URP to track reassignment, but there are
> many more tools that use URP for monitoring and alerting. If I understand
> Ismael's suggestion correctly, a re-definition will improve the reliability
> of the monitoring tools (since there won't be false alerts in case of
> re-assignment) without having to switch to a new metric.
>
> I think we should choose the proposal that improves the more common usage
> of the metric, in this case, failure monitoring rather than reassignment.
>
>
> >
> > About your question about storage in ZK, I can't think of anything
> > additional that we need. Probably the main difficulty is getting access
> to
> > the replication factor in the topic utility. My basic thought was just to
> > collect the URPs (as we know them today) and use the config API to
> > partition them based on the replication factor. Do you see any problems
> > with this?
> >
> > -Jason
> >
> >
> > On Thu, Aug 2, 2018 at 12:14 PM, Ismael Juma  wrote:
> >
> > > Thanks Jason. This is definitely a pain point. I actually prefer the
> > option
> > > to redefine what under-replicated means (currently under rejected
> > > alternatives). Also, do we need to make changes to what we store in ZK?
> > If
> > > so, that should be in the KIP too.
> > >
> > > Ismael
> > >
> > > On Thu, Aug 2, 2018 at 11:45 AM Jason Gustafson 
> > > wrote:
> > >
> > > > Hey All,
> > > >
> > > > Another day, another KIP. This one is hopefully straightforward:
> > > >
> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-352%
> > > 3A+Distinguish+URPs+caused+by+reassignment
> > > > .
> > > > Have a look and let me know what you think!
> > > >
> > > > Thanks,
> > > > Jason
> > > >
> > >
> >
>
>
>
> --
> *Gwen Shapira*
> Product Manager | Confluent
> 650.450.2760 | @gwenshap
> Follow us: Twitter  | blog
> 
>


Re: KIP-352: Distinguish URPs caused by reassignment

2018-08-03 Thread Gwen Shapira
On Fri, Aug 3, 2018 at 11:23 AM, Jason Gustafson  wrote:

> Hey Ismael,
>
> Yeah, my initial inclination was to redefine URP as well. My only doubt was
> how it would affect existing tools which might depend on URPs to track the
> progress of a reassignment. I decided to be conservative in the end, but
> I'd reconsider if we think it is not a major concern. It is annoying to
> need a new category.
>

There are existing tools that use URP to track reassignment, but there are
many more tools that use URP for monitoring and alerting. If I understand
Ismael's suggestion correctly, a re-definition will improve the reliability
of the monitoring tools (since there won't be false alerts in case of
re-assignment) without having to switch to a new metric.

I think we should choose the proposal that improves the more common usage
of the metric, in this case, failure monitoring rather than reassignment.


>
> About your question about storage in ZK, I can't think of anything
> additional that we need. Probably the main difficulty is getting access to
> the replication factor in the topic utility. My basic thought was just to
> collect the URPs (as we know them today) and use the config API to
> partition them based on the replication factor. Do you see any problems
> with this?
>
> -Jason
>
>
> On Thu, Aug 2, 2018 at 12:14 PM, Ismael Juma  wrote:
>
> > Thanks Jason. This is definitely a pain point. I actually prefer the
> option
> > to redefine what under-replicated means (currently under rejected
> > alternatives). Also, do we need to make changes to what we store in ZK?
> If
> > so, that should be in the KIP too.
> >
> > Ismael
> >
> > On Thu, Aug 2, 2018 at 11:45 AM Jason Gustafson 
> > wrote:
> >
> > > Hey All,
> > >
> > > Another day, another KIP. This one is hopefully straightforward:
> > >
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-352%
> > 3A+Distinguish+URPs+caused+by+reassignment
> > > .
> > > Have a look and let me know what you think!
> > >
> > > Thanks,
> > > Jason
> > >
> >
>



-- 
*Gwen Shapira*
Product Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter  | blog



Re: KIP-352: Distinguish URPs caused by reassignment

2018-08-03 Thread Dong Lin
Hey Jason,

Thanks for the KIP! I have some questions below.

The KIP defines "UnderSynchronized" as partitions which have an in-sync
replica set that is smaller than the topic's replication factor. Can you
clarify how we decide the topic's replication factor? Currently the
possible sources include the topic config znode, replica list in the topic
znode or the broker default config. The topic config znode will have
replication factor only if the replication factor differs from the default
broker config when the topic is created, right? Replica list in the topic
znode will expand during reassignment which is not useful for the purpose
of this KIP. The default replication factor in the broker may be changed
after the topic is created. If user increases default replication factor
from 3 to 4, would all topics be considered as `UnderSynchronized`?

The KIP proposes to use over-replicated partitions to track the progress of
a reassignment. Currently ReassignPartitionsCommand tracks the progress of
reassignment by looking at the reassignment znode. Is there concern with
still using this approach to track the progress of reassignment?

Thanks,
Dong


On Fri, Aug 3, 2018 at 11:23 AM, Jason Gustafson  wrote:

> Hey Ismael,
>
> Yeah, my initial inclination was to redefine URP as well. My only doubt was
> how it would affect existing tools which might depend on URPs to track the
> progress of a reassignment. I decided to be conservative in the end, but
> I'd reconsider if we think it is not a major concern. It is annoying to
> need a new category.
>
> About your question about storage in ZK, I can't think of anything
> additional that we need. Probably the main difficulty is getting access to
> the replication factor in the topic utility. My basic thought was just to
> collect the URPs (as we know them today) and use the config API to
> partition them based on the replication factor. Do you see any problems
> with this?
>
> -Jason
>
>
> On Thu, Aug 2, 2018 at 12:14 PM, Ismael Juma  wrote:
>
> > Thanks Jason. This is definitely a pain point. I actually prefer the
> option
> > to redefine what under-replicated means (currently under rejected
> > alternatives). Also, do we need to make changes to what we store in ZK?
> If
> > so, that should be in the KIP too.
> >
> > Ismael
> >
> > On Thu, Aug 2, 2018 at 11:45 AM Jason Gustafson 
> > wrote:
> >
> > > Hey All,
> > >
> > > Another day, another KIP. This one is hopefully straightforward:
> > >
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-352%
> > 3A+Distinguish+URPs+caused+by+reassignment
> > > .
> > > Have a look and let me know what you think!
> > >
> > > Thanks,
> > > Jason
> > >
> >
>


Re: KIP-352: Distinguish URPs caused by reassignment

2018-08-03 Thread Jason Gustafson
Hey Ismael,

Yeah, my initial inclination was to redefine URP as well. My only doubt was
how it would affect existing tools which might depend on URPs to track the
progress of a reassignment. I decided to be conservative in the end, but
I'd reconsider if we think it is not a major concern. It is annoying to
need a new category.

About your question about storage in ZK, I can't think of anything
additional that we need. Probably the main difficulty is getting access to
the replication factor in the topic utility. My basic thought was just to
collect the URPs (as we know them today) and use the config API to
partition them based on the replication factor. Do you see any problems
with this?

-Jason


On Thu, Aug 2, 2018 at 12:14 PM, Ismael Juma  wrote:

> Thanks Jason. This is definitely a pain point. I actually prefer the option
> to redefine what under-replicated means (currently under rejected
> alternatives). Also, do we need to make changes to what we store in ZK? If
> so, that should be in the KIP too.
>
> Ismael
>
> On Thu, Aug 2, 2018 at 11:45 AM Jason Gustafson 
> wrote:
>
> > Hey All,
> >
> > Another day, another KIP. This one is hopefully straightforward:
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-352%
> 3A+Distinguish+URPs+caused+by+reassignment
> > .
> > Have a look and let me know what you think!
> >
> > Thanks,
> > Jason
> >
>


Re: KIP-352: Distinguish URPs caused by reassignment

2018-08-02 Thread Ismael Juma
Thanks Jason. This is definitely a pain point. I actually prefer the option
to redefine what under-replicated means (currently under rejected
alternatives). Also, do we need to make changes to what we store in ZK? If
so, that should be in the KIP too.

Ismael

On Thu, Aug 2, 2018 at 11:45 AM Jason Gustafson  wrote:

> Hey All,
>
> Another day, another KIP. This one is hopefully straightforward:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-352%3A+Distinguish+URPs+caused+by+reassignment
> .
> Have a look and let me know what you think!
>
> Thanks,
> Jason
>


KIP-352: Distinguish URPs caused by reassignment

2018-08-02 Thread Jason Gustafson
Hey All,

Another day, another KIP. This one is hopefully straightforward:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-352%3A+Distinguish+URPs+caused+by+reassignment.
Have a look and let me know what you think!

Thanks,
Jason