I guess the question, which I think is what Gwen was getting at, is if,
rather than making this configurable, it might be possible to make this
just work reliably and with the lowest possible latency in some automatic
fashion? I raised group commit because that is a way to automatically batch
under load. If that doesn't work perhaps there is another way? The
challenge as we've seen with obscure configs is that 99% of people can't
figure out how to set them so for 99% of people this won't really help them.

-Jay

On Mon, Aug 10, 2015 at 9:12 AM, Ashish Singh <asi...@cloudera.com> wrote:

> Hey Guys,
>
> Looks like Jun and Jiangjie have covered the motivation behind the config.
> To me one would want to set ISR propagation delay to a high number
> primarily during rolling upgrade. I think the delay one would want to have
> in ISR propagation is proportional to the cluster size. However, during
> normal operations a faster ISR propagation is desired. Having a config to
> expose the delay time provides admin a way to control it, and it will come
> with a default value so if someone does not want to play with it they can
> choose not to.
>
> @Gwen, does that answer your question?
>
> On Sun, Aug 9, 2015 at 3:26 PM, Jiangjie Qin <j...@linkedin.com.invalid>
> wrote:
>
> > Jun,
> >
> > Thanks for the detailed explanation. Now I understand. Yes, we might not
> be
> > able leverage the group commit in this case.
> >
> > When I was testing the patch, I also found a potential use case for the
> > config (not sure if that is a strong use case though). When we rolling
> > upgrade a cluster, if the controller is still running on an old version,
> > the brokers got bounced will create a bunch of ZK path that will not be
> > picked up by the controller until the controller is upgraded to the new
> > version. It might be fine for small clusters, for larger clusters, there
> > could be many ISR change got accumulated before the controller runs on
> the
> > new version. So if we have the config, when people do rolling upgrade,
> they
> > can first have the propagation interval to be very large so ISR change
> will
> > not be propagated. After all the brokers are running on the new version,
> we
> > can set the propagation interval back to a short value and bounce brokers
> > again.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> > On Sun, Aug 9, 2015 at 12:22 PM, Jun Rao <j...@confluent.io> wrote:
> >
> > > Jiangjie,
> > >
> > > Related to group commit, I think what Jay says is the following.
> Suppose
> > > the propagation of ISR is slow and also creates back pressure, you can
> > just
> > > propagate the ISR one batch at a time as fast as you can. After one
> batch
> > > is sent, you just collect all ISR changes that have accumulated since
> the
> > > last propagation and propagate them as a batch. This way, the amount of
> > > batching happens automatically based on the propagation rate and the
> > delay
> > > of propagation is minimized. However, this doesn't quite work if we
> > > propagate the ISR changes by writing a notification path in ZK. There
> is
> > no
> > > back pressure: the notification can be written faster than the listener
> > can
> > > process it. So the explicit batching that you did essentially is to get
> > > around this problem.
> > >
> > > As for the config, I can see a couple of use cases: (1) We have unit
> > tests
> > > that verify the propagation of ISR. To prevent those tests taking too
> > long,
> > > we can configure a smaller propagation interval. (2) When there are
> many
> > > partitions and brokers, someone may want to increase the propagation
> > > interval to reduce the ZK overhead. I agree that most people likely
> don't
> > > need to change this setting since the default should be good enough.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > >
> > > On Fri, Aug 7, 2015 at 4:06 PM, Gwen Shapira <g...@confluent.io>
> wrote:
> > >
> > > > Maybe Ashish can supply the use-case and tuning advice then :)
> > > > I'm a -1 on adding new configurations that we can't quite explain.
> > > >
> > > > On Fri, Aug 7, 2015 at 3:57 PM, Jiangjie Qin
> <j...@linkedin.com.invalid
> > >
> > > > wrote:
> > > >
> > > > > Hi Gwen,
> > > > >
> > > > > Completely agree with you. I originally just hard coded it to be 10
> > > > > seconds. Ashish raised this requirement in KAFKA-2406 because
> people
> > > > might
> > > > > want to ISR changes get propagated quicker.
> > > > > I don't have a good use case myself. Personally I think hard code
> it
> > is
> > > > > fine although I don't object to make it configurable.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jiangjie (Becket) Qin
> > > > >
> > > >
> > >
> >
>
>
>
> --
>
> Regards,
> Ashish
>

Reply via email to