Re: [DISCUSS] KIP-124: Request rate quotas

2017-03-09 Thread Jun Rao
Hi, Rajini,

Thanks for the updated KIP. A few more comments.

30. Should we just account for the time in network threads in this KIP too?
The issue with doing this later is that existing quotas may be too small
and everyone will have to adjust them before upgrading, which is
inconvenient. If we just do the delaying in the io threads, there probably
isn't too much additional work to include the network thread time?

31. It would be useful for the new metrics to capture the utilization of
all those requests exempt from request throttling (under sth like
"exempt"). It's useful for an admin to know how much time is spent there
too.

32. "The maximum throttle time for any single request will be the quota
window size (one second by default)." We probably should cap the delay at
quota.window.size.seconds * quota.window.num?

33. It's unfortunate that we use . in configs and _ in ZK data structures.
However, for consistency, request.percentage in ZK probably should be
request_percentage?

Thanks,

Jun

On Thu, Mar 9, 2017 at 7:55 AM, Rajini Sivaram 
wrote:

> I have updated the KIP to use "request.percentage" quotas where the
> percentage is out of a total of (num.io.threads * 100). I have added the
> other options considered so far under "Rejected Alternatives".
>
> To address Todd's concern about per-thread quotas: Even though the quotas
> are out of (num.io.threads * 100)  clients are not locked into threads.
> Utilization is measured as the total across all the I/O threads and 10 %
> quota can be 1% of 10 threads. Individual quotas can also be greater than
> 100% if required.
>
> Please let me know if there are any other concerns or suggestions.
>
> Thank you,
>
> Rajini
>
> On Wed, Mar 8, 2017 at 10:20 PM, Todd Palino  wrote:
>
> > Rajini -
> >
> > I understand what you’re saying, but the point I’m making is that I don’t
> > believe we need to take it into account directly. The CPU utilization of
> > the network threads is directly proportional to the number of bytes being
> > sent. The more bytes, the more CPU that is required for SSL (or other
> > tasks). This is opposed to the request handler threads, where there are a
> > number of factors that affect CPU utilization. This means that it’s not
> > necessary to separately quota network thread byte usage and CPU - if we
> > quota byte usage (which we already do), we have fixed the CPU usage at a
> > proportional amount.
> >
> > Jun -
> >
> > Thanks for the clarification there. I was thinking of the utilization
> > percentage as being fixed, not what the percentage reflects. I’m not tied
> > to either way of doing it, provided that we do not lock clients to a
> single
> > thread. For example, if I specify that a given client can use 10% of a
> > single thread, that should also mean they can use 1% on 10 threads.
> >
> > -Todd
> >
> >
> >
> > On Wed, Mar 8, 2017 at 8:57 AM, Jun Rao  wrote:
> >
> > > Hi, Todd,
> > >
> > > Thanks for the feedback.
> > >
> > > I just want to clarify your second point. If the limit percentage is
> per
> > > thread and the thread counts are changed, the absolute processing limit
> > for
> > > existing users haven't changed and there is no need to adjust them. On
> > the
> > > other hand, if the limit percentage is of total thread pool capacity
> and
> > > the thread counts are changed, the effective processing limit for a
> user
> > > will change. So, to preserve the current processing limit, existing
> user
> > > limits have to be adjusted. If there is a hardware change, the
> effective
> > > processing limit for a user will change in either approach and the
> > existing
> > > limit may need to be adjusted. However, hardware changes are less
> common
> > > than thread pool configuration changes.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Tue, Mar 7, 2017 at 4:45 PM, Todd Palino  wrote:
> > >
> > > > I’ve been following this one on and off, and overall it sounds good
> to
> > > me.
> > > >
> > > > - The SSL question is a good one. However, that type of overhead
> should
> > > be
> > > > proportional to the bytes rate, so I think that a bytes rate quota
> > would
> > > > still be a suitable way to address it.
> > > >
> > > > - I think it’s better to make the quota percentage of total thread
> pool
> > > > capacity, and not percentage of an individual thread. That way you
> > don’t
> > > > have to adjust it when you adjust thread counts (tuning, hardware
> > > changes,
> > > > etc.)
> > > >
> > > >
> > > > -Todd
> > > >
> > > >
> > > >
> > > > On Tue, Mar 7, 2017 at 2:38 PM, Becket Qin 
> > wrote:
> > > >
> > > > > I see. Good point about SSL.
> > > > >
> > > > > I just asked Todd to take a look.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jiangjie (Becket) Qin
> > > > >
> > > > > On Tue, Mar 7, 2017 at 2:17 PM, Jun Rao  wrote:
> > > > >
> > > > > > Hi, Jiangjie,
> > > > > >
> > > > > > Yes, I agree 

Re: [DISCUSS] KIP-124: Request rate quotas

2017-03-09 Thread Rajini Sivaram
I have updated the KIP to use "request.percentage" quotas where the
percentage is out of a total of (num.io.threads * 100). I have added the
other options considered so far under "Rejected Alternatives".

To address Todd's concern about per-thread quotas: Even though the quotas
are out of (num.io.threads * 100)  clients are not locked into threads.
Utilization is measured as the total across all the I/O threads and 10 %
quota can be 1% of 10 threads. Individual quotas can also be greater than
100% if required.

Please let me know if there are any other concerns or suggestions.

Thank you,

Rajini

On Wed, Mar 8, 2017 at 10:20 PM, Todd Palino  wrote:

> Rajini -
>
> I understand what you’re saying, but the point I’m making is that I don’t
> believe we need to take it into account directly. The CPU utilization of
> the network threads is directly proportional to the number of bytes being
> sent. The more bytes, the more CPU that is required for SSL (or other
> tasks). This is opposed to the request handler threads, where there are a
> number of factors that affect CPU utilization. This means that it’s not
> necessary to separately quota network thread byte usage and CPU - if we
> quota byte usage (which we already do), we have fixed the CPU usage at a
> proportional amount.
>
> Jun -
>
> Thanks for the clarification there. I was thinking of the utilization
> percentage as being fixed, not what the percentage reflects. I’m not tied
> to either way of doing it, provided that we do not lock clients to a single
> thread. For example, if I specify that a given client can use 10% of a
> single thread, that should also mean they can use 1% on 10 threads.
>
> -Todd
>
>
>
> On Wed, Mar 8, 2017 at 8:57 AM, Jun Rao  wrote:
>
> > Hi, Todd,
> >
> > Thanks for the feedback.
> >
> > I just want to clarify your second point. If the limit percentage is per
> > thread and the thread counts are changed, the absolute processing limit
> for
> > existing users haven't changed and there is no need to adjust them. On
> the
> > other hand, if the limit percentage is of total thread pool capacity and
> > the thread counts are changed, the effective processing limit for a user
> > will change. So, to preserve the current processing limit, existing user
> > limits have to be adjusted. If there is a hardware change, the effective
> > processing limit for a user will change in either approach and the
> existing
> > limit may need to be adjusted. However, hardware changes are less common
> > than thread pool configuration changes.
> >
> > Thanks,
> >
> > Jun
> >
> > On Tue, Mar 7, 2017 at 4:45 PM, Todd Palino  wrote:
> >
> > > I’ve been following this one on and off, and overall it sounds good to
> > me.
> > >
> > > - The SSL question is a good one. However, that type of overhead should
> > be
> > > proportional to the bytes rate, so I think that a bytes rate quota
> would
> > > still be a suitable way to address it.
> > >
> > > - I think it’s better to make the quota percentage of total thread pool
> > > capacity, and not percentage of an individual thread. That way you
> don’t
> > > have to adjust it when you adjust thread counts (tuning, hardware
> > changes,
> > > etc.)
> > >
> > >
> > > -Todd
> > >
> > >
> > >
> > > On Tue, Mar 7, 2017 at 2:38 PM, Becket Qin 
> wrote:
> > >
> > > > I see. Good point about SSL.
> > > >
> > > > I just asked Todd to take a look.
> > > >
> > > > Thanks,
> > > >
> > > > Jiangjie (Becket) Qin
> > > >
> > > > On Tue, Mar 7, 2017 at 2:17 PM, Jun Rao  wrote:
> > > >
> > > > > Hi, Jiangjie,
> > > > >
> > > > > Yes, I agree that byte rate already protects the network threads
> > > > > indirectly. I am not sure if byte rate fully captures the CPU
> > overhead
> > > in
> > > > > network due to SSL. So, at the high level, we can use request time
> > > limit
> > > > to
> > > > > protect CPU and use byte rate to protect storage and network.
> > > > >
> > > > > Also, do you think you can get Todd to comment on this KIP?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jun
> > > > >
> > > > > On Tue, Mar 7, 2017 at 11:21 AM, Becket Qin 
> > > > wrote:
> > > > >
> > > > > > Hi Rajini/Jun,
> > > > > >
> > > > > > The percentage based reasoning sounds good.
> > > > > > One thing I am wondering is that if we assume the network thread
> > are
> > > > just
> > > > > > doing the network IO, can we say bytes rate quota is already sort
> > of
> > > > > > network threads quota?
> > > > > > If we take network threads into the consideration here, would
> that
> > be
> > > > > > somewhat overlapping with the bytes rate quota?
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jiangjie (Becket) Qin
> > > > > >
> > > > > > On Tue, Mar 7, 2017 at 11:04 AM, Rajini Sivaram <
> > > > rajinisiva...@gmail.com
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Jun,
> > > > > > >
> > > > > > > Thank you for the 

Re: [DISCUSS] KIP-124: Request rate quotas

2017-03-08 Thread Todd Palino
Rajini -

I understand what you’re saying, but the point I’m making is that I don’t
believe we need to take it into account directly. The CPU utilization of
the network threads is directly proportional to the number of bytes being
sent. The more bytes, the more CPU that is required for SSL (or other
tasks). This is opposed to the request handler threads, where there are a
number of factors that affect CPU utilization. This means that it’s not
necessary to separately quota network thread byte usage and CPU - if we
quota byte usage (which we already do), we have fixed the CPU usage at a
proportional amount.

Jun -

Thanks for the clarification there. I was thinking of the utilization
percentage as being fixed, not what the percentage reflects. I’m not tied
to either way of doing it, provided that we do not lock clients to a single
thread. For example, if I specify that a given client can use 10% of a
single thread, that should also mean they can use 1% on 10 threads.

-Todd



On Wed, Mar 8, 2017 at 8:57 AM, Jun Rao  wrote:

> Hi, Todd,
>
> Thanks for the feedback.
>
> I just want to clarify your second point. If the limit percentage is per
> thread and the thread counts are changed, the absolute processing limit for
> existing users haven't changed and there is no need to adjust them. On the
> other hand, if the limit percentage is of total thread pool capacity and
> the thread counts are changed, the effective processing limit for a user
> will change. So, to preserve the current processing limit, existing user
> limits have to be adjusted. If there is a hardware change, the effective
> processing limit for a user will change in either approach and the existing
> limit may need to be adjusted. However, hardware changes are less common
> than thread pool configuration changes.
>
> Thanks,
>
> Jun
>
> On Tue, Mar 7, 2017 at 4:45 PM, Todd Palino  wrote:
>
> > I’ve been following this one on and off, and overall it sounds good to
> me.
> >
> > - The SSL question is a good one. However, that type of overhead should
> be
> > proportional to the bytes rate, so I think that a bytes rate quota would
> > still be a suitable way to address it.
> >
> > - I think it’s better to make the quota percentage of total thread pool
> > capacity, and not percentage of an individual thread. That way you don’t
> > have to adjust it when you adjust thread counts (tuning, hardware
> changes,
> > etc.)
> >
> >
> > -Todd
> >
> >
> >
> > On Tue, Mar 7, 2017 at 2:38 PM, Becket Qin  wrote:
> >
> > > I see. Good point about SSL.
> > >
> > > I just asked Todd to take a look.
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qin
> > >
> > > On Tue, Mar 7, 2017 at 2:17 PM, Jun Rao  wrote:
> > >
> > > > Hi, Jiangjie,
> > > >
> > > > Yes, I agree that byte rate already protects the network threads
> > > > indirectly. I am not sure if byte rate fully captures the CPU
> overhead
> > in
> > > > network due to SSL. So, at the high level, we can use request time
> > limit
> > > to
> > > > protect CPU and use byte rate to protect storage and network.
> > > >
> > > > Also, do you think you can get Todd to comment on this KIP?
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > > On Tue, Mar 7, 2017 at 11:21 AM, Becket Qin 
> > > wrote:
> > > >
> > > > > Hi Rajini/Jun,
> > > > >
> > > > > The percentage based reasoning sounds good.
> > > > > One thing I am wondering is that if we assume the network thread
> are
> > > just
> > > > > doing the network IO, can we say bytes rate quota is already sort
> of
> > > > > network threads quota?
> > > > > If we take network threads into the consideration here, would that
> be
> > > > > somewhat overlapping with the bytes rate quota?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jiangjie (Becket) Qin
> > > > >
> > > > > On Tue, Mar 7, 2017 at 11:04 AM, Rajini Sivaram <
> > > rajinisiva...@gmail.com
> > > > >
> > > > > wrote:
> > > > >
> > > > > > Jun,
> > > > > >
> > > > > > Thank you for the explanation, I hadn't realized you meant
> > percentage
> > > > of
> > > > > > the total thread pool. If everyone is OK with Jun's suggestion, I
> > > will
> > > > > > update the KIP.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Rajini
> > > > > >
> > > > > > On Tue, Mar 7, 2017 at 5:08 PM, Jun Rao 
> wrote:
> > > > > >
> > > > > > > Hi, Rajini,
> > > > > > >
> > > > > > > Let's take your example. Let's say a user sets the limit to
> 50%.
> > I
> > > am
> > > > > not
> > > > > > > sure if it's better to apply the same percentage separately to
> > > > network
> > > > > > and
> > > > > > > io thread pool. For example, for produce requests, most of the
> > time
> > > > > will
> > > > > > be
> > > > > > > spent in the io threads whereas for fetch requests, most of the
> > > time
> > > > > will
> > > > > > > be in the network threads. So, using the same percentage in
> both
> > > > thread
> > 

Re: [DISCUSS] KIP-124: Request rate quotas

2017-03-08 Thread Jun Rao
Hi, Todd,

Thanks for the feedback.

I just want to clarify your second point. If the limit percentage is per
thread and the thread counts are changed, the absolute processing limit for
existing users haven't changed and there is no need to adjust them. On the
other hand, if the limit percentage is of total thread pool capacity and
the thread counts are changed, the effective processing limit for a user
will change. So, to preserve the current processing limit, existing user
limits have to be adjusted. If there is a hardware change, the effective
processing limit for a user will change in either approach and the existing
limit may need to be adjusted. However, hardware changes are less common
than thread pool configuration changes.

Thanks,

Jun

On Tue, Mar 7, 2017 at 4:45 PM, Todd Palino  wrote:

> I’ve been following this one on and off, and overall it sounds good to me.
>
> - The SSL question is a good one. However, that type of overhead should be
> proportional to the bytes rate, so I think that a bytes rate quota would
> still be a suitable way to address it.
>
> - I think it’s better to make the quota percentage of total thread pool
> capacity, and not percentage of an individual thread. That way you don’t
> have to adjust it when you adjust thread counts (tuning, hardware changes,
> etc.)
>
>
> -Todd
>
>
>
> On Tue, Mar 7, 2017 at 2:38 PM, Becket Qin  wrote:
>
> > I see. Good point about SSL.
> >
> > I just asked Todd to take a look.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> > On Tue, Mar 7, 2017 at 2:17 PM, Jun Rao  wrote:
> >
> > > Hi, Jiangjie,
> > >
> > > Yes, I agree that byte rate already protects the network threads
> > > indirectly. I am not sure if byte rate fully captures the CPU overhead
> in
> > > network due to SSL. So, at the high level, we can use request time
> limit
> > to
> > > protect CPU and use byte rate to protect storage and network.
> > >
> > > Also, do you think you can get Todd to comment on this KIP?
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Tue, Mar 7, 2017 at 11:21 AM, Becket Qin 
> > wrote:
> > >
> > > > Hi Rajini/Jun,
> > > >
> > > > The percentage based reasoning sounds good.
> > > > One thing I am wondering is that if we assume the network thread are
> > just
> > > > doing the network IO, can we say bytes rate quota is already sort of
> > > > network threads quota?
> > > > If we take network threads into the consideration here, would that be
> > > > somewhat overlapping with the bytes rate quota?
> > > >
> > > > Thanks,
> > > >
> > > > Jiangjie (Becket) Qin
> > > >
> > > > On Tue, Mar 7, 2017 at 11:04 AM, Rajini Sivaram <
> > rajinisiva...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > Jun,
> > > > >
> > > > > Thank you for the explanation, I hadn't realized you meant
> percentage
> > > of
> > > > > the total thread pool. If everyone is OK with Jun's suggestion, I
> > will
> > > > > update the KIP.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Rajini
> > > > >
> > > > > On Tue, Mar 7, 2017 at 5:08 PM, Jun Rao  wrote:
> > > > >
> > > > > > Hi, Rajini,
> > > > > >
> > > > > > Let's take your example. Let's say a user sets the limit to 50%.
> I
> > am
> > > > not
> > > > > > sure if it's better to apply the same percentage separately to
> > > network
> > > > > and
> > > > > > io thread pool. For example, for produce requests, most of the
> time
> > > > will
> > > > > be
> > > > > > spent in the io threads whereas for fetch requests, most of the
> > time
> > > > will
> > > > > > be in the network threads. So, using the same percentage in both
> > > thread
> > > > > > pools means one of the pools' resource will be over allocated.
> > > > > >
> > > > > > An alternative way is to simply model network and io thread pool
> > > > > together.
> > > > > > If you get 10 io threads and 5 network threads, you get 1500%
> > request
> > > > > > processing power. A 50% limit means a total of 750% processing
> > power.
> > > > We
> > > > > > just add up the time a user request spent in either network or io
> > > > thread.
> > > > > > If that total exceeds 750% (doesn't matter whether it's spent
> more
> > in
> > > > > > network or io thread), the request will be throttled. This seems
> > more
> > > > > > general and is not sensitive to the current implementation detail
> > of
> > > > > having
> > > > > > a separate network and io thread pool. In the future, if the
> > > threading
> > > > > > model changes, the same concept of quota can still be applied.
> For
> > > now,
> > > > > > since it's a bit tricky to add the delay logic in the network
> > thread
> > > > > pool,
> > > > > > we could probably just do the delaying only in the io threads as
> > you
> > > > > > suggested earlier.
> > > > > >
> > > > > > There is still the orthogonal question of whether a quota of 50%
> is
> > > out
> > > > > of
> > > > > > 100% or 100% * #total processing threads. My 

Re: [DISCUSS] KIP-124: Request rate quotas

2017-03-08 Thread Rajini Sivaram
Hi Todd,

Thank you for the review.

For SSL, the case that is not covered is Scenario 6 in the KIP that Ismael
pointed out. For clusters with only SSL or PLAINTEXT, byte rate quotas work
well, but for clusters with both SSL and PLAINTEXT, network thread
utilization also needs to be taken into account.

For percentage used in quota configuration, looks like opinion is still
split between an overall percentage and per-thread percentage. Will wait
for Jun to respond before updating the KIP either way.

Regards,

Rajini

On Wed, Mar 8, 2017 at 12:45 AM, Todd Palino  wrote:

> I’ve been following this one on and off, and overall it sounds good to me.
>
> - The SSL question is a good one. However, that type of overhead should be
> proportional to the bytes rate, so I think that a bytes rate quota would
> still be a suitable way to address it.
>
> - I think it’s better to make the quota percentage of total thread pool
> capacity, and not percentage of an individual thread. That way you don’t
> have to adjust it when you adjust thread counts (tuning, hardware changes,
> etc.)
>
>
> -Todd
>
>
>
> On Tue, Mar 7, 2017 at 2:38 PM, Becket Qin  wrote:
>
> > I see. Good point about SSL.
> >
> > I just asked Todd to take a look.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> > On Tue, Mar 7, 2017 at 2:17 PM, Jun Rao  wrote:
> >
> > > Hi, Jiangjie,
> > >
> > > Yes, I agree that byte rate already protects the network threads
> > > indirectly. I am not sure if byte rate fully captures the CPU overhead
> in
> > > network due to SSL. So, at the high level, we can use request time
> limit
> > to
> > > protect CPU and use byte rate to protect storage and network.
> > >
> > > Also, do you think you can get Todd to comment on this KIP?
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Tue, Mar 7, 2017 at 11:21 AM, Becket Qin 
> > wrote:
> > >
> > > > Hi Rajini/Jun,
> > > >
> > > > The percentage based reasoning sounds good.
> > > > One thing I am wondering is that if we assume the network thread are
> > just
> > > > doing the network IO, can we say bytes rate quota is already sort of
> > > > network threads quota?
> > > > If we take network threads into the consideration here, would that be
> > > > somewhat overlapping with the bytes rate quota?
> > > >
> > > > Thanks,
> > > >
> > > > Jiangjie (Becket) Qin
> > > >
> > > > On Tue, Mar 7, 2017 at 11:04 AM, Rajini Sivaram <
> > rajinisiva...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > Jun,
> > > > >
> > > > > Thank you for the explanation, I hadn't realized you meant
> percentage
> > > of
> > > > > the total thread pool. If everyone is OK with Jun's suggestion, I
> > will
> > > > > update the KIP.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Rajini
> > > > >
> > > > > On Tue, Mar 7, 2017 at 5:08 PM, Jun Rao  wrote:
> > > > >
> > > > > > Hi, Rajini,
> > > > > >
> > > > > > Let's take your example. Let's say a user sets the limit to 50%.
> I
> > am
> > > > not
> > > > > > sure if it's better to apply the same percentage separately to
> > > network
> > > > > and
> > > > > > io thread pool. For example, for produce requests, most of the
> time
> > > > will
> > > > > be
> > > > > > spent in the io threads whereas for fetch requests, most of the
> > time
> > > > will
> > > > > > be in the network threads. So, using the same percentage in both
> > > thread
> > > > > > pools means one of the pools' resource will be over allocated.
> > > > > >
> > > > > > An alternative way is to simply model network and io thread pool
> > > > > together.
> > > > > > If you get 10 io threads and 5 network threads, you get 1500%
> > request
> > > > > > processing power. A 50% limit means a total of 750% processing
> > power.
> > > > We
> > > > > > just add up the time a user request spent in either network or io
> > > > thread.
> > > > > > If that total exceeds 750% (doesn't matter whether it's spent
> more
> > in
> > > > > > network or io thread), the request will be throttled. This seems
> > more
> > > > > > general and is not sensitive to the current implementation detail
> > of
> > > > > having
> > > > > > a separate network and io thread pool. In the future, if the
> > > threading
> > > > > > model changes, the same concept of quota can still be applied.
> For
> > > now,
> > > > > > since it's a bit tricky to add the delay logic in the network
> > thread
> > > > > pool,
> > > > > > we could probably just do the delaying only in the io threads as
> > you
> > > > > > suggested earlier.
> > > > > >
> > > > > > There is still the orthogonal question of whether a quota of 50%
> is
> > > out
> > > > > of
> > > > > > 100% or 100% * #total processing threads. My feeling is that the
> > > latter
> > > > > is
> > > > > > slightly better based on my explanation earlier. The way to
> > describe
> > > > this
> > > > > > quota to the users can be "share of elapsed request processing
> time
> > > on
> > 

Re: [DISCUSS] KIP-124: Request rate quotas

2017-03-07 Thread Todd Palino
I’ve been following this one on and off, and overall it sounds good to me.

- The SSL question is a good one. However, that type of overhead should be
proportional to the bytes rate, so I think that a bytes rate quota would
still be a suitable way to address it.

- I think it’s better to make the quota percentage of total thread pool
capacity, and not percentage of an individual thread. That way you don’t
have to adjust it when you adjust thread counts (tuning, hardware changes,
etc.)


-Todd



On Tue, Mar 7, 2017 at 2:38 PM, Becket Qin  wrote:

> I see. Good point about SSL.
>
> I just asked Todd to take a look.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Tue, Mar 7, 2017 at 2:17 PM, Jun Rao  wrote:
>
> > Hi, Jiangjie,
> >
> > Yes, I agree that byte rate already protects the network threads
> > indirectly. I am not sure if byte rate fully captures the CPU overhead in
> > network due to SSL. So, at the high level, we can use request time limit
> to
> > protect CPU and use byte rate to protect storage and network.
> >
> > Also, do you think you can get Todd to comment on this KIP?
> >
> > Thanks,
> >
> > Jun
> >
> > On Tue, Mar 7, 2017 at 11:21 AM, Becket Qin 
> wrote:
> >
> > > Hi Rajini/Jun,
> > >
> > > The percentage based reasoning sounds good.
> > > One thing I am wondering is that if we assume the network thread are
> just
> > > doing the network IO, can we say bytes rate quota is already sort of
> > > network threads quota?
> > > If we take network threads into the consideration here, would that be
> > > somewhat overlapping with the bytes rate quota?
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qin
> > >
> > > On Tue, Mar 7, 2017 at 11:04 AM, Rajini Sivaram <
> rajinisiva...@gmail.com
> > >
> > > wrote:
> > >
> > > > Jun,
> > > >
> > > > Thank you for the explanation, I hadn't realized you meant percentage
> > of
> > > > the total thread pool. If everyone is OK with Jun's suggestion, I
> will
> > > > update the KIP.
> > > >
> > > > Thanks,
> > > >
> > > > Rajini
> > > >
> > > > On Tue, Mar 7, 2017 at 5:08 PM, Jun Rao  wrote:
> > > >
> > > > > Hi, Rajini,
> > > > >
> > > > > Let's take your example. Let's say a user sets the limit to 50%. I
> am
> > > not
> > > > > sure if it's better to apply the same percentage separately to
> > network
> > > > and
> > > > > io thread pool. For example, for produce requests, most of the time
> > > will
> > > > be
> > > > > spent in the io threads whereas for fetch requests, most of the
> time
> > > will
> > > > > be in the network threads. So, using the same percentage in both
> > thread
> > > > > pools means one of the pools' resource will be over allocated.
> > > > >
> > > > > An alternative way is to simply model network and io thread pool
> > > > together.
> > > > > If you get 10 io threads and 5 network threads, you get 1500%
> request
> > > > > processing power. A 50% limit means a total of 750% processing
> power.
> > > We
> > > > > just add up the time a user request spent in either network or io
> > > thread.
> > > > > If that total exceeds 750% (doesn't matter whether it's spent more
> in
> > > > > network or io thread), the request will be throttled. This seems
> more
> > > > > general and is not sensitive to the current implementation detail
> of
> > > > having
> > > > > a separate network and io thread pool. In the future, if the
> > threading
> > > > > model changes, the same concept of quota can still be applied. For
> > now,
> > > > > since it's a bit tricky to add the delay logic in the network
> thread
> > > > pool,
> > > > > we could probably just do the delaying only in the io threads as
> you
> > > > > suggested earlier.
> > > > >
> > > > > There is still the orthogonal question of whether a quota of 50% is
> > out
> > > > of
> > > > > 100% or 100% * #total processing threads. My feeling is that the
> > latter
> > > > is
> > > > > slightly better based on my explanation earlier. The way to
> describe
> > > this
> > > > > quota to the users can be "share of elapsed request processing time
> > on
> > > a
> > > > > single CPU" (similar to top).
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jun
> > > > >
> > > > >
> > > > > On Fri, Mar 3, 2017 at 4:22 AM, Rajini Sivaram <
> > > rajinisiva...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Jun,
> > > > > >
> > > > > > Agree about the two scenarios.
> > > > > >
> > > > > > But still not sure about a single quota covering both network
> > threads
> > > > and
> > > > > > I/O threads with per-thread quota. If there are 10 I/O threads
> and
> > 5
> > > > > > network threads and I want to assign half the quota to userA, the
> > > quota
> > > > > > would be 750%. I imagine, internally, we would convert this to
> 500%
> > > for
> > > > > I/O
> > > > > > and 250% for network threads to allocate 50% of each pool.
> > > > > >
> > > > > > A couple of scenarios:
> > > > > >
> > > > > > 1. Admin adds 1 extra network 

Re: [DISCUSS] KIP-124: Request rate quotas

2017-03-07 Thread Becket Qin
I see. Good point about SSL.

I just asked Todd to take a look.

Thanks,

Jiangjie (Becket) Qin

On Tue, Mar 7, 2017 at 2:17 PM, Jun Rao  wrote:

> Hi, Jiangjie,
>
> Yes, I agree that byte rate already protects the network threads
> indirectly. I am not sure if byte rate fully captures the CPU overhead in
> network due to SSL. So, at the high level, we can use request time limit to
> protect CPU and use byte rate to protect storage and network.
>
> Also, do you think you can get Todd to comment on this KIP?
>
> Thanks,
>
> Jun
>
> On Tue, Mar 7, 2017 at 11:21 AM, Becket Qin  wrote:
>
> > Hi Rajini/Jun,
> >
> > The percentage based reasoning sounds good.
> > One thing I am wondering is that if we assume the network thread are just
> > doing the network IO, can we say bytes rate quota is already sort of
> > network threads quota?
> > If we take network threads into the consideration here, would that be
> > somewhat overlapping with the bytes rate quota?
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> > On Tue, Mar 7, 2017 at 11:04 AM, Rajini Sivaram  >
> > wrote:
> >
> > > Jun,
> > >
> > > Thank you for the explanation, I hadn't realized you meant percentage
> of
> > > the total thread pool. If everyone is OK with Jun's suggestion, I will
> > > update the KIP.
> > >
> > > Thanks,
> > >
> > > Rajini
> > >
> > > On Tue, Mar 7, 2017 at 5:08 PM, Jun Rao  wrote:
> > >
> > > > Hi, Rajini,
> > > >
> > > > Let's take your example. Let's say a user sets the limit to 50%. I am
> > not
> > > > sure if it's better to apply the same percentage separately to
> network
> > > and
> > > > io thread pool. For example, for produce requests, most of the time
> > will
> > > be
> > > > spent in the io threads whereas for fetch requests, most of the time
> > will
> > > > be in the network threads. So, using the same percentage in both
> thread
> > > > pools means one of the pools' resource will be over allocated.
> > > >
> > > > An alternative way is to simply model network and io thread pool
> > > together.
> > > > If you get 10 io threads and 5 network threads, you get 1500% request
> > > > processing power. A 50% limit means a total of 750% processing power.
> > We
> > > > just add up the time a user request spent in either network or io
> > thread.
> > > > If that total exceeds 750% (doesn't matter whether it's spent more in
> > > > network or io thread), the request will be throttled. This seems more
> > > > general and is not sensitive to the current implementation detail of
> > > having
> > > > a separate network and io thread pool. In the future, if the
> threading
> > > > model changes, the same concept of quota can still be applied. For
> now,
> > > > since it's a bit tricky to add the delay logic in the network thread
> > > pool,
> > > > we could probably just do the delaying only in the io threads as you
> > > > suggested earlier.
> > > >
> > > > There is still the orthogonal question of whether a quota of 50% is
> out
> > > of
> > > > 100% or 100% * #total processing threads. My feeling is that the
> latter
> > > is
> > > > slightly better based on my explanation earlier. The way to describe
> > this
> > > > quota to the users can be "share of elapsed request processing time
> on
> > a
> > > > single CPU" (similar to top).
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > >
> > > > On Fri, Mar 3, 2017 at 4:22 AM, Rajini Sivaram <
> > rajinisiva...@gmail.com>
> > > > wrote:
> > > >
> > > > > Jun,
> > > > >
> > > > > Agree about the two scenarios.
> > > > >
> > > > > But still not sure about a single quota covering both network
> threads
> > > and
> > > > > I/O threads with per-thread quota. If there are 10 I/O threads and
> 5
> > > > > network threads and I want to assign half the quota to userA, the
> > quota
> > > > > would be 750%. I imagine, internally, we would convert this to 500%
> > for
> > > > I/O
> > > > > and 250% for network threads to allocate 50% of each pool.
> > > > >
> > > > > A couple of scenarios:
> > > > >
> > > > > 1. Admin adds 1 extra network thread. To retain 50%, admin needs to
> > now
> > > > > allocate 800% for each user. Or increase the quota for a few users.
> > To
> > > > me,
> > > > > it feels like admin needs to convert 50% to 800% and Kafka
> internally
> > > > needs
> > > > > to convert 800% to (500%, 300%). Everyone using just 50% feels a
> lot
> > > > > simpler.
> > > > >
> > > > > 2. We decide to add some other thread to this list. Admin needs to
> > know
> > > > > exactly how many threads form the maximum quota. And we can be
> > changing
> > > > > this between broker versions as we add more to the list. Again a
> > single
> > > > > overall percent would be a lot simpler.
> > > > >
> > > > > There were others who were unconvinced by a single percent from the
> > > > initial
> > > > > proposal and were happier with thread units similar to CPU units,
> so
> > I
> > > am
> > > > > ok 

Re: [DISCUSS] KIP-124: Request rate quotas

2017-03-07 Thread Jun Rao
Hi, Jiangjie,

Yes, I agree that byte rate already protects the network threads
indirectly. I am not sure if byte rate fully captures the CPU overhead in
network due to SSL. So, at the high level, we can use request time limit to
protect CPU and use byte rate to protect storage and network.

Also, do you think you can get Todd to comment on this KIP?

Thanks,

Jun

On Tue, Mar 7, 2017 at 11:21 AM, Becket Qin  wrote:

> Hi Rajini/Jun,
>
> The percentage based reasoning sounds good.
> One thing I am wondering is that if we assume the network thread are just
> doing the network IO, can we say bytes rate quota is already sort of
> network threads quota?
> If we take network threads into the consideration here, would that be
> somewhat overlapping with the bytes rate quota?
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Tue, Mar 7, 2017 at 11:04 AM, Rajini Sivaram 
> wrote:
>
> > Jun,
> >
> > Thank you for the explanation, I hadn't realized you meant percentage of
> > the total thread pool. If everyone is OK with Jun's suggestion, I will
> > update the KIP.
> >
> > Thanks,
> >
> > Rajini
> >
> > On Tue, Mar 7, 2017 at 5:08 PM, Jun Rao  wrote:
> >
> > > Hi, Rajini,
> > >
> > > Let's take your example. Let's say a user sets the limit to 50%. I am
> not
> > > sure if it's better to apply the same percentage separately to network
> > and
> > > io thread pool. For example, for produce requests, most of the time
> will
> > be
> > > spent in the io threads whereas for fetch requests, most of the time
> will
> > > be in the network threads. So, using the same percentage in both thread
> > > pools means one of the pools' resource will be over allocated.
> > >
> > > An alternative way is to simply model network and io thread pool
> > together.
> > > If you get 10 io threads and 5 network threads, you get 1500% request
> > > processing power. A 50% limit means a total of 750% processing power.
> We
> > > just add up the time a user request spent in either network or io
> thread.
> > > If that total exceeds 750% (doesn't matter whether it's spent more in
> > > network or io thread), the request will be throttled. This seems more
> > > general and is not sensitive to the current implementation detail of
> > having
> > > a separate network and io thread pool. In the future, if the threading
> > > model changes, the same concept of quota can still be applied. For now,
> > > since it's a bit tricky to add the delay logic in the network thread
> > pool,
> > > we could probably just do the delaying only in the io threads as you
> > > suggested earlier.
> > >
> > > There is still the orthogonal question of whether a quota of 50% is out
> > of
> > > 100% or 100% * #total processing threads. My feeling is that the latter
> > is
> > > slightly better based on my explanation earlier. The way to describe
> this
> > > quota to the users can be "share of elapsed request processing time on
> a
> > > single CPU" (similar to top).
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > > On Fri, Mar 3, 2017 at 4:22 AM, Rajini Sivaram <
> rajinisiva...@gmail.com>
> > > wrote:
> > >
> > > > Jun,
> > > >
> > > > Agree about the two scenarios.
> > > >
> > > > But still not sure about a single quota covering both network threads
> > and
> > > > I/O threads with per-thread quota. If there are 10 I/O threads and 5
> > > > network threads and I want to assign half the quota to userA, the
> quota
> > > > would be 750%. I imagine, internally, we would convert this to 500%
> for
> > > I/O
> > > > and 250% for network threads to allocate 50% of each pool.
> > > >
> > > > A couple of scenarios:
> > > >
> > > > 1. Admin adds 1 extra network thread. To retain 50%, admin needs to
> now
> > > > allocate 800% for each user. Or increase the quota for a few users.
> To
> > > me,
> > > > it feels like admin needs to convert 50% to 800% and Kafka internally
> > > needs
> > > > to convert 800% to (500%, 300%). Everyone using just 50% feels a lot
> > > > simpler.
> > > >
> > > > 2. We decide to add some other thread to this list. Admin needs to
> know
> > > > exactly how many threads form the maximum quota. And we can be
> changing
> > > > this between broker versions as we add more to the list. Again a
> single
> > > > overall percent would be a lot simpler.
> > > >
> > > > There were others who were unconvinced by a single percent from the
> > > initial
> > > > proposal and were happier with thread units similar to CPU units, so
> I
> > am
> > > > ok with going with per-thread quotas (as units or percent). Just not
> > sure
> > > > it makes it easier for admin in all cases.
> > > >
> > > > Regards,
> > > >
> > > > Rajini
> > > >
> > > >
> > > > On Fri, Mar 3, 2017 at 6:03 AM, Jun Rao  wrote:
> > > >
> > > > > Hi, Rajini,
> > > > >
> > > > > Consider modeling as n * 100% unit. For 2), the question is what's
> > > > causing
> > > > > the I/O threads to be saturated. It's unlikely that 

Re: [DISCUSS] KIP-124: Request rate quotas

2017-03-07 Thread Becket Qin
Hi Rajini/Jun,

The percentage based reasoning sounds good.
One thing I am wondering is that if we assume the network thread are just
doing the network IO, can we say bytes rate quota is already sort of
network threads quota?
If we take network threads into the consideration here, would that be
somewhat overlapping with the bytes rate quota?

Thanks,

Jiangjie (Becket) Qin

On Tue, Mar 7, 2017 at 11:04 AM, Rajini Sivaram 
wrote:

> Jun,
>
> Thank you for the explanation, I hadn't realized you meant percentage of
> the total thread pool. If everyone is OK with Jun's suggestion, I will
> update the KIP.
>
> Thanks,
>
> Rajini
>
> On Tue, Mar 7, 2017 at 5:08 PM, Jun Rao  wrote:
>
> > Hi, Rajini,
> >
> > Let's take your example. Let's say a user sets the limit to 50%. I am not
> > sure if it's better to apply the same percentage separately to network
> and
> > io thread pool. For example, for produce requests, most of the time will
> be
> > spent in the io threads whereas for fetch requests, most of the time will
> > be in the network threads. So, using the same percentage in both thread
> > pools means one of the pools' resource will be over allocated.
> >
> > An alternative way is to simply model network and io thread pool
> together.
> > If you get 10 io threads and 5 network threads, you get 1500% request
> > processing power. A 50% limit means a total of 750% processing power. We
> > just add up the time a user request spent in either network or io thread.
> > If that total exceeds 750% (doesn't matter whether it's spent more in
> > network or io thread), the request will be throttled. This seems more
> > general and is not sensitive to the current implementation detail of
> having
> > a separate network and io thread pool. In the future, if the threading
> > model changes, the same concept of quota can still be applied. For now,
> > since it's a bit tricky to add the delay logic in the network thread
> pool,
> > we could probably just do the delaying only in the io threads as you
> > suggested earlier.
> >
> > There is still the orthogonal question of whether a quota of 50% is out
> of
> > 100% or 100% * #total processing threads. My feeling is that the latter
> is
> > slightly better based on my explanation earlier. The way to describe this
> > quota to the users can be "share of elapsed request processing time on a
> > single CPU" (similar to top).
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Fri, Mar 3, 2017 at 4:22 AM, Rajini Sivaram 
> > wrote:
> >
> > > Jun,
> > >
> > > Agree about the two scenarios.
> > >
> > > But still not sure about a single quota covering both network threads
> and
> > > I/O threads with per-thread quota. If there are 10 I/O threads and 5
> > > network threads and I want to assign half the quota to userA, the quota
> > > would be 750%. I imagine, internally, we would convert this to 500% for
> > I/O
> > > and 250% for network threads to allocate 50% of each pool.
> > >
> > > A couple of scenarios:
> > >
> > > 1. Admin adds 1 extra network thread. To retain 50%, admin needs to now
> > > allocate 800% for each user. Or increase the quota for a few users. To
> > me,
> > > it feels like admin needs to convert 50% to 800% and Kafka internally
> > needs
> > > to convert 800% to (500%, 300%). Everyone using just 50% feels a lot
> > > simpler.
> > >
> > > 2. We decide to add some other thread to this list. Admin needs to know
> > > exactly how many threads form the maximum quota. And we can be changing
> > > this between broker versions as we add more to the list. Again a single
> > > overall percent would be a lot simpler.
> > >
> > > There were others who were unconvinced by a single percent from the
> > initial
> > > proposal and were happier with thread units similar to CPU units, so I
> am
> > > ok with going with per-thread quotas (as units or percent). Just not
> sure
> > > it makes it easier for admin in all cases.
> > >
> > > Regards,
> > >
> > > Rajini
> > >
> > >
> > > On Fri, Mar 3, 2017 at 6:03 AM, Jun Rao  wrote:
> > >
> > > > Hi, Rajini,
> > > >
> > > > Consider modeling as n * 100% unit. For 2), the question is what's
> > > causing
> > > > the I/O threads to be saturated. It's unlikely that all users'
> > > utilization
> > > > have increased at the same. A more likely case is that a few isolated
> > > > users' utilization have increased. If so, after increasing the number
> > of
> > > > threads, the admin just needs to adjust the quota for a few isolated
> > > users,
> > > > which is expected and is less work.
> > > >
> > > > Consider modeling as 1 * 100% unit. For 1), all users' quota need to
> be
> > > > adjusted, which is unexpected and is more work.
> > > >
> > > > So, to me, the n * 100% model seems more convenient.
> > > >
> > > > As for future extension to cover network thread utilization, I was
> > > thinking
> > > > that one way is to simply model the capacity as (n + m) * 

Re: [DISCUSS] KIP-124: Request rate quotas

2017-03-07 Thread Rajini Sivaram
Jun,

Thank you for the explanation, I hadn't realized you meant percentage of
the total thread pool. If everyone is OK with Jun's suggestion, I will
update the KIP.

Thanks,

Rajini

On Tue, Mar 7, 2017 at 5:08 PM, Jun Rao  wrote:

> Hi, Rajini,
>
> Let's take your example. Let's say a user sets the limit to 50%. I am not
> sure if it's better to apply the same percentage separately to network and
> io thread pool. For example, for produce requests, most of the time will be
> spent in the io threads whereas for fetch requests, most of the time will
> be in the network threads. So, using the same percentage in both thread
> pools means one of the pools' resource will be over allocated.
>
> An alternative way is to simply model network and io thread pool together.
> If you get 10 io threads and 5 network threads, you get 1500% request
> processing power. A 50% limit means a total of 750% processing power. We
> just add up the time a user request spent in either network or io thread.
> If that total exceeds 750% (doesn't matter whether it's spent more in
> network or io thread), the request will be throttled. This seems more
> general and is not sensitive to the current implementation detail of having
> a separate network and io thread pool. In the future, if the threading
> model changes, the same concept of quota can still be applied. For now,
> since it's a bit tricky to add the delay logic in the network thread pool,
> we could probably just do the delaying only in the io threads as you
> suggested earlier.
>
> There is still the orthogonal question of whether a quota of 50% is out of
> 100% or 100% * #total processing threads. My feeling is that the latter is
> slightly better based on my explanation earlier. The way to describe this
> quota to the users can be "share of elapsed request processing time on a
> single CPU" (similar to top).
>
> Thanks,
>
> Jun
>
>
> On Fri, Mar 3, 2017 at 4:22 AM, Rajini Sivaram 
> wrote:
>
> > Jun,
> >
> > Agree about the two scenarios.
> >
> > But still not sure about a single quota covering both network threads and
> > I/O threads with per-thread quota. If there are 10 I/O threads and 5
> > network threads and I want to assign half the quota to userA, the quota
> > would be 750%. I imagine, internally, we would convert this to 500% for
> I/O
> > and 250% for network threads to allocate 50% of each pool.
> >
> > A couple of scenarios:
> >
> > 1. Admin adds 1 extra network thread. To retain 50%, admin needs to now
> > allocate 800% for each user. Or increase the quota for a few users. To
> me,
> > it feels like admin needs to convert 50% to 800% and Kafka internally
> needs
> > to convert 800% to (500%, 300%). Everyone using just 50% feels a lot
> > simpler.
> >
> > 2. We decide to add some other thread to this list. Admin needs to know
> > exactly how many threads form the maximum quota. And we can be changing
> > this between broker versions as we add more to the list. Again a single
> > overall percent would be a lot simpler.
> >
> > There were others who were unconvinced by a single percent from the
> initial
> > proposal and were happier with thread units similar to CPU units, so I am
> > ok with going with per-thread quotas (as units or percent). Just not sure
> > it makes it easier for admin in all cases.
> >
> > Regards,
> >
> > Rajini
> >
> >
> > On Fri, Mar 3, 2017 at 6:03 AM, Jun Rao  wrote:
> >
> > > Hi, Rajini,
> > >
> > > Consider modeling as n * 100% unit. For 2), the question is what's
> > causing
> > > the I/O threads to be saturated. It's unlikely that all users'
> > utilization
> > > have increased at the same. A more likely case is that a few isolated
> > > users' utilization have increased. If so, after increasing the number
> of
> > > threads, the admin just needs to adjust the quota for a few isolated
> > users,
> > > which is expected and is less work.
> > >
> > > Consider modeling as 1 * 100% unit. For 1), all users' quota need to be
> > > adjusted, which is unexpected and is more work.
> > >
> > > So, to me, the n * 100% model seems more convenient.
> > >
> > > As for future extension to cover network thread utilization, I was
> > thinking
> > > that one way is to simply model the capacity as (n + m) * 100% unit,
> > where
> > > n and m are the number of network and i/o threads, respectively. Then,
> > for
> > > each user, we can just add up the utilization in the network and the
> i/o
> > > thread. If we do this, we don't need a new type of quota.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > > On Thu, Mar 2, 2017 at 12:27 PM, Rajini Sivaram <
> rajinisiva...@gmail.com
> > >
> > > wrote:
> > >
> > > > Jun,
> > > >
> > > > If we use request.percentage as the percentage used in a single I/O
> > > thread,
> > > > the total percentage being allocated will be num.io.threads * 100 for
> > I/O
> > > > threads and num.network.threads * 100 for network threads. A single
> > quota
> > 

Re: [DISCUSS] KIP-124: Request rate quotas

2017-03-07 Thread Jun Rao
Hi, Rajini,

Let's take your example. Let's say a user sets the limit to 50%. I am not
sure if it's better to apply the same percentage separately to network and
io thread pool. For example, for produce requests, most of the time will be
spent in the io threads whereas for fetch requests, most of the time will
be in the network threads. So, using the same percentage in both thread
pools means one of the pools' resource will be over allocated.

An alternative way is to simply model network and io thread pool together.
If you get 10 io threads and 5 network threads, you get 1500% request
processing power. A 50% limit means a total of 750% processing power. We
just add up the time a user request spent in either network or io thread.
If that total exceeds 750% (doesn't matter whether it's spent more in
network or io thread), the request will be throttled. This seems more
general and is not sensitive to the current implementation detail of having
a separate network and io thread pool. In the future, if the threading
model changes, the same concept of quota can still be applied. For now,
since it's a bit tricky to add the delay logic in the network thread pool,
we could probably just do the delaying only in the io threads as you
suggested earlier.

There is still the orthogonal question of whether a quota of 50% is out of
100% or 100% * #total processing threads. My feeling is that the latter is
slightly better based on my explanation earlier. The way to describe this
quota to the users can be "share of elapsed request processing time on a
single CPU" (similar to top).

Thanks,

Jun


On Fri, Mar 3, 2017 at 4:22 AM, Rajini Sivaram 
wrote:

> Jun,
>
> Agree about the two scenarios.
>
> But still not sure about a single quota covering both network threads and
> I/O threads with per-thread quota. If there are 10 I/O threads and 5
> network threads and I want to assign half the quota to userA, the quota
> would be 750%. I imagine, internally, we would convert this to 500% for I/O
> and 250% for network threads to allocate 50% of each pool.
>
> A couple of scenarios:
>
> 1. Admin adds 1 extra network thread. To retain 50%, admin needs to now
> allocate 800% for each user. Or increase the quota for a few users. To me,
> it feels like admin needs to convert 50% to 800% and Kafka internally needs
> to convert 800% to (500%, 300%). Everyone using just 50% feels a lot
> simpler.
>
> 2. We decide to add some other thread to this list. Admin needs to know
> exactly how many threads form the maximum quota. And we can be changing
> this between broker versions as we add more to the list. Again a single
> overall percent would be a lot simpler.
>
> There were others who were unconvinced by a single percent from the initial
> proposal and were happier with thread units similar to CPU units, so I am
> ok with going with per-thread quotas (as units or percent). Just not sure
> it makes it easier for admin in all cases.
>
> Regards,
>
> Rajini
>
>
> On Fri, Mar 3, 2017 at 6:03 AM, Jun Rao  wrote:
>
> > Hi, Rajini,
> >
> > Consider modeling as n * 100% unit. For 2), the question is what's
> causing
> > the I/O threads to be saturated. It's unlikely that all users'
> utilization
> > have increased at the same. A more likely case is that a few isolated
> > users' utilization have increased. If so, after increasing the number of
> > threads, the admin just needs to adjust the quota for a few isolated
> users,
> > which is expected and is less work.
> >
> > Consider modeling as 1 * 100% unit. For 1), all users' quota need to be
> > adjusted, which is unexpected and is more work.
> >
> > So, to me, the n * 100% model seems more convenient.
> >
> > As for future extension to cover network thread utilization, I was
> thinking
> > that one way is to simply model the capacity as (n + m) * 100% unit,
> where
> > n and m are the number of network and i/o threads, respectively. Then,
> for
> > each user, we can just add up the utilization in the network and the i/o
> > thread. If we do this, we don't need a new type of quota.
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Thu, Mar 2, 2017 at 12:27 PM, Rajini Sivaram  >
> > wrote:
> >
> > > Jun,
> > >
> > > If we use request.percentage as the percentage used in a single I/O
> > thread,
> > > the total percentage being allocated will be num.io.threads * 100 for
> I/O
> > > threads and num.network.threads * 100 for network threads. A single
> quota
> > > covering the two as a percentage wouldn't quite work if you want to
> > > allocate the same proportion in both cases. If we want to treat threads
> > as
> > > separate units, won't we need two quota configurations regardless of
> > > whether we use units or percentage? Perhaps I misunderstood your
> > > suggestion.
> > >
> > > I think there are two cases:
> > >
> > >1. The use case that you mentioned where an admin is adding more
> users
> > >and decides to add more I/O 

Re: [DISCUSS] KIP-124: Request rate quotas

2017-03-03 Thread Rajini Sivaram
Jun,

Agree about the two scenarios.

But still not sure about a single quota covering both network threads and
I/O threads with per-thread quota. If there are 10 I/O threads and 5
network threads and I want to assign half the quota to userA, the quota
would be 750%. I imagine, internally, we would convert this to 500% for I/O
and 250% for network threads to allocate 50% of each pool.

A couple of scenarios:

1. Admin adds 1 extra network thread. To retain 50%, admin needs to now
allocate 800% for each user. Or increase the quota for a few users. To me,
it feels like admin needs to convert 50% to 800% and Kafka internally needs
to convert 800% to (500%, 300%). Everyone using just 50% feels a lot
simpler.

2. We decide to add some other thread to this list. Admin needs to know
exactly how many threads form the maximum quota. And we can be changing
this between broker versions as we add more to the list. Again a single
overall percent would be a lot simpler.

There were others who were unconvinced by a single percent from the initial
proposal and were happier with thread units similar to CPU units, so I am
ok with going with per-thread quotas (as units or percent). Just not sure
it makes it easier for admin in all cases.

Regards,

Rajini


On Fri, Mar 3, 2017 at 6:03 AM, Jun Rao  wrote:

> Hi, Rajini,
>
> Consider modeling as n * 100% unit. For 2), the question is what's causing
> the I/O threads to be saturated. It's unlikely that all users' utilization
> have increased at the same. A more likely case is that a few isolated
> users' utilization have increased. If so, after increasing the number of
> threads, the admin just needs to adjust the quota for a few isolated users,
> which is expected and is less work.
>
> Consider modeling as 1 * 100% unit. For 1), all users' quota need to be
> adjusted, which is unexpected and is more work.
>
> So, to me, the n * 100% model seems more convenient.
>
> As for future extension to cover network thread utilization, I was thinking
> that one way is to simply model the capacity as (n + m) * 100% unit, where
> n and m are the number of network and i/o threads, respectively. Then, for
> each user, we can just add up the utilization in the network and the i/o
> thread. If we do this, we don't need a new type of quota.
>
> Thanks,
>
> Jun
>
>
> On Thu, Mar 2, 2017 at 12:27 PM, Rajini Sivaram 
> wrote:
>
> > Jun,
> >
> > If we use request.percentage as the percentage used in a single I/O
> thread,
> > the total percentage being allocated will be num.io.threads * 100 for I/O
> > threads and num.network.threads * 100 for network threads. A single quota
> > covering the two as a percentage wouldn't quite work if you want to
> > allocate the same proportion in both cases. If we want to treat threads
> as
> > separate units, won't we need two quota configurations regardless of
> > whether we use units or percentage? Perhaps I misunderstood your
> > suggestion.
> >
> > I think there are two cases:
> >
> >1. The use case that you mentioned where an admin is adding more users
> >and decides to add more I/O threads and expects to find free quota to
> >allocate for new users.
> >2. Admin adds more I/O threads because the I/O threads are saturated
> and
> >there are cores available to allocate, even though the number or
> >users/clients hasn't changed.
> >
> > If we allocated treated I/O threads as a single unit of 100%, all user
> > quotas need to be reallocated for 1). If we allocated I/O threads as n
> > units with n*100%, all user quotas need to be reallocated for 2),
> otherwise
> > some of the new threads may just not be used. Either way it should be
> easy
> > to write a script to decrease/increase quotas by a multiple for all
> users.
> >
> > So it really boils down to which quota unit is most intuitive in terms of
> > configuration. And from the discussion so far, it feels like opinion is
> > divided on whether quotas should be carved out of an absolute 100% (or 1
> > unit) or be relative to the number of threads (n*100% or n units).
> >
> >
> >
> > On Thu, Mar 2, 2017 at 7:31 PM, Jun Rao  wrote:
> >
> > > Another way to express an absolute limit is to use request.percentage,
> > but
> > > treat it as the percentage used in a single request handling thread.
> For
> > > now, the request handling threads can be just the io threads. In the
> > > future, they can cover the network threads as well. This is similar to
> > how
> > > top reports CPU usage and may be a bit easier for people to understand.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Fri, Feb 24, 2017 at 10:31 AM, Jun Rao  wrote:
> > >
> > > > Hi, Jay,
> > > >
> > > > 2. Regarding request.unit vs request.percentage. I started with
> > > > request.percentage too. The reasoning for request.unit is the
> > following.
> > > > Suppose that the capacity has been reached on a broker and the admin
> > > needs
> > > 

Re: [DISCUSS] KIP-124: Request rate quotas

2017-03-02 Thread Jun Rao
Hi, Rajini,

Consider modeling as n * 100% unit. For 2), the question is what's causing
the I/O threads to be saturated. It's unlikely that all users' utilization
have increased at the same. A more likely case is that a few isolated
users' utilization have increased. If so, after increasing the number of
threads, the admin just needs to adjust the quota for a few isolated users,
which is expected and is less work.

Consider modeling as 1 * 100% unit. For 1), all users' quota need to be
adjusted, which is unexpected and is more work.

So, to me, the n * 100% model seems more convenient.

As for future extension to cover network thread utilization, I was thinking
that one way is to simply model the capacity as (n + m) * 100% unit, where
n and m are the number of network and i/o threads, respectively. Then, for
each user, we can just add up the utilization in the network and the i/o
thread. If we do this, we don't need a new type of quota.

Thanks,

Jun


On Thu, Mar 2, 2017 at 12:27 PM, Rajini Sivaram 
wrote:

> Jun,
>
> If we use request.percentage as the percentage used in a single I/O thread,
> the total percentage being allocated will be num.io.threads * 100 for I/O
> threads and num.network.threads * 100 for network threads. A single quota
> covering the two as a percentage wouldn't quite work if you want to
> allocate the same proportion in both cases. If we want to treat threads as
> separate units, won't we need two quota configurations regardless of
> whether we use units or percentage? Perhaps I misunderstood your
> suggestion.
>
> I think there are two cases:
>
>1. The use case that you mentioned where an admin is adding more users
>and decides to add more I/O threads and expects to find free quota to
>allocate for new users.
>2. Admin adds more I/O threads because the I/O threads are saturated and
>there are cores available to allocate, even though the number or
>users/clients hasn't changed.
>
> If we allocated treated I/O threads as a single unit of 100%, all user
> quotas need to be reallocated for 1). If we allocated I/O threads as n
> units with n*100%, all user quotas need to be reallocated for 2), otherwise
> some of the new threads may just not be used. Either way it should be easy
> to write a script to decrease/increase quotas by a multiple for all users.
>
> So it really boils down to which quota unit is most intuitive in terms of
> configuration. And from the discussion so far, it feels like opinion is
> divided on whether quotas should be carved out of an absolute 100% (or 1
> unit) or be relative to the number of threads (n*100% or n units).
>
>
>
> On Thu, Mar 2, 2017 at 7:31 PM, Jun Rao  wrote:
>
> > Another way to express an absolute limit is to use request.percentage,
> but
> > treat it as the percentage used in a single request handling thread. For
> > now, the request handling threads can be just the io threads. In the
> > future, they can cover the network threads as well. This is similar to
> how
> > top reports CPU usage and may be a bit easier for people to understand.
> >
> > Thanks,
> >
> > Jun
> >
> > On Fri, Feb 24, 2017 at 10:31 AM, Jun Rao  wrote:
> >
> > > Hi, Jay,
> > >
> > > 2. Regarding request.unit vs request.percentage. I started with
> > > request.percentage too. The reasoning for request.unit is the
> following.
> > > Suppose that the capacity has been reached on a broker and the admin
> > needs
> > > to add a new user. A simple way to increase the capacity is to increase
> > the
> > > number of io threads, assuming there are still enough cores. If the
> limit
> > > is based on percentage, the additional capacity automatically gets
> > > distributed to existing users and we haven't really carved out any
> > > additional resource for the new user. Now, is it easy for a user to
> > reason
> > > about 0.1 unit vs 10%. My feeling is that both are hard and have to be
> > > configured empirically. Not sure if percentage is obviously easier to
> > > reason about.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Fri, Feb 24, 2017 at 8:10 AM, Jay Kreps  wrote:
> > >
> > >> A couple of quick points:
> > >>
> > >> 1. Even though the implementation of this quota is only using io
> thread
> > >> time, i think we should call it something like "request-time". This
> will
> > >> give us flexibility to improve the implementation to cover network
> > threads
> > >> in the future and will avoid exposing internal details like our thread
> > >> pools on the server.
> > >>
> > >> 2. Jun/Roger, I get what you are trying to fix but the idea of
> > >> thread/units
> > >> is super unintuitive as a user-facing knob. I had to read the KIP like
> > >> eight times to understand this. I'm not sure that your point that
> > >> increasing the number of threads is a problem with a percentage-based
> > >> value, it really depends on whether the user thinks about the
> > "percentage
> > 

Re: [DISCUSS] KIP-124: Request rate quotas

2017-03-02 Thread Rajini Sivaram
Jun,

If we use request.percentage as the percentage used in a single I/O thread,
the total percentage being allocated will be num.io.threads * 100 for I/O
threads and num.network.threads * 100 for network threads. A single quota
covering the two as a percentage wouldn't quite work if you want to
allocate the same proportion in both cases. If we want to treat threads as
separate units, won't we need two quota configurations regardless of
whether we use units or percentage? Perhaps I misunderstood your suggestion.

I think there are two cases:

   1. The use case that you mentioned where an admin is adding more users
   and decides to add more I/O threads and expects to find free quota to
   allocate for new users.
   2. Admin adds more I/O threads because the I/O threads are saturated and
   there are cores available to allocate, even though the number or
   users/clients hasn't changed.

If we allocated treated I/O threads as a single unit of 100%, all user
quotas need to be reallocated for 1). If we allocated I/O threads as n
units with n*100%, all user quotas need to be reallocated for 2), otherwise
some of the new threads may just not be used. Either way it should be easy
to write a script to decrease/increase quotas by a multiple for all users.

So it really boils down to which quota unit is most intuitive in terms of
configuration. And from the discussion so far, it feels like opinion is
divided on whether quotas should be carved out of an absolute 100% (or 1
unit) or be relative to the number of threads (n*100% or n units).



On Thu, Mar 2, 2017 at 7:31 PM, Jun Rao  wrote:

> Another way to express an absolute limit is to use request.percentage, but
> treat it as the percentage used in a single request handling thread. For
> now, the request handling threads can be just the io threads. In the
> future, they can cover the network threads as well. This is similar to how
> top reports CPU usage and may be a bit easier for people to understand.
>
> Thanks,
>
> Jun
>
> On Fri, Feb 24, 2017 at 10:31 AM, Jun Rao  wrote:
>
> > Hi, Jay,
> >
> > 2. Regarding request.unit vs request.percentage. I started with
> > request.percentage too. The reasoning for request.unit is the following.
> > Suppose that the capacity has been reached on a broker and the admin
> needs
> > to add a new user. A simple way to increase the capacity is to increase
> the
> > number of io threads, assuming there are still enough cores. If the limit
> > is based on percentage, the additional capacity automatically gets
> > distributed to existing users and we haven't really carved out any
> > additional resource for the new user. Now, is it easy for a user to
> reason
> > about 0.1 unit vs 10%. My feeling is that both are hard and have to be
> > configured empirically. Not sure if percentage is obviously easier to
> > reason about.
> >
> > Thanks,
> >
> > Jun
> >
> > On Fri, Feb 24, 2017 at 8:10 AM, Jay Kreps  wrote:
> >
> >> A couple of quick points:
> >>
> >> 1. Even though the implementation of this quota is only using io thread
> >> time, i think we should call it something like "request-time". This will
> >> give us flexibility to improve the implementation to cover network
> threads
> >> in the future and will avoid exposing internal details like our thread
> >> pools on the server.
> >>
> >> 2. Jun/Roger, I get what you are trying to fix but the idea of
> >> thread/units
> >> is super unintuitive as a user-facing knob. I had to read the KIP like
> >> eight times to understand this. I'm not sure that your point that
> >> increasing the number of threads is a problem with a percentage-based
> >> value, it really depends on whether the user thinks about the
> "percentage
> >> of request processing time" or "thread units". If they think "I have
> >> allocated 10% of my request processing time to user x" then it is a bug
> >> that increasing the thread count decreases that percent as it does in
> the
> >> current proposal. As a practical matter I think the only way to actually
> >> reason about this is as a percent---I just don't believe people are
> going
> >> to think, "ah, 4.3 thread units, that is the right amount!". Instead I
> >> think they have to understand this thread unit concept, figure out what
> >> they have set in number of threads, compute a percent and then come up
> >> with
> >> the number of thread units, and these will all be wrong if that thread
> >> count changes. I also think this ties us to throttling the I/O thread
> >> pool,
> >> which may not be where we want to end up.
> >>
> >> 3. For what it's worth I do think having a single throttle_ms field in
> all
> >> the responses that combines all throttling from all quotas is probably
> the
> >> simplest. There could be a use case for having separate fields for each,
> >> but I think that is actually harder to use/monitor in the common case so
> >> unless someone has a use case I think just one should be 

Re: [DISCUSS] KIP-124: Request rate quotas

2017-03-02 Thread Jun Rao
Another way to express an absolute limit is to use request.percentage, but
treat it as the percentage used in a single request handling thread. For
now, the request handling threads can be just the io threads. In the
future, they can cover the network threads as well. This is similar to how
top reports CPU usage and may be a bit easier for people to understand.

Thanks,

Jun

On Fri, Feb 24, 2017 at 10:31 AM, Jun Rao  wrote:

> Hi, Jay,
>
> 2. Regarding request.unit vs request.percentage. I started with
> request.percentage too. The reasoning for request.unit is the following.
> Suppose that the capacity has been reached on a broker and the admin needs
> to add a new user. A simple way to increase the capacity is to increase the
> number of io threads, assuming there are still enough cores. If the limit
> is based on percentage, the additional capacity automatically gets
> distributed to existing users and we haven't really carved out any
> additional resource for the new user. Now, is it easy for a user to reason
> about 0.1 unit vs 10%. My feeling is that both are hard and have to be
> configured empirically. Not sure if percentage is obviously easier to
> reason about.
>
> Thanks,
>
> Jun
>
> On Fri, Feb 24, 2017 at 8:10 AM, Jay Kreps  wrote:
>
>> A couple of quick points:
>>
>> 1. Even though the implementation of this quota is only using io thread
>> time, i think we should call it something like "request-time". This will
>> give us flexibility to improve the implementation to cover network threads
>> in the future and will avoid exposing internal details like our thread
>> pools on the server.
>>
>> 2. Jun/Roger, I get what you are trying to fix but the idea of
>> thread/units
>> is super unintuitive as a user-facing knob. I had to read the KIP like
>> eight times to understand this. I'm not sure that your point that
>> increasing the number of threads is a problem with a percentage-based
>> value, it really depends on whether the user thinks about the "percentage
>> of request processing time" or "thread units". If they think "I have
>> allocated 10% of my request processing time to user x" then it is a bug
>> that increasing the thread count decreases that percent as it does in the
>> current proposal. As a practical matter I think the only way to actually
>> reason about this is as a percent---I just don't believe people are going
>> to think, "ah, 4.3 thread units, that is the right amount!". Instead I
>> think they have to understand this thread unit concept, figure out what
>> they have set in number of threads, compute a percent and then come up
>> with
>> the number of thread units, and these will all be wrong if that thread
>> count changes. I also think this ties us to throttling the I/O thread
>> pool,
>> which may not be where we want to end up.
>>
>> 3. For what it's worth I do think having a single throttle_ms field in all
>> the responses that combines all throttling from all quotas is probably the
>> simplest. There could be a use case for having separate fields for each,
>> but I think that is actually harder to use/monitor in the common case so
>> unless someone has a use case I think just one should be fine.
>>
>> -Jay
>>
>> On Fri, Feb 24, 2017 at 4:21 AM, Rajini Sivaram 
>> wrote:
>>
>> > I have updated the KIP based on the discussions so far.
>> >
>> >
>> > Regards,
>> >
>> > Rajini
>> >
>> > On Thu, Feb 23, 2017 at 11:29 PM, Rajini Sivaram <
>> rajinisiva...@gmail.com>
>> > wrote:
>> >
>> > > Thank you all for the feedback.
>> > >
>> > > Ismael #1. It makes sense not to throttle inter-broker requests like
>> > > LeaderAndIsr etc. The simplest way to ensure that clients cannot use
>> > these
>> > > requests to bypass quotas for DoS attacks is to ensure that ACLs
>> prevent
>> > > clients from using these requests and unauthorized requests are
>> included
>> > > towards quotas.
>> > >
>> > > Ismael #2, Jay #1 : I was thinking that these quotas can return a
>> > separate
>> > > throttle time, and all utilization based quotas could use the same
>> field
>> > > (we won't add another one for network thread utilization for
>> instance).
>> > But
>> > > perhaps it makes sense to keep byte rate quotas separate in
>> produce/fetch
>> > > responses to provide separate metrics? Agree with Ismael that the
>> name of
>> > > the existing field should be changed if we have two. Happy to switch
>> to a
>> > > single combined throttle time if that is sufficient.
>> > >
>> > > Ismael #4, #5, #6: Will update KIP. Will use dot separated name for
>> new
>> > > property. Replication quotas use dot separated, so it will be
>> consistent
>> > > with all properties except byte rate quotas.
>> > >
>> > > Radai: #1 Request processing time rather than request rate were chosen
>> > > because the time per request can vary significantly between requests
>> as
>> > > mentioned in the discussion and KIP.
>> > > #2 Two separate quotas for 

Re: [DISCUSS] KIP-124: Request rate quotas

2017-03-01 Thread Colin McCabe
That makes sense.  I didn't see that this field already existed in some
of the replies-- good clarification.

best,


On Wed, Mar 1, 2017, at 05:41, Rajini Sivaram wrote:
> Colin,
> 
> Thank you for the feedback. Since we are reusing the existing
> throttle_time_ms field for produce/fetch responses, changing this to
> microseconds would be a breaking change. Since we don't currently plan to
> throttle at sub-millisecond intervals, perhaps it makes sense to keep the
> value consistent with the existing responses (and metrics which expose
> this
> value) and change them all together in future if required?
> 
> Regards,
> 
> Rajini
> 
> On Tue, Feb 28, 2017 at 5:58 PM, Colin McCabe  wrote:
> 
> > I noticed that the throttle_time_ms added to all the message responses
> > is in milliseconds.  Does it make sense to express this in microseconds
> > in case we start doing more fine-grained CPU throttling later on?  An
> > int32 should still be more than enough if using microseconds.
> >
> > best,
> > Colin
> >
> >
> > On Fri, Feb 24, 2017, at 10:31, Jun Rao wrote:
> > > Hi, Jay,
> > >
> > > 2. Regarding request.unit vs request.percentage. I started with
> > > request.percentage too. The reasoning for request.unit is the following.
> > > Suppose that the capacity has been reached on a broker and the admin
> > > needs
> > > to add a new user. A simple way to increase the capacity is to increase
> > > the
> > > number of io threads, assuming there are still enough cores. If the limit
> > > is based on percentage, the additional capacity automatically gets
> > > distributed to existing users and we haven't really carved out any
> > > additional resource for the new user. Now, is it easy for a user to
> > > reason
> > > about 0.1 unit vs 10%. My feeling is that both are hard and have to be
> > > configured empirically. Not sure if percentage is obviously easier to
> > > reason about.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Fri, Feb 24, 2017 at 8:10 AM, Jay Kreps  wrote:
> > >
> > > > A couple of quick points:
> > > >
> > > > 1. Even though the implementation of this quota is only using io thread
> > > > time, i think we should call it something like "request-time". This
> > will
> > > > give us flexibility to improve the implementation to cover network
> > threads
> > > > in the future and will avoid exposing internal details like our thread
> > > > pools on the server.
> > > >
> > > > 2. Jun/Roger, I get what you are trying to fix but the idea of
> > thread/units
> > > > is super unintuitive as a user-facing knob. I had to read the KIP like
> > > > eight times to understand this. I'm not sure that your point that
> > > > increasing the number of threads is a problem with a percentage-based
> > > > value, it really depends on whether the user thinks about the
> > "percentage
> > > > of request processing time" or "thread units". If they think "I have
> > > > allocated 10% of my request processing time to user x" then it is a bug
> > > > that increasing the thread count decreases that percent as it does in
> > the
> > > > current proposal. As a practical matter I think the only way to
> > actually
> > > > reason about this is as a percent---I just don't believe people are
> > going
> > > > to think, "ah, 4.3 thread units, that is the right amount!". Instead I
> > > > think they have to understand this thread unit concept, figure out what
> > > > they have set in number of threads, compute a percent and then come up
> > with
> > > > the number of thread units, and these will all be wrong if that thread
> > > > count changes. I also think this ties us to throttling the I/O thread
> > pool,
> > > > which may not be where we want to end up.
> > > >
> > > > 3. For what it's worth I do think having a single throttle_ms field in
> > all
> > > > the responses that combines all throttling from all quotas is probably
> > the
> > > > simplest. There could be a use case for having separate fields for
> > each,
> > > > but I think that is actually harder to use/monitor in the common case
> > so
> > > > unless someone has a use case I think just one should be fine.
> > > >
> > > > -Jay
> > > >
> > > > On Fri, Feb 24, 2017 at 4:21 AM, Rajini Sivaram <
> > rajinisiva...@gmail.com>
> > > > wrote:
> > > >
> > > > > I have updated the KIP based on the discussions so far.
> > > > >
> > > > >
> > > > > Regards,
> > > > >
> > > > > Rajini
> > > > >
> > > > > On Thu, Feb 23, 2017 at 11:29 PM, Rajini Sivaram <
> > > > rajinisiva...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Thank you all for the feedback.
> > > > > >
> > > > > > Ismael #1. It makes sense not to throttle inter-broker requests
> > like
> > > > > > LeaderAndIsr etc. The simplest way to ensure that clients cannot
> > use
> > > > > these
> > > > > > requests to bypass quotas for DoS attacks is to ensure that ACLs
> > > > prevent
> > > > > > clients from using these requests and unauthorized requests are
> > > > included
> > 

Re: [DISCUSS] KIP-124: Request rate quotas

2017-03-01 Thread Rajini Sivaram
Colin,

Thank you for the feedback. Since we are reusing the existing
throttle_time_ms field for produce/fetch responses, changing this to
microseconds would be a breaking change. Since we don't currently plan to
throttle at sub-millisecond intervals, perhaps it makes sense to keep the
value consistent with the existing responses (and metrics which expose this
value) and change them all together in future if required?

Regards,

Rajini

On Tue, Feb 28, 2017 at 5:58 PM, Colin McCabe  wrote:

> I noticed that the throttle_time_ms added to all the message responses
> is in milliseconds.  Does it make sense to express this in microseconds
> in case we start doing more fine-grained CPU throttling later on?  An
> int32 should still be more than enough if using microseconds.
>
> best,
> Colin
>
>
> On Fri, Feb 24, 2017, at 10:31, Jun Rao wrote:
> > Hi, Jay,
> >
> > 2. Regarding request.unit vs request.percentage. I started with
> > request.percentage too. The reasoning for request.unit is the following.
> > Suppose that the capacity has been reached on a broker and the admin
> > needs
> > to add a new user. A simple way to increase the capacity is to increase
> > the
> > number of io threads, assuming there are still enough cores. If the limit
> > is based on percentage, the additional capacity automatically gets
> > distributed to existing users and we haven't really carved out any
> > additional resource for the new user. Now, is it easy for a user to
> > reason
> > about 0.1 unit vs 10%. My feeling is that both are hard and have to be
> > configured empirically. Not sure if percentage is obviously easier to
> > reason about.
> >
> > Thanks,
> >
> > Jun
> >
> > On Fri, Feb 24, 2017 at 8:10 AM, Jay Kreps  wrote:
> >
> > > A couple of quick points:
> > >
> > > 1. Even though the implementation of this quota is only using io thread
> > > time, i think we should call it something like "request-time". This
> will
> > > give us flexibility to improve the implementation to cover network
> threads
> > > in the future and will avoid exposing internal details like our thread
> > > pools on the server.
> > >
> > > 2. Jun/Roger, I get what you are trying to fix but the idea of
> thread/units
> > > is super unintuitive as a user-facing knob. I had to read the KIP like
> > > eight times to understand this. I'm not sure that your point that
> > > increasing the number of threads is a problem with a percentage-based
> > > value, it really depends on whether the user thinks about the
> "percentage
> > > of request processing time" or "thread units". If they think "I have
> > > allocated 10% of my request processing time to user x" then it is a bug
> > > that increasing the thread count decreases that percent as it does in
> the
> > > current proposal. As a practical matter I think the only way to
> actually
> > > reason about this is as a percent---I just don't believe people are
> going
> > > to think, "ah, 4.3 thread units, that is the right amount!". Instead I
> > > think they have to understand this thread unit concept, figure out what
> > > they have set in number of threads, compute a percent and then come up
> with
> > > the number of thread units, and these will all be wrong if that thread
> > > count changes. I also think this ties us to throttling the I/O thread
> pool,
> > > which may not be where we want to end up.
> > >
> > > 3. For what it's worth I do think having a single throttle_ms field in
> all
> > > the responses that combines all throttling from all quotas is probably
> the
> > > simplest. There could be a use case for having separate fields for
> each,
> > > but I think that is actually harder to use/monitor in the common case
> so
> > > unless someone has a use case I think just one should be fine.
> > >
> > > -Jay
> > >
> > > On Fri, Feb 24, 2017 at 4:21 AM, Rajini Sivaram <
> rajinisiva...@gmail.com>
> > > wrote:
> > >
> > > > I have updated the KIP based on the discussions so far.
> > > >
> > > >
> > > > Regards,
> > > >
> > > > Rajini
> > > >
> > > > On Thu, Feb 23, 2017 at 11:29 PM, Rajini Sivaram <
> > > rajinisiva...@gmail.com>
> > > > wrote:
> > > >
> > > > > Thank you all for the feedback.
> > > > >
> > > > > Ismael #1. It makes sense not to throttle inter-broker requests
> like
> > > > > LeaderAndIsr etc. The simplest way to ensure that clients cannot
> use
> > > > these
> > > > > requests to bypass quotas for DoS attacks is to ensure that ACLs
> > > prevent
> > > > > clients from using these requests and unauthorized requests are
> > > included
> > > > > towards quotas.
> > > > >
> > > > > Ismael #2, Jay #1 : I was thinking that these quotas can return a
> > > > separate
> > > > > throttle time, and all utilization based quotas could use the same
> > > field
> > > > > (we won't add another one for network thread utilization for
> instance).
> > > > But
> > > > > perhaps it makes sense to keep byte rate quotas separate in
> > > produce/fetch
> > > > > 

Re: [DISCUSS] KIP-124: Request rate quotas

2017-02-28 Thread Colin McCabe
I noticed that the throttle_time_ms added to all the message responses
is in milliseconds.  Does it make sense to express this in microseconds
in case we start doing more fine-grained CPU throttling later on?  An
int32 should still be more than enough if using microseconds.

best,
Colin


On Fri, Feb 24, 2017, at 10:31, Jun Rao wrote:
> Hi, Jay,
> 
> 2. Regarding request.unit vs request.percentage. I started with
> request.percentage too. The reasoning for request.unit is the following.
> Suppose that the capacity has been reached on a broker and the admin
> needs
> to add a new user. A simple way to increase the capacity is to increase
> the
> number of io threads, assuming there are still enough cores. If the limit
> is based on percentage, the additional capacity automatically gets
> distributed to existing users and we haven't really carved out any
> additional resource for the new user. Now, is it easy for a user to
> reason
> about 0.1 unit vs 10%. My feeling is that both are hard and have to be
> configured empirically. Not sure if percentage is obviously easier to
> reason about.
> 
> Thanks,
> 
> Jun
> 
> On Fri, Feb 24, 2017 at 8:10 AM, Jay Kreps  wrote:
> 
> > A couple of quick points:
> >
> > 1. Even though the implementation of this quota is only using io thread
> > time, i think we should call it something like "request-time". This will
> > give us flexibility to improve the implementation to cover network threads
> > in the future and will avoid exposing internal details like our thread
> > pools on the server.
> >
> > 2. Jun/Roger, I get what you are trying to fix but the idea of thread/units
> > is super unintuitive as a user-facing knob. I had to read the KIP like
> > eight times to understand this. I'm not sure that your point that
> > increasing the number of threads is a problem with a percentage-based
> > value, it really depends on whether the user thinks about the "percentage
> > of request processing time" or "thread units". If they think "I have
> > allocated 10% of my request processing time to user x" then it is a bug
> > that increasing the thread count decreases that percent as it does in the
> > current proposal. As a practical matter I think the only way to actually
> > reason about this is as a percent---I just don't believe people are going
> > to think, "ah, 4.3 thread units, that is the right amount!". Instead I
> > think they have to understand this thread unit concept, figure out what
> > they have set in number of threads, compute a percent and then come up with
> > the number of thread units, and these will all be wrong if that thread
> > count changes. I also think this ties us to throttling the I/O thread pool,
> > which may not be where we want to end up.
> >
> > 3. For what it's worth I do think having a single throttle_ms field in all
> > the responses that combines all throttling from all quotas is probably the
> > simplest. There could be a use case for having separate fields for each,
> > but I think that is actually harder to use/monitor in the common case so
> > unless someone has a use case I think just one should be fine.
> >
> > -Jay
> >
> > On Fri, Feb 24, 2017 at 4:21 AM, Rajini Sivaram 
> > wrote:
> >
> > > I have updated the KIP based on the discussions so far.
> > >
> > >
> > > Regards,
> > >
> > > Rajini
> > >
> > > On Thu, Feb 23, 2017 at 11:29 PM, Rajini Sivaram <
> > rajinisiva...@gmail.com>
> > > wrote:
> > >
> > > > Thank you all for the feedback.
> > > >
> > > > Ismael #1. It makes sense not to throttle inter-broker requests like
> > > > LeaderAndIsr etc. The simplest way to ensure that clients cannot use
> > > these
> > > > requests to bypass quotas for DoS attacks is to ensure that ACLs
> > prevent
> > > > clients from using these requests and unauthorized requests are
> > included
> > > > towards quotas.
> > > >
> > > > Ismael #2, Jay #1 : I was thinking that these quotas can return a
> > > separate
> > > > throttle time, and all utilization based quotas could use the same
> > field
> > > > (we won't add another one for network thread utilization for instance).
> > > But
> > > > perhaps it makes sense to keep byte rate quotas separate in
> > produce/fetch
> > > > responses to provide separate metrics? Agree with Ismael that the name
> > of
> > > > the existing field should be changed if we have two. Happy to switch
> > to a
> > > > single combined throttle time if that is sufficient.
> > > >
> > > > Ismael #4, #5, #6: Will update KIP. Will use dot separated name for new
> > > > property. Replication quotas use dot separated, so it will be
> > consistent
> > > > with all properties except byte rate quotas.
> > > >
> > > > Radai: #1 Request processing time rather than request rate were chosen
> > > > because the time per request can vary significantly between requests as
> > > > mentioned in the discussion and KIP.
> > > > #2 Two separate quotas for heartbeats/regular requests feel like more
> > > 

Re: [DISCUSS] KIP-124: Request rate quotas

2017-02-24 Thread Jun Rao
Hi, Jay,

2. Regarding request.unit vs request.percentage. I started with
request.percentage too. The reasoning for request.unit is the following.
Suppose that the capacity has been reached on a broker and the admin needs
to add a new user. A simple way to increase the capacity is to increase the
number of io threads, assuming there are still enough cores. If the limit
is based on percentage, the additional capacity automatically gets
distributed to existing users and we haven't really carved out any
additional resource for the new user. Now, is it easy for a user to reason
about 0.1 unit vs 10%. My feeling is that both are hard and have to be
configured empirically. Not sure if percentage is obviously easier to
reason about.

Thanks,

Jun

On Fri, Feb 24, 2017 at 8:10 AM, Jay Kreps  wrote:

> A couple of quick points:
>
> 1. Even though the implementation of this quota is only using io thread
> time, i think we should call it something like "request-time". This will
> give us flexibility to improve the implementation to cover network threads
> in the future and will avoid exposing internal details like our thread
> pools on the server.
>
> 2. Jun/Roger, I get what you are trying to fix but the idea of thread/units
> is super unintuitive as a user-facing knob. I had to read the KIP like
> eight times to understand this. I'm not sure that your point that
> increasing the number of threads is a problem with a percentage-based
> value, it really depends on whether the user thinks about the "percentage
> of request processing time" or "thread units". If they think "I have
> allocated 10% of my request processing time to user x" then it is a bug
> that increasing the thread count decreases that percent as it does in the
> current proposal. As a practical matter I think the only way to actually
> reason about this is as a percent---I just don't believe people are going
> to think, "ah, 4.3 thread units, that is the right amount!". Instead I
> think they have to understand this thread unit concept, figure out what
> they have set in number of threads, compute a percent and then come up with
> the number of thread units, and these will all be wrong if that thread
> count changes. I also think this ties us to throttling the I/O thread pool,
> which may not be where we want to end up.
>
> 3. For what it's worth I do think having a single throttle_ms field in all
> the responses that combines all throttling from all quotas is probably the
> simplest. There could be a use case for having separate fields for each,
> but I think that is actually harder to use/monitor in the common case so
> unless someone has a use case I think just one should be fine.
>
> -Jay
>
> On Fri, Feb 24, 2017 at 4:21 AM, Rajini Sivaram 
> wrote:
>
> > I have updated the KIP based on the discussions so far.
> >
> >
> > Regards,
> >
> > Rajini
> >
> > On Thu, Feb 23, 2017 at 11:29 PM, Rajini Sivaram <
> rajinisiva...@gmail.com>
> > wrote:
> >
> > > Thank you all for the feedback.
> > >
> > > Ismael #1. It makes sense not to throttle inter-broker requests like
> > > LeaderAndIsr etc. The simplest way to ensure that clients cannot use
> > these
> > > requests to bypass quotas for DoS attacks is to ensure that ACLs
> prevent
> > > clients from using these requests and unauthorized requests are
> included
> > > towards quotas.
> > >
> > > Ismael #2, Jay #1 : I was thinking that these quotas can return a
> > separate
> > > throttle time, and all utilization based quotas could use the same
> field
> > > (we won't add another one for network thread utilization for instance).
> > But
> > > perhaps it makes sense to keep byte rate quotas separate in
> produce/fetch
> > > responses to provide separate metrics? Agree with Ismael that the name
> of
> > > the existing field should be changed if we have two. Happy to switch
> to a
> > > single combined throttle time if that is sufficient.
> > >
> > > Ismael #4, #5, #6: Will update KIP. Will use dot separated name for new
> > > property. Replication quotas use dot separated, so it will be
> consistent
> > > with all properties except byte rate quotas.
> > >
> > > Radai: #1 Request processing time rather than request rate were chosen
> > > because the time per request can vary significantly between requests as
> > > mentioned in the discussion and KIP.
> > > #2 Two separate quotas for heartbeats/regular requests feel like more
> > > configuration and more metrics. Since most users would set quotas
> higher
> > > than the expected usage and quotas are more of a safety net, a single
> > quota
> > > should work in most cases.
> > >  #3 The number of requests in purgatory is limited by the number of
> > active
> > > connections since only one request per connection will be throttled at
> a
> > > time.
> > > #4 As with byte rate quotas, to use the full allocated quotas,
> > > clients/users would need to use partitions that are distributed across
> > the
> > > cluster. The 

Re: [DISCUSS] KIP-124: Request rate quotas

2017-02-24 Thread Rajini Sivaram
Thanks, Jay.

*(1) *The rename from *request.time*.percent to* io.thread*.units for the
quota configuration was based on the change from percent to thread-units,
since we will need different quota configuration for I/O threads and
network threads if we use units. If we agree that *(2)* percent (or ratio)
is a better configuration, then the name can be request,time.percent, with
the same config applying to both request thread utilization and network
thread utilization. Metrics and sensors on the brokers-side will probably
need to be separate for I/O and network threads so that these can be
accounted separately (5% request.time.percent would mean maximum 5% of
request thread utilization and maximum 5% of network thread utilization
with either violation leading to throttling).

*(3)* Agree - KIP reflects combined throttling time in a single field in
the response.


On Fri, Feb 24, 2017 at 4:10 PM, Jay Kreps  wrote:

> A couple of quick points:
>
> 1. Even though the implementation of this quota is only using io thread
> time, i think we should call it something like "request-time". This will
> give us flexibility to improve the implementation to cover network threads
> in the future and will avoid exposing internal details like our thread
> pools on the server.
>
> 2. Jun/Roger, I get what you are trying to fix but the idea of thread/units
> is super unintuitive as a user-facing knob. I had to read the KIP like
> eight times to understand this. I'm not sure that your point that
> increasing the number of threads is a problem with a percentage-based
> value, it really depends on whether the user thinks about the "percentage
> of request processing time" or "thread units". If they think "I have
> allocated 10% of my request processing time to user x" then it is a bug
> that increasing the thread count decreases that percent as it does in the
> current proposal. As a practical matter I think the only way to actually
> reason about this is as a percent---I just don't believe people are going
> to think, "ah, 4.3 thread units, that is the right amount!". Instead I
> think they have to understand this thread unit concept, figure out what
> they have set in number of threads, compute a percent and then come up with
> the number of thread units, and these will all be wrong if that thread
> count changes. I also think this ties us to throttling the I/O thread pool,
> which may not be where we want to end up.
>
> 3. For what it's worth I do think having a single throttle_ms field in all
> the responses that combines all throttling from all quotas is probably the
> simplest. There could be a use case for having separate fields for each,
> but I think that is actually harder to use/monitor in the common case so
> unless someone has a use case I think just one should be fine.
>
> -Jay
>
> On Fri, Feb 24, 2017 at 4:21 AM, Rajini Sivaram 
> wrote:
>
> > I have updated the KIP based on the discussions so far.
> >
> >
> > Regards,
> >
> > Rajini
> >
> > On Thu, Feb 23, 2017 at 11:29 PM, Rajini Sivaram <
> rajinisiva...@gmail.com>
> > wrote:
> >
> > > Thank you all for the feedback.
> > >
> > > Ismael #1. It makes sense not to throttle inter-broker requests like
> > > LeaderAndIsr etc. The simplest way to ensure that clients cannot use
> > these
> > > requests to bypass quotas for DoS attacks is to ensure that ACLs
> prevent
> > > clients from using these requests and unauthorized requests are
> included
> > > towards quotas.
> > >
> > > Ismael #2, Jay #1 : I was thinking that these quotas can return a
> > separate
> > > throttle time, and all utilization based quotas could use the same
> field
> > > (we won't add another one for network thread utilization for instance).
> > But
> > > perhaps it makes sense to keep byte rate quotas separate in
> produce/fetch
> > > responses to provide separate metrics? Agree with Ismael that the name
> of
> > > the existing field should be changed if we have two. Happy to switch
> to a
> > > single combined throttle time if that is sufficient.
> > >
> > > Ismael #4, #5, #6: Will update KIP. Will use dot separated name for new
> > > property. Replication quotas use dot separated, so it will be
> consistent
> > > with all properties except byte rate quotas.
> > >
> > > Radai: #1 Request processing time rather than request rate were chosen
> > > because the time per request can vary significantly between requests as
> > > mentioned in the discussion and KIP.
> > > #2 Two separate quotas for heartbeats/regular requests feel like more
> > > configuration and more metrics. Since most users would set quotas
> higher
> > > than the expected usage and quotas are more of a safety net, a single
> > quota
> > > should work in most cases.
> > >  #3 The number of requests in purgatory is limited by the number of
> > active
> > > connections since only one request per connection will be throttled at
> a
> > > time.
> > > #4 As with byte rate quotas, to use the 

Re: [DISCUSS] KIP-124: Request rate quotas

2017-02-24 Thread Jay Kreps
A couple of quick points:

1. Even though the implementation of this quota is only using io thread
time, i think we should call it something like "request-time". This will
give us flexibility to improve the implementation to cover network threads
in the future and will avoid exposing internal details like our thread
pools on the server.

2. Jun/Roger, I get what you are trying to fix but the idea of thread/units
is super unintuitive as a user-facing knob. I had to read the KIP like
eight times to understand this. I'm not sure that your point that
increasing the number of threads is a problem with a percentage-based
value, it really depends on whether the user thinks about the "percentage
of request processing time" or "thread units". If they think "I have
allocated 10% of my request processing time to user x" then it is a bug
that increasing the thread count decreases that percent as it does in the
current proposal. As a practical matter I think the only way to actually
reason about this is as a percent---I just don't believe people are going
to think, "ah, 4.3 thread units, that is the right amount!". Instead I
think they have to understand this thread unit concept, figure out what
they have set in number of threads, compute a percent and then come up with
the number of thread units, and these will all be wrong if that thread
count changes. I also think this ties us to throttling the I/O thread pool,
which may not be where we want to end up.

3. For what it's worth I do think having a single throttle_ms field in all
the responses that combines all throttling from all quotas is probably the
simplest. There could be a use case for having separate fields for each,
but I think that is actually harder to use/monitor in the common case so
unless someone has a use case I think just one should be fine.

-Jay

On Fri, Feb 24, 2017 at 4:21 AM, Rajini Sivaram 
wrote:

> I have updated the KIP based on the discussions so far.
>
>
> Regards,
>
> Rajini
>
> On Thu, Feb 23, 2017 at 11:29 PM, Rajini Sivaram 
> wrote:
>
> > Thank you all for the feedback.
> >
> > Ismael #1. It makes sense not to throttle inter-broker requests like
> > LeaderAndIsr etc. The simplest way to ensure that clients cannot use
> these
> > requests to bypass quotas for DoS attacks is to ensure that ACLs prevent
> > clients from using these requests and unauthorized requests are included
> > towards quotas.
> >
> > Ismael #2, Jay #1 : I was thinking that these quotas can return a
> separate
> > throttle time, and all utilization based quotas could use the same field
> > (we won't add another one for network thread utilization for instance).
> But
> > perhaps it makes sense to keep byte rate quotas separate in produce/fetch
> > responses to provide separate metrics? Agree with Ismael that the name of
> > the existing field should be changed if we have two. Happy to switch to a
> > single combined throttle time if that is sufficient.
> >
> > Ismael #4, #5, #6: Will update KIP. Will use dot separated name for new
> > property. Replication quotas use dot separated, so it will be consistent
> > with all properties except byte rate quotas.
> >
> > Radai: #1 Request processing time rather than request rate were chosen
> > because the time per request can vary significantly between requests as
> > mentioned in the discussion and KIP.
> > #2 Two separate quotas for heartbeats/regular requests feel like more
> > configuration and more metrics. Since most users would set quotas higher
> > than the expected usage and quotas are more of a safety net, a single
> quota
> > should work in most cases.
> >  #3 The number of requests in purgatory is limited by the number of
> active
> > connections since only one request per connection will be throttled at a
> > time.
> > #4 As with byte rate quotas, to use the full allocated quotas,
> > clients/users would need to use partitions that are distributed across
> the
> > cluster. The alternative of using cluster-wide quotas instead of
> per-broker
> > quotas would be far too complex to implement.
> >
> > Dong : We currently have two ClientQuotaManagers for quota types Fetch
> and
> > Produce. A new one will be added for IOThread, which manages quotas for
> I/O
> > thread utilization. This will not update the Fetch or Produce queue-size,
> > but will have a separate metric for the queue-size.  I wasn't planning to
> > add any additional metrics apart from the equivalent ones for existing
> > quotas as part of this KIP. Ratio of byte-rate to I/O thread utilization
> > could be slightly misleading since it depends on the sequence of
> requests.
> > But we can look into more metrics after the KIP is implemented if
> required.
> >
> > I think we need to limit the maximum delay since all requests are
> > throttled. If a client has a quota of 0.001 units and a single request
> used
> > 50ms, we don't want to delay all requests from the client by 50 seconds,
> > throwing the 

Re: [DISCUSS] KIP-124: Request rate quotas

2017-02-24 Thread Rajini Sivaram
I have updated the KIP based on the discussions so far.


Regards,

Rajini

On Thu, Feb 23, 2017 at 11:29 PM, Rajini Sivaram 
wrote:

> Thank you all for the feedback.
>
> Ismael #1. It makes sense not to throttle inter-broker requests like
> LeaderAndIsr etc. The simplest way to ensure that clients cannot use these
> requests to bypass quotas for DoS attacks is to ensure that ACLs prevent
> clients from using these requests and unauthorized requests are included
> towards quotas.
>
> Ismael #2, Jay #1 : I was thinking that these quotas can return a separate
> throttle time, and all utilization based quotas could use the same field
> (we won't add another one for network thread utilization for instance). But
> perhaps it makes sense to keep byte rate quotas separate in produce/fetch
> responses to provide separate metrics? Agree with Ismael that the name of
> the existing field should be changed if we have two. Happy to switch to a
> single combined throttle time if that is sufficient.
>
> Ismael #4, #5, #6: Will update KIP. Will use dot separated name for new
> property. Replication quotas use dot separated, so it will be consistent
> with all properties except byte rate quotas.
>
> Radai: #1 Request processing time rather than request rate were chosen
> because the time per request can vary significantly between requests as
> mentioned in the discussion and KIP.
> #2 Two separate quotas for heartbeats/regular requests feel like more
> configuration and more metrics. Since most users would set quotas higher
> than the expected usage and quotas are more of a safety net, a single quota
> should work in most cases.
>  #3 The number of requests in purgatory is limited by the number of active
> connections since only one request per connection will be throttled at a
> time.
> #4 As with byte rate quotas, to use the full allocated quotas,
> clients/users would need to use partitions that are distributed across the
> cluster. The alternative of using cluster-wide quotas instead of per-broker
> quotas would be far too complex to implement.
>
> Dong : We currently have two ClientQuotaManagers for quota types Fetch and
> Produce. A new one will be added for IOThread, which manages quotas for I/O
> thread utilization. This will not update the Fetch or Produce queue-size,
> but will have a separate metric for the queue-size.  I wasn't planning to
> add any additional metrics apart from the equivalent ones for existing
> quotas as part of this KIP. Ratio of byte-rate to I/O thread utilization
> could be slightly misleading since it depends on the sequence of requests.
> But we can look into more metrics after the KIP is implemented if required.
>
> I think we need to limit the maximum delay since all requests are
> throttled. If a client has a quota of 0.001 units and a single request used
> 50ms, we don't want to delay all requests from the client by 50 seconds,
> throwing the client out of all its consumer groups. The issue is only if a
> user is allocated a quota that is insufficient to process one large
> request. The expectation is that the units allocated per user will be much
> higher than the time taken to process one request and the limit should
> seldom be applied. Agree this needs proper documentation.
>
> Regards,
>
> Rajini
>
>
> On Thu, Feb 23, 2017 at 8:04 PM, radai  wrote:
>
>> @jun: i wasnt concerned about tying up a request processing thread, but
>> IIUC the code does still read the entire request out, which might add-up
>> to
>> a non-negligible amount of memory.
>>
>> On Thu, Feb 23, 2017 at 11:55 AM, Dong Lin  wrote:
>>
>> > Hey Rajini,
>> >
>> > The current KIP says that the maximum delay will be reduced to window
>> size
>> > if it is larger than the window size. I have a concern with this:
>> >
>> > 1) This essentially means that the user is allowed to exceed their quota
>> > over a long period of time. Can you provide an upper bound on this
>> > deviation?
>> >
>> > 2) What is the motivation for cap the maximum delay by the window size?
>> I
>> > am wondering if there is better alternative to address the problem.
>> >
>> > 3) It means that the existing metric-related config will have a more
>> > directly impact on the mechanism of this io-thread-unit-based quota. The
>> > may be an important change depending on the answer to 1) above. We
>> probably
>> > need to document this more explicitly.
>> >
>> > Dong
>> >
>> >
>> > On Thu, Feb 23, 2017 at 10:56 AM, Dong Lin  wrote:
>> >
>> > > Hey Jun,
>> > >
>> > > Yeah you are right. I thought it wasn't because at LinkedIn it will be
>> > too
>> > > much pressure on inGraph to expose those per-clientId metrics so we
>> ended
>> > > up printing them periodically to local log. Never mind if it is not a
>> > > general problem.
>> > >
>> > > Hey Rajini,
>> > >
>> > > - I agree with Jay that we probably don't want to add a new field for
>> > > every quota 

Re: [DISCUSS] KIP-124: Request rate quotas

2017-02-23 Thread Rajini Sivaram
Thank you all for the feedback.

Ismael #1. It makes sense not to throttle inter-broker requests like
LeaderAndIsr etc. The simplest way to ensure that clients cannot use these
requests to bypass quotas for DoS attacks is to ensure that ACLs prevent
clients from using these requests and unauthorized requests are included
towards quotas.

Ismael #2, Jay #1 : I was thinking that these quotas can return a separate
throttle time, and all utilization based quotas could use the same field
(we won't add another one for network thread utilization for instance). But
perhaps it makes sense to keep byte rate quotas separate in produce/fetch
responses to provide separate metrics? Agree with Ismael that the name of
the existing field should be changed if we have two. Happy to switch to a
single combined throttle time if that is sufficient.

Ismael #4, #5, #6: Will update KIP. Will use dot separated name for new
property. Replication quotas use dot separated, so it will be consistent
with all properties except byte rate quotas.

Radai: #1 Request processing time rather than request rate were chosen
because the time per request can vary significantly between requests as
mentioned in the discussion and KIP.
#2 Two separate quotas for heartbeats/regular requests feel like more
configuration and more metrics. Since most users would set quotas higher
than the expected usage and quotas are more of a safety net, a single quota
should work in most cases.
 #3 The number of requests in purgatory is limited by the number of active
connections since only one request per connection will be throttled at a
time.
#4 As with byte rate quotas, to use the full allocated quotas,
clients/users would need to use partitions that are distributed across the
cluster. The alternative of using cluster-wide quotas instead of per-broker
quotas would be far too complex to implement.

Dong : We currently have two ClientQuotaManagers for quota types Fetch and
Produce. A new one will be added for IOThread, which manages quotas for I/O
thread utilization. This will not update the Fetch or Produce queue-size,
but will have a separate metric for the queue-size.  I wasn't planning to
add any additional metrics apart from the equivalent ones for existing
quotas as part of this KIP. Ratio of byte-rate to I/O thread utilization
could be slightly misleading since it depends on the sequence of requests.
But we can look into more metrics after the KIP is implemented if required.

I think we need to limit the maximum delay since all requests are
throttled. If a client has a quota of 0.001 units and a single request used
50ms, we don't want to delay all requests from the client by 50 seconds,
throwing the client out of all its consumer groups. The issue is only if a
user is allocated a quota that is insufficient to process one large
request. The expectation is that the units allocated per user will be much
higher than the time taken to process one request and the limit should
seldom be applied. Agree this needs proper documentation.

Regards,

Rajini


On Thu, Feb 23, 2017 at 8:04 PM, radai  wrote:

> @jun: i wasnt concerned about tying up a request processing thread, but
> IIUC the code does still read the entire request out, which might add-up to
> a non-negligible amount of memory.
>
> On Thu, Feb 23, 2017 at 11:55 AM, Dong Lin  wrote:
>
> > Hey Rajini,
> >
> > The current KIP says that the maximum delay will be reduced to window
> size
> > if it is larger than the window size. I have a concern with this:
> >
> > 1) This essentially means that the user is allowed to exceed their quota
> > over a long period of time. Can you provide an upper bound on this
> > deviation?
> >
> > 2) What is the motivation for cap the maximum delay by the window size? I
> > am wondering if there is better alternative to address the problem.
> >
> > 3) It means that the existing metric-related config will have a more
> > directly impact on the mechanism of this io-thread-unit-based quota. The
> > may be an important change depending on the answer to 1) above. We
> probably
> > need to document this more explicitly.
> >
> > Dong
> >
> >
> > On Thu, Feb 23, 2017 at 10:56 AM, Dong Lin  wrote:
> >
> > > Hey Jun,
> > >
> > > Yeah you are right. I thought it wasn't because at LinkedIn it will be
> > too
> > > much pressure on inGraph to expose those per-clientId metrics so we
> ended
> > > up printing them periodically to local log. Never mind if it is not a
> > > general problem.
> > >
> > > Hey Rajini,
> > >
> > > - I agree with Jay that we probably don't want to add a new field for
> > > every quota ProduceResponse or FetchResponse. Is there any use-case for
> > > having separate throttle-time fields for byte-rate-quota and
> > > io-thread-unit-quota? You probably need to document this as interface
> > > change if you plan to add new field in any request.
> > >
> > > - I don't think IOThread belongs to quotaType. The 

Re: [DISCUSS] KIP-124: Request rate quotas

2017-02-23 Thread radai
@jun: i wasnt concerned about tying up a request processing thread, but
IIUC the code does still read the entire request out, which might add-up to
a non-negligible amount of memory.

On Thu, Feb 23, 2017 at 11:55 AM, Dong Lin  wrote:

> Hey Rajini,
>
> The current KIP says that the maximum delay will be reduced to window size
> if it is larger than the window size. I have a concern with this:
>
> 1) This essentially means that the user is allowed to exceed their quota
> over a long period of time. Can you provide an upper bound on this
> deviation?
>
> 2) What is the motivation for cap the maximum delay by the window size? I
> am wondering if there is better alternative to address the problem.
>
> 3) It means that the existing metric-related config will have a more
> directly impact on the mechanism of this io-thread-unit-based quota. The
> may be an important change depending on the answer to 1) above. We probably
> need to document this more explicitly.
>
> Dong
>
>
> On Thu, Feb 23, 2017 at 10:56 AM, Dong Lin  wrote:
>
> > Hey Jun,
> >
> > Yeah you are right. I thought it wasn't because at LinkedIn it will be
> too
> > much pressure on inGraph to expose those per-clientId metrics so we ended
> > up printing them periodically to local log. Never mind if it is not a
> > general problem.
> >
> > Hey Rajini,
> >
> > - I agree with Jay that we probably don't want to add a new field for
> > every quota ProduceResponse or FetchResponse. Is there any use-case for
> > having separate throttle-time fields for byte-rate-quota and
> > io-thread-unit-quota? You probably need to document this as interface
> > change if you plan to add new field in any request.
> >
> > - I don't think IOThread belongs to quotaType. The existing quota types
> > (i.e. Produce/Fetch/LeaderReplication/FollowerReplication) identify the
> > type of request that are throttled, not the quota mechanism that is
> applied.
> >
> > - If a request is throttled due to this io-thread-unit-based quota, is
> the
> > existing queue-size metric in ClientQuotaManager incremented?
> >
> > - In the interest of providing guide line for admin to decide
> > io-thread-unit-based quota and for user to understand its impact on their
> > traffic, would it be useful to have a metric that shows the overall
> > byte-rate per io-thread-unit? Can we also show this a per-clientId
> metric?
> >
> > Thanks,
> > Dong
> >
> >
> > On Thu, Feb 23, 2017 at 9:25 AM, Jun Rao  wrote:
> >
> >> Hi, Ismael,
> >>
> >> For #3, typically, an admin won't configure more io threads than CPU
> >> cores,
> >> but it's possible for an admin to start with fewer io threads than cores
> >> and grow that later on.
> >>
> >> Hi, Dong,
> >>
> >> I think the throttleTime sensor on the broker tells the admin whether a
> >> user/clentId is throttled or not.
> >>
> >> Hi, Radi,
> >>
> >> The reasoning for delaying the throttled requests on the broker instead
> of
> >> returning an error immediately is that the latter has no way to prevent
> >> the
> >> client from retrying immediately, which will make things worse. The
> >> delaying logic is based off a delay queue. A separate expiration thread
> >> just waits on the next to be expired request. So, it doesn't tie up a
> >> request handler thread.
> >>
> >> Thanks,
> >>
> >> Jun
> >>
> >> On Thu, Feb 23, 2017 at 9:07 AM, Ismael Juma  wrote:
> >>
> >> > Hi Jay,
> >> >
> >> > Regarding 1, I definitely like the simplicity of keeping a single
> >> throttle
> >> > time field in the response. The downside is that the client metrics
> >> will be
> >> > more coarse grained.
> >> >
> >> > Regarding 3, we have `leader.imbalance.per.broker.percentage` and
> >> > `log.cleaner.min.cleanable.ratio`.
> >> >
> >> > Ismael
> >> >
> >> > On Thu, Feb 23, 2017 at 4:43 PM, Jay Kreps  wrote:
> >> >
> >> > > A few minor comments:
> >> > >
> >> > >1. Isn't it the case that the throttling time response field
> should
> >> > have
> >> > >the total time your request was throttled irrespective of the
> >> quotas
> >> > > that
> >> > >caused that. Limiting it to byte rate quota doesn't make sense,
> >> but I
> >> > > also
> >> > >I don't think we want to end up adding new fields in the response
> >> for
> >> > > every
> >> > >single thing we quota, right?
> >> > >2. I don't think we should make this quota specifically about io
> >> > >threads. Once we introduce these quotas people set them and
> expect
> >> > them
> >> > > to
> >> > >be enforced (and if they aren't it may cause an outage). As a
> >> result
> >> > > they
> >> > >are a bit more sensitive than normal configs, I think. The
> current
> >> > > thread
> >> > >pools seem like something of an implementation detail and not the
> >> > level
> >> > > the
> >> > >user-facing quotas should be involved with. I think it might be
> >> better
> >> > > to
> >> > >make this a general 

Re: [DISCUSS] KIP-124: Request rate quotas

2017-02-23 Thread Dong Lin
Hey Rajini,

The current KIP says that the maximum delay will be reduced to window size
if it is larger than the window size. I have a concern with this:

1) This essentially means that the user is allowed to exceed their quota
over a long period of time. Can you provide an upper bound on this
deviation?

2) What is the motivation for cap the maximum delay by the window size? I
am wondering if there is better alternative to address the problem.

3) It means that the existing metric-related config will have a more
directly impact on the mechanism of this io-thread-unit-based quota. The
may be an important change depending on the answer to 1) above. We probably
need to document this more explicitly.

Dong


On Thu, Feb 23, 2017 at 10:56 AM, Dong Lin  wrote:

> Hey Jun,
>
> Yeah you are right. I thought it wasn't because at LinkedIn it will be too
> much pressure on inGraph to expose those per-clientId metrics so we ended
> up printing them periodically to local log. Never mind if it is not a
> general problem.
>
> Hey Rajini,
>
> - I agree with Jay that we probably don't want to add a new field for
> every quota ProduceResponse or FetchResponse. Is there any use-case for
> having separate throttle-time fields for byte-rate-quota and
> io-thread-unit-quota? You probably need to document this as interface
> change if you plan to add new field in any request.
>
> - I don't think IOThread belongs to quotaType. The existing quota types
> (i.e. Produce/Fetch/LeaderReplication/FollowerReplication) identify the
> type of request that are throttled, not the quota mechanism that is applied.
>
> - If a request is throttled due to this io-thread-unit-based quota, is the
> existing queue-size metric in ClientQuotaManager incremented?
>
> - In the interest of providing guide line for admin to decide
> io-thread-unit-based quota and for user to understand its impact on their
> traffic, would it be useful to have a metric that shows the overall
> byte-rate per io-thread-unit? Can we also show this a per-clientId metric?
>
> Thanks,
> Dong
>
>
> On Thu, Feb 23, 2017 at 9:25 AM, Jun Rao  wrote:
>
>> Hi, Ismael,
>>
>> For #3, typically, an admin won't configure more io threads than CPU
>> cores,
>> but it's possible for an admin to start with fewer io threads than cores
>> and grow that later on.
>>
>> Hi, Dong,
>>
>> I think the throttleTime sensor on the broker tells the admin whether a
>> user/clentId is throttled or not.
>>
>> Hi, Radi,
>>
>> The reasoning for delaying the throttled requests on the broker instead of
>> returning an error immediately is that the latter has no way to prevent
>> the
>> client from retrying immediately, which will make things worse. The
>> delaying logic is based off a delay queue. A separate expiration thread
>> just waits on the next to be expired request. So, it doesn't tie up a
>> request handler thread.
>>
>> Thanks,
>>
>> Jun
>>
>> On Thu, Feb 23, 2017 at 9:07 AM, Ismael Juma  wrote:
>>
>> > Hi Jay,
>> >
>> > Regarding 1, I definitely like the simplicity of keeping a single
>> throttle
>> > time field in the response. The downside is that the client metrics
>> will be
>> > more coarse grained.
>> >
>> > Regarding 3, we have `leader.imbalance.per.broker.percentage` and
>> > `log.cleaner.min.cleanable.ratio`.
>> >
>> > Ismael
>> >
>> > On Thu, Feb 23, 2017 at 4:43 PM, Jay Kreps  wrote:
>> >
>> > > A few minor comments:
>> > >
>> > >1. Isn't it the case that the throttling time response field should
>> > have
>> > >the total time your request was throttled irrespective of the
>> quotas
>> > > that
>> > >caused that. Limiting it to byte rate quota doesn't make sense,
>> but I
>> > > also
>> > >I don't think we want to end up adding new fields in the response
>> for
>> > > every
>> > >single thing we quota, right?
>> > >2. I don't think we should make this quota specifically about io
>> > >threads. Once we introduce these quotas people set them and expect
>> > them
>> > > to
>> > >be enforced (and if they aren't it may cause an outage). As a
>> result
>> > > they
>> > >are a bit more sensitive than normal configs, I think. The current
>> > > thread
>> > >pools seem like something of an implementation detail and not the
>> > level
>> > > the
>> > >user-facing quotas should be involved with. I think it might be
>> better
>> > > to
>> > >make this a general request-time throttle with no mention in the
>> > naming
>> > >about I/O threads and simply acknowledge the current limitation
>> (which
>> > > we
>> > >may someday fix) in the docs that this covers only the time after
>> the
>> > >thread is read off the network.
>> > >3. As such I think the right interface to the user would be
>> something
>> > >like percent_request_time and be in {0,...100} or
>> request_time_ratio
>> > > and be
>> > >in {0.0,...,1.0} (I think "ratio" is the terminology we 

Re: [DISCUSS] KIP-124: Request rate quotas

2017-02-23 Thread Dong Lin
Hey Jun,

Yeah you are right. I thought it wasn't because at LinkedIn it will be too
much pressure on inGraph to expose those per-clientId metrics so we ended
up printing them periodically to local log. Never mind if it is not a
general problem.

Hey Rajini,

- I agree with Jay that we probably don't want to add a new field for every
quota ProduceResponse or FetchResponse. Is there any use-case for having
separate throttle-time fields for byte-rate-quota and io-thread-unit-quota?
You probably need to document this as interface change if you plan to add
new field in any request.

- I don't think IOThread belongs to quotaType. The existing quota types
(i.e. Produce/Fetch/LeaderReplication/FollowerReplication) identify the
type of request that are throttled, not the quota mechanism that is applied.

- If a request is throttled due to this io-thread-unit-based quota, is the
existing queue-size metric in ClientQuotaManager incremented?

- In the interest of providing guide line for admin to decide
io-thread-unit-based quota and for user to understand its impact on their
traffic, would it be useful to have a metric that shows the overall
byte-rate per io-thread-unit? Can we also show this a per-clientId metric?

Thanks,
Dong


On Thu, Feb 23, 2017 at 9:25 AM, Jun Rao  wrote:

> Hi, Ismael,
>
> For #3, typically, an admin won't configure more io threads than CPU cores,
> but it's possible for an admin to start with fewer io threads than cores
> and grow that later on.
>
> Hi, Dong,
>
> I think the throttleTime sensor on the broker tells the admin whether a
> user/clentId is throttled or not.
>
> Hi, Radi,
>
> The reasoning for delaying the throttled requests on the broker instead of
> returning an error immediately is that the latter has no way to prevent the
> client from retrying immediately, which will make things worse. The
> delaying logic is based off a delay queue. A separate expiration thread
> just waits on the next to be expired request. So, it doesn't tie up a
> request handler thread.
>
> Thanks,
>
> Jun
>
> On Thu, Feb 23, 2017 at 9:07 AM, Ismael Juma  wrote:
>
> > Hi Jay,
> >
> > Regarding 1, I definitely like the simplicity of keeping a single
> throttle
> > time field in the response. The downside is that the client metrics will
> be
> > more coarse grained.
> >
> > Regarding 3, we have `leader.imbalance.per.broker.percentage` and
> > `log.cleaner.min.cleanable.ratio`.
> >
> > Ismael
> >
> > On Thu, Feb 23, 2017 at 4:43 PM, Jay Kreps  wrote:
> >
> > > A few minor comments:
> > >
> > >1. Isn't it the case that the throttling time response field should
> > have
> > >the total time your request was throttled irrespective of the quotas
> > > that
> > >caused that. Limiting it to byte rate quota doesn't make sense, but
> I
> > > also
> > >I don't think we want to end up adding new fields in the response
> for
> > > every
> > >single thing we quota, right?
> > >2. I don't think we should make this quota specifically about io
> > >threads. Once we introduce these quotas people set them and expect
> > them
> > > to
> > >be enforced (and if they aren't it may cause an outage). As a result
> > > they
> > >are a bit more sensitive than normal configs, I think. The current
> > > thread
> > >pools seem like something of an implementation detail and not the
> > level
> > > the
> > >user-facing quotas should be involved with. I think it might be
> better
> > > to
> > >make this a general request-time throttle with no mention in the
> > naming
> > >about I/O threads and simply acknowledge the current limitation
> (which
> > > we
> > >may someday fix) in the docs that this covers only the time after
> the
> > >thread is read off the network.
> > >3. As such I think the right interface to the user would be
> something
> > >like percent_request_time and be in {0,...100} or request_time_ratio
> > > and be
> > >in {0.0,...,1.0} (I think "ratio" is the terminology we used if the
> > > scale
> > >is between 0 and 1 in the other metrics, right?)
> > >
> > > -Jay
> > >
> > > On Thu, Feb 23, 2017 at 3:45 AM, Rajini Sivaram <
> rajinisiva...@gmail.com
> > >
> > > wrote:
> > >
> > > > Guozhang/Dong,
> > > >
> > > > Thank you for the feedback.
> > > >
> > > > Guozhang : I have updated the section on co-existence of byte rate
> and
> > > > request time quotas.
> > > >
> > > > Dong: I hadn't added much detail to the metrics and sensors since
> they
> > > are
> > > > going to be very similar to the existing metrics and sensors. To
> avoid
> > > > confusion, I have now added more detail. All metrics are in the group
> > > > "quotaType" and all sensors have names starting with "quotaType"
> (where
> > > > quotaType is Produce/Fetch/LeaderReplication/
> > > > FollowerReplication/*IOThread*).
> > > > So there will be no reuse of existing metrics/sensors. The new ones
> for
> > > > request processing 

Re: [DISCUSS] KIP-124: Request rate quotas

2017-02-23 Thread Jun Rao
Hi, Ismael,

For #3, typically, an admin won't configure more io threads than CPU cores,
but it's possible for an admin to start with fewer io threads than cores
and grow that later on.

Hi, Dong,

I think the throttleTime sensor on the broker tells the admin whether a
user/clentId is throttled or not.

Hi, Radi,

The reasoning for delaying the throttled requests on the broker instead of
returning an error immediately is that the latter has no way to prevent the
client from retrying immediately, which will make things worse. The
delaying logic is based off a delay queue. A separate expiration thread
just waits on the next to be expired request. So, it doesn't tie up a
request handler thread.

Thanks,

Jun

On Thu, Feb 23, 2017 at 9:07 AM, Ismael Juma  wrote:

> Hi Jay,
>
> Regarding 1, I definitely like the simplicity of keeping a single throttle
> time field in the response. The downside is that the client metrics will be
> more coarse grained.
>
> Regarding 3, we have `leader.imbalance.per.broker.percentage` and
> `log.cleaner.min.cleanable.ratio`.
>
> Ismael
>
> On Thu, Feb 23, 2017 at 4:43 PM, Jay Kreps  wrote:
>
> > A few minor comments:
> >
> >1. Isn't it the case that the throttling time response field should
> have
> >the total time your request was throttled irrespective of the quotas
> > that
> >caused that. Limiting it to byte rate quota doesn't make sense, but I
> > also
> >I don't think we want to end up adding new fields in the response for
> > every
> >single thing we quota, right?
> >2. I don't think we should make this quota specifically about io
> >threads. Once we introduce these quotas people set them and expect
> them
> > to
> >be enforced (and if they aren't it may cause an outage). As a result
> > they
> >are a bit more sensitive than normal configs, I think. The current
> > thread
> >pools seem like something of an implementation detail and not the
> level
> > the
> >user-facing quotas should be involved with. I think it might be better
> > to
> >make this a general request-time throttle with no mention in the
> naming
> >about I/O threads and simply acknowledge the current limitation (which
> > we
> >may someday fix) in the docs that this covers only the time after the
> >thread is read off the network.
> >3. As such I think the right interface to the user would be something
> >like percent_request_time and be in {0,...100} or request_time_ratio
> > and be
> >in {0.0,...,1.0} (I think "ratio" is the terminology we used if the
> > scale
> >is between 0 and 1 in the other metrics, right?)
> >
> > -Jay
> >
> > On Thu, Feb 23, 2017 at 3:45 AM, Rajini Sivaram  >
> > wrote:
> >
> > > Guozhang/Dong,
> > >
> > > Thank you for the feedback.
> > >
> > > Guozhang : I have updated the section on co-existence of byte rate and
> > > request time quotas.
> > >
> > > Dong: I hadn't added much detail to the metrics and sensors since they
> > are
> > > going to be very similar to the existing metrics and sensors. To avoid
> > > confusion, I have now added more detail. All metrics are in the group
> > > "quotaType" and all sensors have names starting with "quotaType" (where
> > > quotaType is Produce/Fetch/LeaderReplication/
> > > FollowerReplication/*IOThread*).
> > > So there will be no reuse of existing metrics/sensors. The new ones for
> > > request processing time based throttling will be completely independent
> > of
> > > existing metrics/sensors, but will be consistent in format.
> > >
> > > The existing throttle_time_ms field in produce/fetch responses will not
> > be
> > > impacted by this KIP. That will continue to return byte-rate based
> > > throttling times. In addition, a new field request_throttle_time_ms
> will
> > be
> > > added to return request quota based throttling times. These will be
> > exposed
> > > as new metrics on the client-side.
> > >
> > > Since all metrics and sensors are different for each type of quota, I
> > > believe there is already sufficient metrics to monitor throttling on
> both
> > > client and broker side for each type of throttling.
> > >
> > > Regards,
> > >
> > > Rajini
> > >
> > >
> > > On Thu, Feb 23, 2017 at 4:32 AM, Dong Lin  wrote:
> > >
> > > > Hey Rajini,
> > > >
> > > > I think it makes a lot of sense to use io_thread_units as metric to
> > quota
> > > > user's traffic here. LGTM overall. I have some questions regarding
> > > sensors.
> > > >
> > > > - Can you be more specific in the KIP what sensors will be added? For
> > > > example, it will be useful to specify the name and attributes of
> these
> > > new
> > > > sensors.
> > > >
> > > > - We currently have throttle-time and queue-size for byte-rate based
> > > quota.
> > > > Are you going to have separate throttle-time and queue-size for
> > requests
> > > > throttled by io_thread_unit-based quota, or will they share the same
> > > > 

Re: [DISCUSS] KIP-124: Request rate quotas

2017-02-23 Thread Ismael Juma
Hi Jay,

Regarding 1, I definitely like the simplicity of keeping a single throttle
time field in the response. The downside is that the client metrics will be
more coarse grained.

Regarding 3, we have `leader.imbalance.per.broker.percentage` and
`log.cleaner.min.cleanable.ratio`.

Ismael

On Thu, Feb 23, 2017 at 4:43 PM, Jay Kreps  wrote:

> A few minor comments:
>
>1. Isn't it the case that the throttling time response field should have
>the total time your request was throttled irrespective of the quotas
> that
>caused that. Limiting it to byte rate quota doesn't make sense, but I
> also
>I don't think we want to end up adding new fields in the response for
> every
>single thing we quota, right?
>2. I don't think we should make this quota specifically about io
>threads. Once we introduce these quotas people set them and expect them
> to
>be enforced (and if they aren't it may cause an outage). As a result
> they
>are a bit more sensitive than normal configs, I think. The current
> thread
>pools seem like something of an implementation detail and not the level
> the
>user-facing quotas should be involved with. I think it might be better
> to
>make this a general request-time throttle with no mention in the naming
>about I/O threads and simply acknowledge the current limitation (which
> we
>may someday fix) in the docs that this covers only the time after the
>thread is read off the network.
>3. As such I think the right interface to the user would be something
>like percent_request_time and be in {0,...100} or request_time_ratio
> and be
>in {0.0,...,1.0} (I think "ratio" is the terminology we used if the
> scale
>is between 0 and 1 in the other metrics, right?)
>
> -Jay
>
> On Thu, Feb 23, 2017 at 3:45 AM, Rajini Sivaram 
> wrote:
>
> > Guozhang/Dong,
> >
> > Thank you for the feedback.
> >
> > Guozhang : I have updated the section on co-existence of byte rate and
> > request time quotas.
> >
> > Dong: I hadn't added much detail to the metrics and sensors since they
> are
> > going to be very similar to the existing metrics and sensors. To avoid
> > confusion, I have now added more detail. All metrics are in the group
> > "quotaType" and all sensors have names starting with "quotaType" (where
> > quotaType is Produce/Fetch/LeaderReplication/
> > FollowerReplication/*IOThread*).
> > So there will be no reuse of existing metrics/sensors. The new ones for
> > request processing time based throttling will be completely independent
> of
> > existing metrics/sensors, but will be consistent in format.
> >
> > The existing throttle_time_ms field in produce/fetch responses will not
> be
> > impacted by this KIP. That will continue to return byte-rate based
> > throttling times. In addition, a new field request_throttle_time_ms will
> be
> > added to return request quota based throttling times. These will be
> exposed
> > as new metrics on the client-side.
> >
> > Since all metrics and sensors are different for each type of quota, I
> > believe there is already sufficient metrics to monitor throttling on both
> > client and broker side for each type of throttling.
> >
> > Regards,
> >
> > Rajini
> >
> >
> > On Thu, Feb 23, 2017 at 4:32 AM, Dong Lin  wrote:
> >
> > > Hey Rajini,
> > >
> > > I think it makes a lot of sense to use io_thread_units as metric to
> quota
> > > user's traffic here. LGTM overall. I have some questions regarding
> > sensors.
> > >
> > > - Can you be more specific in the KIP what sensors will be added? For
> > > example, it will be useful to specify the name and attributes of these
> > new
> > > sensors.
> > >
> > > - We currently have throttle-time and queue-size for byte-rate based
> > quota.
> > > Are you going to have separate throttle-time and queue-size for
> requests
> > > throttled by io_thread_unit-based quota, or will they share the same
> > > sensor?
> > >
> > > - Does the throttle-time in the ProduceResponse and FetchResponse
> > contains
> > > time due to io_thread_unit-based quota?
> > >
> > > - Currently kafka server doesn't not provide any log or metrics that
> > tells
> > > whether any given clientId (or user) is throttled. This is not too bad
> > > because we can still check the client-side byte-rate metric to validate
> > > whether a given client is throttled. But with this io_thread_unit,
> there
> > > will be no way to validate whether a given client is slow because it
> has
> > > exceeded its io_thread_unit limit. It is necessary for user to be able
> to
> > > know this information to figure how whether they have reached there
> quota
> > > limit. How about we add log4j log on the server side to periodically
> > print
> > > the (client_id, byte-rate-throttle-time, io-thread-unit-throttle-time)
> so
> > > that kafka administrator can figure those users that have reached their
> > > limit and act accordingly?
> > >
> > > Thanks,
> 

Re: [DISCUSS] KIP-124: Request rate quotas

2017-02-23 Thread Jay Kreps
A few minor comments:

   1. Isn't it the case that the throttling time response field should have
   the total time your request was throttled irrespective of the quotas that
   caused that. Limiting it to byte rate quota doesn't make sense, but I also
   I don't think we want to end up adding new fields in the response for every
   single thing we quota, right?
   2. I don't think we should make this quota specifically about io
   threads. Once we introduce these quotas people set them and expect them to
   be enforced (and if they aren't it may cause an outage). As a result they
   are a bit more sensitive than normal configs, I think. The current thread
   pools seem like something of an implementation detail and not the level the
   user-facing quotas should be involved with. I think it might be better to
   make this a general request-time throttle with no mention in the naming
   about I/O threads and simply acknowledge the current limitation (which we
   may someday fix) in the docs that this covers only the time after the
   thread is read off the network.
   3. As such I think the right interface to the user would be something
   like percent_request_time and be in {0,...100} or request_time_ratio and be
   in {0.0,...,1.0} (I think "ratio" is the terminology we used if the scale
   is between 0 and 1 in the other metrics, right?)

-Jay

On Thu, Feb 23, 2017 at 3:45 AM, Rajini Sivaram 
wrote:

> Guozhang/Dong,
>
> Thank you for the feedback.
>
> Guozhang : I have updated the section on co-existence of byte rate and
> request time quotas.
>
> Dong: I hadn't added much detail to the metrics and sensors since they are
> going to be very similar to the existing metrics and sensors. To avoid
> confusion, I have now added more detail. All metrics are in the group
> "quotaType" and all sensors have names starting with "quotaType" (where
> quotaType is Produce/Fetch/LeaderReplication/
> FollowerReplication/*IOThread*).
> So there will be no reuse of existing metrics/sensors. The new ones for
> request processing time based throttling will be completely independent of
> existing metrics/sensors, but will be consistent in format.
>
> The existing throttle_time_ms field in produce/fetch responses will not be
> impacted by this KIP. That will continue to return byte-rate based
> throttling times. In addition, a new field request_throttle_time_ms will be
> added to return request quota based throttling times. These will be exposed
> as new metrics on the client-side.
>
> Since all metrics and sensors are different for each type of quota, I
> believe there is already sufficient metrics to monitor throttling on both
> client and broker side for each type of throttling.
>
> Regards,
>
> Rajini
>
>
> On Thu, Feb 23, 2017 at 4:32 AM, Dong Lin  wrote:
>
> > Hey Rajini,
> >
> > I think it makes a lot of sense to use io_thread_units as metric to quota
> > user's traffic here. LGTM overall. I have some questions regarding
> sensors.
> >
> > - Can you be more specific in the KIP what sensors will be added? For
> > example, it will be useful to specify the name and attributes of these
> new
> > sensors.
> >
> > - We currently have throttle-time and queue-size for byte-rate based
> quota.
> > Are you going to have separate throttle-time and queue-size for requests
> > throttled by io_thread_unit-based quota, or will they share the same
> > sensor?
> >
> > - Does the throttle-time in the ProduceResponse and FetchResponse
> contains
> > time due to io_thread_unit-based quota?
> >
> > - Currently kafka server doesn't not provide any log or metrics that
> tells
> > whether any given clientId (or user) is throttled. This is not too bad
> > because we can still check the client-side byte-rate metric to validate
> > whether a given client is throttled. But with this io_thread_unit, there
> > will be no way to validate whether a given client is slow because it has
> > exceeded its io_thread_unit limit. It is necessary for user to be able to
> > know this information to figure how whether they have reached there quota
> > limit. How about we add log4j log on the server side to periodically
> print
> > the (client_id, byte-rate-throttle-time, io-thread-unit-throttle-time) so
> > that kafka administrator can figure those users that have reached their
> > limit and act accordingly?
> >
> > Thanks,
> > Dong
> >
> >
> >
> >
> >
> > On Wed, Feb 22, 2017 at 4:46 PM, Guozhang Wang 
> wrote:
> >
> > > Made a pass over the doc, overall LGTM except a minor comment on the
> > > throttling implementation:
> > >
> > > Stated as "Request processing time throttling will be applied on top if
> > > necessary." I thought that it meant the request processing time
> > throttling
> > > is applied first, but continue reading I found it actually meant to
> apply
> > > produce / fetch byte rate throttling first.
> > >
> > > Also the last sentence "The remaining delay if any is applied to 

Re: [DISCUSS] KIP-124: Request rate quotas

2017-02-23 Thread radai
i dont think time/cpu% are easy to reason about. most user-facing quota
systems i know (especially the commercial ones) focus on things users
understand better - iops and bytes.

as for quotas and "overhead" requests like heartbeats - on the one hand
subjecting them to the quota may cause clients to die out. on the other not
subjecting them to the quota opens the broker up to DOS attacks. how about
giving overhead requests their own quota, separate from "real"
(user-initiated?) requests? slightly more complicated but i think solves
the issue?

how long are requests held in purgatory? wouldnt this, at some point, still
cause resources to be taken? wouldnt it be better (for high enough delay
values) to just return an error to the client (quota exceeded, try again in
3 seconds)?

how would these work across an entire cluster? if these are enforced
independently on every single broker you'd be hitting "monotonous" clients
(who interact with fewer partitions) much harder than clients who operate
across a lot of partitions.

On Thu, Feb 23, 2017 at 8:02 AM, Ismael Juma  wrote:

> Thanks for the KIP, Rajini. This is a welcome improvement and the KIP page
> covers it well. A few comments:
>
> 1. Can you expand a bit on the motivation for throttling requests that fail
> authorization for ClusterAction? Under what scenarios would this help?
>
> 2. I think we should rename `throttle_time_ms` in the new version of
> produce/fetch response to make it clear that it refers to the byte rate
> throttling. Also, it would be good to include the updated schema for the
> responses (we typically try to do that whenever we update protocol APIs).
>
> 3. I think I am OK with using absolute units, but I am not sure about the
> argument why it's better than a percentage. We are comparing request
> threads to CPUs, but they're not the same as increasing the number of
> request threads doesn't necessarily mean that the server can cope with more
> requests. In the example where we double the number of threads, all the
> existing users would still have the same capacity proportionally speaking
> so it seems intuitive to me. One thing that would be helpful, I think, is
> to describe a few scenarios where the setting needs to be adjusted and how
> users would go about doing it.
>
> 4. I think it's worth mentioning that TLS increases the load on the network
> thread significantly and for cases where there is mixed plaintext and TLS
> traffic, the existing byte rate throttling may not do a great job. I think
> it's OK to tackle this in a separate KIP, but worth mentioning the
> limitation.
>
> 5. We mention DoS attacks in the document. It may be worth mentioning that
> this mostly helps with clients that are not malicious. A malicious client
> could generate a large number of connections to counteract the delays that
> this KIP introduces. Kafka has connection limits per IP today, but not per
> user, so a distributed DoS could bypass those. This is not easy to solve at
> the Kafka level since the authentication step required to get the user may
> be costly enough that the brokers will eventually be overwhelmed.
>
> 6. It's unfortunate that the existing byte rate quota configs use
> underscores instead of dots (like every other config) as separators. It's
> reasonable for `io_thread_units` to use the same convention as the byte
> rate configs, but it's not great that we are adding to the inconsistency. I
> don't have any great solutions apart from perhaps accepting the dot
> notation for all these configs as well.
>
> Ismael
>
> On Fri, Feb 17, 2017 at 5:05 PM, Rajini Sivaram 
> wrote:
>
> > Hi all,
> >
> > I have just created KIP-124 to introduce request rate quotas to Kafka:
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-124+-+
> > Request+rate+quotas
> >
> > The proposal is for a simple percentage request handling time quota that
> > can be allocated to **, ** or **. There
> > are a few other suggestions also under "Rejected alternatives". Feedback
> > and suggestions are welcome.
> >
> > Thank you...
> >
> > Regards,
> >
> > Rajini
> >
>


Re: [DISCUSS] KIP-124: Request rate quotas

2017-02-23 Thread Ismael Juma
Thanks for the KIP, Rajini. This is a welcome improvement and the KIP page
covers it well. A few comments:

1. Can you expand a bit on the motivation for throttling requests that fail
authorization for ClusterAction? Under what scenarios would this help?

2. I think we should rename `throttle_time_ms` in the new version of
produce/fetch response to make it clear that it refers to the byte rate
throttling. Also, it would be good to include the updated schema for the
responses (we typically try to do that whenever we update protocol APIs).

3. I think I am OK with using absolute units, but I am not sure about the
argument why it's better than a percentage. We are comparing request
threads to CPUs, but they're not the same as increasing the number of
request threads doesn't necessarily mean that the server can cope with more
requests. In the example where we double the number of threads, all the
existing users would still have the same capacity proportionally speaking
so it seems intuitive to me. One thing that would be helpful, I think, is
to describe a few scenarios where the setting needs to be adjusted and how
users would go about doing it.

4. I think it's worth mentioning that TLS increases the load on the network
thread significantly and for cases where there is mixed plaintext and TLS
traffic, the existing byte rate throttling may not do a great job. I think
it's OK to tackle this in a separate KIP, but worth mentioning the
limitation.

5. We mention DoS attacks in the document. It may be worth mentioning that
this mostly helps with clients that are not malicious. A malicious client
could generate a large number of connections to counteract the delays that
this KIP introduces. Kafka has connection limits per IP today, but not per
user, so a distributed DoS could bypass those. This is not easy to solve at
the Kafka level since the authentication step required to get the user may
be costly enough that the brokers will eventually be overwhelmed.

6. It's unfortunate that the existing byte rate quota configs use
underscores instead of dots (like every other config) as separators. It's
reasonable for `io_thread_units` to use the same convention as the byte
rate configs, but it's not great that we are adding to the inconsistency. I
don't have any great solutions apart from perhaps accepting the dot
notation for all these configs as well.

Ismael

On Fri, Feb 17, 2017 at 5:05 PM, Rajini Sivaram 
wrote:

> Hi all,
>
> I have just created KIP-124 to introduce request rate quotas to Kafka:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-124+-+
> Request+rate+quotas
>
> The proposal is for a simple percentage request handling time quota that
> can be allocated to **, ** or **. There
> are a few other suggestions also under "Rejected alternatives". Feedback
> and suggestions are welcome.
>
> Thank you...
>
> Regards,
>
> Rajini
>


Re: [DISCUSS] KIP-124: Request rate quotas

2017-02-23 Thread Rajini Sivaram
Guozhang/Dong,

Thank you for the feedback.

Guozhang : I have updated the section on co-existence of byte rate and
request time quotas.

Dong: I hadn't added much detail to the metrics and sensors since they are
going to be very similar to the existing metrics and sensors. To avoid
confusion, I have now added more detail. All metrics are in the group
"quotaType" and all sensors have names starting with "quotaType" (where
quotaType is Produce/Fetch/LeaderReplication/FollowerReplication/*IOThread*).
So there will be no reuse of existing metrics/sensors. The new ones for
request processing time based throttling will be completely independent of
existing metrics/sensors, but will be consistent in format.

The existing throttle_time_ms field in produce/fetch responses will not be
impacted by this KIP. That will continue to return byte-rate based
throttling times. In addition, a new field request_throttle_time_ms will be
added to return request quota based throttling times. These will be exposed
as new metrics on the client-side.

Since all metrics and sensors are different for each type of quota, I
believe there is already sufficient metrics to monitor throttling on both
client and broker side for each type of throttling.

Regards,

Rajini


On Thu, Feb 23, 2017 at 4:32 AM, Dong Lin  wrote:

> Hey Rajini,
>
> I think it makes a lot of sense to use io_thread_units as metric to quota
> user's traffic here. LGTM overall. I have some questions regarding sensors.
>
> - Can you be more specific in the KIP what sensors will be added? For
> example, it will be useful to specify the name and attributes of these new
> sensors.
>
> - We currently have throttle-time and queue-size for byte-rate based quota.
> Are you going to have separate throttle-time and queue-size for requests
> throttled by io_thread_unit-based quota, or will they share the same
> sensor?
>
> - Does the throttle-time in the ProduceResponse and FetchResponse contains
> time due to io_thread_unit-based quota?
>
> - Currently kafka server doesn't not provide any log or metrics that tells
> whether any given clientId (or user) is throttled. This is not too bad
> because we can still check the client-side byte-rate metric to validate
> whether a given client is throttled. But with this io_thread_unit, there
> will be no way to validate whether a given client is slow because it has
> exceeded its io_thread_unit limit. It is necessary for user to be able to
> know this information to figure how whether they have reached there quota
> limit. How about we add log4j log on the server side to periodically print
> the (client_id, byte-rate-throttle-time, io-thread-unit-throttle-time) so
> that kafka administrator can figure those users that have reached their
> limit and act accordingly?
>
> Thanks,
> Dong
>
>
>
>
>
> On Wed, Feb 22, 2017 at 4:46 PM, Guozhang Wang  wrote:
>
> > Made a pass over the doc, overall LGTM except a minor comment on the
> > throttling implementation:
> >
> > Stated as "Request processing time throttling will be applied on top if
> > necessary." I thought that it meant the request processing time
> throttling
> > is applied first, but continue reading I found it actually meant to apply
> > produce / fetch byte rate throttling first.
> >
> > Also the last sentence "The remaining delay if any is applied to the
> > response." is a bit confusing to me. Maybe rewording it a bit?
> >
> >
> > Guozhang
> >
> >
> > On Wed, Feb 22, 2017 at 3:24 PM, Jun Rao  wrote:
> >
> > > Hi, Rajini,
> > >
> > > Thanks for the updated KIP. The latest proposal looks good to me.
> > >
> > > Jun
> > >
> > > On Wed, Feb 22, 2017 at 2:19 PM, Rajini Sivaram <
> rajinisiva...@gmail.com
> > >
> > > wrote:
> > >
> > > > Jun/Roger,
> > > >
> > > > Thank you for the feedback.
> > > >
> > > > 1. I have updated the KIP to use absolute units instead of
> percentage.
> > > The
> > > > property is called* io_thread_units* to align with the thread count
> > > > property *num.io.threads*. When we implement network thread
> utilization
> > > > quotas, we can add another property *network_thread_units.*
> > > >
> > > > 2. ControlledShutdown is already listed under the exempt requests.
> Jun,
> > > did
> > > > you mean a different request that needs to be added? The four
> requests
> > > > currently exempt in the KIP are StopReplica, ControlledShutdown,
> > > > LeaderAndIsr and UpdateMetadata. These are controlled using
> > ClusterAction
> > > > ACL, so it is easy to exclude and only throttle if unauthorized. I
> > wasn't
> > > > sure if there are other requests used only for inter-broker that
> needed
> > > to
> > > > be excluded.
> > > >
> > > > 3. I was thinking the smallest change would be to replace all
> > references
> > > to
> > > > *requestChannel.sendResponse()* with a local method
> > > > *sendResponseMaybeThrottle()* that does the throttling if any plus
> send
> > > > response. If we throttle first in 

Re: [DISCUSS] KIP-124: Request rate quotas

2017-02-22 Thread Dong Lin
Hey Rajini,

I think it makes a lot of sense to use io_thread_units as metric to quota
user's traffic here. LGTM overall. I have some questions regarding sensors.

- Can you be more specific in the KIP what sensors will be added? For
example, it will be useful to specify the name and attributes of these new
sensors.

- We currently have throttle-time and queue-size for byte-rate based quota.
Are you going to have separate throttle-time and queue-size for requests
throttled by io_thread_unit-based quota, or will they share the same sensor?

- Does the throttle-time in the ProduceResponse and FetchResponse contains
time due to io_thread_unit-based quota?

- Currently kafka server doesn't not provide any log or metrics that tells
whether any given clientId (or user) is throttled. This is not too bad
because we can still check the client-side byte-rate metric to validate
whether a given client is throttled. But with this io_thread_unit, there
will be no way to validate whether a given client is slow because it has
exceeded its io_thread_unit limit. It is necessary for user to be able to
know this information to figure how whether they have reached there quota
limit. How about we add log4j log on the server side to periodically print
the (client_id, byte-rate-throttle-time, io-thread-unit-throttle-time) so
that kafka administrator can figure those users that have reached their
limit and act accordingly?

Thanks,
Dong





On Wed, Feb 22, 2017 at 4:46 PM, Guozhang Wang  wrote:

> Made a pass over the doc, overall LGTM except a minor comment on the
> throttling implementation:
>
> Stated as "Request processing time throttling will be applied on top if
> necessary." I thought that it meant the request processing time throttling
> is applied first, but continue reading I found it actually meant to apply
> produce / fetch byte rate throttling first.
>
> Also the last sentence "The remaining delay if any is applied to the
> response." is a bit confusing to me. Maybe rewording it a bit?
>
>
> Guozhang
>
>
> On Wed, Feb 22, 2017 at 3:24 PM, Jun Rao  wrote:
>
> > Hi, Rajini,
> >
> > Thanks for the updated KIP. The latest proposal looks good to me.
> >
> > Jun
> >
> > On Wed, Feb 22, 2017 at 2:19 PM, Rajini Sivaram  >
> > wrote:
> >
> > > Jun/Roger,
> > >
> > > Thank you for the feedback.
> > >
> > > 1. I have updated the KIP to use absolute units instead of percentage.
> > The
> > > property is called* io_thread_units* to align with the thread count
> > > property *num.io.threads*. When we implement network thread utilization
> > > quotas, we can add another property *network_thread_units.*
> > >
> > > 2. ControlledShutdown is already listed under the exempt requests. Jun,
> > did
> > > you mean a different request that needs to be added? The four requests
> > > currently exempt in the KIP are StopReplica, ControlledShutdown,
> > > LeaderAndIsr and UpdateMetadata. These are controlled using
> ClusterAction
> > > ACL, so it is easy to exclude and only throttle if unauthorized. I
> wasn't
> > > sure if there are other requests used only for inter-broker that needed
> > to
> > > be excluded.
> > >
> > > 3. I was thinking the smallest change would be to replace all
> references
> > to
> > > *requestChannel.sendResponse()* with a local method
> > > *sendResponseMaybeThrottle()* that does the throttling if any plus send
> > > response. If we throttle first in *KafkaApis.handle()*, the time spent
> > > within the method handling the request will not be recorded or used in
> > > throttling. We can look into this again when the PR is ready for
> review.
> > >
> > > Regards,
> > >
> > > Rajini
> > >
> > >
> > >
> > > On Wed, Feb 22, 2017 at 5:55 PM, Roger Hoover 
> > > wrote:
> > >
> > > > Great to see this KIP and the excellent discussion.
> > > >
> > > > To me, Jun's suggestion makes sense.  If my application is allocated
> 1
> > > > request handler unit, then it's as if I have a Kafka broker with a
> > single
> > > > request handler thread dedicated to me.  That's the most I can use,
> at
> > > > least.  That allocation doesn't change even if an admin later
> increases
> > > the
> > > > size of the request thread pool on the broker.  It's similar to the
> CPU
> > > > abstraction that VMs and containers get from hypervisors or OS
> > > schedulers.
> > > > While different client access patterns can use wildly different
> amounts
> > > of
> > > > request thread resources per request, a given application will
> > generally
> > > > have a stable access pattern and can figure out empirically how many
> > > > "request thread units" it needs to meet it's throughput/latency
> goals.
> > > >
> > > > Cheers,
> > > >
> > > > Roger
> > > >
> > > > On Wed, Feb 22, 2017 at 8:53 AM, Jun Rao  wrote:
> > > >
> > > > > Hi, Rajini,
> > > > >
> > > > > Thanks for the updated KIP. A few more comments.
> > > > >
> > > > > 1. A concern of 

Re: [DISCUSS] KIP-124: Request rate quotas

2017-02-22 Thread Guozhang Wang
Made a pass over the doc, overall LGTM except a minor comment on the
throttling implementation:

Stated as "Request processing time throttling will be applied on top if
necessary." I thought that it meant the request processing time throttling
is applied first, but continue reading I found it actually meant to apply
produce / fetch byte rate throttling first.

Also the last sentence "The remaining delay if any is applied to the
response." is a bit confusing to me. Maybe rewording it a bit?


Guozhang


On Wed, Feb 22, 2017 at 3:24 PM, Jun Rao  wrote:

> Hi, Rajini,
>
> Thanks for the updated KIP. The latest proposal looks good to me.
>
> Jun
>
> On Wed, Feb 22, 2017 at 2:19 PM, Rajini Sivaram 
> wrote:
>
> > Jun/Roger,
> >
> > Thank you for the feedback.
> >
> > 1. I have updated the KIP to use absolute units instead of percentage.
> The
> > property is called* io_thread_units* to align with the thread count
> > property *num.io.threads*. When we implement network thread utilization
> > quotas, we can add another property *network_thread_units.*
> >
> > 2. ControlledShutdown is already listed under the exempt requests. Jun,
> did
> > you mean a different request that needs to be added? The four requests
> > currently exempt in the KIP are StopReplica, ControlledShutdown,
> > LeaderAndIsr and UpdateMetadata. These are controlled using ClusterAction
> > ACL, so it is easy to exclude and only throttle if unauthorized. I wasn't
> > sure if there are other requests used only for inter-broker that needed
> to
> > be excluded.
> >
> > 3. I was thinking the smallest change would be to replace all references
> to
> > *requestChannel.sendResponse()* with a local method
> > *sendResponseMaybeThrottle()* that does the throttling if any plus send
> > response. If we throttle first in *KafkaApis.handle()*, the time spent
> > within the method handling the request will not be recorded or used in
> > throttling. We can look into this again when the PR is ready for review.
> >
> > Regards,
> >
> > Rajini
> >
> >
> >
> > On Wed, Feb 22, 2017 at 5:55 PM, Roger Hoover 
> > wrote:
> >
> > > Great to see this KIP and the excellent discussion.
> > >
> > > To me, Jun's suggestion makes sense.  If my application is allocated 1
> > > request handler unit, then it's as if I have a Kafka broker with a
> single
> > > request handler thread dedicated to me.  That's the most I can use, at
> > > least.  That allocation doesn't change even if an admin later increases
> > the
> > > size of the request thread pool on the broker.  It's similar to the CPU
> > > abstraction that VMs and containers get from hypervisors or OS
> > schedulers.
> > > While different client access patterns can use wildly different amounts
> > of
> > > request thread resources per request, a given application will
> generally
> > > have a stable access pattern and can figure out empirically how many
> > > "request thread units" it needs to meet it's throughput/latency goals.
> > >
> > > Cheers,
> > >
> > > Roger
> > >
> > > On Wed, Feb 22, 2017 at 8:53 AM, Jun Rao  wrote:
> > >
> > > > Hi, Rajini,
> > > >
> > > > Thanks for the updated KIP. A few more comments.
> > > >
> > > > 1. A concern of request_time_percent is that it's not an absolute
> > value.
> > > > Let's say you give a user a 10% limit. If the admin doubles the
> number
> > of
> > > > request handler threads, that user now actually has twice the
> absolute
> > > > capacity. This may confuse people a bit. So, perhaps setting the
> quota
> > > > based on an absolute request thread unit is better.
> > > >
> > > > 2. ControlledShutdownRequest is also an inter-broker request and
> needs
> > to
> > > > be excluded from throttling.
> > > >
> > > > 3. Implementation wise, I am wondering if it's simpler to apply the
> > > request
> > > > time throttling first in KafkaApis.handle(). Otherwise, we will need
> to
> > > add
> > > > the throttling logic in each type of request.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > > On Wed, Feb 22, 2017 at 5:58 AM, Rajini Sivaram <
> > rajinisiva...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > Jun,
> > > > >
> > > > > Thank you for the review.
> > > > >
> > > > > I have reverted to the original KIP that throttles based on request
> > > > handler
> > > > > utilization. At the moment, it uses percentage, but I am happy to
> > > change
> > > > to
> > > > > a fraction (out of 1 instead of 100) if required. I have added the
> > > > examples
> > > > > from this discussion to the KIP. Also added a "Future Work" section
> > to
> > > > > address network thread utilization. The configuration is named
> > > > > "request_time_percent" with the expectation that it can also be
> used
> > as
> > > > the
> > > > > limit for network thread utilization when that is implemented, so
> > that
> > > > > users have to set only one config for the two and not have to worry
> > > about
> > > > > 

Re: [DISCUSS] KIP-124: Request rate quotas

2017-02-22 Thread Jun Rao
Hi, Rajini,

Thanks for the updated KIP. The latest proposal looks good to me.

Jun

On Wed, Feb 22, 2017 at 2:19 PM, Rajini Sivaram 
wrote:

> Jun/Roger,
>
> Thank you for the feedback.
>
> 1. I have updated the KIP to use absolute units instead of percentage. The
> property is called* io_thread_units* to align with the thread count
> property *num.io.threads*. When we implement network thread utilization
> quotas, we can add another property *network_thread_units.*
>
> 2. ControlledShutdown is already listed under the exempt requests. Jun, did
> you mean a different request that needs to be added? The four requests
> currently exempt in the KIP are StopReplica, ControlledShutdown,
> LeaderAndIsr and UpdateMetadata. These are controlled using ClusterAction
> ACL, so it is easy to exclude and only throttle if unauthorized. I wasn't
> sure if there are other requests used only for inter-broker that needed to
> be excluded.
>
> 3. I was thinking the smallest change would be to replace all references to
> *requestChannel.sendResponse()* with a local method
> *sendResponseMaybeThrottle()* that does the throttling if any plus send
> response. If we throttle first in *KafkaApis.handle()*, the time spent
> within the method handling the request will not be recorded or used in
> throttling. We can look into this again when the PR is ready for review.
>
> Regards,
>
> Rajini
>
>
>
> On Wed, Feb 22, 2017 at 5:55 PM, Roger Hoover 
> wrote:
>
> > Great to see this KIP and the excellent discussion.
> >
> > To me, Jun's suggestion makes sense.  If my application is allocated 1
> > request handler unit, then it's as if I have a Kafka broker with a single
> > request handler thread dedicated to me.  That's the most I can use, at
> > least.  That allocation doesn't change even if an admin later increases
> the
> > size of the request thread pool on the broker.  It's similar to the CPU
> > abstraction that VMs and containers get from hypervisors or OS
> schedulers.
> > While different client access patterns can use wildly different amounts
> of
> > request thread resources per request, a given application will generally
> > have a stable access pattern and can figure out empirically how many
> > "request thread units" it needs to meet it's throughput/latency goals.
> >
> > Cheers,
> >
> > Roger
> >
> > On Wed, Feb 22, 2017 at 8:53 AM, Jun Rao  wrote:
> >
> > > Hi, Rajini,
> > >
> > > Thanks for the updated KIP. A few more comments.
> > >
> > > 1. A concern of request_time_percent is that it's not an absolute
> value.
> > > Let's say you give a user a 10% limit. If the admin doubles the number
> of
> > > request handler threads, that user now actually has twice the absolute
> > > capacity. This may confuse people a bit. So, perhaps setting the quota
> > > based on an absolute request thread unit is better.
> > >
> > > 2. ControlledShutdownRequest is also an inter-broker request and needs
> to
> > > be excluded from throttling.
> > >
> > > 3. Implementation wise, I am wondering if it's simpler to apply the
> > request
> > > time throttling first in KafkaApis.handle(). Otherwise, we will need to
> > add
> > > the throttling logic in each type of request.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Wed, Feb 22, 2017 at 5:58 AM, Rajini Sivaram <
> rajinisiva...@gmail.com
> > >
> > > wrote:
> > >
> > > > Jun,
> > > >
> > > > Thank you for the review.
> > > >
> > > > I have reverted to the original KIP that throttles based on request
> > > handler
> > > > utilization. At the moment, it uses percentage, but I am happy to
> > change
> > > to
> > > > a fraction (out of 1 instead of 100) if required. I have added the
> > > examples
> > > > from this discussion to the KIP. Also added a "Future Work" section
> to
> > > > address network thread utilization. The configuration is named
> > > > "request_time_percent" with the expectation that it can also be used
> as
> > > the
> > > > limit for network thread utilization when that is implemented, so
> that
> > > > users have to set only one config for the two and not have to worry
> > about
> > > > the internal distribution of the work between the two thread pools in
> > > > Kafka.
> > > >
> > > >
> > > > Regards,
> > > >
> > > > Rajini
> > > >
> > > >
> > > > On Wed, Feb 22, 2017 at 12:23 AM, Jun Rao  wrote:
> > > >
> > > > > Hi, Rajini,
> > > > >
> > > > > Thanks for the proposal.
> > > > >
> > > > > The benefit of using the request processing time over the request
> > rate
> > > is
> > > > > exactly what people have said. I will just expand that a bit.
> > Consider
> > > > the
> > > > > following case. The producer sends a produce request with a 10MB
> > > message
> > > > > but compressed to 100KB with gzip. The decompression of the message
> > on
> > > > the
> > > > > broker could take 10-15 seconds, during which time, a request
> handler
> > > > > thread is completely blocked. In this case, 

Re: [DISCUSS] KIP-124: Request rate quotas

2017-02-22 Thread Rajini Sivaram
Jun/Roger,

Thank you for the feedback.

1. I have updated the KIP to use absolute units instead of percentage. The
property is called* io_thread_units* to align with the thread count
property *num.io.threads*. When we implement network thread utilization
quotas, we can add another property *network_thread_units.*

2. ControlledShutdown is already listed under the exempt requests. Jun, did
you mean a different request that needs to be added? The four requests
currently exempt in the KIP are StopReplica, ControlledShutdown,
LeaderAndIsr and UpdateMetadata. These are controlled using ClusterAction
ACL, so it is easy to exclude and only throttle if unauthorized. I wasn't
sure if there are other requests used only for inter-broker that needed to
be excluded.

3. I was thinking the smallest change would be to replace all references to
*requestChannel.sendResponse()* with a local method
*sendResponseMaybeThrottle()* that does the throttling if any plus send
response. If we throttle first in *KafkaApis.handle()*, the time spent
within the method handling the request will not be recorded or used in
throttling. We can look into this again when the PR is ready for review.

Regards,

Rajini



On Wed, Feb 22, 2017 at 5:55 PM, Roger Hoover 
wrote:

> Great to see this KIP and the excellent discussion.
>
> To me, Jun's suggestion makes sense.  If my application is allocated 1
> request handler unit, then it's as if I have a Kafka broker with a single
> request handler thread dedicated to me.  That's the most I can use, at
> least.  That allocation doesn't change even if an admin later increases the
> size of the request thread pool on the broker.  It's similar to the CPU
> abstraction that VMs and containers get from hypervisors or OS schedulers.
> While different client access patterns can use wildly different amounts of
> request thread resources per request, a given application will generally
> have a stable access pattern and can figure out empirically how many
> "request thread units" it needs to meet it's throughput/latency goals.
>
> Cheers,
>
> Roger
>
> On Wed, Feb 22, 2017 at 8:53 AM, Jun Rao  wrote:
>
> > Hi, Rajini,
> >
> > Thanks for the updated KIP. A few more comments.
> >
> > 1. A concern of request_time_percent is that it's not an absolute value.
> > Let's say you give a user a 10% limit. If the admin doubles the number of
> > request handler threads, that user now actually has twice the absolute
> > capacity. This may confuse people a bit. So, perhaps setting the quota
> > based on an absolute request thread unit is better.
> >
> > 2. ControlledShutdownRequest is also an inter-broker request and needs to
> > be excluded from throttling.
> >
> > 3. Implementation wise, I am wondering if it's simpler to apply the
> request
> > time throttling first in KafkaApis.handle(). Otherwise, we will need to
> add
> > the throttling logic in each type of request.
> >
> > Thanks,
> >
> > Jun
> >
> > On Wed, Feb 22, 2017 at 5:58 AM, Rajini Sivaram  >
> > wrote:
> >
> > > Jun,
> > >
> > > Thank you for the review.
> > >
> > > I have reverted to the original KIP that throttles based on request
> > handler
> > > utilization. At the moment, it uses percentage, but I am happy to
> change
> > to
> > > a fraction (out of 1 instead of 100) if required. I have added the
> > examples
> > > from this discussion to the KIP. Also added a "Future Work" section to
> > > address network thread utilization. The configuration is named
> > > "request_time_percent" with the expectation that it can also be used as
> > the
> > > limit for network thread utilization when that is implemented, so that
> > > users have to set only one config for the two and not have to worry
> about
> > > the internal distribution of the work between the two thread pools in
> > > Kafka.
> > >
> > >
> > > Regards,
> > >
> > > Rajini
> > >
> > >
> > > On Wed, Feb 22, 2017 at 12:23 AM, Jun Rao  wrote:
> > >
> > > > Hi, Rajini,
> > > >
> > > > Thanks for the proposal.
> > > >
> > > > The benefit of using the request processing time over the request
> rate
> > is
> > > > exactly what people have said. I will just expand that a bit.
> Consider
> > > the
> > > > following case. The producer sends a produce request with a 10MB
> > message
> > > > but compressed to 100KB with gzip. The decompression of the message
> on
> > > the
> > > > broker could take 10-15 seconds, during which time, a request handler
> > > > thread is completely blocked. In this case, neither the byte-in quota
> > nor
> > > > the request rate quota may be effective in protecting the broker.
> > > Consider
> > > > another case. A consumer group starts with 10 instances and later on
> > > > switches to 20 instances. The request rate will likely double, but
> the
> > > > actually load on the broker may not double since each fetch request
> > only
> > > > contains half of the partitions. Request rate quota may 

Re: [DISCUSS] KIP-124: Request rate quotas

2017-02-22 Thread Roger Hoover
Great to see this KIP and the excellent discussion.

To me, Jun's suggestion makes sense.  If my application is allocated 1
request handler unit, then it's as if I have a Kafka broker with a single
request handler thread dedicated to me.  That's the most I can use, at
least.  That allocation doesn't change even if an admin later increases the
size of the request thread pool on the broker.  It's similar to the CPU
abstraction that VMs and containers get from hypervisors or OS schedulers.
While different client access patterns can use wildly different amounts of
request thread resources per request, a given application will generally
have a stable access pattern and can figure out empirically how many
"request thread units" it needs to meet it's throughput/latency goals.

Cheers,

Roger

On Wed, Feb 22, 2017 at 8:53 AM, Jun Rao  wrote:

> Hi, Rajini,
>
> Thanks for the updated KIP. A few more comments.
>
> 1. A concern of request_time_percent is that it's not an absolute value.
> Let's say you give a user a 10% limit. If the admin doubles the number of
> request handler threads, that user now actually has twice the absolute
> capacity. This may confuse people a bit. So, perhaps setting the quota
> based on an absolute request thread unit is better.
>
> 2. ControlledShutdownRequest is also an inter-broker request and needs to
> be excluded from throttling.
>
> 3. Implementation wise, I am wondering if it's simpler to apply the request
> time throttling first in KafkaApis.handle(). Otherwise, we will need to add
> the throttling logic in each type of request.
>
> Thanks,
>
> Jun
>
> On Wed, Feb 22, 2017 at 5:58 AM, Rajini Sivaram 
> wrote:
>
> > Jun,
> >
> > Thank you for the review.
> >
> > I have reverted to the original KIP that throttles based on request
> handler
> > utilization. At the moment, it uses percentage, but I am happy to change
> to
> > a fraction (out of 1 instead of 100) if required. I have added the
> examples
> > from this discussion to the KIP. Also added a "Future Work" section to
> > address network thread utilization. The configuration is named
> > "request_time_percent" with the expectation that it can also be used as
> the
> > limit for network thread utilization when that is implemented, so that
> > users have to set only one config for the two and not have to worry about
> > the internal distribution of the work between the two thread pools in
> > Kafka.
> >
> >
> > Regards,
> >
> > Rajini
> >
> >
> > On Wed, Feb 22, 2017 at 12:23 AM, Jun Rao  wrote:
> >
> > > Hi, Rajini,
> > >
> > > Thanks for the proposal.
> > >
> > > The benefit of using the request processing time over the request rate
> is
> > > exactly what people have said. I will just expand that a bit. Consider
> > the
> > > following case. The producer sends a produce request with a 10MB
> message
> > > but compressed to 100KB with gzip. The decompression of the message on
> > the
> > > broker could take 10-15 seconds, during which time, a request handler
> > > thread is completely blocked. In this case, neither the byte-in quota
> nor
> > > the request rate quota may be effective in protecting the broker.
> > Consider
> > > another case. A consumer group starts with 10 instances and later on
> > > switches to 20 instances. The request rate will likely double, but the
> > > actually load on the broker may not double since each fetch request
> only
> > > contains half of the partitions. Request rate quota may not be easy to
> > > configure in this case.
> > >
> > > What we really want is to be able to prevent a client from using too
> much
> > > of the server side resources. In this particular KIP, this resource is
> > the
> > > capacity of the request handler threads. I agree that it may not be
> > > intuitive for the users to determine how to set the right limit.
> However,
> > > this is not completely new and has been done in the container world
> > > already. For example, Linux cgroup (https://access.redhat.com/
> > > documentation/en-US/Red_Hat_Enterprise_Linux/6/html/
> > > Resource_Management_Guide/sec-cpu.html) has the concept of
> > > cpu.cfs_quota_us,
> > > which specifies the total amount of time in microseconds for which all
> > > tasks in a cgroup can run during a one second period. We can
> potentially
> > > model the request handler threads in a similar way. For example, each
> > > request handler thread can be 1 request handler unit and the admin can
> > > configure a limit on how many units (say 0.01) a client can have.
> > >
> > > Regarding not throttling the internal broker to broker requests. We
> could
> > > do that. Alternatively, we could just let the admin configure a high
> > limit
> > > for the kafka user (it may not be able to do that easily based on
> > clientId
> > > though).
> > >
> > > Ideally we want to be able to protect the utilization of the network
> > thread
> > > pool too. The difficult is mostly what Rajini said: (1) The 

Re: [DISCUSS] KIP-124: Request rate quotas

2017-02-22 Thread Jun Rao
Hi, Rajini,

Thanks for the updated KIP. A few more comments.

1. A concern of request_time_percent is that it's not an absolute value.
Let's say you give a user a 10% limit. If the admin doubles the number of
request handler threads, that user now actually has twice the absolute
capacity. This may confuse people a bit. So, perhaps setting the quota
based on an absolute request thread unit is better.

2. ControlledShutdownRequest is also an inter-broker request and needs to
be excluded from throttling.

3. Implementation wise, I am wondering if it's simpler to apply the request
time throttling first in KafkaApis.handle(). Otherwise, we will need to add
the throttling logic in each type of request.

Thanks,

Jun

On Wed, Feb 22, 2017 at 5:58 AM, Rajini Sivaram 
wrote:

> Jun,
>
> Thank you for the review.
>
> I have reverted to the original KIP that throttles based on request handler
> utilization. At the moment, it uses percentage, but I am happy to change to
> a fraction (out of 1 instead of 100) if required. I have added the examples
> from this discussion to the KIP. Also added a "Future Work" section to
> address network thread utilization. The configuration is named
> "request_time_percent" with the expectation that it can also be used as the
> limit for network thread utilization when that is implemented, so that
> users have to set only one config for the two and not have to worry about
> the internal distribution of the work between the two thread pools in
> Kafka.
>
>
> Regards,
>
> Rajini
>
>
> On Wed, Feb 22, 2017 at 12:23 AM, Jun Rao  wrote:
>
> > Hi, Rajini,
> >
> > Thanks for the proposal.
> >
> > The benefit of using the request processing time over the request rate is
> > exactly what people have said. I will just expand that a bit. Consider
> the
> > following case. The producer sends a produce request with a 10MB message
> > but compressed to 100KB with gzip. The decompression of the message on
> the
> > broker could take 10-15 seconds, during which time, a request handler
> > thread is completely blocked. In this case, neither the byte-in quota nor
> > the request rate quota may be effective in protecting the broker.
> Consider
> > another case. A consumer group starts with 10 instances and later on
> > switches to 20 instances. The request rate will likely double, but the
> > actually load on the broker may not double since each fetch request only
> > contains half of the partitions. Request rate quota may not be easy to
> > configure in this case.
> >
> > What we really want is to be able to prevent a client from using too much
> > of the server side resources. In this particular KIP, this resource is
> the
> > capacity of the request handler threads. I agree that it may not be
> > intuitive for the users to determine how to set the right limit. However,
> > this is not completely new and has been done in the container world
> > already. For example, Linux cgroup (https://access.redhat.com/
> > documentation/en-US/Red_Hat_Enterprise_Linux/6/html/
> > Resource_Management_Guide/sec-cpu.html) has the concept of
> > cpu.cfs_quota_us,
> > which specifies the total amount of time in microseconds for which all
> > tasks in a cgroup can run during a one second period. We can potentially
> > model the request handler threads in a similar way. For example, each
> > request handler thread can be 1 request handler unit and the admin can
> > configure a limit on how many units (say 0.01) a client can have.
> >
> > Regarding not throttling the internal broker to broker requests. We could
> > do that. Alternatively, we could just let the admin configure a high
> limit
> > for the kafka user (it may not be able to do that easily based on
> clientId
> > though).
> >
> > Ideally we want to be able to protect the utilization of the network
> thread
> > pool too. The difficult is mostly what Rajini said: (1) The mechanism for
> > throttling the requests is through Purgatory and we will have to think
> > through how to integrate that into the network layer.  (2) In the network
> > layer, currently we know the user, but not the clientId of the request.
> So,
> > it's a bit tricky to throttle based on clientId there. Plus, the byteOut
> > quota can already protect the network thread utilization for fetch
> > requests. So, if we can't figure out this part right now, just focusing
> on
> > the request handling threads for this KIP is still a useful feature.
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Tue, Feb 21, 2017 at 4:27 AM, Rajini Sivaram  >
> > wrote:
> >
> > > Thank you all for the feedback.
> > >
> > > Jay: I have removed exemption for consumer heartbeat etc. Agree that
> > > protecting the cluster is more important than protecting individual
> apps.
> > > Have retained the exemption for StopReplicat/LeaderAndIsr etc, these
> are
> > > throttled only if authorization fails (so can't be used for DoS attacks
> > in
> > > a secure 

Re: [DISCUSS] KIP-124: Request rate quotas

2017-02-22 Thread Rajini Sivaram
Jun,

Thank you for the review.

I have reverted to the original KIP that throttles based on request handler
utilization. At the moment, it uses percentage, but I am happy to change to
a fraction (out of 1 instead of 100) if required. I have added the examples
from this discussion to the KIP. Also added a "Future Work" section to
address network thread utilization. The configuration is named
"request_time_percent" with the expectation that it can also be used as the
limit for network thread utilization when that is implemented, so that
users have to set only one config for the two and not have to worry about
the internal distribution of the work between the two thread pools in Kafka.


Regards,

Rajini


On Wed, Feb 22, 2017 at 12:23 AM, Jun Rao  wrote:

> Hi, Rajini,
>
> Thanks for the proposal.
>
> The benefit of using the request processing time over the request rate is
> exactly what people have said. I will just expand that a bit. Consider the
> following case. The producer sends a produce request with a 10MB message
> but compressed to 100KB with gzip. The decompression of the message on the
> broker could take 10-15 seconds, during which time, a request handler
> thread is completely blocked. In this case, neither the byte-in quota nor
> the request rate quota may be effective in protecting the broker. Consider
> another case. A consumer group starts with 10 instances and later on
> switches to 20 instances. The request rate will likely double, but the
> actually load on the broker may not double since each fetch request only
> contains half of the partitions. Request rate quota may not be easy to
> configure in this case.
>
> What we really want is to be able to prevent a client from using too much
> of the server side resources. In this particular KIP, this resource is the
> capacity of the request handler threads. I agree that it may not be
> intuitive for the users to determine how to set the right limit. However,
> this is not completely new and has been done in the container world
> already. For example, Linux cgroup (https://access.redhat.com/
> documentation/en-US/Red_Hat_Enterprise_Linux/6/html/
> Resource_Management_Guide/sec-cpu.html) has the concept of
> cpu.cfs_quota_us,
> which specifies the total amount of time in microseconds for which all
> tasks in a cgroup can run during a one second period. We can potentially
> model the request handler threads in a similar way. For example, each
> request handler thread can be 1 request handler unit and the admin can
> configure a limit on how many units (say 0.01) a client can have.
>
> Regarding not throttling the internal broker to broker requests. We could
> do that. Alternatively, we could just let the admin configure a high limit
> for the kafka user (it may not be able to do that easily based on clientId
> though).
>
> Ideally we want to be able to protect the utilization of the network thread
> pool too. The difficult is mostly what Rajini said: (1) The mechanism for
> throttling the requests is through Purgatory and we will have to think
> through how to integrate that into the network layer.  (2) In the network
> layer, currently we know the user, but not the clientId of the request. So,
> it's a bit tricky to throttle based on clientId there. Plus, the byteOut
> quota can already protect the network thread utilization for fetch
> requests. So, if we can't figure out this part right now, just focusing on
> the request handling threads for this KIP is still a useful feature.
>
> Thanks,
>
> Jun
>
>
> On Tue, Feb 21, 2017 at 4:27 AM, Rajini Sivaram 
> wrote:
>
> > Thank you all for the feedback.
> >
> > Jay: I have removed exemption for consumer heartbeat etc. Agree that
> > protecting the cluster is more important than protecting individual apps.
> > Have retained the exemption for StopReplicat/LeaderAndIsr etc, these are
> > throttled only if authorization fails (so can't be used for DoS attacks
> in
> > a secure cluster, but allows inter-broker requests to complete without
> > delays).
> >
> > I will wait another day to see if these is any objection to quotas based
> on
> > request processing time (as opposed to request rate) and if there are no
> > objections, I will revert to the original proposal with some changes.
> >
> > The original proposal was only including the time used by the request
> > handler threads (that made calculation easy). I think the suggestion is
> to
> > include the time spent in the network threads as well since that may be
> > significant. As Jay pointed out, it is more complicated to calculate the
> > total available CPU time and convert to a ratio when there *m* I/O
> threads
> > and *n* network threads. ThreadMXBean#getThreadCPUTime() may give us
> what
> > we want, but it can be very expensive on some platforms. As Becket and
> > Guozhang have pointed out, we do have several time measurements already
> for
> > generating metrics that we could use, though we 

Re: [DISCUSS] KIP-124: Request rate quotas

2017-02-21 Thread Jun Rao
Hi, Rajini,

Thanks for the proposal.

The benefit of using the request processing time over the request rate is
exactly what people have said. I will just expand that a bit. Consider the
following case. The producer sends a produce request with a 10MB message
but compressed to 100KB with gzip. The decompression of the message on the
broker could take 10-15 seconds, during which time, a request handler
thread is completely blocked. In this case, neither the byte-in quota nor
the request rate quota may be effective in protecting the broker. Consider
another case. A consumer group starts with 10 instances and later on
switches to 20 instances. The request rate will likely double, but the
actually load on the broker may not double since each fetch request only
contains half of the partitions. Request rate quota may not be easy to
configure in this case.

What we really want is to be able to prevent a client from using too much
of the server side resources. In this particular KIP, this resource is the
capacity of the request handler threads. I agree that it may not be
intuitive for the users to determine how to set the right limit. However,
this is not completely new and has been done in the container world
already. For example, Linux cgroup (https://access.redhat.com/
documentation/en-US/Red_Hat_Enterprise_Linux/6/html/
Resource_Management_Guide/sec-cpu.html) has the concept of cpu.cfs_quota_us,
which specifies the total amount of time in microseconds for which all
tasks in a cgroup can run during a one second period. We can potentially
model the request handler threads in a similar way. For example, each
request handler thread can be 1 request handler unit and the admin can
configure a limit on how many units (say 0.01) a client can have.

Regarding not throttling the internal broker to broker requests. We could
do that. Alternatively, we could just let the admin configure a high limit
for the kafka user (it may not be able to do that easily based on clientId
though).

Ideally we want to be able to protect the utilization of the network thread
pool too. The difficult is mostly what Rajini said: (1) The mechanism for
throttling the requests is through Purgatory and we will have to think
through how to integrate that into the network layer.  (2) In the network
layer, currently we know the user, but not the clientId of the request. So,
it's a bit tricky to throttle based on clientId there. Plus, the byteOut
quota can already protect the network thread utilization for fetch
requests. So, if we can't figure out this part right now, just focusing on
the request handling threads for this KIP is still a useful feature.

Thanks,

Jun


On Tue, Feb 21, 2017 at 4:27 AM, Rajini Sivaram 
wrote:

> Thank you all for the feedback.
>
> Jay: I have removed exemption for consumer heartbeat etc. Agree that
> protecting the cluster is more important than protecting individual apps.
> Have retained the exemption for StopReplicat/LeaderAndIsr etc, these are
> throttled only if authorization fails (so can't be used for DoS attacks in
> a secure cluster, but allows inter-broker requests to complete without
> delays).
>
> I will wait another day to see if these is any objection to quotas based on
> request processing time (as opposed to request rate) and if there are no
> objections, I will revert to the original proposal with some changes.
>
> The original proposal was only including the time used by the request
> handler threads (that made calculation easy). I think the suggestion is to
> include the time spent in the network threads as well since that may be
> significant. As Jay pointed out, it is more complicated to calculate the
> total available CPU time and convert to a ratio when there *m* I/O threads
> and *n* network threads. ThreadMXBean#getThreadCPUTime() may give us what
> we want, but it can be very expensive on some platforms. As Becket and
> Guozhang have pointed out, we do have several time measurements already for
> generating metrics that we could use, though we might want to switch to
> nanoTime() instead of currentTimeMillis() since some of the values for
> small requests may be < 1ms. But rather than add up the time spent in I/O
> thread and network thread, wouldn't it be better to convert the time spent
> on each thread into a separate ratio? UserA has a request quota of 5%. Can
> we take that to mean that UserA can use 5% of the time on network threads
> and 5% of the time on I/O threads? If either is exceeded, the response is
> throttled - it would mean maintaining two sets of metrics for the two
> durations, but would result in more meaningful ratios. We could define two
> quota limits (UserA has 5% of request threads and 10% of network threads),
> but that seems unnecessary and harder to explain to users.
>
> Back to why and how quotas are applied to network thread utilization:
> a) In the case of fetch,  the time spent in the network thread may be
> significant and I can see the 

Re: [DISCUSS] KIP-124: Request rate quotas

2017-02-21 Thread Rajini Sivaram
Thank you all for the feedback.

Jay: I have removed exemption for consumer heartbeat etc. Agree that
protecting the cluster is more important than protecting individual apps.
Have retained the exemption for StopReplicat/LeaderAndIsr etc, these are
throttled only if authorization fails (so can't be used for DoS attacks in
a secure cluster, but allows inter-broker requests to complete without
delays).

I will wait another day to see if these is any objection to quotas based on
request processing time (as opposed to request rate) and if there are no
objections, I will revert to the original proposal with some changes.

The original proposal was only including the time used by the request
handler threads (that made calculation easy). I think the suggestion is to
include the time spent in the network threads as well since that may be
significant. As Jay pointed out, it is more complicated to calculate the
total available CPU time and convert to a ratio when there *m* I/O threads
and *n* network threads. ThreadMXBean#getThreadCPUTime() may give us what
we want, but it can be very expensive on some platforms. As Becket and
Guozhang have pointed out, we do have several time measurements already for
generating metrics that we could use, though we might want to switch to
nanoTime() instead of currentTimeMillis() since some of the values for
small requests may be < 1ms. But rather than add up the time spent in I/O
thread and network thread, wouldn't it be better to convert the time spent
on each thread into a separate ratio? UserA has a request quota of 5%. Can
we take that to mean that UserA can use 5% of the time on network threads
and 5% of the time on I/O threads? If either is exceeded, the response is
throttled - it would mean maintaining two sets of metrics for the two
durations, but would result in more meaningful ratios. We could define two
quota limits (UserA has 5% of request threads and 10% of network threads),
but that seems unnecessary and harder to explain to users.

Back to why and how quotas are applied to network thread utilization:
a) In the case of fetch,  the time spent in the network thread may be
significant and I can see the need to include this. Are there other
requests where the network thread utilization is significant? In the case
of fetch, request handler thread utilization would throttle clients with
high request rate, low data volume and fetch byte rate quota will throttle
clients with high data volume. Network thread utilization is perhaps
proportional to the data volume. I am wondering if we even need to throttle
based on network thread utilization or whether the data volume quota covers
this case.

b) At the moment, we record and check for quota violation at the same time.
If a quota is violated, the response is delayed. Using Jay'e example of
disk reads for fetches happening in the network thread, We can't record and
delay a response after the disk reads. We could record the time spent on
the network thread when the response is complete and introduce a delay for
handling a subsequent request (separate out recording and quota violation
handling in the case of network thread overload). Does that make sense?


Regards,

Rajini


On Tue, Feb 21, 2017 at 2:58 AM, Becket Qin  wrote:

> Hey Jay,
>
> Yeah, I agree that enforcing the CPU time is a little tricky. I am thinking
> that maybe we can use the existing request statistics. They are already
> very detailed so we can probably see the approximate CPU time from it, e.g.
> something like (total_time - request/response_queue_time - remote_time).
>
> I agree with Guozhang that when a user is throttled it is likely that we
> need to see if anything has went wrong first, and if the users are well
> behaving and just need more resources, we will have to bump up the quota
> for them. It is true that pre-allocating CPU time quota precisely for the
> users is difficult. So in practice it would probably be more like first set
> a relative high protective CPU time quota for everyone and increase that
> for some individual clients on demand.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
>
> On Mon, Feb 20, 2017 at 5:48 PM, Guozhang Wang  wrote:
>
> > This is a great proposal, glad to see it happening.
> >
> > I am inclined to the CPU throttling, or more specifically processing time
> > ratio instead of the request rate throttling as well. Becket has very
> well
> > summed my rationales above, and one thing to add here is that the former
> > has a good support for both "protecting against rogue clients" as well as
> > "utilizing a cluster for multi-tenancy usage": when thinking about how to
> > explain this to the end users, I find it actually more natural than the
> > request rate since as mentioned above, different requests will have quite
> > different "cost", and Kafka today already have various request types
> > (produce, fetch, admin, metadata, etc), because of that the request rate
> > throttling may not 

Re: [DISCUSS] KIP-124: Request rate quotas

2017-02-20 Thread Becket Qin
Hey Jay,

Yeah, I agree that enforcing the CPU time is a little tricky. I am thinking
that maybe we can use the existing request statistics. They are already
very detailed so we can probably see the approximate CPU time from it, e.g.
something like (total_time - request/response_queue_time - remote_time).

I agree with Guozhang that when a user is throttled it is likely that we
need to see if anything has went wrong first, and if the users are well
behaving and just need more resources, we will have to bump up the quota
for them. It is true that pre-allocating CPU time quota precisely for the
users is difficult. So in practice it would probably be more like first set
a relative high protective CPU time quota for everyone and increase that
for some individual clients on demand.

Thanks,

Jiangjie (Becket) Qin


On Mon, Feb 20, 2017 at 5:48 PM, Guozhang Wang  wrote:

> This is a great proposal, glad to see it happening.
>
> I am inclined to the CPU throttling, or more specifically processing time
> ratio instead of the request rate throttling as well. Becket has very well
> summed my rationales above, and one thing to add here is that the former
> has a good support for both "protecting against rogue clients" as well as
> "utilizing a cluster for multi-tenancy usage": when thinking about how to
> explain this to the end users, I find it actually more natural than the
> request rate since as mentioned above, different requests will have quite
> different "cost", and Kafka today already have various request types
> (produce, fetch, admin, metadata, etc), because of that the request rate
> throttling may not be as effective unless it is set very conservatively.
>
> Regarding to user reactions when they are throttled, I think it may differ
> case-by-case, and need to be discovered / guided by looking at relative
> metrics. So in other words users would not expect to get additional
> information by simply being told "hey, you are throttled", which is all
> what throttling does; they need to take a follow-up step and see "hmm, I'm
> throttled probably because of ..", which is by looking at other metric
> values: e.g. whether I'm bombarding the brokers with metadata request,
> which are usually cheap to handle but I'm sending thousands per second; or
> is it because I'm catching up and hence sending very heavy fetching request
> with large min.bytes, etc.
>
> Regarding to the implementation, as once discussed with Jun, this seems not
> very difficult since today we are already collecting the "thread pool
> utilization" metrics, which is a single percentage "aggregateIdleMeter"
> value; but we are already effectively aggregating it for each requests in
> KafkaRequestHandler, and we can just extend it by recording the source
> client id when handling them and aggregating by clientId as well as the
> total aggregate.
>
>
> Guozhang
>
>
>
>
> On Mon, Feb 20, 2017 at 4:27 PM, Jay Kreps  wrote:
>
> > Hey Becket/Rajini,
> >
> > When I thought about it more deeply I came around to the "percent of
> > processing time" metric too. It seems a lot closer to the thing we
> actually
> > care about and need to protect. I also think this would be a very useful
> > metric even in the absence of throttling just to debug whose using
> > capacity.
> >
> > Two problems to consider:
> >
> >1. I agree that for the user it is understandable what lead to their
> >being throttled, but it is a bit hard to figure out the safe range for
> >them. i.e. if I have a new app that will send 200 messages/sec I can
> >probably reason that I'll be under the throttling limit of 300
> req/sec.
> >However if I need to be under a 10% CPU resources limit it may be a
> bit
> >harder for me to know a priori if i will or won't.
> >2. Calculating the available CPU time is a bit difficult since there
> are
> >actually two thread pools--the I/O threads and the network threads. I
> > think
> >it might be workable to count just the I/O thread time as in the
> > proposal,
> >but the network thread work is actually non-trivial (e.g. all the disk
> >reads for fetches happen in that thread). If you count both the
> network
> > and
> >I/O threads it can skew things a bit. E.g. say you have 50 network
> > threads,
> >10 I/O threads, and 8 cores, what is the available cpu time available
> > in a
> >second? I suppose this is a problem whenever you have a bottleneck
> > between
> >I/O and network threads or if you end up significantly
> over-provisioning
> >one pool (both of which are hard to avoid).
> >
> > An alternative for CPU throttling would be to use this api:
> > http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/
> > management/ThreadMXBean.html#getThreadCpuTime(long)
> >
> > That would let you track actual CPU usage across the network, I/O
> threads,
> > and purgatory threads and look at it as a percentage of total cores. I
> > think this fixes many problems 

Re: [DISCUSS] KIP-124: Request rate quotas

2017-02-20 Thread Guozhang Wang
This is a great proposal, glad to see it happening.

I am inclined to the CPU throttling, or more specifically processing time
ratio instead of the request rate throttling as well. Becket has very well
summed my rationales above, and one thing to add here is that the former
has a good support for both "protecting against rogue clients" as well as
"utilizing a cluster for multi-tenancy usage": when thinking about how to
explain this to the end users, I find it actually more natural than the
request rate since as mentioned above, different requests will have quite
different "cost", and Kafka today already have various request types
(produce, fetch, admin, metadata, etc), because of that the request rate
throttling may not be as effective unless it is set very conservatively.

Regarding to user reactions when they are throttled, I think it may differ
case-by-case, and need to be discovered / guided by looking at relative
metrics. So in other words users would not expect to get additional
information by simply being told "hey, you are throttled", which is all
what throttling does; they need to take a follow-up step and see "hmm, I'm
throttled probably because of ..", which is by looking at other metric
values: e.g. whether I'm bombarding the brokers with metadata request,
which are usually cheap to handle but I'm sending thousands per second; or
is it because I'm catching up and hence sending very heavy fetching request
with large min.bytes, etc.

Regarding to the implementation, as once discussed with Jun, this seems not
very difficult since today we are already collecting the "thread pool
utilization" metrics, which is a single percentage "aggregateIdleMeter"
value; but we are already effectively aggregating it for each requests in
KafkaRequestHandler, and we can just extend it by recording the source
client id when handling them and aggregating by clientId as well as the
total aggregate.


Guozhang




On Mon, Feb 20, 2017 at 4:27 PM, Jay Kreps  wrote:

> Hey Becket/Rajini,
>
> When I thought about it more deeply I came around to the "percent of
> processing time" metric too. It seems a lot closer to the thing we actually
> care about and need to protect. I also think this would be a very useful
> metric even in the absence of throttling just to debug whose using
> capacity.
>
> Two problems to consider:
>
>1. I agree that for the user it is understandable what lead to their
>being throttled, but it is a bit hard to figure out the safe range for
>them. i.e. if I have a new app that will send 200 messages/sec I can
>probably reason that I'll be under the throttling limit of 300 req/sec.
>However if I need to be under a 10% CPU resources limit it may be a bit
>harder for me to know a priori if i will or won't.
>2. Calculating the available CPU time is a bit difficult since there are
>actually two thread pools--the I/O threads and the network threads. I
> think
>it might be workable to count just the I/O thread time as in the
> proposal,
>but the network thread work is actually non-trivial (e.g. all the disk
>reads for fetches happen in that thread). If you count both the network
> and
>I/O threads it can skew things a bit. E.g. say you have 50 network
> threads,
>10 I/O threads, and 8 cores, what is the available cpu time available
> in a
>second? I suppose this is a problem whenever you have a bottleneck
> between
>I/O and network threads or if you end up significantly over-provisioning
>one pool (both of which are hard to avoid).
>
> An alternative for CPU throttling would be to use this api:
> http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/
> management/ThreadMXBean.html#getThreadCpuTime(long)
>
> That would let you track actual CPU usage across the network, I/O threads,
> and purgatory threads and look at it as a percentage of total cores. I
> think this fixes many problems in the reliability of the metric. It's
> meaning is slightly different as it is just CPU (you don't get charged for
> time blocking on I/O) but that may be okay because we already have a
> throttle on I/O. The downside is I think it is possible this api can be
> disabled or isn't always available and it may also be expensive (also I've
> never used it so not sure if it really works the way i think).
>
> -Jay
>
> On Mon, Feb 20, 2017 at 3:17 PM, Becket Qin  wrote:
>
> > If the purpose of the KIP is only to protect the cluster from being
> > overwhelmed by crazy clients and is not intended to address resource
> > allocation problem among the clients, I am wondering if using request
> > handling time quota (CPU time quota) is a better option. Here are the
> > reasons:
> >
> > 1. request handling time quota has better protection. Say we have request
> > rate quota and set that to some value like 100 requests/sec, it is
> possible
> > that some of the requests are very expensive actually take a lot of time
> to
> > handle. In 

Re: [DISCUSS] KIP-124: Request rate quotas

2017-02-20 Thread Jay Kreps
Hey Becket/Rajini,

When I thought about it more deeply I came around to the "percent of
processing time" metric too. It seems a lot closer to the thing we actually
care about and need to protect. I also think this would be a very useful
metric even in the absence of throttling just to debug whose using capacity.

Two problems to consider:

   1. I agree that for the user it is understandable what lead to their
   being throttled, but it is a bit hard to figure out the safe range for
   them. i.e. if I have a new app that will send 200 messages/sec I can
   probably reason that I'll be under the throttling limit of 300 req/sec.
   However if I need to be under a 10% CPU resources limit it may be a bit
   harder for me to know a priori if i will or won't.
   2. Calculating the available CPU time is a bit difficult since there are
   actually two thread pools--the I/O threads and the network threads. I think
   it might be workable to count just the I/O thread time as in the proposal,
   but the network thread work is actually non-trivial (e.g. all the disk
   reads for fetches happen in that thread). If you count both the network and
   I/O threads it can skew things a bit. E.g. say you have 50 network threads,
   10 I/O threads, and 8 cores, what is the available cpu time available in a
   second? I suppose this is a problem whenever you have a bottleneck between
   I/O and network threads or if you end up significantly over-provisioning
   one pool (both of which are hard to avoid).

An alternative for CPU throttling would be to use this api:
http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/management/ThreadMXBean.html#getThreadCpuTime(long)

That would let you track actual CPU usage across the network, I/O threads,
and purgatory threads and look at it as a percentage of total cores. I
think this fixes many problems in the reliability of the metric. It's
meaning is slightly different as it is just CPU (you don't get charged for
time blocking on I/O) but that may be okay because we already have a
throttle on I/O. The downside is I think it is possible this api can be
disabled or isn't always available and it may also be expensive (also I've
never used it so not sure if it really works the way i think).

-Jay

On Mon, Feb 20, 2017 at 3:17 PM, Becket Qin  wrote:

> If the purpose of the KIP is only to protect the cluster from being
> overwhelmed by crazy clients and is not intended to address resource
> allocation problem among the clients, I am wondering if using request
> handling time quota (CPU time quota) is a better option. Here are the
> reasons:
>
> 1. request handling time quota has better protection. Say we have request
> rate quota and set that to some value like 100 requests/sec, it is possible
> that some of the requests are very expensive actually take a lot of time to
> handle. In that case a few clients may still occupy a lot of CPU time even
> the request rate is low. Arguably we can carefully set request rate quota
> for each request and client id combination, but it could still be tricky to
> get it right for everyone.
>
> If we use the request time handling quota, we can simply say no clients can
> take up to more than 30% of the total request handling capacity (measured
> by time), regardless of the difference among different requests or what is
> the client doing. In this case maybe we can quota all the requests if we
> want to.
>
> 2. The main benefit of using request rate limit is that it seems more
> intuitive. It is true that it is probably easier to explain to the user
> what does that mean. However, in practice it looks the impact of request
> rate quota is not more quantifiable than the request handling time quota.
> Unlike the byte rate quota, it is still difficult to give a number about
> impact of throughput or latency when a request rate quota is hit. So it is
> not better than the request handling time quota. In fact I feel it is
> clearer to tell user that "you are limited because you have taken 30% of
> the CPU time on the broker" than otherwise something like "your request
> rate quota on metadata request has reached".
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
>
> On Mon, Feb 20, 2017 at 2:23 PM, Jay Kreps  wrote:
>
> > I think this proposal makes a lot of sense (especially now that it is
> > oriented around request rate) and fills the biggest remaining gap in the
> > multi-tenancy story.
> >
> > I think for intra-cluster communication (StopReplica, etc) we could avoid
> > throttling entirely. You can secure or otherwise lock-down the cluster
> > communication to avoid any unauthorized external party from trying to
> > initiate these requests. As a result we are as likely to cause problems
> as
> > solve them by throttling these, right?
> >
> > I'm not so sure that we should exempt the consumer requests such as
> > heartbeat. It's true that if we throttle an app's heartbeat requests it
> may
> > cause it to fall out of its 

Re: [DISCUSS] KIP-124: Request rate quotas

2017-02-20 Thread Becket Qin
If the purpose of the KIP is only to protect the cluster from being
overwhelmed by crazy clients and is not intended to address resource
allocation problem among the clients, I am wondering if using request
handling time quota (CPU time quota) is a better option. Here are the
reasons:

1. request handling time quota has better protection. Say we have request
rate quota and set that to some value like 100 requests/sec, it is possible
that some of the requests are very expensive actually take a lot of time to
handle. In that case a few clients may still occupy a lot of CPU time even
the request rate is low. Arguably we can carefully set request rate quota
for each request and client id combination, but it could still be tricky to
get it right for everyone.

If we use the request time handling quota, we can simply say no clients can
take up to more than 30% of the total request handling capacity (measured
by time), regardless of the difference among different requests or what is
the client doing. In this case maybe we can quota all the requests if we
want to.

2. The main benefit of using request rate limit is that it seems more
intuitive. It is true that it is probably easier to explain to the user
what does that mean. However, in practice it looks the impact of request
rate quota is not more quantifiable than the request handling time quota.
Unlike the byte rate quota, it is still difficult to give a number about
impact of throughput or latency when a request rate quota is hit. So it is
not better than the request handling time quota. In fact I feel it is
clearer to tell user that "you are limited because you have taken 30% of
the CPU time on the broker" than otherwise something like "your request
rate quota on metadata request has reached".

Thanks,

Jiangjie (Becket) Qin


On Mon, Feb 20, 2017 at 2:23 PM, Jay Kreps  wrote:

> I think this proposal makes a lot of sense (especially now that it is
> oriented around request rate) and fills the biggest remaining gap in the
> multi-tenancy story.
>
> I think for intra-cluster communication (StopReplica, etc) we could avoid
> throttling entirely. You can secure or otherwise lock-down the cluster
> communication to avoid any unauthorized external party from trying to
> initiate these requests. As a result we are as likely to cause problems as
> solve them by throttling these, right?
>
> I'm not so sure that we should exempt the consumer requests such as
> heartbeat. It's true that if we throttle an app's heartbeat requests it may
> cause it to fall out of its consumer group. However if we don't throttle it
> it may DDOS the cluster if the heartbeat interval is set incorrectly or if
> some client in some language has a bug. I think the policy with this kind
> of throttling is to protect the cluster above any individual app, right? I
> think in general this should be okay since for most deployments this
> setting is meant as more of a safety valve---that is rather than set
> something very close to what you expect to need (say 2 req/sec or whatever)
> you would have something quite high (like 100 req/sec) with this meant to
> prevent a client gone crazy. I think when used this way allowing those to
> be throttled would actually provide meaningful protection.
>
> -Jay
>
>
>
> On Fri, Feb 17, 2017 at 9:05 AM, Rajini Sivaram 
> wrote:
>
> > Hi all,
> >
> > I have just created KIP-124 to introduce request rate quotas to Kafka:
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 124+-+Request+rate+quotas
> >
> > The proposal is for a simple percentage request handling time quota that
> > can be allocated to **, ** or **. There
> > are a few other suggestions also under "Rejected alternatives". Feedback
> > and suggestions are welcome.
> >
> > Thank you...
> >
> > Regards,
> >
> > Rajini
> >
>


Re: [DISCUSS] KIP-124: Request rate quotas

2017-02-20 Thread Jay Kreps
I think this proposal makes a lot of sense (especially now that it is
oriented around request rate) and fills the biggest remaining gap in the
multi-tenancy story.

I think for intra-cluster communication (StopReplica, etc) we could avoid
throttling entirely. You can secure or otherwise lock-down the cluster
communication to avoid any unauthorized external party from trying to
initiate these requests. As a result we are as likely to cause problems as
solve them by throttling these, right?

I'm not so sure that we should exempt the consumer requests such as
heartbeat. It's true that if we throttle an app's heartbeat requests it may
cause it to fall out of its consumer group. However if we don't throttle it
it may DDOS the cluster if the heartbeat interval is set incorrectly or if
some client in some language has a bug. I think the policy with this kind
of throttling is to protect the cluster above any individual app, right? I
think in general this should be okay since for most deployments this
setting is meant as more of a safety valve---that is rather than set
something very close to what you expect to need (say 2 req/sec or whatever)
you would have something quite high (like 100 req/sec) with this meant to
prevent a client gone crazy. I think when used this way allowing those to
be throttled would actually provide meaningful protection.

-Jay



On Fri, Feb 17, 2017 at 9:05 AM, Rajini Sivaram 
wrote:

> Hi all,
>
> I have just created KIP-124 to introduce request rate quotas to Kafka:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 124+-+Request+rate+quotas
>
> The proposal is for a simple percentage request handling time quota that
> can be allocated to **, ** or **. There
> are a few other suggestions also under "Rejected alternatives". Feedback
> and suggestions are welcome.
>
> Thank you...
>
> Regards,
>
> Rajini
>


Re: [DISCUSS] KIP-124: Request rate quotas

2017-02-20 Thread Rajini Sivaram
I have updated the KIP to use request rates instead of request processing
time,

I have removed all requests that require ClusterAction permission
(LeaderAndIsr and UpdateMetdata as well in addition to stop/shutdown). But
I have left Metadata request in. Quota windows which limit the maximum
delay tend to be small (1 second by default) compared to request timeout or
max.block.ms and even the existing byte rate quotas can impact the time
taken to fetch metadata if the metadata request is queued behind a produce
request (for instance). So I don't think clients will need any additional
exception handling code for request rate quotas beyond what they already
need for byte rate quotas. Clients can flood the broker with metadata
requests (eg. producer with retry.backoff.ms=0 sending a message to a
non-existent topic), so it makes sense to throttle metadata requests.


Thanks,

Rajini

On Mon, Feb 20, 2017 at 11:55 AM, Dong Lin  wrote:

> Hey Rajini,
>
> Thanks for the explanation. I have some follow up questions regarding the
> types of requests that will be covered by this quota. Since this KIP focus
> only on throttling the traffic between client and broker and client never
> sends LeaderAndIsrRequest to broker, should we exclude LeaderAndIsrRequest
> from this KIP?
>
> Besides, I am still not sure we should throttle MetadataUpdateRequeset. The
> benefits of throttling MetadataUpdateRequest seems little since it doesn't
> increase with user traffic. Client only sends MetadataUpdateRequest when
> there is partition leadership change or when client metadata has expired.
> On the other hand, if we throttle MetadataUpdateRequest, there is chance
> that MetadataUpdateRequest doesn't get update in time and user may receive
> exception. This seems like a big interface change because user will have to
> change application code to handle such exception. Note that the current
> rate-based quota will reduce traffic without throwing any exception to
> user.
>
> Anyway, I am looking forward to the updated KIP:)
>
> Thanks,
> Dong
>
> On Mon, Feb 20, 2017 at 2:43 AM, Rajini Sivaram 
> wrote:
>
> > Dong, Onur & Becket,
> >
> > Thank you all for the very useful feedback.
> >
> > The choice of request handling time as opposed to request rate was based
> on
> > the observation in KAFKA-4195
> >  that request rates
> may
> > be less intuitive to configure than percentage utilization. But since the
> > KIP is measuring time rather than request pool utilization as suggested
> in
> > the JIRA, I agree that request rate would probably work better than
> > percentage. So I am inclined to change the KIP to throttle on request
> rates
> > (e.g 100 requests per second) rather than percentage. Average request
> rates
> > are exposed as metrics, so admin can configure quotas based on that. And
> > the values are more meaningful from the client application point of
> view. I
> > am still interested in feedback regarding the second rejected alternative
> > that throttles based on percentage utilization of resource handler pool.
> > That was the suggestion from Jun/Ismael in KAFKA-4195, but I couldn't see
> > how that would help in the case where a small number of connections
> pushed
> > a continuous stream of short requests. Suggestions welcome.
> >
> > Responses to other questions above:
> >
> > - (Dong): The KIP proposes to throttle most requests (and not just
> > Produce/Fetch) since the goal is to control usage of broker resources. So
> > LeaderAndIsrRequest and MetadataRequest will also be throttled. The few
> > requests not being throttled are timing-sensitive.
> >
> > - (Dong): The KIP does not propose to throttle inter-broker traffic based
> > on request rates. The most frequent requests in inter-broker traffic are
> > fetch requests and a well configured broker would use reasonably good
> > values of min.bytes and max.wait that avoids overloading the broker
> > unnecessarily with fetch requests. The existing byte-rate based quotas
> > should be sufficient in this case.
> >
> > - (Onur): Quota window configuration - this is the existing configuration
> > quota.window.size.seconds (also used for byte-rate quotas)
> >
> > - (Becket): The main issue that the KIP is addressing is clients flooding
> > the broker with small requests (eg. fetch with max.wait.ms=0), which can
> > overload the broker and delay requests from other clients/users even
> though
> > the byte rate is quite small. CPU quota reflects the resource usage on
> the
> > broker that the KIP is attempting to limit. Since this is the time on the
> > local broker, it shouldn't vary much depending on acks=-1 etc. but I do
> > agree on the unpredictability of time based quotas. Switching from
> request
> > processing time to request rates will address this. Would you still be
> > concerned that "*Users do not have direct control over the request rate,
> > i.e. users do **not know 

Re: [DISCUSS] KIP-124: Request rate quotas

2017-02-20 Thread Dong Lin
Hey Rajini,

Thanks for the explanation. I have some follow up questions regarding the
types of requests that will be covered by this quota. Since this KIP focus
only on throttling the traffic between client and broker and client never
sends LeaderAndIsrRequest to broker, should we exclude LeaderAndIsrRequest
from this KIP?

Besides, I am still not sure we should throttle MetadataUpdateRequeset. The
benefits of throttling MetadataUpdateRequest seems little since it doesn't
increase with user traffic. Client only sends MetadataUpdateRequest when
there is partition leadership change or when client metadata has expired.
On the other hand, if we throttle MetadataUpdateRequest, there is chance
that MetadataUpdateRequest doesn't get update in time and user may receive
exception. This seems like a big interface change because user will have to
change application code to handle such exception. Note that the current
rate-based quota will reduce traffic without throwing any exception to user.

Anyway, I am looking forward to the updated KIP:)

Thanks,
Dong

On Mon, Feb 20, 2017 at 2:43 AM, Rajini Sivaram 
wrote:

> Dong, Onur & Becket,
>
> Thank you all for the very useful feedback.
>
> The choice of request handling time as opposed to request rate was based on
> the observation in KAFKA-4195
>  that request rates may
> be less intuitive to configure than percentage utilization. But since the
> KIP is measuring time rather than request pool utilization as suggested in
> the JIRA, I agree that request rate would probably work better than
> percentage. So I am inclined to change the KIP to throttle on request rates
> (e.g 100 requests per second) rather than percentage. Average request rates
> are exposed as metrics, so admin can configure quotas based on that. And
> the values are more meaningful from the client application point of view. I
> am still interested in feedback regarding the second rejected alternative
> that throttles based on percentage utilization of resource handler pool.
> That was the suggestion from Jun/Ismael in KAFKA-4195, but I couldn't see
> how that would help in the case where a small number of connections pushed
> a continuous stream of short requests. Suggestions welcome.
>
> Responses to other questions above:
>
> - (Dong): The KIP proposes to throttle most requests (and not just
> Produce/Fetch) since the goal is to control usage of broker resources. So
> LeaderAndIsrRequest and MetadataRequest will also be throttled. The few
> requests not being throttled are timing-sensitive.
>
> - (Dong): The KIP does not propose to throttle inter-broker traffic based
> on request rates. The most frequent requests in inter-broker traffic are
> fetch requests and a well configured broker would use reasonably good
> values of min.bytes and max.wait that avoids overloading the broker
> unnecessarily with fetch requests. The existing byte-rate based quotas
> should be sufficient in this case.
>
> - (Onur): Quota window configuration - this is the existing configuration
> quota.window.size.seconds (also used for byte-rate quotas)
>
> - (Becket): The main issue that the KIP is addressing is clients flooding
> the broker with small requests (eg. fetch with max.wait.ms=0), which can
> overload the broker and delay requests from other clients/users even though
> the byte rate is quite small. CPU quota reflects the resource usage on the
> broker that the KIP is attempting to limit. Since this is the time on the
> local broker, it shouldn't vary much depending on acks=-1 etc. but I do
> agree on the unpredictability of time based quotas. Switching from request
> processing time to request rates will address this. Would you still be
> concerned that "*Users do not have direct control over the request rate,
> i.e. users do **not know when a request will be sent by the clients*"?
>
> Jun/Ismael,
>
> I am interested in your views on request rate based quotas and whether we
> should still consider utilization of the resource handler pool.
>
>
> Many thanks,
>
> Rajini
>
>
> On Sun, Feb 19, 2017 at 11:54 PM, Becket Qin  wrote:
>
> > Thanks for the KIP, Rajini,
> >
> > If I understand correctly the proposal was essentially trying to quota
> the
> > CPU usage (that is probably why time slice is used instead of request
> rate)
> > while the existing quota we have is for network bandwidth.
> >
> > Given we are trying to throttle both CPU and Network, that implies the
> > following patterns for the clients:
> > 1. High CPU usage, high network usage.
> > 2. High CPU usage, low network usage.
> > 3. Low CPU usage, high network usage.
> > 4. Low CPU usage, low network usage
> >
> > Theoretically the existing quota addresses case 3 & 4. And this KIP seems
> > trying to address case 1 & 2. However, it might be helpful to understand
> > what we want to achieve with CPU and network quotas.
> >
> > People mainly use quota for two 

Re: [DISCUSS] KIP-124: Request rate quotas

2017-02-20 Thread Rajini Sivaram
Dong, Onur & Becket,

Thank you all for the very useful feedback.

The choice of request handling time as opposed to request rate was based on
the observation in KAFKA-4195
 that request rates may
be less intuitive to configure than percentage utilization. But since the
KIP is measuring time rather than request pool utilization as suggested in
the JIRA, I agree that request rate would probably work better than
percentage. So I am inclined to change the KIP to throttle on request rates
(e.g 100 requests per second) rather than percentage. Average request rates
are exposed as metrics, so admin can configure quotas based on that. And
the values are more meaningful from the client application point of view. I
am still interested in feedback regarding the second rejected alternative
that throttles based on percentage utilization of resource handler pool.
That was the suggestion from Jun/Ismael in KAFKA-4195, but I couldn't see
how that would help in the case where a small number of connections pushed
a continuous stream of short requests. Suggestions welcome.

Responses to other questions above:

- (Dong): The KIP proposes to throttle most requests (and not just
Produce/Fetch) since the goal is to control usage of broker resources. So
LeaderAndIsrRequest and MetadataRequest will also be throttled. The few
requests not being throttled are timing-sensitive.

- (Dong): The KIP does not propose to throttle inter-broker traffic based
on request rates. The most frequent requests in inter-broker traffic are
fetch requests and a well configured broker would use reasonably good
values of min.bytes and max.wait that avoids overloading the broker
unnecessarily with fetch requests. The existing byte-rate based quotas
should be sufficient in this case.

- (Onur): Quota window configuration - this is the existing configuration
quota.window.size.seconds (also used for byte-rate quotas)

- (Becket): The main issue that the KIP is addressing is clients flooding
the broker with small requests (eg. fetch with max.wait.ms=0), which can
overload the broker and delay requests from other clients/users even though
the byte rate is quite small. CPU quota reflects the resource usage on the
broker that the KIP is attempting to limit. Since this is the time on the
local broker, it shouldn't vary much depending on acks=-1 etc. but I do
agree on the unpredictability of time based quotas. Switching from request
processing time to request rates will address this. Would you still be
concerned that "*Users do not have direct control over the request rate,
i.e. users do **not know when a request will be sent by the clients*"?

Jun/Ismael,

I am interested in your views on request rate based quotas and whether we
should still consider utilization of the resource handler pool.


Many thanks,

Rajini


On Sun, Feb 19, 2017 at 11:54 PM, Becket Qin  wrote:

> Thanks for the KIP, Rajini,
>
> If I understand correctly the proposal was essentially trying to quota the
> CPU usage (that is probably why time slice is used instead of request rate)
> while the existing quota we have is for network bandwidth.
>
> Given we are trying to throttle both CPU and Network, that implies the
> following patterns for the clients:
> 1. High CPU usage, high network usage.
> 2. High CPU usage, low network usage.
> 3. Low CPU usage, high network usage.
> 4. Low CPU usage, low network usage
>
> Theoretically the existing quota addresses case 3 & 4. And this KIP seems
> trying to address case 1 & 2. However, it might be helpful to understand
> what we want to achieve with CPU and network quotas.
>
> People mainly use quota for two different purposes:
> a) protecting the broker from misbehaving clients, and
> b) resource distribution for multi-tenancy.
>
> I agree that generally speaking CPU time is a suitable metric to quota on
> for CPU usage and would work for a). However, as Dong and Onur noticed, it
> is not easy to quantify the impact for the end users at application level
> with a throttled CPU time. If the purpose of the CPU quota is only for
> protection, maybe we don't need a user facing CPU quota.
>
> That said, a user facing CPU quota could be useful for virtualization,
> which maybe related to multi-tenancy but is a little different. Imagine
> there are 10 services sharing the same physical Kafka cluster. With CPU
> time quota and network bandwidth quota, each service can provision a
> logical Kafka cluster with some reserved CPU time and network bandwidth.
> And in this case the quota will be on per logic cluster. Not sure if this
> is what the KIP is intended in the future, though. It would be good if the
> KIP can be more clear on what exact scenarios the CPU quota is trying to
> address.
>
> As of the request rate quota, while it seems easy to enforce and intuitive,
> there are some caveats.
> 1. Users do not have direct control over the request rate, i.e. users do
> not known when a 

Re: [DISCUSS] KIP-124: Request rate quotas

2017-02-19 Thread Becket Qin
Thanks for the KIP, Rajini,

If I understand correctly the proposal was essentially trying to quota the
CPU usage (that is probably why time slice is used instead of request rate)
while the existing quota we have is for network bandwidth.

Given we are trying to throttle both CPU and Network, that implies the
following patterns for the clients:
1. High CPU usage, high network usage.
2. High CPU usage, low network usage.
3. Low CPU usage, high network usage.
4. Low CPU usage, low network usage

Theoretically the existing quota addresses case 3 & 4. And this KIP seems
trying to address case 1 & 2. However, it might be helpful to understand
what we want to achieve with CPU and network quotas.

People mainly use quota for two different purposes:
a) protecting the broker from misbehaving clients, and
b) resource distribution for multi-tenancy.

I agree that generally speaking CPU time is a suitable metric to quota on
for CPU usage and would work for a). However, as Dong and Onur noticed, it
is not easy to quantify the impact for the end users at application level
with a throttled CPU time. If the purpose of the CPU quota is only for
protection, maybe we don't need a user facing CPU quota.

That said, a user facing CPU quota could be useful for virtualization,
which maybe related to multi-tenancy but is a little different. Imagine
there are 10 services sharing the same physical Kafka cluster. With CPU
time quota and network bandwidth quota, each service can provision a
logical Kafka cluster with some reserved CPU time and network bandwidth.
And in this case the quota will be on per logic cluster. Not sure if this
is what the KIP is intended in the future, though. It would be good if the
KIP can be more clear on what exact scenarios the CPU quota is trying to
address.

As of the request rate quota, while it seems easy to enforce and intuitive,
there are some caveats.
1. Users do not have direct control over the request rate, i.e. users do
not known when a request will be sent by the clients.
2. Each request may require different amount of CPU resources to handle.
That may depends on many things, e.g. whether acks = 1 or acks = -1,
whether a request is addressing 1000 partitions or 1 partition, whether a
fetch request requires message format down conversion or not, etc.
So the result of using request rate quota could be quite unpredictable.

Thanks,

Jiangjie (Becket) Qin

On Sat, Feb 18, 2017 at 9:35 PM, Dong Lin  wrote:

> I realized the main concern with this proposal is how user can interpret
> this CPU-percentage based quota. Since this quota is exposed to user, we
> need to explain to user how this quota is going to impact their application
> performance and convince them that the quota is now too low for their
> application. We can able to do this with byte-rate based quota. But I am
> not sure how we can do this with CPU-percentage based quota. For example,
> how is user going to understand whether 1% CPU is OK?
>
> On Fri, Feb 17, 2017 at 10:11 AM, Onur Karaman <
> onurkaraman.apa...@gmail.com
> > wrote:
>
> > Overall a big fan of the KIP.
> >
> > I'd have to agree with Dong. I'm not sure about the decision of using the
> > percentage over the window as opposed to request rate. It's pretty hard
> to
> > reason about. I just spoke to one of our SRE's and he agrees.
> >
> > Also I may have missed it, but I couldn't find information in the KIP on
> > where this window would be configured.
> >
> > On Fri, Feb 17, 2017 at 9:45 AM, Dong Lin  wrote:
> >
> > > To correct the typo above: It seems to me that determination of request
> > > rate is not any more difficult than determination of *byte* rate as
> both
> > > metrics are commonly used to measure performance and provide guarantee
> to
> > > user.
> > >
> > > On Fri, Feb 17, 2017 at 9:40 AM, Dong Lin  wrote:
> > >
> > > > Hey Rajini,
> > > >
> > > > Thanks for the KIP. I have some questions:
> > > >
> > > > - I am wondering why throttling based on request rate is listed as a
> > > > rejected alternative. Can you provide more specific reason why it is
> > > > difficult for administrators to decide request rates to allocate? It
> > > seems
> > > > to me that determination of request rate is not any more difficult
> than
> > > > determination of request rate as both metrics are commonly used to
> > > measure
> > > > performance and provide guarantee to user. On the other hand, the
> > > > percentage of processing time provides a vague guarantee to user. For
> > > > example, what performance can user expect if you provide 1%
> processing
> > > time
> > > > quota to this user? How is administrator going to decide this quota?
> > > Should
> > > > Kafka administrator continues to reduce this percentage quota as
> number
> > > of
> > > > users grow?
> > > >
> > > > - The KIP suggests that LeaderAndIsrRequest and MetadataRequest will
> > also
> > > > be throttled by this quota. What is the motivation for 

Re: [DISCUSS] KIP-124: Request rate quotas

2017-02-18 Thread Dong Lin
I realized the main concern with this proposal is how user can interpret
this CPU-percentage based quota. Since this quota is exposed to user, we
need to explain to user how this quota is going to impact their application
performance and convince them that the quota is now too low for their
application. We can able to do this with byte-rate based quota. But I am
not sure how we can do this with CPU-percentage based quota. For example,
how is user going to understand whether 1% CPU is OK?

On Fri, Feb 17, 2017 at 10:11 AM, Onur Karaman  wrote:

> Overall a big fan of the KIP.
>
> I'd have to agree with Dong. I'm not sure about the decision of using the
> percentage over the window as opposed to request rate. It's pretty hard to
> reason about. I just spoke to one of our SRE's and he agrees.
>
> Also I may have missed it, but I couldn't find information in the KIP on
> where this window would be configured.
>
> On Fri, Feb 17, 2017 at 9:45 AM, Dong Lin  wrote:
>
> > To correct the typo above: It seems to me that determination of request
> > rate is not any more difficult than determination of *byte* rate as both
> > metrics are commonly used to measure performance and provide guarantee to
> > user.
> >
> > On Fri, Feb 17, 2017 at 9:40 AM, Dong Lin  wrote:
> >
> > > Hey Rajini,
> > >
> > > Thanks for the KIP. I have some questions:
> > >
> > > - I am wondering why throttling based on request rate is listed as a
> > > rejected alternative. Can you provide more specific reason why it is
> > > difficult for administrators to decide request rates to allocate? It
> > seems
> > > to me that determination of request rate is not any more difficult than
> > > determination of request rate as both metrics are commonly used to
> > measure
> > > performance and provide guarantee to user. On the other hand, the
> > > percentage of processing time provides a vague guarantee to user. For
> > > example, what performance can user expect if you provide 1% processing
> > time
> > > quota to this user? How is administrator going to decide this quota?
> > Should
> > > Kafka administrator continues to reduce this percentage quota as number
> > of
> > > users grow?
> > >
> > > - The KIP suggests that LeaderAndIsrRequest and MetadataRequest will
> also
> > > be throttled by this quota. What is the motivation for throttling these
> > > requests? It is also inconsistent with rate-based quota which is only
> > > applied to ProduceRequest and FetchRequest. IMO it will be simpler to
> > only
> > > throttle ProduceRequest and FetchRequest.
> > >
> > > - Do you think we should also throttle the inter-broker traffic using
> > this
> > > quota as well similar to KIP-73?
> > >
> > > Thanks,
> > > Dong
> > >
> > >
> > >
> > > On Fri, Feb 17, 2017 at 9:05 AM, Rajini Sivaram <
> rajinisiva...@gmail.com
> > >
> > > wrote:
> > >
> > >> Hi all,
> > >>
> > >> I have just created KIP-124 to introduce request rate quotas to Kafka:
> > >>
> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-124+-+
> > >> Request+rate+quotas
> > >>
> > >> The proposal is for a simple percentage request handling time quota
> that
> > >> can be allocated to **, ** or **.
> > There
> > >> are a few other suggestions also under "Rejected alternatives".
> Feedback
> > >> and suggestions are welcome.
> > >>
> > >> Thank you...
> > >>
> > >> Regards,
> > >>
> > >> Rajini
> > >>
> > >
> > >
> >
>


Re: [DISCUSS] KIP-124: Request rate quotas

2017-02-17 Thread Onur Karaman
Overall a big fan of the KIP.

I'd have to agree with Dong. I'm not sure about the decision of using the
percentage over the window as opposed to request rate. It's pretty hard to
reason about. I just spoke to one of our SRE's and he agrees.

Also I may have missed it, but I couldn't find information in the KIP on
where this window would be configured.

On Fri, Feb 17, 2017 at 9:45 AM, Dong Lin  wrote:

> To correct the typo above: It seems to me that determination of request
> rate is not any more difficult than determination of *byte* rate as both
> metrics are commonly used to measure performance and provide guarantee to
> user.
>
> On Fri, Feb 17, 2017 at 9:40 AM, Dong Lin  wrote:
>
> > Hey Rajini,
> >
> > Thanks for the KIP. I have some questions:
> >
> > - I am wondering why throttling based on request rate is listed as a
> > rejected alternative. Can you provide more specific reason why it is
> > difficult for administrators to decide request rates to allocate? It
> seems
> > to me that determination of request rate is not any more difficult than
> > determination of request rate as both metrics are commonly used to
> measure
> > performance and provide guarantee to user. On the other hand, the
> > percentage of processing time provides a vague guarantee to user. For
> > example, what performance can user expect if you provide 1% processing
> time
> > quota to this user? How is administrator going to decide this quota?
> Should
> > Kafka administrator continues to reduce this percentage quota as number
> of
> > users grow?
> >
> > - The KIP suggests that LeaderAndIsrRequest and MetadataRequest will also
> > be throttled by this quota. What is the motivation for throttling these
> > requests? It is also inconsistent with rate-based quota which is only
> > applied to ProduceRequest and FetchRequest. IMO it will be simpler to
> only
> > throttle ProduceRequest and FetchRequest.
> >
> > - Do you think we should also throttle the inter-broker traffic using
> this
> > quota as well similar to KIP-73?
> >
> > Thanks,
> > Dong
> >
> >
> >
> > On Fri, Feb 17, 2017 at 9:05 AM, Rajini Sivaram  >
> > wrote:
> >
> >> Hi all,
> >>
> >> I have just created KIP-124 to introduce request rate quotas to Kafka:
> >>
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-124+-+
> >> Request+rate+quotas
> >>
> >> The proposal is for a simple percentage request handling time quota that
> >> can be allocated to **, ** or **.
> There
> >> are a few other suggestions also under "Rejected alternatives". Feedback
> >> and suggestions are welcome.
> >>
> >> Thank you...
> >>
> >> Regards,
> >>
> >> Rajini
> >>
> >
> >
>


Re: [DISCUSS] KIP-124: Request rate quotas

2017-02-17 Thread Dong Lin
To correct the typo above: It seems to me that determination of request
rate is not any more difficult than determination of *byte* rate as both
metrics are commonly used to measure performance and provide guarantee to
user.

On Fri, Feb 17, 2017 at 9:40 AM, Dong Lin  wrote:

> Hey Rajini,
>
> Thanks for the KIP. I have some questions:
>
> - I am wondering why throttling based on request rate is listed as a
> rejected alternative. Can you provide more specific reason why it is
> difficult for administrators to decide request rates to allocate? It seems
> to me that determination of request rate is not any more difficult than
> determination of request rate as both metrics are commonly used to measure
> performance and provide guarantee to user. On the other hand, the
> percentage of processing time provides a vague guarantee to user. For
> example, what performance can user expect if you provide 1% processing time
> quota to this user? How is administrator going to decide this quota? Should
> Kafka administrator continues to reduce this percentage quota as number of
> users grow?
>
> - The KIP suggests that LeaderAndIsrRequest and MetadataRequest will also
> be throttled by this quota. What is the motivation for throttling these
> requests? It is also inconsistent with rate-based quota which is only
> applied to ProduceRequest and FetchRequest. IMO it will be simpler to only
> throttle ProduceRequest and FetchRequest.
>
> - Do you think we should also throttle the inter-broker traffic using this
> quota as well similar to KIP-73?
>
> Thanks,
> Dong
>
>
>
> On Fri, Feb 17, 2017 at 9:05 AM, Rajini Sivaram 
> wrote:
>
>> Hi all,
>>
>> I have just created KIP-124 to introduce request rate quotas to Kafka:
>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-124+-+
>> Request+rate+quotas
>>
>> The proposal is for a simple percentage request handling time quota that
>> can be allocated to **, ** or **. There
>> are a few other suggestions also under "Rejected alternatives". Feedback
>> and suggestions are welcome.
>>
>> Thank you...
>>
>> Regards,
>>
>> Rajini
>>
>
>


Re: [DISCUSS] KIP-124: Request rate quotas

2017-02-17 Thread Dong Lin
Hey Rajini,

Thanks for the KIP. I have some questions:

- I am wondering why throttling based on request rate is listed as a
rejected alternative. Can you provide more specific reason why it is
difficult for administrators to decide request rates to allocate? It seems
to me that determination of request rate is not any more difficult than
determination of request rate as both metrics are commonly used to measure
performance and provide guarantee to user. On the other hand, the
percentage of processing time provides a vague guarantee to user. For
example, what performance can user expect if you provide 1% processing time
quota to this user? How is administrator going to decide this quota? Should
Kafka administrator continues to reduce this percentage quota as number of
users grow?

- The KIP suggests that LeaderAndIsrRequest and MetadataRequest will also
be throttled by this quota. What is the motivation for throttling these
requests? It is also inconsistent with rate-based quota which is only
applied to ProduceRequest and FetchRequest. IMO it will be simpler to only
throttle ProduceRequest and FetchRequest.

- Do you think we should also throttle the inter-broker traffic using this
quota as well similar to KIP-73?

Thanks,
Dong



On Fri, Feb 17, 2017 at 9:05 AM, Rajini Sivaram 
wrote:

> Hi all,
>
> I have just created KIP-124 to introduce request rate quotas to Kafka:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-124+-+
> Request+rate+quotas
>
> The proposal is for a simple percentage request handling time quota that
> can be allocated to **, ** or **. There
> are a few other suggestions also under "Rejected alternatives". Feedback
> and suggestions are welcome.
>
> Thank you...
>
> Regards,
>
> Rajini
>