Hi Claude,

I can clarify my comments.

Just to clarify -- my understanding is that we don't intend to throttle any
new producer IDs at the beginning. I believe this amount is specified by
`producer_ids_rate`, but you can see this as a number of producer IDs per
hour.

So consider a case where there is a storm for a given principal. We could
have a large mass of short lived producers in addition to some
"well-behaved" ones. My understanding is that if the "well-behaved" one
doesn't produce as frequently ie less than once per hour, it will also get
throttled when a storm of short-lived producers leads the principal to hit
the given rate necessary for throttling. The idea of the filter is that we
don't throttle existing producers, but in this case, we will.

Note -- one thing that wasn't totally clear from the KIP was whether we
throttle all new produce requests from the client or just the ones with
unseen IDs. If we throttle them all, perhaps this point isn't a huge deal.

The other concern that I brought up is that when we throttle, we will
likely continue to throttle until the storm stops. This is because we will
have to wait 1 day or so for IDs to expire, and we will likely replace them
at a pretty fast rate. This can be acceptable if we believe that it is
helpful to getting the behavior to stop, but I just wanted to call out that
the user will likely not be able to start clients in the meantime.

Justine

On Sun, May 5, 2024 at 6:35 AM Claude Warren <cla...@xenei.com> wrote:

> Justine,
>
> I am new here so please excuse the ignorance.
>
> When you talk about "seen" producers I assume you mean the PIDs that the
> Bloom filter has seen.
> When you say "producer produces every 2 hours" are you the producer writes
> to a topic every 2 hours and uses the same PID?
> When you say "hitting the limit" what limit is reached?
>
> Given the default setup, A producer that produces a PID every 2 hours,
> regardless of whether or not it is a new PID, will be reported as a new PID
> being seen.  But I would expect the throttling system to accept that as a
> new PID for the producer and look at the frequency of PIDs and accept
> without throttling.
>
> If the actual question is "how many PIDs did this Principal produce in the
> last hour"  Or "Has this Principal produced more than X PIDs in the last
> hour", there are probably cleaner ways to do this.  If this is the
> question, I would use CPC from Apache Data Sketches [1] and keep multiple
> CPC (say every 15 minutes -- to match the KIP-936 proposal) for each
> Principal.  You could then do a quick check on the current CPC to see if it
> exceeds hour-limit / 4 and if so check the hour rate (by summing the 4
> 15-minute CPCs).  Then the code could simply notify when to throttle and
> when to stop throttling.
>
> Claude
>
>
> https://datasketches.apache.org/docs/CPC/CpcPerformance.html
>
> On Fri, May 3, 2024 at 4:21 PM Justine Olshan <jols...@confluent.io.invalid
> >
> wrote:
>
> > Hey folks,
> >
> > I shared this with Omnia offline:
> > One concern I have is with the length of time we keep "seen" producer
> IDs.
> > It seems like the default is 1 hour. If a producer produces every 2 hours
> > or so, and we are hitting the limit, it seems like we will throttle it
> even
> > though we've seen it before and have state for it on the server. Then, it
> > seems like we will have to wait for the natural expiration of producer
> ids
> > (via producer.id.expiration.ms) before we allow new or idle producers to
> > join again without throttling. I think this proposal is a step in the
> right
> > direction when it comes to throttling the "right" clients, but I want to
> > make sure we have reasonable defaults. Keep in mind that idempotent
> > producers are the default, so most folks won't be tuning these values out
> > of the box.
> >
> > As for Igor's questions about InitProducerId -- I think the main reason
> we
> > have avoided that solution is that there is no state stored for
> idempotent
> > producers when grabbing an ID. My concern there is either storing too
> much
> > state to track this or throttling before we need to.
> >
> > Justine
> >
> > On Thu, May 2, 2024 at 2:36 PM Claude Warren, Jr
> > <claude.war...@aiven.io.invalid> wrote:
> >
> > > There is some question about whether or not we need the configuration
> > > options.  My take on them is as follows:
> > >
> > > producer.id.quota.window.num  No opinion.  I don't know what this is
> used
> > > for, but I suspect that there is a good reason to have it.  It is not
> > used
> > > within the Bloom filter caching mechanism
> > > producer.id.quota.window.size.seconds Leave it as it is one of the most
> > > effective ways to tune the filter and determines how long a PID is
> > > recognized.
> > > producer.id.quota.cache.cleanup.scheduler.interval.ms  Remove it
> unless
> > > there is another use for it.   We can get a better calculation for
> > > internals.
> > > producer.id.quota.cache.layer.count Leave it as it is one of the most
> > > effective ways to tune the filter.
> > > producer.id.quota.cache.false.positive.rate Replace it with a constant,
> > I
> > > don't think any other Bloom filter solution provides access to this
> knob
> > > for end users.
> > > producer_ids_rate Leave this one, it is critical for reasonable
> > operation.
> > >
> >
>
>
> --
> LinkedIn: http://www.linkedin.com/in/claudewarren
>

Reply via email to