Omnia, thanks for the updates!

> Am happy to add section for throttling in this KIP if it is high concern
or open a followup KIP for this once we already have the pagination in
place. Which one do you suggest?

I'm okay leaving throttling for a future KIP. It might be useful to see the
feature in action for a while before deciding if its necessary or the best
way to approach it.

On Mon, Jul 22, 2024 at 9:23 AM Omnia Ibrahim <o.g.h.ibra...@gmail.com>
wrote:

>
> Hi David, thanks for the feedback and sorry for taking long to respond as
> I was off for a week.
> > DA1: In "Public Interfaces" you say "max.request.pagination.size.limit"
> > controls the max items to return by default. It's not clear to me if this
> > is just a default, or if it is a hard limit. In KIP-966, this config
> serves
> > as a hard limit to prevent misconfigured or malicious clients from
> > requesting too many resources. Can you clarify this bit?
>
> `max.request.partition.size.limit` will be used in same way as KIP-966 I
> just meant `max.request.partition.size.limit` will equal
> `max.request.pagination.size.limit` by default unless it is specified
> otherwise. I clarified this in the KIP now
>
> > DA2: Is "ItemsLeftToFetch" accounting for authorization? If not, it could
> > be considered a minor info leak.
>
> This is a good point. Any of the requests still will count to what ACLs
> and resources the authorised user is used by the client, the pagination
> will not effect this.
> In cases where the client is using user with wild ACLs I am assuming this
> is okay and they have the right to see this info.
> However am rethinking this now as it might not be that useful and we can
> just relay on if the there is a next cursor or not to simplify the approach
> similar to KIP-966. I have updated the KIP to reflect this.
>
> > DA3: By splitting up results into pages, we are introducing the
> possibility
> > of inconsistency into these RPCs. For example, today MetadataRequest
> > returns results from the same MetadataImage, so the response is
> consistent.
> > With the paging approach, it is possible (likely even) that different
> > requests will be served from different MetadataImage-s, leading to
> > inconsistencies. This can be even worse if paged requests go to different
> > brokers that may be lagging behind in metadata propagation. BTW this
> issue
> > exists for KIP-966 as well. We don't necessarily need to solve this right
> > away, but I think it's worth mentioning in the KIP.
>
> I added a limitation section to the KIP to mention this. I also mentioned
> it in the top section of public interfaces.
>
> > DA4: Have we considered some generic throttling for paged requests? I
> > expect it might be an issue if clients want to get everything and just
> page
> > through all of the results as quickly as possible.
> I didn’t consider throttling for pagination requests as
>  Right now the only throttling AdminClient knows is throttling
> TopicCreate/Delete which is different than pagination and might need it is
> own conversation and KIP.
> For example in the case of throttling and retries > timeouts, should
> consider send back what we fetched so far and allow the operator to set the
> cursor next time. If this is the case then we need to include cursor to all
> the Option classes to these requests. Also Admin API for
> DescribeTopicPartitionRequest in KIP-966 don’t provide Cursor as part of
> DescribeTopicsOptions.
> Also extending `controllerMutation` or should we separate the paging
> throttling to its own quota
> The only requests I think might actively scraped are `OffsetFetchRequest`,
> `ListGroupsRequest`, `DescribeGroupsRequest` and
> `ConsumerGroupDescribeRequest` to actively provide lag metrics/dashboards
> to consumers. So there might be too many pages.
> The rest of the requests mostly used during maintenance of the cluster or
> incidents (specially the producer/txn requests) and operator of the cluster
> need them to take a decision. The pagination just provides them with a way
> to escape the timeout problem with large clusters. So am not sure adding
> throttling during such time would be wise.
> Am happy to add section for throttling in this KIP if it is high concern
> or open a followup KIP for this once we already have the pagination in
> place. Which one do you suggest?
>
> Thanks
> Omnia
>
> > On 12 Jul 2024, at 14:56, David Arthur <mum...@gmail.com> wrote:
> >
> > Hey Omnia, thanks for the KIP! I think this will be a really nice
> > improvement for operators.
> >
> > DA1: In "Public Interfaces" you say "max.request.pagination.size.limit"
> > controls the max items to return by default. It's not clear to me if this
> > is just a default, or if it is a hard limit. In KIP-966, this config
> serves
> > as a hard limit to prevent misconfigured or malicious clients from
> > requesting too many resources. Can you clarify this bit?
> >
> > DA2: Is "ItemsLeftToFetch" accounting for authorization? If not, it could
> > be considered a minor info leak.
> >
> > DA3: By splitting up results into pages, we are introducing the
> possibility
> > of inconsistency into these RPCs. For example, today MetadataRequest
> > returns results from the same MetadataImage, so the response is
> consistent.
> > With the paging approach, it is possible (likely even) that different
> > requests will be served from different MetadataImage-s, leading to
> > inconsistencies. This can be even worse if paged requests go to different
> > brokers that may be lagging behind in metadata propagation. BTW this
> issue
> > exists for KIP-966 as well. We don't necessarily need to solve this right
> > away, but I think it's worth mentioning in the KIP.
> >
> > DA4: Have we considered some generic throttling for paged requests? I
> > expect it might be an issue if clients want to get everything and just
> page
> > through all of the results as quickly as possible.
> >
> > Cheers,
> > David
> >
> >
> > On Tue, Jul 9, 2024 at 8:20 AM Omnia Ibrahim <o.g.h.ibra...@gmail.com>
> > wrote:
> >
> >> Hey,
> >> I will open a voting thread by end of this week if there is no more
> >> feedbacks.
> >>
> >> Thanks
> >> Omnia
> >>
> >>> On 2 Jul 2024, at 17:25, Omnia Ibrahim <o.g.h.ibra...@gmail.com>
> wrote:
> >>>
> >>> Hi David, Thanks for having a look into this!
> >>>
> >>>> DJ1: Why do we want to migrate only a subset of the APIs vs migrating
> >> all
> >>>> of them? For instance, there are DescribeGroups,
> ConsumerGroupDescribe,
> >>>> etc. Do we have reasons not to migrate them too? I think that it would
> >> be
> >>>> great to have a KIP that establishes the pattern for all the admin
> APIs.
> >>>
> >>> No particular reason I only was trying to focus on the requests that
> >> cause the most pain ( as far as I know with Kafka). I don’t mind expand
> it
> >> to other requests in AdminClient as well.
> >>> The other thing I was wondering about if
> >> max.request.partition.size.limit still make sense or should it be
> renamed
> >> (or have another one called max.request.pagingation.size.limit) if we
> going
> >> to expand to cover all Admin requests.  I am leaning toward
> >> max.request.pagingation.size.limit and max.request.partition.size.limit
> can
> >> be set for now to max.request.partition.size.limit by default.
> >>>
> >>> I have updated the KIP now to cover more requests please have a second
> >> look and let me know if I missed any?
> >>>
> >>>> DJ2: I am not a fan of all the new parameters passed to the tools
> >>>> (e.g. --partition-size-limit-per-response). I wonder if we could
> rather
> >>>> have a config in the admin client to set the default page size for all
> >>>> requests.
> >>>
> >>> If it going to be pattern to all requests I agree it should be an admin
> >> config then.
> >>> Updated the KIP with this now.
> >>>
> >>>> DJ3: I assume that the Admin client will transparently handle the
> >> change.
> >>>> It would be great to call it out in the compatibility section.
> >>>
> >>> Updated, Old clients will still have the old behaviour of no
> pagingation
> >> while all new versions will support this.
> >>>
> >>> Thanks
> >>> Omnia
> >>>
> >>>> On 2 Jul 2024, at 10:43, David Jacot <dja...@confluent.io.INVALID>
> >> wrote:
> >>>>
> >>>> Hi Omnia,
> >>>>
> >>>> Thanks for the KIP. I agree that we should migrate admin APIs to the
> new
> >>>> pattern.
> >>>>
> >>>> DJ1: Why do we want to migrate only a subset of the APIs vs migrating
> >> all
> >>>> of them? For instance, there are DescribeGroups,
> ConsumerGroupDescribe,
> >>>> etc. Do we have reasons not to migrate them too? I think that it would
> >> be
> >>>> great to have a KIP that establishes the pattern for all the admin
> APIs.
> >>>>
> >>>> DJ2: I am not a fan of all the new parameters passed to the tools
> >>>> (e.g. --partition-size-limit-per-response). I wonder if we could
> rather
> >>>> have a config in the admin client to set the default page size for all
> >>>> requests.
> >>>>
> >>>> DJ3: I assume that the Admin client will transparently handle the
> >> change.
> >>>> It would be great to call it out in the compatibility section.
> >>>>
> >>>> Thanks,
> >>>> David
> >>>>
> >>>> On Tue, Jul 2, 2024 at 11:17 AM Andrew Schofield <
> >> andrew_schofi...@live.com>
> >>>> wrote:
> >>>>
> >>>>> Hi,
> >>>>> Thanks for the response. Makes sense to me. Just one additional
> >> comment:
> >>>>>
> >>>>> AS5: The cursor for ListGroupsResponse is called
> `TransactionalCursor`
> >>>>> which
> >>>>> seems like a copy-paste mistake.
> >>>>>
> >>>>> Thanks,
> >>>>> Andrew
> >>>>>
> >>>>>> On 30 Jun 2024, at 22:28, Omnia Ibrahim <o.g.h.ibra...@gmail.com>
> >> wrote:
> >>>>>>
> >>>>>> Hi Andrew thanks for having a look into the KIP
> >>>>>>
> >>>>>>> AS1: Besides topics, the most numerous resources in Kafka clusters
> in
> >>>>> my experience
> >>>>>>> are consumer groups. Would it be possible to extend the KIP to
> cover
> >>>>> ListGroups while
> >>>>>>> you’re in here? I’ve heard of clusters with truly vast numbers of
> >>>>> groups. This is also
> >>>>>>> potentially a sign of a misbehaving or poorly written clients.
> >> Getting
> >>>>> a page of groups
> >>>>>>> with a massive ItemsLeftToFetch would be nice.
> >>>>>> Yes, I also had few experiences with large cluster where to list
> >>>>> consumer groups can take up to 5min. I update the KIP to include this
> >> as
> >>>>> well.
> >>>>>>
> >>>>>>> AS2: A tiny nit: The versions for the added fields are incorrect in
> >>>>> some cases.
> >>>>>> I believe I fixed all of them now
> >>>>>>
> >>>>>>> AS3: I don’t quite understand the cursor for
> >>>>> OffsetFetchRequest/Response.
> >>>>>>> It looks like the cursor is (topic, partition), but not group ID.
> >> Does
> >>>>> the cursor
> >>>>>>> apply to all groups in the request, or is group ID missing?
> >>>>>>
> >>>>>> I was thinking that the last one in the response will be the one
> that
> >>>>> has the cursor while the rest will have null. But if we are moving
> >>>>> NextCursour to the top level of the response then the cursor will
> need
> >>>>> groupID.
> >>>>>>> AS4: For the remaining request/response pairs, the cursor makes
> sense
> >>>>> to me,
> >>>>>>> but I do wonder whether `NextCursor` should be at the top level of
> >> the
> >>>>> responses
> >>>>>>> instead, like DescribeTopicPartitionsResponse.
> >>>>>>
> >>>>>> Updates the KIP to reflect this now.
> >>>>>>
> >>>>>> Let me know if you have any more feedback on this.
> >>>>>>
> >>>>>> Best
> >>>>>> Omnia
> >>>>>>
> >>>>>>> On 27 Jun 2024, at 17:53, Andrew Schofield <
> >> andrew_schofi...@live.com>
> >>>>> wrote:
> >>>>>>>
> >>>>>>> Hi Omnia,
> >>>>>>> Thanks for the KIP. This is a really nice improvement for
> >> administering
> >>>>> large clusters.
> >>>>>>>
> >>>>>>> AS1: Besides topics, the most numerous resources in Kafka clusters
> in
> >>>>> my experience
> >>>>>>> are consumer groups. Would it be possible to extend the KIP to
> cover
> >>>>> ListGroups while
> >>>>>>> you’re in here? I’ve heard of clusters with truly vast numbers of
> >>>>> groups. This is also
> >>>>>>> potentially a sign of a misbehaving or poorly written clients.
> >> Getting
> >>>>> a page of groups
> >>>>>>> with a massive ItemsLeftToFetch would be nice.
> >>>>>>>
> >>>>>>> AS2: A tiny nit: The versions for the added fields are incorrect in
> >>>>> some cases.
> >>>>>>>
> >>>>>>> AS3: I don’t quite understand the cursor for
> >>>>> OffsetFetchRequest/Response.
> >>>>>>> It looks like the cursor is (topic, partition), but not group ID.
> >> Does
> >>>>> the cursor
> >>>>>>> apply to all groups in the request, or is group ID missing?
> >>>>>>>
> >>>>>>> AS4: For the remaining request/response pairs, the cursor makes
> sense
> >>>>> to me,
> >>>>>>> but I do wonder whether `NextCursor` should be at the top level of
> >> the
> >>>>> responses
> >>>>>>> instead, like DescribeTopicPartitionsResponse.
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Andrew
> >>>>>>>
> >>>>>>>> On 27 Jun 2024, at 14:05, Omnia Ibrahim <o.g.h.ibra...@gmail.com>
> >>>>> wrote:
> >>>>>>>>
> >>>>>>>> Hi everyone, I would like to start a discussion thread for
> KIP-1062
> >>>>>>>>
> >>>>>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1062%3A+Introduce+Pagination+for+some+requests+used+by+Admin+API
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Thanks
> >>>>>>>> Omnia
> >>>>>
> >>>>>
> >>>>>
> >>>
> >>
> >>
> >
> > --
> > David Arthur
>
>

-- 
David Arthur

Reply via email to