Re: [DISCUSS] KIP-1062: Introduce Pagination for some requests used by Admin API

Omnia Ibrahim Mon, 22 Jul 2024 06:24:13 -0700

Hi David, thanks for the feedback and sorry for taking long to respond as I was 
off for a week. 
> DA1: In "Public Interfaces" you say "max.request.pagination.size.limit"
> controls the max items to return by default. It's not clear to me if this
> is just a default, or if it is a hard limit. In KIP-966, this config serves
> as a hard limit to prevent misconfigured or malicious clients from
> requesting too many resources. Can you clarify this bit?


`max.request.partition.size.limit` will be used in same way as KIP-966 I just 
meant `max.request.partition.size.limit` will equal 
`max.request.pagination.size.limit` by default unless it is specified 
otherwise. I clarified this in the KIP now 

> DA2: Is "ItemsLeftToFetch" accounting for authorization? If not, it could
> be considered a minor info leak.

This is a good point. Any of the requests still will count to what ACLs and 
resources the authorised user is used by the client, the pagination will not 
effect this.
In cases where the client is using user with wild ACLs I am assuming this is 
okay and they have the right to see this info. 
However am rethinking this now as it might not be that useful and we can just 
relay on if the there is a next cursor or not to simplify the approach similar 
to KIP-966. I have updated the KIP to reflect this. 

> DA3: By splitting up results into pages, we are introducing the possibility
> of inconsistency into these RPCs. For example, today MetadataRequest
> returns results from the same MetadataImage, so the response is consistent.
> With the paging approach, it is possible (likely even) that different
> requests will be served from different MetadataImage-s, leading to
> inconsistencies. This can be even worse if paged requests go to different
> brokers that may be lagging behind in metadata propagation. BTW this issue
> exists for KIP-966 as well. We don't necessarily need to solve this right
> away, but I think it's worth mentioning in the KIP.

I added a limitation section to the KIP to mention this. I also mentioned it in 
the top section of public interfaces.

> DA4: Have we considered some generic throttling for paged requests? I
> expect it might be an issue if clients want to get everything and just page
> through all of the results as quickly as possible.
I didn’t consider throttling for pagination requests as 
 Right now the only throttling AdminClient knows is throttling 
TopicCreate/Delete which is different than pagination and might need it is own 
conversation and KIP. 
For example in the case of throttling and retries > timeouts, should consider 
send back what we fetched so far and allow the operator to set the cursor next 
time. If this is the case then we need to include cursor to all the Option 
classes to these requests. Also Admin API for DescribeTopicPartitionRequest in 
KIP-966 don’t provide Cursor as part of DescribeTopicsOptions. 
Also extending `controllerMutation` or should we separate the paging throttling 
to its own quota 
The only requests I think might actively scraped are `OffsetFetchRequest`, 
`ListGroupsRequest`, `DescribeGroupsRequest` and `ConsumerGroupDescribeRequest` 
to actively provide lag metrics/dashboards to consumers. So there might be too 
many pages.
The rest of the requests mostly used during maintenance of the cluster or 
incidents (specially the producer/txn requests) and operator of the cluster 
need them to take a decision. The pagination just provides them with a way to 
escape the timeout problem with large clusters. So am not sure adding 
throttling during such time would be wise.
Am happy to add section for throttling in this KIP if it is high concern or 
open a followup KIP for this once we already have the pagination in place. 
Which one do you suggest?

Thanks 
Omnia

> On 12 Jul 2024, at 14:56, David Arthur <mum...@gmail.com> wrote:
> 
> Hey Omnia, thanks for the KIP! I think this will be a really nice
> improvement for operators.
> 
> DA1: In "Public Interfaces" you say "max.request.pagination.size.limit"
> controls the max items to return by default. It's not clear to me if this
> is just a default, or if it is a hard limit. In KIP-966, this config serves
> as a hard limit to prevent misconfigured or malicious clients from
> requesting too many resources. Can you clarify this bit?
> 
> DA2: Is "ItemsLeftToFetch" accounting for authorization? If not, it could
> be considered a minor info leak.
> 
> DA3: By splitting up results into pages, we are introducing the possibility
> of inconsistency into these RPCs. For example, today MetadataRequest
> returns results from the same MetadataImage, so the response is consistent.
> With the paging approach, it is possible (likely even) that different
> requests will be served from different MetadataImage-s, leading to
> inconsistencies. This can be even worse if paged requests go to different
> brokers that may be lagging behind in metadata propagation. BTW this issue
> exists for KIP-966 as well. We don't necessarily need to solve this right
> away, but I think it's worth mentioning in the KIP.
> 
> DA4: Have we considered some generic throttling for paged requests? I
> expect it might be an issue if clients want to get everything and just page
> through all of the results as quickly as possible.
> 
> Cheers,
> David
> 
> 
> On Tue, Jul 9, 2024 at 8:20 AM Omnia Ibrahim <o.g.h.ibra...@gmail.com>
> wrote:
> 
>> Hey,
>> I will open a voting thread by end of this week if there is no more
>> feedbacks.
>> 
>> Thanks
>> Omnia
>> 
>>> On 2 Jul 2024, at 17:25, Omnia Ibrahim <o.g.h.ibra...@gmail.com> wrote:
>>> 
>>> Hi David, Thanks for having a look into this!
>>> 
>>>> DJ1: Why do we want to migrate only a subset of the APIs vs migrating
>> all
>>>> of them? For instance, there are DescribeGroups, ConsumerGroupDescribe,
>>>> etc. Do we have reasons not to migrate them too? I think that it would
>> be
>>>> great to have a KIP that establishes the pattern for all the admin APIs.
>>> 
>>> No particular reason I only was trying to focus on the requests that
>> cause the most pain ( as far as I know with Kafka). I don’t mind expand it
>> to other requests in AdminClient as well.
>>> The other thing I was wondering about if
>> max.request.partition.size.limit still make sense or should it be renamed
>> (or have another one called max.request.pagingation.size.limit) if we going
>> to expand to cover all Admin requests.  I am leaning toward
>> max.request.pagingation.size.limit and max.request.partition.size.limit can
>> be set for now to max.request.partition.size.limit by default.
>>> 
>>> I have updated the KIP now to cover more requests please have a second
>> look and let me know if I missed any?
>>> 
>>>> DJ2: I am not a fan of all the new parameters passed to the tools
>>>> (e.g. --partition-size-limit-per-response). I wonder if we could rather
>>>> have a config in the admin client to set the default page size for all
>>>> requests.
>>> 
>>> If it going to be pattern to all requests I agree it should be an admin
>> config then.
>>> Updated the KIP with this now.
>>> 
>>>> DJ3: I assume that the Admin client will transparently handle the
>> change.
>>>> It would be great to call it out in the compatibility section.
>>> 
>>> Updated, Old clients will still have the old behaviour of no pagingation
>> while all new versions will support this.
>>> 
>>> Thanks
>>> Omnia
>>> 
>>>> On 2 Jul 2024, at 10:43, David Jacot <dja...@confluent.io.INVALID>
>> wrote:
>>>> 
>>>> Hi Omnia,
>>>> 
>>>> Thanks for the KIP. I agree that we should migrate admin APIs to the new
>>>> pattern.
>>>> 
>>>> DJ1: Why do we want to migrate only a subset of the APIs vs migrating
>> all
>>>> of them? For instance, there are DescribeGroups, ConsumerGroupDescribe,
>>>> etc. Do we have reasons not to migrate them too? I think that it would
>> be
>>>> great to have a KIP that establishes the pattern for all the admin APIs.
>>>> 
>>>> DJ2: I am not a fan of all the new parameters passed to the tools
>>>> (e.g. --partition-size-limit-per-response). I wonder if we could rather
>>>> have a config in the admin client to set the default page size for all
>>>> requests.
>>>> 
>>>> DJ3: I assume that the Admin client will transparently handle the
>> change.
>>>> It would be great to call it out in the compatibility section.
>>>> 
>>>> Thanks,
>>>> David
>>>> 
>>>> On Tue, Jul 2, 2024 at 11:17 AM Andrew Schofield <
>> andrew_schofi...@live.com>
>>>> wrote:
>>>> 
>>>>> Hi,
>>>>> Thanks for the response. Makes sense to me. Just one additional
>> comment:
>>>>> 
>>>>> AS5: The cursor for ListGroupsResponse is called `TransactionalCursor`
>>>>> which
>>>>> seems like a copy-paste mistake.
>>>>> 
>>>>> Thanks,
>>>>> Andrew
>>>>> 
>>>>>> On 30 Jun 2024, at 22:28, Omnia Ibrahim <o.g.h.ibra...@gmail.com>
>> wrote:
>>>>>> 
>>>>>> Hi Andrew thanks for having a look into the KIP
>>>>>> 
>>>>>>> AS1: Besides topics, the most numerous resources in Kafka clusters in
>>>>> my experience
>>>>>>> are consumer groups. Would it be possible to extend the KIP to cover
>>>>> ListGroups while
>>>>>>> you’re in here? I’ve heard of clusters with truly vast numbers of
>>>>> groups. This is also
>>>>>>> potentially a sign of a misbehaving or poorly written clients.
>> Getting
>>>>> a page of groups
>>>>>>> with a massive ItemsLeftToFetch would be nice.
>>>>>> Yes, I also had few experiences with large cluster where to list
>>>>> consumer groups can take up to 5min. I update the KIP to include this
>> as
>>>>> well.
>>>>>> 
>>>>>>> AS2: A tiny nit: The versions for the added fields are incorrect in
>>>>> some cases.
>>>>>> I believe I fixed all of them now
>>>>>> 
>>>>>>> AS3: I don’t quite understand the cursor for
>>>>> OffsetFetchRequest/Response.
>>>>>>> It looks like the cursor is (topic, partition), but not group ID.
>> Does
>>>>> the cursor
>>>>>>> apply to all groups in the request, or is group ID missing?
>>>>>> 
>>>>>> I was thinking that the last one in the response will be the one that
>>>>> has the cursor while the rest will have null. But if we are moving
>>>>> NextCursour to the top level of the response then the cursor will need
>>>>> groupID.
>>>>>>> AS4: For the remaining request/response pairs, the cursor makes sense
>>>>> to me,
>>>>>>> but I do wonder whether `NextCursor` should be at the top level of
>> the
>>>>> responses
>>>>>>> instead, like DescribeTopicPartitionsResponse.
>>>>>> 
>>>>>> Updates the KIP to reflect this now.
>>>>>> 
>>>>>> Let me know if you have any more feedback on this.
>>>>>> 
>>>>>> Best
>>>>>> Omnia
>>>>>> 
>>>>>>> On 27 Jun 2024, at 17:53, Andrew Schofield <
>> andrew_schofi...@live.com>
>>>>> wrote:
>>>>>>> 
>>>>>>> Hi Omnia,
>>>>>>> Thanks for the KIP. This is a really nice improvement for
>> administering
>>>>> large clusters.
>>>>>>> 
>>>>>>> AS1: Besides topics, the most numerous resources in Kafka clusters in
>>>>> my experience
>>>>>>> are consumer groups. Would it be possible to extend the KIP to cover
>>>>> ListGroups while
>>>>>>> you’re in here? I’ve heard of clusters with truly vast numbers of
>>>>> groups. This is also
>>>>>>> potentially a sign of a misbehaving or poorly written clients.
>> Getting
>>>>> a page of groups
>>>>>>> with a massive ItemsLeftToFetch would be nice.
>>>>>>> 
>>>>>>> AS2: A tiny nit: The versions for the added fields are incorrect in
>>>>> some cases.
>>>>>>> 
>>>>>>> AS3: I don’t quite understand the cursor for
>>>>> OffsetFetchRequest/Response.
>>>>>>> It looks like the cursor is (topic, partition), but not group ID.
>> Does
>>>>> the cursor
>>>>>>> apply to all groups in the request, or is group ID missing?
>>>>>>> 
>>>>>>> AS4: For the remaining request/response pairs, the cursor makes sense
>>>>> to me,
>>>>>>> but I do wonder whether `NextCursor` should be at the top level of
>> the
>>>>> responses
>>>>>>> instead, like DescribeTopicPartitionsResponse.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Andrew
>>>>>>> 
>>>>>>>> On 27 Jun 2024, at 14:05, Omnia Ibrahim <o.g.h.ibra...@gmail.com>
>>>>> wrote:
>>>>>>>> 
>>>>>>>> Hi everyone, I would like to start a discussion thread for KIP-1062
>>>>>>>> 
>>>>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1062%3A+Introduce+Pagination+for+some+requests+used+by+Admin+API
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Thanks
>>>>>>>> Omnia
>>>>> 
>>>>> 
>>>>> 
>>> 
>> 
>> 
> 
> -- 
> David Arthur

Re: [DISCUSS] KIP-1062: Introduce Pagination for some requests used by Admin API

Reply via email to