I believe this is already merged? https://github.com/apache/iceberg/pull/9782
Best, Jack Ye On Sat, May 18, 2024 at 4:06 PM Pucheng Yang <py...@pinterest.com.invalid> wrote: > Hi all, is there an ETA for this? thanks > > On Wed, Dec 20, 2023 at 6:03 PM Renjie Liu <liurenjie2...@gmail.com> > wrote: > >> I think if servers provide a meaningful error message on expiration >>> hopefully, this would be a good first step in debugging. I think saying >>> tokens should generally support O(Minutes) at least should cover most >>> use-cases? >>> >> >> Sounds reasonable to me. Clients just need to be aware that the token is >> for transient usage and should not store it for too long. >> >> On Thu, Dec 21, 2023 at 8:43 AM Micah Kornfield <emkornfi...@gmail.com> >> wrote: >> >>> Overall, I don't think it's a good idea to add parallel listing for >>>> things like tables and namespaces as it just adds complexity for an >>>> incredibly narrow (and possibly poorly designed) use case. >>> >>> >>> +1 I think that there are likely a few ways parallelization of table and >>> namespace listing can be incorporated in the future into the API if >>> necessary. >>> >>> I think the one place where parallelization is important immediately is >>> for Planning, but that is already a separate thread. Apologies if I >>> forked the conversation too far from that. >>> >>> On Wed, Dec 20, 2023 at 4:06 PM Daniel Weeks <dwe...@apache.org> wrote: >>> >>>> Overall, I don't think it's a good idea to add parallel listing for >>>> things like tables and namespaces as it just adds complexity for an >>>> incredibly narrow (and possibly poorly designed) use case. >>>> >>>> I feel we should leave it up to the server to define whether it will >>>> provide consistency across paginated listing and avoid >>>> bleeding time-travel like concepts (like 'asOf') into the API. I really >>>> just don't see what practical value it provides as there are no explicit or >>>> consistently held guarantees around these operations. >>>> >>>> I'd agree with Micah's argument that if the server does provide >>>> stronger guarantees, it should manage those via the opaque token and >>>> respond with meaningful errors if it cannot satisfy the internal >>>> constraints it imposes (like timeouts). >>>> >>>> It would help to have articulable use cases to really invest in more >>>> complexity in this area and I feel like we're drifting a little into the >>>> speculative at this point. >>>> >>>> -Dan >>>> >>>> >>>> >>>> On Wed, Dec 20, 2023 at 3:27 PM Micah Kornfield <emkornfi...@gmail.com> >>>> wrote: >>>> >>>>> I agree that this is not quite useful for clients at this moment. But >>>>>> I'm thinking that maybe exposing this will help debugging or diagnosing, >>>>>> user just need to be aware of this potential expiration. >>>>> >>>>> >>>>> I think if servers provide a meaningful error message on expiration >>>>> hopefully, this would be a good first step in debugging. I think saying >>>>> tokens should generally support O(Minutes) at least should cover most >>>>> use-cases? >>>>> >>>>> On Tue, Dec 19, 2023 at 9:18 PM Renjie Liu <liurenjie2...@gmail.com> >>>>> wrote: >>>>> >>>>>> If we choose to manage state on the server side, I recommend not >>>>>>> revealing the expiration time to the client, at least not for now. We >>>>>>> can >>>>>>> introduce it when there's a practical need. It wouldn't constitute a >>>>>>> breaking change, would it? >>>>>> >>>>>> >>>>>> I agree that this is not quite useful for clients at this moment. But >>>>>> I'm thinking that maybe exposing this will help debugging or diagnosing, >>>>>> user just need to be aware of this potential expiration. >>>>>> >>>>>> On Wed, Dec 20, 2023 at 11:09 AM Xuanwo <xua...@apache.org> wrote: >>>>>> >>>>>>> > For the continuation token, I think one missing part is about the >>>>>>> expiration time of this token, since this may affect the state cleaning >>>>>>> process of the server. >>>>>>> >>>>>>> Some storage services use a continuation token as a binary >>>>>>> representation of internal states. For example, they serialize a >>>>>>> structure >>>>>>> into binary and then perform base64 encoding. Services don't need to >>>>>>> maintain state, eliminating the need for state cleaning. >>>>>>> >>>>>>> > Do servers need to expose the expiration time to clients? >>>>>>> >>>>>>> If we choose to manage state on the server side, I recommend not >>>>>>> revealing the expiration time to the client, at least not for now. We >>>>>>> can >>>>>>> introduce it when there's a practical need. It wouldn't constitute a >>>>>>> breaking change, would it? >>>>>>> >>>>>>> On Wed, Dec 20, 2023, at 10:57, Renjie Liu wrote: >>>>>>> >>>>>>> For the continuation token, I think one missing part is about the >>>>>>> expiration time of this token, since this may affect the state >>>>>>> cleaning process of the server. There are several things to discuss: >>>>>>> >>>>>>> 1. Should we leave it to the server to decide it or allow the client >>>>>>> to config in api? >>>>>>> >>>>>>> Personally I think it would be enough for the server to determine it >>>>>>> for now, since I don't see any usage to allow clients to set the >>>>>>> expiration >>>>>>> time in api. >>>>>>> >>>>>>> 2. Do servers need to expose the expiration time to clients? >>>>>>> >>>>>>> Personally I think it would be enough to expose this through the >>>>>>> getConfig api to let users know this. For now there is no requirement >>>>>>> for >>>>>>> per request expiration time. >>>>>>> >>>>>>> On Wed, Dec 20, 2023 at 2:49 AM Micah Kornfield < >>>>>>> emkornfi...@gmail.com> wrote: >>>>>>> >>>>>>> IMO, parallelization needs to be a first class entity in the end >>>>>>> point/service design to allow for flexibility (I scanned through the >>>>>>> original proposal for the scan planning and it looked like it was on the >>>>>>> right track). Using offsets for parallelization is problematic from >>>>>>> both a >>>>>>> consistency and scalability perspective if you want to allow for >>>>>>> flexibility in implementation. >>>>>>> >>>>>>> In particular, I think the server needs an APIs like: >>>>>>> >>>>>>> DoScan - returns a list of partitions (represented by an opaque >>>>>>> entity). The list of partitions should support pagination (in an ideal >>>>>>> world, it would be streaming). >>>>>>> GetTasksForPartition - Returns scan tasks for a partition (should >>>>>>> also be paginated/streaming, but this is up for debate). I think it is >>>>>>> an >>>>>>> important consideration to allow for empty partitions. >>>>>>> >>>>>>> With this implementation you don't necessarily require separate >>>>>>> server side state (objects in GCS should be sufficient), I think as Ryan >>>>>>> suggested, one implementation could be to have each partition >>>>>>> correspond to >>>>>>> a byte-range in a manifest file for returning the tasks. >>>>>>> >>>>>>> Thanks, >>>>>>> Micah >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, Dec 19, 2023 at 9:55 AM Walaa Eldin Moustafa < >>>>>>> wa.moust...@gmail.com> wrote: >>>>>>> >>>>>>> Not necessarily. That is more of a general statement. The pagination >>>>>>> discussion forked from server side scan planning. >>>>>>> >>>>>>> On Tue, Dec 19, 2023 at 9:52 AM Ryan Blue <b...@tabular.io> wrote: >>>>>>> >>>>>>> > With start/limit each client can query for own's chunk without >>>>>>> coordination. >>>>>>> >>>>>>> Okay, I understand now. Would you need to parallelize the client for >>>>>>> listing namespaces or tables? That seems odd to me. >>>>>>> >>>>>>> On Tue, Dec 19, 2023 at 9:48 AM Walaa Eldin Moustafa < >>>>>>> wa.moust...@gmail.com> wrote: >>>>>>> >>>>>>> > You can parallelize with opaque tokens by sending a starting point >>>>>>> for the next request. >>>>>>> >>>>>>> I meant we would have to wait for the server to return this starting >>>>>>> point from the past request? With start/limit each client can query for >>>>>>> own's chunk without coordination. >>>>>>> >>>>>>> >>>>>>> On Tue, Dec 19, 2023 at 9:44 AM Ryan Blue <b...@tabular.io> wrote: >>>>>>> >>>>>>> > I think start and offset has the advantage of being parallelizable >>>>>>> (as compared to continuation tokens). >>>>>>> >>>>>>> You can parallelize with opaque tokens by sending a starting point >>>>>>> for the next request. >>>>>>> >>>>>>> > On the other hand, using "asOf" can be complex to implement and >>>>>>> may be too powerful for the pagination use case >>>>>>> >>>>>>> I don't think that we want to add `asOf`. If the service chooses to >>>>>>> do this, it would send a continuation token that has the >>>>>>> information embedded. >>>>>>> >>>>>>> On Tue, Dec 19, 2023 at 9:42 AM Walaa Eldin Moustafa < >>>>>>> wa.moust...@gmail.com> wrote: >>>>>>> >>>>>>> Can we assume it is the responsibility of the server to ensure >>>>>>> determinism (e.g., by caching the results along with query ID)? I think >>>>>>> start and offset has the advantage of being parallelizable (as compared >>>>>>> to >>>>>>> continuation tokens). On the other hand, using "asOf" can be complex to >>>>>>> implement and may be too powerful for the pagination use case (because >>>>>>> it >>>>>>> allows to query the warehouse as of any point of time, not just now). >>>>>>> >>>>>>> Thanks, >>>>>>> Walaa. >>>>>>> >>>>>>> On Tue, Dec 19, 2023 at 9:40 AM Ryan Blue <b...@tabular.io> wrote: >>>>>>> >>>>>>> I think you can solve the atomicity problem with a >>>>>>> continuation token and server-side state. In general, I don't think >>>>>>> this is >>>>>>> a problem we should worry about a lot since pagination commonly has this >>>>>>> problem. But since we can build a system that allows you to solve it if >>>>>>> you >>>>>>> choose to, we should go with that design. >>>>>>> >>>>>>> On Tue, Dec 19, 2023 at 9:13 AM Micah Kornfield < >>>>>>> emkornfi...@gmail.com> wrote: >>>>>>> >>>>>>> Hi Jack, >>>>>>> Some answers inline. >>>>>>> >>>>>>> >>>>>>> In addition to the start index approach, another potential simple >>>>>>> way to implement the continuation token is to use the last item name, >>>>>>> when >>>>>>> the listing is guaranteed to be in lexicographic order. >>>>>>> >>>>>>> >>>>>>> I think this is one viable implementation, but the reason that the >>>>>>> token should be opaque is that it allows several different >>>>>>> implementations >>>>>>> without client side changes. >>>>>>> >>>>>>> For example, if an element is added before the continuation token, >>>>>>> then all future listing calls with the token would always skip that >>>>>>> element. >>>>>>> >>>>>>> >>>>>>> IMO, I think this is fine, for some of the REST APIs it is likely >>>>>>> important to put constraints on atomicity requirements, for others (e.g. >>>>>>> list namespaces) I think it is OK to have looser requirements. >>>>>>> >>>>>>> If we want to enforce that level of atomicity, we probably want to >>>>>>> introduce another time travel query parameter (e.g. asOf=1703003028000) >>>>>>> to >>>>>>> ensure that we are listing results at a specific point of time of the >>>>>>> warehouse, so the complete result list is fixed. >>>>>>> >>>>>>> >>>>>>> Time travel might be useful in some cases but I think it is >>>>>>> orthogonal to services wishing to have guarantees around >>>>>>> atomicity/consistency of results. If a server wants to ensure that >>>>>>> results >>>>>>> are atomic/consistent as of the start of the listing, it can embed the >>>>>>> necessary timestamp in the token it returns and parse it out when >>>>>>> fetching >>>>>>> the next result. >>>>>>> >>>>>>> I think this does raise a more general point around service >>>>>>> definition evolution in general. I think there likely need to be >>>>>>> metadata >>>>>>> endpoints that expose either: >>>>>>> 1. A version of the REST API supported. >>>>>>> 2. Features the API supports (e.g. which query parameters are >>>>>>> honored for a specific endpoint). >>>>>>> >>>>>>> There are pros and cons to both approaches (apologies if I missed >>>>>>> this in the spec or if it has already been discussed). >>>>>>> >>>>>>> Cheers, >>>>>>> Micah >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, Dec 19, 2023 at 8:25 AM Jack Ye <yezhao...@gmail.com> wrote: >>>>>>> >>>>>>> Yes I agree that it is better to not enforce the implementation to >>>>>>> favor any direction, and continuation token is probably better than >>>>>>> enforcing a numeric start index. >>>>>>> >>>>>>> In addition to the start index approach, another potential simple >>>>>>> way to implement the continuation token is to use the last item name, >>>>>>> when >>>>>>> the listing is guaranteed to be in lexicographic order. Compared to the >>>>>>> start index approach, it does not need to worry about the change of >>>>>>> start >>>>>>> index when something in the list is added or removed. >>>>>>> >>>>>>> However, the issue of concurrent modification could still exist even >>>>>>> with a continuation token. For example, if an element is added before >>>>>>> the >>>>>>> continuation token, then all future listing calls with the token would >>>>>>> always skip that element. If we want to enforce that level of >>>>>>> atomicity, we >>>>>>> probably want to introduce another time travel query parameter (e.g. >>>>>>> asOf=1703003028000) to ensure that we are listing results at a specific >>>>>>> point of time of the warehouse, so the complete result list is fixed. >>>>>>> (This >>>>>>> is also the missing piece I forgot to mention in the start index >>>>>>> approach >>>>>>> to ensure it works in distributed settings) >>>>>>> >>>>>>> -Jack >>>>>>> >>>>>>> On Tue, Dec 19, 2023, 9:51 AM Micah Kornfield <emkornfi...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>> I tried to cover these in more details at: >>>>>>> https://docs.google.com/document/d/1bbfoLssY1szCO_Hm3_93ZcN0UAMpf7kjmpwHQngqQJ0/edit >>>>>>> >>>>>>> On Sun, Dec 17, 2023 at 6:07 PM Renjie Liu <liurenjie2...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>> +1 for this approach. I agree that the streaming approach requires >>>>>>> that http client and servers have http 2 streaming support, which is not >>>>>>> compatible with old clients. >>>>>>> >>>>>>> I share the same concern with Micah that only start/limit may not be >>>>>>> enough in a distributed environment where modification happens during >>>>>>> iterations. For compatibility, we need to consider several cases: >>>>>>> >>>>>>> 1. Old client <-> New Server >>>>>>> 2. New client <-> Old server >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Sat, Dec 16, 2023 at 6:51 AM Daniel Weeks <dwe...@apache.org> >>>>>>> wrote: >>>>>>> >>>>>>> I agree that we want to include this feature and I raised similar >>>>>>> concerns to what Micah already presented in talking with Ryan. >>>>>>> >>>>>>> For backward compatibility, just adding a start and limit implies a >>>>>>> deterministic order, which is not a current requirement of the REST >>>>>>> spec. >>>>>>> >>>>>>> Also, we need to consider whether the start/limit would need to be >>>>>>> respected by the server. If existing implementations simply return all >>>>>>> the >>>>>>> results, will that be sufficient? There are a few edge cases that need >>>>>>> to >>>>>>> be considered here. >>>>>>> >>>>>>> For the opaque key approach, I think adding a query param to >>>>>>> trigger/continue and introducing a continuation token in >>>>>>> the ListNamespacesResponse might allow for more backward compatibility. >>>>>>> In >>>>>>> that scenario, pagination would only take place for clients who know >>>>>>> how to >>>>>>> paginate and the ordering would not need to be deterministic. >>>>>>> >>>>>>> -Dan >>>>>>> >>>>>>> On Fri, Dec 15, 2023, 10:33 AM Micah Kornfield < >>>>>>> emkornfi...@gmail.com> wrote: >>>>>>> >>>>>>> Just to clarify and add a small suggestion: >>>>>>> >>>>>>> The behavior with no additional parameters requires the operations >>>>>>> to happen as they do today for backwards compatibility (i.e either all >>>>>>> responses are returned or a failure occurs). >>>>>>> >>>>>>> For new parameters, I'd suggest an opaque start token (instead of >>>>>>> specific numeric offset) that can be returned by the service and a limit >>>>>>> (as proposed above). If a start token is provided without a limit a >>>>>>> default limit can be chosen by the server. Servers might return less >>>>>>> than >>>>>>> limit (i.e. clients are required to check for a next token to determine >>>>>>> if >>>>>>> iteration is complete). This enables server side state if it is desired >>>>>>> but also makes deterministic listing much more feasible (deterministic >>>>>>> responses are essentially impossible in the face of changing data if >>>>>>> only a >>>>>>> start offset is provided). >>>>>>> >>>>>>> In an ideal world, specifying a limit would result in streaming >>>>>>> responses being returned with the last part either containing a token if >>>>>>> continuation is necessary. Given conversation on the other thread of >>>>>>> streaming, I'd imagine this is quite hard to model in an Open API REST >>>>>>> service. >>>>>>> >>>>>>> Therefore it seems like using pagination with token and offset would >>>>>>> be preferred. If skipping someplace in the middle of the namespaces is >>>>>>> required then I would suggest modelling those as first class query >>>>>>> parameters (e.g. "startAfterNamespace") >>>>>>> >>>>>>> Cheers, >>>>>>> Micah >>>>>>> >>>>>>> >>>>>>> On Fri, Dec 15, 2023 at 10:08 AM Ryan Blue <b...@tabular.io> wrote: >>>>>>> >>>>>>> +1 for this approach >>>>>>> >>>>>>> I think it's good to use query params because it can be >>>>>>> backward-compatible with the current behavior. If you get more than the >>>>>>> limit back, then the service probably doesn't support pagination. And >>>>>>> if a >>>>>>> client doesn't support pagination they get the same results that they >>>>>>> would >>>>>>> today. A streaming approach with a continuation link like in the scan >>>>>>> API >>>>>>> discussion wouldn't work because old clients don't know to make a second >>>>>>> request. >>>>>>> >>>>>>> On Thu, Dec 14, 2023 at 10:07 AM Jack Ye <yezhao...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>> Hi everyone, >>>>>>> >>>>>>> During the conversation of the Scan API for REST spec, we touched on >>>>>>> the topic of pagination when REST response is large or takes time to be >>>>>>> produced. >>>>>>> >>>>>>> I just want to discuss this separately, since we also see the issue >>>>>>> for ListNamespaces and ListTables/Views, when integrating with a large >>>>>>> organization that has over 100k namespaces, and also a lot of tables in >>>>>>> some namespaces. >>>>>>> >>>>>>> Pagination requires either keeping state, or the response to be >>>>>>> deterministic such that the client can request a range of the full >>>>>>> response. If we want to avoid keeping state, I think we need to allow >>>>>>> some >>>>>>> query parameters like: >>>>>>> - *start*: the start index of the item in the response >>>>>>> - *limit*: the number of items to be returned in the response >>>>>>> >>>>>>> So we can send a request like: >>>>>>> >>>>>>> *GET /namespaces?start=300&limit=100* >>>>>>> >>>>>>> *GET /namespaces/ns/tables?start=300&limit=100* >>>>>>> >>>>>>> And the REST spec should enforce that the response returned for the >>>>>>> paginated GET should be deterministic. >>>>>>> >>>>>>> Any thoughts on this? >>>>>>> >>>>>>> Best, >>>>>>> Jack Ye >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Ryan Blue >>>>>>> Tabular >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Ryan Blue >>>>>>> Tabular >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Ryan Blue >>>>>>> Tabular >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Ryan Blue >>>>>>> Tabular >>>>>>> >>>>>>> >>>>>>> Xuanwo >>>>>>> >>>>>>>