> does it make a difference if I declare the capability or not? I think that it does in other cases. Multi-table commits, for example, are a building block for multi-statement transactions. If a service doesn't support multi-table commits then we ideally want clients to know that ahead of time so that they don't run a big transaction and then fail because the commit is not supported.
On Tue, Jul 9, 2024 at 9:12 AM Dmitri Bourlatchkov <[email protected]> wrote: > Re: remote signing, I agree that it does not look like a server capability > that a client can / should discover. It is more like something that the > server instructs / configures the client to do. > > Cheers, > Dmitri. > > On Tue, Jul 9, 2024 at 12:05 PM Jack Ye <[email protected]> wrote: > >> I was reconciling the discussion yesterday, one point that was >> interesting to me was that we agreed the purpose of these capabilities is >> to "control client-side fallback behavior", or at least the client should >> behave differently based on these capabilities. However, this seems to be >> only needed so far for views, or more specifically, for loadView API only >> because it impacts the fallback behavior to resolve the identifier as a >> table or not. >> >> For all the other capabilities listed, and even the other endpoints in >> view, because a server can decide to implement it partially anyway and just >> document the behavior, does it make a difference if I declare the >> capability or not? The client will not stop the request, the server will >> just error out if it is not supported. Maybe the error is not in the >> expected code or message, but it is still an error. In that case, why do we >> need all these other capabilities like tables, remote-signing, etc. in the >> first place? >> >> Maybe it is too extreme of a thought, but could anyone help describe how >> the other capabilities could be used beyond potentially returning an error >> earlier? >> >> -Jack >> >> >> >> >> On Tue, Jul 9, 2024 at 8:02 AM Dmitri Bourlatchkov >> <[email protected]> wrote: >> >>> Hi Eduard, >>> >>> > I've also added the 501 error to the response of the respective >>> endpoints but worth mentioning that *HEAD* / *GET *requests must not >>> return a 501 >>> <https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/501> (this >>> implies that the server impl would e.g. return a *404* in such a case). >>> >>> My reading on the Mozilla page makes me think that it is phrased too >>> narrowly. Reading RFC 2616 [1] I believe that it does not preclude >>> responding with 501 to GET and HEAD requests. I think it means that GET and >>> HEAD methods must be supported by "general purpose" servers. The Iceberg >>> REST server is not a general purpose server for resources. So, I think it >>> should be fine to respond with 501 to unimplemented endpoints. >>> >>> Cheers, >>> Dmitri. >>> >>> [1] https://www.rfc-editor.org/rfc/rfc2616#section-5.1.1 >>> >>> On Tue, Jul 9, 2024 at 9:44 AM Eduard Tudenhöfner < >>> [email protected]> wrote: >>> >>>> Hey everyone, >>>> >>>> I watched the catalog sync recording today and updated the PR >>>> <https://github.com/apache/iceberg/pull/9940> to remove fine-grained >>>> capabilities like *register-table / table-metrics*. >>>> >>>> The current capabilities (with versioning information) in the PR are: >>>> >>>> - tables >>>> - views >>>> - remote-signing >>>> - vended-credentials >>>> - multi-table-commit >>>> >>>> For servers that only *partially* implement endpoints under a >>>> capability the spec requires the server to throw a *501 Not >>>> Implemented*. I've also added the 501 error to the response of the >>>> respective endpoints but worth mentioning that *HEAD* / *GET *requests must >>>> not return a 501 >>>> <https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/501> (this >>>> implies that the server impl would e.g. return a *404* in such a case). >>>> >>>> >>>> Regards >>>> Eduard >>>> >>>> On Thu, Jul 4, 2024 at 3:59 PM Jean-Baptiste Onofré <[email protected]> >>>> wrote: >>>> >>>>> Hi Eduard, >>>>> >>>>> It makes sense to return 501 for servers which don't implement all >>>>> endpoints. It means that the server will at least have to implement >>>>> empty endpoints if needed (that makes sense to me). >>>>> >>>>> I think we should focus on only "identified capabilities". I think >>>>> that I proposed before that the capabilities can be >>>>> overridden/provided by server implementation. Else, I'm afraid we >>>>> won't be flexible enough or always behind the implementation (if an >>>>> implementation wants to add "my-foo-cap"). >>>>> >>>>> Regards >>>>> JB >>>>> >>>>> On Thu, Jul 4, 2024 at 9:32 AM Eduard Tudenhöfner >>>>> <[email protected]> wrote: >>>>> > >>>>> > I have clarified the wording in #9940 around the requirement on >>>>> having to implement all endpoints under a particular capability. >>>>> > >>>>> > For servers that only partially implement endpoints under a >>>>> capability the spec requires the server to throw a 501 Not Implemented. >>>>> This was suggested by Jack and it seems reasonable to do that. >>>>> > >>>>> > Regarding the inclusion of table-spec / view-spec as a capability: I >>>>> think this might make sense for the next iteration of the REST spec but as >>>>> I mentioned earlier I don't see any clear benefit for the current REST >>>>> spec >>>>> as the client wouldn't do anything with that information. >>>>> > If there is a clear benefit of having this, then this can still be >>>>> added later to the current REST spec but I believe we should rather have a >>>>> few well-defined and actionable capabilities rather than too many. >>>>> > >>>>> > Eduard >>>>> > >>>>> > On Wed, Jul 3, 2024 at 5:44 AM Renjie Liu <[email protected]> >>>>> wrote: >>>>> >>> >>>>> >>> Spec is an interesting topic we did not discuss. Robert, how do >>>>> you envision this to be used? >>>>> >>> In my mind, if a new table format v3 is launched, there are 2 >>>>> approaches we can go with, taking CreateTable as an example: >>>>> >>> (1) increment the related operation version, which means that POST >>>>> /v2/{prefix}/namespaces/{ns}/tables will be created and allow creating >>>>> tables in the v3 version. >>>>> >>> (2) update the existing table metadata model to support both v2 >>>>> and v3 fields, and the server enforces the payload differently based on >>>>> the >>>>> TableMetadata.format-version field. If the server does not support v3, it >>>>> can return unsupported at that time. >>>>> >>> Either way we go, the table-spec version does not need to be a >>>>> capability. (1) seems to be cleaner, but has some overhead in provisioning >>>>> a new endpoint compared to (2). >>>>> >>> Do you see another way to do this leveraging the table-spec >>>>> version? >>>>> >> >>>>> >> >>>>> >> 2 is cleaner but maybe inconsistent with current behavior, since >>>>> /v1/tables operation supports both v1 and v3. We should only go to 2 only >>>>> when we have incompatible fields/break changes according to discussion. >>>>> >> >>>>> >> Generally I agree with adding table-spec into capabilities. For >>>>> example, we can expose this to user in api so that user could choose a >>>>> supported table format version without throwing exception. >>>>> >> >>>>> >> On Wed, Jul 3, 2024 at 12:18 AM Jack Ye <[email protected]> >>>>> wrote: >>>>> >>> >>>>> >>> Spec is an interesting topic we did not discuss. Robert, how do >>>>> you envision this to be used? >>>>> >>> >>>>> >>> In my mind, if a new table format v3 is launched, there are 2 >>>>> approaches we can go with, taking CreateTable as an example: >>>>> >>> >>>>> >>> (1) increment the related operation version, which means that POST >>>>> /v2/{prefix}/namespaces/{ns}/tables will be created and allow creating >>>>> tables in the v3 version. >>>>> >>> >>>>> >>> (2) update the existing table metadata model to support both v2 >>>>> and v3 fields, and the server enforces the payload differently based on >>>>> the >>>>> TableMetadata.format-version field. If the server does not support v3, it >>>>> can return unsupported at that time. >>>>> >>> >>>>> >>> Either way we go, the table-spec version does not need to be a >>>>> capability. (1) seems to be cleaner, but has some overhead in provisioning >>>>> a new endpoint compared to (2). >>>>> >>> >>>>> >>> Do you see another way to do this leveraging the table-spec >>>>> version? >>>>> >>> >>>>> >>> -Jack >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> On Tue, Jul 2, 2024 at 6:03 AM Eduard Tudenhöfner >>>>> <[email protected]> wrote: >>>>> >>>> >>>>> >>>> >>>>> >>>> I couldn't make it to the catalog sync meeting yesterday but I >>>>> watched the recording today (thanks for providing that). >>>>> >>>> >>>>> >>>>> The missing piece is how (new, capabilities-aware) clients >>>>> handle the case when a service does _not_ return the capabilities field >>>>> (absent). My proposal would be that a client should in this case assume >>>>> that all _currently_ existing capabilities are supported. >>>>> >>>>> >>>>> >>>>> - tables: [1] >>>>> >>>>> - views: [1] >>>>> >>>>> - remote-signing: [1] >>>>> >>>>> - multi-table-commit: [1] >>>>> >>>>> - register-table: [1] >>>>> >>>>> - table-metrics: [1] >>>>> >>>>> - table-spec: [1,2] >>>>> >>>>> - view-spec: [1,2] >>>>> >>>>> >>>>> >>>>> >>>>> >>>> The one thing I would like to add here is that the current PR >>>>> uses the tables capability (as version 1) as the default when a server >>>>> doesn't return capabilities but it might be also ok to include views (as >>>>> version 1) because the current client impl has some code to deal with >>>>> errors in case endpoints don't exist. >>>>> >>>> >>>>> >>>> Unless we agree that the currently existing functionality in the >>>>> REST spec is the default behavior to be assumed for older server, I'm not >>>>> sure about including remote-signing / multi-table-commit / register-table >>>>> / >>>>> table-metrics as it has been indicated in earlier comments on the PR/ML >>>>> that not every REST server supports these. >>>>> >>>> >>>>> >>>> That being said, we should discuss whether we want the default >>>>> behavior (when an older server doesn't send back capabilities) to be >>>>> >>>> a) tables (version 1) only >>>>> >>>> b) the currently existing functionality as defined in the REST >>>>> spec (as version 1) >>>>> >>>> >>>>> >>>> >>>>> >>>> On another note: Including table-spec / view-spec seems to be >>>>> more informative in its nature as I don't think a client would act >>>>> differently right now when seeing these. >>>>> >>>> >>>>> >>>> Thanks >>>>> >>>> Eduard >>>>> >>>> >>>>> >>>> -- Ryan Blue Databricks
