Re: [DISCUSS] Describing REST Server capabilities

Eduard Tudenhöfner Wed, 10 Jul 2024 02:48:08 -0700

Hey everyone,

I've added a few inline comments below.




> Re: remote signing, I agree that it does not look like a server capability
> that a client can / should discover. It is more like something that the
> server instructs / configures the client to do.


While a server can control this behavior and instruct the client to use
remote signing, technically nothing is preventing a client from configuring
s3.remote-signing-enabled=true. In such a case it seems more appropriate to
indicate that this capability isn't supported rather than a generic 501,
because not every server will support remote signing.

The *vended-credentials* capability on the other hand is more informative
in its nature and a server indeed configures a client. I think that was
also one of the reasons I removed this capability but added it later back
due to a comment from Jack.

I'm ok either way in terms of removing / keeping *vended-credentials* as a
capability but given that we'd want to include *actionable* capabilities at
this point, I'd just remove it (nothing is preventing us from adding it
later if necessary).


In that case, why do we need all these other capabilities like tables,
> remote-signing, etc. in the first place?


Given that capabilities also carry versioning information, clients can make
more informed decisions on which endpoints to call. One could argue that
generally throwing a 501 on everything that isn't supported might be
sufficient, but that doesn't necessarily help a client in knowing which
versions of a capability are safe to call/use.

Regarding the control of client-side fallback behavior:
I think the default fallback behavior should be *tables* (with version 1)
with a property in the REST catalog that allows configuring this to e.g.
*rest-default-capabilities=tables,views,abc,xyz* (all of them defaulting to
version 1).


Eduard


On Tue, Jul 9, 2024 at 7:00 PM Jack Ye <[email protected]> wrote:

> Yes I agree that sounds like a valid use case. So the criteria so far is
> that capabilities are used for:
> - controlling client-side fallback behavior
> - failing expensive operations early if we know it will eventually fail
> due to missing capability
>
> Do we agree if this is the criteria we should use? What about the other
> capabilities, namly tables, remote-signing, credential-vending?
>
> -Jack
>
>
> On Tue, Jul 9, 2024 at 9:27 AM Ryan Blue <[email protected]>
> wrote:
>
>> > does it make a difference if I declare the capability or not?
>>
>> I think that it does in other cases. Multi-table commits, for example,
>> are a building block for multi-statement transactions. If a service doesn't
>> support multi-table commits then we ideally want clients to know that ahead
>> of time so that they don't run a big transaction and then fail because the
>> commit is not supported.
>>
>> On Tue, Jul 9, 2024 at 9:12 AM Dmitri Bourlatchkov
>> <[email protected]> wrote:
>>
>>> Re: remote signing, I agree that it does not look like a server
>>> capability that a client can / should discover. It is more like something
>>> that the server instructs / configures the client to do.
>>>
>>> Cheers,
>>> Dmitri.
>>>
>>> On Tue, Jul 9, 2024 at 12:05 PM Jack Ye <[email protected]> wrote:
>>>
>>>> I was reconciling the discussion yesterday, one point that was
>>>> interesting to me was that we agreed the purpose of these capabilities is
>>>> to "control client-side fallback behavior", or at least the client should
>>>> behave differently based on these capabilities. However, this seems to be
>>>> only needed so far for views, or more specifically, for loadView API only
>>>> because it impacts the fallback behavior to resolve the identifier as a
>>>> table or not.
>>>>
>>>> For all the other capabilities listed, and even the other endpoints in
>>>> view, because a server can decide to implement it partially anyway and just
>>>> document the behavior, does it make a difference if I declare the
>>>> capability or not? The client will not stop the request, the server will
>>>> just error out if it is not supported. Maybe the error is not in the
>>>> expected code or message, but it is still an error. In that case, why do we
>>>> need all these other capabilities like tables, remote-signing, etc. in the
>>>> first place?
>>>>
>>>> Maybe it is too extreme of a thought, but could anyone help describe
>>>> how the other capabilities could be used beyond potentially returning an
>>>> error earlier?
>>>>
>>>> -Jack
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Jul 9, 2024 at 8:02 AM Dmitri Bourlatchkov
>>>> <[email protected]> wrote:
>>>>
>>>>> Hi Eduard,
>>>>>
>>>>> > I've also added the 501 error to the response of the respective
>>>>> endpoints but worth mentioning that *HEAD* / *GET *requests must not
>>>>> return a 501
>>>>> <https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/501> (this
>>>>> implies that the server impl would e.g. return a *404* in such a
>>>>> case).
>>>>>
>>>>> My reading on the Mozilla page makes me think that it is phrased too
>>>>> narrowly. Reading RFC 2616 [1] I believe that it does not preclude
>>>>> responding with 501 to GET and HEAD requests. I think it means that GET 
>>>>> and
>>>>> HEAD methods must be supported by "general purpose" servers. The Iceberg
>>>>> REST server is not a general purpose server for resources. So, I think it
>>>>> should be fine to respond with 501 to unimplemented endpoints.
>>>>>
>>>>> Cheers,
>>>>> Dmitri.
>>>>>
>>>>> [1] https://www.rfc-editor.org/rfc/rfc2616#section-5.1.1
>>>>>
>>>>> On Tue, Jul 9, 2024 at 9:44 AM Eduard Tudenhöfner <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Hey everyone,
>>>>>>
>>>>>> I watched the catalog sync recording today and updated the PR
>>>>>> <https://github.com/apache/iceberg/pull/9940> to remove fine-grained
>>>>>> capabilities like *register-table / table-metrics*.
>>>>>>
>>>>>> The current capabilities (with versioning information) in the PR are:
>>>>>>
>>>>>>    - tables
>>>>>>    - views
>>>>>>    - remote-signing
>>>>>>    - vended-credentials
>>>>>>    - multi-table-commit
>>>>>>
>>>>>> For servers that only *partially* implement endpoints under a
>>>>>> capability the spec requires the server to throw a *501 Not
>>>>>> Implemented*. I've also added the 501 error to the response of the
>>>>>> respective endpoints but worth mentioning that *HEAD* / *GET *requests
>>>>>> must not return a 501
>>>>>> <https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/501> (this
>>>>>> implies that the server impl would e.g. return a *404* in such a
>>>>>> case).
>>>>>>
>>>>>>
>>>>>> Regards
>>>>>> Eduard
>>>>>>
>>>>>> On Thu, Jul 4, 2024 at 3:59 PM Jean-Baptiste Onofré <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Eduard,
>>>>>>>
>>>>>>> It makes sense to return 501 for servers which don't implement all
>>>>>>> endpoints. It means that the server will at least have to implement
>>>>>>> empty endpoints if needed (that makes sense to me).
>>>>>>>
>>>>>>> I think we should focus on only "identified capabilities". I think
>>>>>>> that I proposed before that the capabilities can be
>>>>>>> overridden/provided by server implementation. Else, I'm afraid we
>>>>>>> won't be flexible enough or always behind the implementation (if an
>>>>>>> implementation wants to add "my-foo-cap").
>>>>>>>
>>>>>>> Regards
>>>>>>> JB
>>>>>>>
>>>>>>> On Thu, Jul 4, 2024 at 9:32 AM Eduard Tudenhöfner
>>>>>>> <[email protected]> wrote:
>>>>>>> >
>>>>>>> > I have clarified the wording in #9940 around the requirement on
>>>>>>> having to implement all endpoints under a particular capability.
>>>>>>> >
>>>>>>> > For servers that only partially implement endpoints under a
>>>>>>> capability the spec requires the server to throw a 501 Not Implemented.
>>>>>>> This was suggested by Jack and it seems reasonable to do that.
>>>>>>> >
>>>>>>> > Regarding the inclusion of table-spec / view-spec as a capability:
>>>>>>> I think this might make sense for the next iteration of the REST spec 
>>>>>>> but
>>>>>>> as I mentioned earlier I don't see any clear benefit for the current 
>>>>>>> REST
>>>>>>> spec as the client wouldn't do anything with that information.
>>>>>>> > If there is a clear benefit of having this, then this can still be
>>>>>>> added later to the current REST spec but I believe we should rather 
>>>>>>> have a
>>>>>>> few well-defined and actionable capabilities rather than too many.
>>>>>>> >
>>>>>>> > Eduard
>>>>>>> >
>>>>>>> > On Wed, Jul 3, 2024 at 5:44 AM Renjie Liu <[email protected]>
>>>>>>> wrote:
>>>>>>> >>>
>>>>>>> >>> Spec is an interesting topic we did not discuss. Robert, how do
>>>>>>> you envision this to be used?
>>>>>>> >>> In my mind, if a new table format v3 is launched, there are 2
>>>>>>> approaches we can go with, taking CreateTable as an example:
>>>>>>> >>> (1) increment the related operation version, which means that
>>>>>>> POST /v2/{prefix}/namespaces/{ns}/tables will be created and allow 
>>>>>>> creating
>>>>>>> tables in the v3 version.
>>>>>>> >>> (2) update the existing table metadata model to support both v2
>>>>>>> and v3 fields, and the server enforces the payload differently based on 
>>>>>>> the
>>>>>>> TableMetadata.format-version field. If the server does not support v3, 
>>>>>>> it
>>>>>>> can return unsupported at that time.
>>>>>>> >>> Either way we go, the table-spec version does not need to be a
>>>>>>> capability. (1) seems to be cleaner, but has some overhead in 
>>>>>>> provisioning
>>>>>>> a new endpoint compared to (2).
>>>>>>> >>> Do you see another way to do this leveraging the table-spec
>>>>>>> version?
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> 2 is cleaner but maybe inconsistent with current behavior, since
>>>>>>> /v1/tables operation supports both v1 and v3. We should only go to 2 
>>>>>>> only
>>>>>>> when we have incompatible fields/break changes according to discussion.
>>>>>>> >>
>>>>>>> >> Generally I agree with adding table-spec into capabilities. For
>>>>>>> example, we can expose this to user in api so that user could choose a
>>>>>>> supported table format version without throwing exception.
>>>>>>> >>
>>>>>>> >> On Wed, Jul 3, 2024 at 12:18 AM Jack Ye <[email protected]>
>>>>>>> wrote:
>>>>>>> >>>
>>>>>>> >>> Spec is an interesting topic we did not discuss. Robert, how do
>>>>>>> you envision this to be used?
>>>>>>> >>>
>>>>>>> >>> In my mind, if a new table format v3 is launched, there are 2
>>>>>>> approaches we can go with, taking CreateTable as an example:
>>>>>>> >>>
>>>>>>> >>> (1) increment the related operation version, which means that
>>>>>>> POST /v2/{prefix}/namespaces/{ns}/tables will be created and allow 
>>>>>>> creating
>>>>>>> tables in the v3 version.
>>>>>>> >>>
>>>>>>> >>> (2) update the existing table metadata model to support both v2
>>>>>>> and v3 fields, and the server enforces the payload differently based on 
>>>>>>> the
>>>>>>> TableMetadata.format-version field. If the server does not support v3, 
>>>>>>> it
>>>>>>> can return unsupported at that time.
>>>>>>> >>>
>>>>>>> >>> Either way we go, the table-spec version does not need to be a
>>>>>>> capability. (1) seems to be cleaner, but has some overhead in 
>>>>>>> provisioning
>>>>>>> a new endpoint compared to (2).
>>>>>>> >>>
>>>>>>> >>> Do you see another way to do this leveraging the table-spec
>>>>>>> version?
>>>>>>> >>>
>>>>>>> >>> -Jack
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> On Tue, Jul 2, 2024 at 6:03 AM Eduard Tudenhöfner
>>>>>>> <[email protected]> wrote:
>>>>>>> >>>>
>>>>>>> >>>>
>>>>>>> >>>> I couldn't make it to the catalog sync meeting yesterday but I
>>>>>>> watched the recording today (thanks for providing that).
>>>>>>> >>>>
>>>>>>> >>>>> The missing piece is how (new, capabilities-aware) clients
>>>>>>> handle the case when a service does _not_ return the capabilities field
>>>>>>> (absent). My proposal would be that a client should in this case assume
>>>>>>> that all _currently_ existing capabilities are supported.
>>>>>>> >>>>>
>>>>>>> >>>>> - tables: [1]
>>>>>>> >>>>> - views: [1]
>>>>>>> >>>>> - remote-signing: [1]
>>>>>>> >>>>> - multi-table-commit: [1]
>>>>>>> >>>>> - register-table: [1]
>>>>>>> >>>>> - table-metrics: [1]
>>>>>>> >>>>> - table-spec: [1,2]
>>>>>>> >>>>> - view-spec: [1,2]
>>>>>>> >>>>>
>>>>>>> >>>>>
>>>>>>> >>>> The one thing I would like to add here is that the current PR
>>>>>>> uses the tables capability (as version 1) as the default when a server
>>>>>>> doesn't return capabilities but it might be also ok to include views (as
>>>>>>> version 1) because the current client impl has some code to deal with
>>>>>>> errors in case endpoints don't exist.
>>>>>>> >>>>
>>>>>>> >>>> Unless we agree that the currently existing functionality in
>>>>>>> the REST spec is the default behavior to be assumed for older server, 
>>>>>>> I'm
>>>>>>> not sure about including remote-signing / multi-table-commit /
>>>>>>> register-table / table-metrics as it has been indicated in earlier 
>>>>>>> comments
>>>>>>> on the PR/ML that not every REST server supports these.
>>>>>>> >>>>
>>>>>>> >>>> That being said, we should discuss whether we want the default
>>>>>>> behavior (when an older server doesn't send back capabilities) to be
>>>>>>> >>>> a) tables (version 1) only
>>>>>>> >>>> b) the currently existing functionality as defined in the REST
>>>>>>> spec (as version 1)
>>>>>>> >>>>
>>>>>>> >>>>
>>>>>>> >>>> On another note: Including table-spec / view-spec seems to be
>>>>>>> more informative in its nature as I don't think a client would act
>>>>>>> differently right now when seeing these.
>>>>>>> >>>>
>>>>>>> >>>> Thanks
>>>>>>> >>>> Eduard
>>>>>>> >>>>
>>>>>>>
>>>>>>
>>
>> --
>> Ryan Blue
>> Databricks
>>
>

Re: [DISCUSS] Describing REST Server capabilities

Reply via email to