Re: [DISCUSS] Describing REST Server capabilities

Ryan Blue Tue, 09 Jul 2024 09:27:21 -0700

> does it make a difference if I declare the capability or not?

I think that it does in other cases. Multi-table commits, for example, are
a building block for multi-statement transactions. If a service doesn't
support multi-table commits then we ideally want clients to know that ahead
of time so that they don't run a big transaction and then fail because the
commit is not supported.


On Tue, Jul 9, 2024 at 9:12 AM Dmitri Bourlatchkov
<dmitri.bourlatch...@dremio.com.invalid> wrote:

> Re: remote signing, I agree that it does not look like a server capability
> that a client can / should discover. It is more like something that the
> server instructs / configures the client to do.
>
> Cheers,
> Dmitri.
>
> On Tue, Jul 9, 2024 at 12:05 PM Jack Ye <yezhao...@gmail.com> wrote:
>
>> I was reconciling the discussion yesterday, one point that was
>> interesting to me was that we agreed the purpose of these capabilities is
>> to "control client-side fallback behavior", or at least the client should
>> behave differently based on these capabilities. However, this seems to be
>> only needed so far for views, or more specifically, for loadView API only
>> because it impacts the fallback behavior to resolve the identifier as a
>> table or not.
>>
>> For all the other capabilities listed, and even the other endpoints in
>> view, because a server can decide to implement it partially anyway and just
>> document the behavior, does it make a difference if I declare the
>> capability or not? The client will not stop the request, the server will
>> just error out if it is not supported. Maybe the error is not in the
>> expected code or message, but it is still an error. In that case, why do we
>> need all these other capabilities like tables, remote-signing, etc. in the
>> first place?
>>
>> Maybe it is too extreme of a thought, but could anyone help describe how
>> the other capabilities could be used beyond potentially returning an error
>> earlier?
>>
>> -Jack
>>
>>
>>
>>
>> On Tue, Jul 9, 2024 at 8:02 AM Dmitri Bourlatchkov
>> <dmitri.bourlatch...@dremio.com.invalid> wrote:
>>
>>> Hi Eduard,
>>>
>>> > I've also added the 501 error to the response of the respective
>>> endpoints but worth mentioning that *HEAD* / *GET *requests must not
>>> return a 501
>>> <https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/501> (this
>>> implies that the server impl would e.g. return a *404* in such a case).
>>>
>>> My reading on the Mozilla page makes me think that it is phrased too
>>> narrowly. Reading RFC 2616 [1] I believe that it does not preclude
>>> responding with 501 to GET and HEAD requests. I think it means that GET and
>>> HEAD methods must be supported by "general purpose" servers. The Iceberg
>>> REST server is not a general purpose server for resources. So, I think it
>>> should be fine to respond with 501 to unimplemented endpoints.
>>>
>>> Cheers,
>>> Dmitri.
>>>
>>> [1] https://www.rfc-editor.org/rfc/rfc2616#section-5.1.1
>>>
>>> On Tue, Jul 9, 2024 at 9:44 AM Eduard Tudenhöfner <
>>> etudenhoef...@apache.org> wrote:
>>>
>>>> Hey everyone,
>>>>
>>>> I watched the catalog sync recording today and updated the PR
>>>> <https://github.com/apache/iceberg/pull/9940> to remove fine-grained
>>>> capabilities like *register-table / table-metrics*.
>>>>
>>>> The current capabilities (with versioning information) in the PR are:
>>>>
>>>>    - tables
>>>>    - views
>>>>    - remote-signing
>>>>    - vended-credentials
>>>>    - multi-table-commit
>>>>
>>>> For servers that only *partially* implement endpoints under a
>>>> capability the spec requires the server to throw a *501 Not
>>>> Implemented*. I've also added the 501 error to the response of the
>>>> respective endpoints but worth mentioning that *HEAD* / *GET *requests must
>>>> not return a 501
>>>> <https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/501> (this
>>>> implies that the server impl would e.g. return a *404* in such a case).
>>>>
>>>>
>>>> Regards
>>>> Eduard
>>>>
>>>> On Thu, Jul 4, 2024 at 3:59 PM Jean-Baptiste Onofré <j...@nanthrax.net>
>>>> wrote:
>>>>
>>>>> Hi Eduard,
>>>>>
>>>>> It makes sense to return 501 for servers which don't implement all
>>>>> endpoints. It means that the server will at least have to implement
>>>>> empty endpoints if needed (that makes sense to me).
>>>>>
>>>>> I think we should focus on only "identified capabilities". I think
>>>>> that I proposed before that the capabilities can be
>>>>> overridden/provided by server implementation. Else, I'm afraid we
>>>>> won't be flexible enough or always behind the implementation (if an
>>>>> implementation wants to add "my-foo-cap").
>>>>>
>>>>> Regards
>>>>> JB
>>>>>
>>>>> On Thu, Jul 4, 2024 at 9:32 AM Eduard Tudenhöfner
>>>>> <etudenhoef...@apache.org> wrote:
>>>>> >
>>>>> > I have clarified the wording in #9940 around the requirement on
>>>>> having to implement all endpoints under a particular capability.
>>>>> >
>>>>> > For servers that only partially implement endpoints under a
>>>>> capability the spec requires the server to throw a 501 Not Implemented.
>>>>> This was suggested by Jack and it seems reasonable to do that.
>>>>> >
>>>>> > Regarding the inclusion of table-spec / view-spec as a capability: I
>>>>> think this might make sense for the next iteration of the REST spec but as
>>>>> I mentioned earlier I don't see any clear benefit for the current REST 
>>>>> spec
>>>>> as the client wouldn't do anything with that information.
>>>>> > If there is a clear benefit of having this, then this can still be
>>>>> added later to the current REST spec but I believe we should rather have a
>>>>> few well-defined and actionable capabilities rather than too many.
>>>>> >
>>>>> > Eduard
>>>>> >
>>>>> > On Wed, Jul 3, 2024 at 5:44 AM Renjie Liu <liurenjie2...@gmail.com>
>>>>> wrote:
>>>>> >>>
>>>>> >>> Spec is an interesting topic we did not discuss. Robert, how do
>>>>> you envision this to be used?
>>>>> >>> In my mind, if a new table format v3 is launched, there are 2
>>>>> approaches we can go with, taking CreateTable as an example:
>>>>> >>> (1) increment the related operation version, which means that POST
>>>>> /v2/{prefix}/namespaces/{ns}/tables will be created and allow creating
>>>>> tables in the v3 version.
>>>>> >>> (2) update the existing table metadata model to support both v2
>>>>> and v3 fields, and the server enforces the payload differently based on 
>>>>> the
>>>>> TableMetadata.format-version field. If the server does not support v3, it
>>>>> can return unsupported at that time.
>>>>> >>> Either way we go, the table-spec version does not need to be a
>>>>> capability. (1) seems to be cleaner, but has some overhead in provisioning
>>>>> a new endpoint compared to (2).
>>>>> >>> Do you see another way to do this leveraging the table-spec
>>>>> version?
>>>>> >>
>>>>> >>
>>>>> >> 2 is cleaner but maybe inconsistent with current behavior, since
>>>>> /v1/tables operation supports both v1 and v3. We should only go to 2 only
>>>>> when we have incompatible fields/break changes according to discussion.
>>>>> >>
>>>>> >> Generally I agree with adding table-spec into capabilities. For
>>>>> example, we can expose this to user in api so that user could choose a
>>>>> supported table format version without throwing exception.
>>>>> >>
>>>>> >> On Wed, Jul 3, 2024 at 12:18 AM Jack Ye <yezhao...@gmail.com>
>>>>> wrote:
>>>>> >>>
>>>>> >>> Spec is an interesting topic we did not discuss. Robert, how do
>>>>> you envision this to be used?
>>>>> >>>
>>>>> >>> In my mind, if a new table format v3 is launched, there are 2
>>>>> approaches we can go with, taking CreateTable as an example:
>>>>> >>>
>>>>> >>> (1) increment the related operation version, which means that POST
>>>>> /v2/{prefix}/namespaces/{ns}/tables will be created and allow creating
>>>>> tables in the v3 version.
>>>>> >>>
>>>>> >>> (2) update the existing table metadata model to support both v2
>>>>> and v3 fields, and the server enforces the payload differently based on 
>>>>> the
>>>>> TableMetadata.format-version field. If the server does not support v3, it
>>>>> can return unsupported at that time.
>>>>> >>>
>>>>> >>> Either way we go, the table-spec version does not need to be a
>>>>> capability. (1) seems to be cleaner, but has some overhead in provisioning
>>>>> a new endpoint compared to (2).
>>>>> >>>
>>>>> >>> Do you see another way to do this leveraging the table-spec
>>>>> version?
>>>>> >>>
>>>>> >>> -Jack
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>> On Tue, Jul 2, 2024 at 6:03 AM Eduard Tudenhöfner
>>>>> <eduard.tudenhoef...@databricks.com.invalid> wrote:
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> I couldn't make it to the catalog sync meeting yesterday but I
>>>>> watched the recording today (thanks for providing that).
>>>>> >>>>
>>>>> >>>>> The missing piece is how (new, capabilities-aware) clients
>>>>> handle the case when a service does _not_ return the capabilities field
>>>>> (absent). My proposal would be that a client should in this case assume
>>>>> that all _currently_ existing capabilities are supported.
>>>>> >>>>>
>>>>> >>>>> - tables: [1]
>>>>> >>>>> - views: [1]
>>>>> >>>>> - remote-signing: [1]
>>>>> >>>>> - multi-table-commit: [1]
>>>>> >>>>> - register-table: [1]
>>>>> >>>>> - table-metrics: [1]
>>>>> >>>>> - table-spec: [1,2]
>>>>> >>>>> - view-spec: [1,2]
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>> The one thing I would like to add here is that the current PR
>>>>> uses the tables capability (as version 1) as the default when a server
>>>>> doesn't return capabilities but it might be also ok to include views (as
>>>>> version 1) because the current client impl has some code to deal with
>>>>> errors in case endpoints don't exist.
>>>>> >>>>
>>>>> >>>> Unless we agree that the currently existing functionality in the
>>>>> REST spec is the default behavior to be assumed for older server, I'm not
>>>>> sure about including remote-signing / multi-table-commit / register-table 
>>>>> /
>>>>> table-metrics as it has been indicated in earlier comments on the PR/ML
>>>>> that not every REST server supports these.
>>>>> >>>>
>>>>> >>>> That being said, we should discuss whether we want the default
>>>>> behavior (when an older server doesn't send back capabilities) to be
>>>>> >>>> a) tables (version 1) only
>>>>> >>>> b) the currently existing functionality as defined in the REST
>>>>> spec (as version 1)
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> On another note: Including table-spec / view-spec seems to be
>>>>> more informative in its nature as I don't think a client would act
>>>>> differently right now when seeing these.
>>>>> >>>>
>>>>> >>>> Thanks
>>>>> >>>> Eduard
>>>>> >>>>
>>>>>
>>>>

-- 
Ryan Blue
Databricks

Re: [DISCUSS] Describing REST Server capabilities

Reply via email to