I was reconciling the discussion yesterday, one point that was interesting
to me was that we agreed the purpose of these capabilities is to "control
client-side fallback behavior", or at least the client should behave
differently based on these capabilities. However, this seems to be only
needed so far for views, or more specifically, for loadView API only
because it impacts the fallback behavior to resolve the identifier as a
table or not.

For all the other capabilities listed, and even the other endpoints in
view, because a server can decide to implement it partially anyway and just
document the behavior, does it make a difference if I declare the
capability or not? The client will not stop the request, the server will
just error out if it is not supported. Maybe the error is not in the
expected code or message, but it is still an error. In that case, why do we
need all these other capabilities like tables, remote-signing, etc. in the
first place?

Maybe it is too extreme of a thought, but could anyone help describe how
the other capabilities could be used beyond potentially returning an error
earlier?

-Jack




On Tue, Jul 9, 2024 at 8:02 AM Dmitri Bourlatchkov
<dmitri.bourlatch...@dremio.com.invalid> wrote:

> Hi Eduard,
>
> > I've also added the 501 error to the response of the respective
> endpoints but worth mentioning that *HEAD* / *GET *requests must not
> return a 501
> <https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/501> (this
> implies that the server impl would e.g. return a *404* in such a case).
>
> My reading on the Mozilla page makes me think that it is phrased too
> narrowly. Reading RFC 2616 [1] I believe that it does not preclude
> responding with 501 to GET and HEAD requests. I think it means that GET and
> HEAD methods must be supported by "general purpose" servers. The Iceberg
> REST server is not a general purpose server for resources. So, I think it
> should be fine to respond with 501 to unimplemented endpoints.
>
> Cheers,
> Dmitri.
>
> [1] https://www.rfc-editor.org/rfc/rfc2616#section-5.1.1
>
> On Tue, Jul 9, 2024 at 9:44 AM Eduard Tudenhöfner <
> etudenhoef...@apache.org> wrote:
>
>> Hey everyone,
>>
>> I watched the catalog sync recording today and updated the PR
>> <https://github.com/apache/iceberg/pull/9940> to remove fine-grained
>> capabilities like *register-table / table-metrics*.
>>
>> The current capabilities (with versioning information) in the PR are:
>>
>>    - tables
>>    - views
>>    - remote-signing
>>    - vended-credentials
>>    - multi-table-commit
>>
>> For servers that only *partially* implement endpoints under a capability
>> the spec requires the server to throw a *501 Not Implemented*. I've also
>> added the 501 error to the response of the respective endpoints but worth
>> mentioning that *HEAD* / *GET *requests must not return a 501
>> <https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/501> (this
>> implies that the server impl would e.g. return a *404* in such a case).
>>
>>
>> Regards
>> Eduard
>>
>> On Thu, Jul 4, 2024 at 3:59 PM Jean-Baptiste Onofré <j...@nanthrax.net>
>> wrote:
>>
>>> Hi Eduard,
>>>
>>> It makes sense to return 501 for servers which don't implement all
>>> endpoints. It means that the server will at least have to implement
>>> empty endpoints if needed (that makes sense to me).
>>>
>>> I think we should focus on only "identified capabilities". I think
>>> that I proposed before that the capabilities can be
>>> overridden/provided by server implementation. Else, I'm afraid we
>>> won't be flexible enough or always behind the implementation (if an
>>> implementation wants to add "my-foo-cap").
>>>
>>> Regards
>>> JB
>>>
>>> On Thu, Jul 4, 2024 at 9:32 AM Eduard Tudenhöfner
>>> <etudenhoef...@apache.org> wrote:
>>> >
>>> > I have clarified the wording in #9940 around the requirement on having
>>> to implement all endpoints under a particular capability.
>>> >
>>> > For servers that only partially implement endpoints under a capability
>>> the spec requires the server to throw a 501 Not Implemented. This was
>>> suggested by Jack and it seems reasonable to do that.
>>> >
>>> > Regarding the inclusion of table-spec / view-spec as a capability: I
>>> think this might make sense for the next iteration of the REST spec but as
>>> I mentioned earlier I don't see any clear benefit for the current REST spec
>>> as the client wouldn't do anything with that information.
>>> > If there is a clear benefit of having this, then this can still be
>>> added later to the current REST spec but I believe we should rather have a
>>> few well-defined and actionable capabilities rather than too many.
>>> >
>>> > Eduard
>>> >
>>> > On Wed, Jul 3, 2024 at 5:44 AM Renjie Liu <liurenjie2...@gmail.com>
>>> wrote:
>>> >>>
>>> >>> Spec is an interesting topic we did not discuss. Robert, how do you
>>> envision this to be used?
>>> >>> In my mind, if a new table format v3 is launched, there are 2
>>> approaches we can go with, taking CreateTable as an example:
>>> >>> (1) increment the related operation version, which means that POST
>>> /v2/{prefix}/namespaces/{ns}/tables will be created and allow creating
>>> tables in the v3 version.
>>> >>> (2) update the existing table metadata model to support both v2 and
>>> v3 fields, and the server enforces the payload differently based on the
>>> TableMetadata.format-version field. If the server does not support v3, it
>>> can return unsupported at that time.
>>> >>> Either way we go, the table-spec version does not need to be a
>>> capability. (1) seems to be cleaner, but has some overhead in provisioning
>>> a new endpoint compared to (2).
>>> >>> Do you see another way to do this leveraging the table-spec version?
>>> >>
>>> >>
>>> >> 2 is cleaner but maybe inconsistent with current behavior, since
>>> /v1/tables operation supports both v1 and v3. We should only go to 2 only
>>> when we have incompatible fields/break changes according to discussion.
>>> >>
>>> >> Generally I agree with adding table-spec into capabilities. For
>>> example, we can expose this to user in api so that user could choose a
>>> supported table format version without throwing exception.
>>> >>
>>> >> On Wed, Jul 3, 2024 at 12:18 AM Jack Ye <yezhao...@gmail.com> wrote:
>>> >>>
>>> >>> Spec is an interesting topic we did not discuss. Robert, how do you
>>> envision this to be used?
>>> >>>
>>> >>> In my mind, if a new table format v3 is launched, there are 2
>>> approaches we can go with, taking CreateTable as an example:
>>> >>>
>>> >>> (1) increment the related operation version, which means that POST
>>> /v2/{prefix}/namespaces/{ns}/tables will be created and allow creating
>>> tables in the v3 version.
>>> >>>
>>> >>> (2) update the existing table metadata model to support both v2 and
>>> v3 fields, and the server enforces the payload differently based on the
>>> TableMetadata.format-version field. If the server does not support v3, it
>>> can return unsupported at that time.
>>> >>>
>>> >>> Either way we go, the table-spec version does not need to be a
>>> capability. (1) seems to be cleaner, but has some overhead in provisioning
>>> a new endpoint compared to (2).
>>> >>>
>>> >>> Do you see another way to do this leveraging the table-spec version?
>>> >>>
>>> >>> -Jack
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> On Tue, Jul 2, 2024 at 6:03 AM Eduard Tudenhöfner
>>> <eduard.tudenhoef...@databricks.com.invalid> wrote:
>>> >>>>
>>> >>>>
>>> >>>> I couldn't make it to the catalog sync meeting yesterday but I
>>> watched the recording today (thanks for providing that).
>>> >>>>
>>> >>>>> The missing piece is how (new, capabilities-aware) clients handle
>>> the case when a service does _not_ return the capabilities field (absent).
>>> My proposal would be that a client should in this case assume that all
>>> _currently_ existing capabilities are supported.
>>> >>>>>
>>> >>>>> - tables: [1]
>>> >>>>> - views: [1]
>>> >>>>> - remote-signing: [1]
>>> >>>>> - multi-table-commit: [1]
>>> >>>>> - register-table: [1]
>>> >>>>> - table-metrics: [1]
>>> >>>>> - table-spec: [1,2]
>>> >>>>> - view-spec: [1,2]
>>> >>>>>
>>> >>>>>
>>> >>>> The one thing I would like to add here is that the current PR uses
>>> the tables capability (as version 1) as the default when a server doesn't
>>> return capabilities but it might be also ok to include views (as version 1)
>>> because the current client impl has some code to deal with errors in case
>>> endpoints don't exist.
>>> >>>>
>>> >>>> Unless we agree that the currently existing functionality in the
>>> REST spec is the default behavior to be assumed for older server, I'm not
>>> sure about including remote-signing / multi-table-commit / register-table /
>>> table-metrics as it has been indicated in earlier comments on the PR/ML
>>> that not every REST server supports these.
>>> >>>>
>>> >>>> That being said, we should discuss whether we want the default
>>> behavior (when an older server doesn't send back capabilities) to be
>>> >>>> a) tables (version 1) only
>>> >>>> b) the currently existing functionality as defined in the REST spec
>>> (as version 1)
>>> >>>>
>>> >>>>
>>> >>>> On another note: Including table-spec / view-spec seems to be more
>>> informative in its nature as I don't think a client would act differently
>>> right now when seeing these.
>>> >>>>
>>> >>>> Thanks
>>> >>>> Eduard
>>> >>>>
>>>
>>

Reply via email to