Re: [DISCUSS] Describing REST Server capabilities

Jack Ye Tue, 25 Jun 2024 15:15:34 -0700

Hi everyone,

I feel I do not see a good answer to why not just simply version each API?
When using tag, it means I have to offer capabilities per-tagged group.
However, I could for example just offer loadTable and nothing else in a
catalog, and that should still be Iceberg REST compliant. And I think we
need a versioning story anyway, there is no way around it.


Here is the workflow in my mind with versioning:

1. Going forward, every time the REST catalog spec introduces any new API
endpoints or backwards incompatible changes to the existing APIs, the
version of the specific API is incremented. So suppose the PlanTable API is
added, this API will be at version v1. Suppose UpdateTable is updated with
a new update type, that API will be at version v2, but PlanTable will
remain at v1.

2. a catalog must implement getConfig. This API is the only one that is
required.

3. in getConfig, in the defaults map (it could be in some new metadata
structure, but since we want strong backwards compatibility guarantee,
reusing string maps seems to be the best way), server returns key-value
pairs of:
- key: operation:<operationName>
- value: version number

4. the client assumes that the map is ordered, and resolves API versions
sequentially. For example, suppose I have the following map:

{ "operation:planTable": "1", "operation:loadTable": "2" }

Note that by "supporting", it means to return a response in a predictable
way that is compliant with the spec. It can also return 406
UnsupportedOperation as a way to support it.

There is also a special version *, that means any version can work.

5. Backwards compatibility: suppose the client is at a higher version than
the server, then the client should always be able to understand the
server's full list of capabilities.

6. Forward compatibility: suppose the client is at a lower version than the
server, then the client should parse whatever operation it understands, and
use the highest version it could support to execute the operation. Suppose
the client only supports loadTable v1, then it will continue to hit the GET
v1/namespaces/{ns}/tables/{table} route, instead of GET
v2/namespaces/{ns}/tables/{table}. The v1 route could continue to support
the client, or it could throw 406 to indicate that this route is deprecated
and the client needs to upgrade.

For initial backwards compatibility, I think not returning anything should
mean that all API that the client understands are having version *.

What do people think of it, compared to the tag approach?

Best,
Jack Ye



















On Mon, Jun 24, 2024 at 1:42 PM Micah Kornfield <emkornfi...@gmail.com>
wrote:

> I don't have strong opinions either way here, just thought it was worth
> raising some concerns over possible evolution here.  Some responses inline,
> but if capabilities seem to meet the requirement at hand, then it does
> potentially seem the simplest mechanism.
>
>
> I think we also want to avoid relyance on server specific published
>> OpenAPI as they may leak other options/parameters/etc.  This may lead to
>> confusion around what the canonical spec is and make clients incompatible
>> if they're generated off of a non-standard spec document.
>
>
> Yeah, I wasn't proposing necessarily using built in functionality but a
> pre-scrubbed document.  Since there is no reference service implementation
> for REST it seems like each implementor would need to describe the best way
> of scrubbing there description.
>
>
>
>> @Micah this sounds to me as if the client would then have to parse a
>> bunch of endpoints to figure out whether it's safe to e.g. call loading a
>> view or dropping a table on the given REST server. Rather than having a
>> dedicated endpoint we're just using the */config* endpoint to provide
>> information about what a server supports.
>
>
> I was not suggesting multiple endpoints here, simply different contents
> for */config *I agree in the short term this does add complexity on the
> clients. But given that the canonical REST API clients are being developed
> into the standard library, I'm not sure how much toil this would cause in
> general. This also does not necessarily need to called up-front but could
> be called to verify existence vs a permission issue after an error was
> received.
>
> What round-trips did you have in mind here?
>
>
> All good points though, but I'm not aware of a standard way to handle this.
>
>
> IIUC, this sounds like a standard service description problem to me, the
> solution with capabilities appears to be one level abstraction on top of
> this.  Service discovery seems like it has been reimplemented a few
> different times depending on the technology [1][2][3]
>
>
> I think versioning adds another level of complexity, but might be
>> necessary since I expect these will evolve to some extent and may even
>> require hitting versioned urls.
>
>
> If there is no concrete proposal on versioning, I agree it probably pays
> to side step this.  The endpoint transitioning from list of strings to list
> of objects, would be an obvious sign to clients that they are out of date.
> I think serving a service description(s), despite its complexity, is likely
> the most principled way of versioning items appropriately, but this
> definitely requires more in depth thought/design.
>
>
> Thanks,
> Micah
>
> [1] https://en.wikipedia.org/wiki/Web_Services_Description_Language
> [2] https://en.wikipedia.org/wiki/Web_Application_Description_Language
> [3] https://developers.google.com/discovery/v1/reference/apis
>
>
>
>
> On Mon, Jun 24, 2024 at 12:42 PM Daniel Weeks <dwe...@apache.org> wrote:
>
>> Hey Micah,
>>
>> I think what we're trying to achieve is strike a balance between client
>> complexity and ability to support multiple server-side capabilities.  One
>> challenge we've run into is if a client performs an operation (e.g.
>> listViews), but receives a 403 code, it's not clear whether the client
>> doesn't have access or the server doesn't support an endpoint but isn't
>> sending a 404 for security reasons.  This is a simple way for the client to
>> understand what it should expect from the server.
>>
>> >  Another option would be just list all endpoints . . . and let clients
>> take appropriate actions
>> > This could be done by vending the OpenAPI spec the server supports at
>> its own endpoint. I think this avoids the future problem of having to
>> classify new endpoints into a specific capability.
>>
>> You're right that this would be the most complete way to handle this, but
>> it's really complicated and may require additional "handshake" calls even
>> for small interactions with the catalog service.  I think this puts a lot
>> of onus on the client, when what we're describing is a set of endpoints
>> that correspond to a capability.
>>
>> I think we also want to avoid relyance on server specific published
>> OpenAPI as they may leak other options/parameters/etc.  This may lead to
>> confusion around what the canonical spec is and make clients incompatible
>> if they're generated off of a non-standard spec document.
>>
>> All good points though, but I'm not aware of a standard way to handle
>> this.
>>
>> I think versioning adds another level of complexity, but might be
>> necessary since I expect these will evolve to some extent and may even
>> require hitting versioned urls.
>>
>> -Dan
>>
>>
>>
>>
>> On Mon, Jun 24, 2024 at 12:03 AM Eduard Tudenhöfner <
>> etudenhoef...@apache.org> wrote:
>>
>>> We had a separate discussion with Dan on the *oauth2* flag last week
>>> and came to the same conclusion that removing the *oauth2* capability
>>> is probably the best for now.
>>> This is mainly because we can't really act on the *oauth2* capability
>>> right now, because the */tokens* endpoint is called before we hit the
>>> */config* endpoint.
>>>
>>> > Another option would be just list all endpoints (and maybe even
>>> further which operations are supported) the server actually supports and
>>> let clients take appropriate actions (i.e. grouping could happen on the
>>> client side).  This could be done by vending the OpenAPI spec the server
>>> supports at its own endpoint. I think this avoids the future problem of
>>> having to classify new endpoints into a specific capability.
>>>
>>> @Micah this sounds to me as if the client would then have to parse a
>>> bunch of endpoints to figure out whether it's safe to e.g. call loading a
>>> view or dropping a table on the given REST server. Rather than having a
>>> dedicated endpoint we're just using the */config* endpoint to provide
>>> information about what a server supports.
>>>
>>> Thanks
>>> Eduard
>>>
>>> On Fri, Jun 21, 2024 at 8:27 PM Ryan Blue <b...@databricks.com.invalid>
>>> wrote:
>>>
>>>> Let's remove the oauth2 tag for now until we figure out how to move
>>>> forward there. That makes sense to me.
>>>>
>>>> On Fri, Jun 21, 2024 at 9:30 AM Dmitri Bourlatchkov
>>>> <dmitri.bourlatch...@dremio.com.invalid> wrote:
>>>>
>>>>> Hi Eduard,
>>>>>
>>>>> The capabilities PR looks good to me overall. I have a concern with
>>>>> the "oauth2" tag name though.
>>>>>
>>>>> I also commented [1] in GH but the comment appears to be closed by
>>>>> default :)
>>>>>
>>>>> I believe the term "oauth2" is confusing in this context with respect
>>>>> to RFC 6749 [2] as discussed in depth on another thread [3]
>>>>>
>>>>> The functionality behind the /tokens endpoint is quite specific to the
>>>>> Iceberg REST spec and as the other discussion highlights, there are
>>>>> concerns with respect to OAuth2 interoperability with other OAuth2 
>>>>> servers.
>>>>>
>>>>> What do you think about using a different tag name for it, for example
>>>>> "local-tokens" or "auth-tokens"?
>>>>>
>>>>> Thanks,
>>>>> Dmitri.
>>>>>
>>>>> [1]
>>>>> https://github.com/apache/iceberg/pull/9940/files/15c769a52b85ac4deff5659978c7ffa7802612b0#r1649173934
>>>>> [2] https://www.rfc-editor.org/rfc/rfc6749
>>>>> [3] https://lists.apache.org/thread/twk84xx7v0xy5q5tfd9x5torgr82vv50
>>>>>
>>>>> On Thu, Jun 20, 2024 at 7:28 AM Eduard Tudenhoefner <
>>>>> etudenhoef...@apache.org> wrote:
>>>>>
>>>>>> Hey everyone,
>>>>>>
>>>>>> I'd like to bring up the discussion around describing REST server
>>>>>> capabilities via the */config* endpoint.
>>>>>> There is PR #9940 <https://github.com/apache/iceberg/pull/9940> that
>>>>>> describes the OpenAPI spec changes.
>>>>>>
>>>>>> Mainly we'd like to have a *capabilities* field in the
>>>>>> *ConfigResponse* that allows servers to indicate to clients which
>>>>>> capabilities are being supported.
>>>>>>
>>>>>> So far we have the following capabilities:
>>>>>>
>>>>>>    - tables
>>>>>>    - views
>>>>>>    - remote-signing
>>>>>>    - vended-credentials
>>>>>>    - multi-table-commit
>>>>>>    - register-table
>>>>>>    - table-metrics
>>>>>>    - oauth2
>>>>>>
>>>>>>
>>>>>> The general idea behind a capability is that if e.g. a server
>>>>>> supports *views*, then that server must implement all endpoints
>>>>>> grouped under that capability.
>>>>>> It's worth noting that the */config* endpoint is currently being
>>>>>> implicit (meaning that every REST server would have to implement it).
>>>>>>
>>>>>> One discussion point that came up during review is how we want to
>>>>>> handle capabilities and backwards compatibility and what the default
>>>>>> capability would be, since older servers don't know anything about
>>>>>> *capabilities* (in such a case we could assume that the default
>>>>>> capabilities would be *oauth2* / *tables*).
>>>>>>
>>>>>> Are there any other capabilities that we'd like to include in the
>>>>>> list?
>>>>>>
>>>>>> Eduard
>>>>>>
>>>>>
>>>>
>>>> --
>>>> Ryan Blue
>>>> Databricks
>>>>
>>>

Re: [DISCUSS] Describing REST Server capabilities

Reply via email to