smaheshwar-pltr opened a new issue, #13554:
URL: https://github.com/apache/iceberg/issues/13554
### Feature Request / Improvement
I'd like REST clients that are atomically creating a table to be able to
refresh vended credentials when writing the table's files.
Currently, credential refresh works when the table exists, but when it does
not, a simple e.g. Spark workflow like
```scala
df
.(...)
.writeTo(table)
.create()
```
does not refresh credentials.
IIUC, the atomic creation of a table works by the client first sending a
`stageCreate` request to the REST server, using the returned
`LoadTableResponse` for writing files, and finally commit-creating the table
via an `updateTable` request to the server.
If the REST catalog provides the refresh properties within the `stageCreate`
`LoadTableResponse` that includes the `loadCredentials` refresh endpoint
**(1)**, and it abides by the REST specification to [throw a
404](https://github.com/apache/iceberg/blob/bcf9c69c098b54d31cbd803d62a2609d3814c3df/open-api/rest-catalog-open-api.yaml#L1205-L1207)
when the table does not exist, then the Spark code above will throw because
the table does not exist when refresh is attempted. Specifically, once the
initial credentials within the `LoadTableResponse` have expired, the user will
receive:
```
org.apache.iceberg.exceptions.RESTException: Unable to process: Table does
not exist: ...
```
I suspected that I might have misunderstood the REST spec, but it looks like
[Polaris](https://github.com/apache/polaris/blob/c43c546a227a3eddd0ed0d519f53d4b347c439a7/service/common/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalogAdapter.java#L590-L596)
has the same interpretation, delegating to `loadTable`, which indeed returns a
404 if the table does not exist.
I think we should be able to support this common use case. Apologies if this
is ongoing work, but I didn't find anything when searching online. I believe
that this is achievable because `stageCreate` anyway returns credentials for a
table that doesn't exist yet according to the catalog, which is what refresh
requires for this use case.
(1) Importantly, I realise that a REST catalog could provide an endpoint
different to the `loadCredentials` one and implement its own behaviour to
handle the case described. However, my impression was that `loadCredentials`
was designed to be the refresh endpoint, and I think there's value in having a
specification for what a REST catalog should implement.
Curious for the community's thoughts. Happy to propose / contribute. cc
@nastra
### Query engine
None
### Willingness to contribute
- [x] I can contribute this improvement/feature independently
- [ ] I would be willing to contribute this improvement/feature with
guidance from the Iceberg community
- [ ] I cannot contribute this improvement/feature at this time
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]