smaheshwar-pltr opened a new issue, #13554:
URL: https://github.com/apache/iceberg/issues/13554

   ### Feature Request / Improvement
   
   I'd like REST clients that are atomically creating a table to be able to 
refresh vended credentials when writing the table's files.
   
   Currently, credential refresh works when the table exists, but when it does 
not, a simple e.g. Spark workflow like
   
   ```scala
   df
     .(...)
     .writeTo(table)
     .create()
   ```
   
   does not refresh credentials.
   
   IIUC, the atomic creation of a table works by the client first sending a 
`stageCreate` request to the REST server, using the returned 
`LoadTableResponse` for writing files, and finally commit-creating the table 
via an `updateTable` request to the server.
   
   If the REST catalog provides the refresh properties within the `stageCreate` 
`LoadTableResponse` that includes the `loadCredentials`  refresh endpoint 
**(1)**, and it abides by the REST specification to [throw a 
404](https://github.com/apache/iceberg/blob/bcf9c69c098b54d31cbd803d62a2609d3814c3df/open-api/rest-catalog-open-api.yaml#L1205-L1207)
 when the table does not exist, then the Spark code above will throw because 
the table does not exist when refresh is attempted. Specifically, once the 
initial credentials within the `LoadTableResponse` have expired, the user will 
receive:
   
   ```
   org.apache.iceberg.exceptions.RESTException: Unable to process: Table does 
not exist: ...
   ```
   
   I suspected that I might have misunderstood the REST spec, but it looks like 
[Polaris](https://github.com/apache/polaris/blob/c43c546a227a3eddd0ed0d519f53d4b347c439a7/service/common/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalogAdapter.java#L590-L596)
 has the same interpretation, delegating to `loadTable`, which indeed returns a 
404 if the table does not exist.
   
   I think we should be able to support this common use case. Apologies if this 
is ongoing work, but I didn't find anything when searching online. I believe 
that this is achievable because `stageCreate` anyway returns credentials for a 
table that doesn't exist yet according to the catalog, which is what refresh 
requires for this use case.
   
   (1) Importantly, I realise that a REST catalog could provide an endpoint 
different to the `loadCredentials` one and implement its own behaviour to 
handle the case described. However, my impression was that `loadCredentials` 
was designed to be the refresh endpoint, and I think there's value in having a 
specification for what a REST catalog should implement.
   
   Curious for the community's thoughts. Happy to propose / contribute. cc 
@nastra
   
   ### Query engine
   
   None
   
   ### Willingness to contribute
   
   - [x] I can contribute this improvement/feature independently
   - [ ] I would be willing to contribute this improvement/feature with 
guidance from the Iceberg community
   - [ ] I cannot contribute this improvement/feature at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to