vbekiaris opened a new issue, #437:
URL: https://github.com/apache/iceberg-go/issues/437
### Apache Iceberg version
main (development)
### Please describe the bug 🐞
`Scan.PlanFiles` takes a `context` argument. This creates the expectation
that this context is actually used internally for downloading objects from S3.
However this is not the case: the context is not propagated through the `IO`
abstraction and the implementation actually uses a stored context (previously
stored from `Catalog.LoadTable`).
This breaks the case where `Table`s are cached across requests (to avoid
hitting catalog and download/parse table metadata on each request), as setting
a context with timeout on `LoadTable` results in getting "context cancelled"
errors in any further request.
The reproducer below just uses `LoadTable` from a separate function and
cancels the context with timeout for resource cleanup (as per recommended
practice), then `PlanFiles` fails with "context canceled".
```
func TestLoadTableWithTimeout(t *testing.T) {
ctx := context.Background()
cat, err := GetCatalog(ctx)
require.NoError(t, err)
tbl, err := loadTableWithTimeout(ctx, cat, "db.test_table")
require.NoError(t, err)
// the following fails, because PlanFiles does not really use passed
context
// error: Received unexpected error:
// could not open manifest file: operation error S3: GetObject, context
canceled
_, err = tbl.Scan().PlanFiles(ctx)
require.NoError(t, err)
}
func loadTableWithTimeout(ctx context.Context, cat catalog.Catalog, tblName
string) (*table.Table, error) {
ctxWithTimeout, cancelFn := context.WithTimeout(ctx, 1*time.Minute)
defer cancelFn()
return cat.LoadTable(ctxWithTimeout, catalog.ToIdentifier(tblName), nil)
}
```
We use the Glue catalog, but any implementation that uses `io.LoadFS`
ultimately stores the context in `IO` implementation.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]