BlakeOrth opened a new pull request, #18855:
URL: https://github.com/apache/datafusion/pull/18855
## Which issue does this PR close?
POC for:
- https://github.com/apache/datafusion/issues/18827
## Rationale for this change
This is a POC for initial feedback and is not intended for merge at this
time.
## What changes are included in this PR?
- Implements a POC version of a default ListFilesCache
- Refactors the existing ListFilesCache to mirror the MetadataCache by
defining a new trait instead of a fixed type wrapping a trait
- Bounds the size of the cache based on number of entries
- Expires entries in the cache after a default timeout duration
## Are these changes tested?
This code is functional and some tests clearly show a reduction in Object
Store requests! However, existing tests are broken around `INSERT` commands,
which is a key point of discussion that needs to be covered.
## Are there any user-facing changes?
Yes, this work will likely break the existing `ListFilesCache` public API.
## Additional Context
This PR is a basic functional implementation which heavily mirrors the
existing `MetadataCache` and its semantics. One very key omission here that
needs to be discussed is how `INSERT` statements are handled. On the surface it
seems like there are two options:
1. Invalidate the cache key(s) associated with the table that corresponds
to the `INSERT` statement
2. Intercept the object(s) that correspond to an `INSERT` statement and add
them to the cache
The first option here seems much easier, but the 2nd option seems more ideal
since a user is likely to issue a query against newly inserted data. Any input
here, or other strategies I haven't thought of to handle inserts, would be
great!
I will also leave some inline comments around some `TODO` items that I think
should be discussed.
cc @alamb @alchemist51
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]