BlakeOrth opened a new pull request, #18855:
URL: https://github.com/apache/datafusion/pull/18855

   
   
   ## Which issue does this PR close?
   
   POC for:
    - https://github.com/apache/datafusion/issues/18827
   
   ## Rationale for this change
   
   This is a POC for initial feedback and is not intended for merge at this 
time.
   
   ## What changes are included in this PR?
   
    - Implements a POC version of a default ListFilesCache
    - Refactors the existing ListFilesCache to mirror the MetadataCache by 
defining a new trait instead of a fixed type wrapping a trait
    - Bounds the size of the cache based on number of entries
    - Expires entries in the cache after a default timeout duration
   
   ## Are these changes tested?
   
   This code is functional and some tests clearly show a reduction in Object 
Store requests! However, existing tests are broken around `INSERT` commands, 
which is a key point of discussion that needs to be covered.
   
   ## Are there any user-facing changes?
   
   Yes, this work will likely break the existing `ListFilesCache` public API.
   
   ## Additional Context
   
   This PR is a basic functional implementation which heavily mirrors the 
existing `MetadataCache` and its semantics. One very key omission here that 
needs to be discussed is how `INSERT` statements are handled. On the surface it 
seems like there are two options:
    1. Invalidate the cache key(s) associated with the table that corresponds 
to the `INSERT` statement
    2. Intercept the object(s) that correspond to an `INSERT` statement and add 
them to the cache
   
   The first option here seems much easier, but the 2nd option seems more ideal 
since a user is likely to issue a query against newly inserted data. Any input 
here, or other strategies I haven't thought of to handle inserts, would be 
great!
   
   I will also leave some inline comments around some `TODO` items that I think 
should be discussed.
   
   cc @alamb @alchemist51 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to