martin-g commented on code in PR #19366:
URL: https://github.com/apache/datafusion/pull/19366#discussion_r2627165217


##########
docs/source/library-user-guide/upgrading.md:
##########
@@ -45,15 +45,23 @@ directly on the `Field`. For example:
 In prior versions, `ListingTableProvider` would issue `LIST` commands to
 the underlying object store each time it needed to list files for a query.
 To improve performance, `ListingTableProvider` now caches the results of
-`LIST` commands for the lifetime of the `ListingTableProvider` instance.
+`LIST` commands for the lifetime of the `ListingTableProvider` instance or
+until a cache entry expires.
 
 Note that by default the cache has no expiration time, so if files are added 
or removed
 from the underlying object store, the `ListingTableProvider` will not see
 those changes until the `ListingTableProvider` instance is dropped and 
recreated.
 
-You will be able to configure the maximum cache size and cache expiration time 
via a configuration option:
+You can configure the maximum cache size and cache entry expiration time via 
configuration options:
 
-See <https://github.com/apache/datafusion/issues/19056> for more details.
+`datafusion.runtime.list_files_cache_limit`
+`datafusion.runtime.list_files_cache_ttl`

Review Comment:
   ```suggestion
   * `datafusion.runtime.list_files_cache_ttl`
   ```



##########
docs/source/library-user-guide/upgrading.md:
##########
@@ -45,15 +45,23 @@ directly on the `Field`. For example:
 In prior versions, `ListingTableProvider` would issue `LIST` commands to
 the underlying object store each time it needed to list files for a query.
 To improve performance, `ListingTableProvider` now caches the results of
-`LIST` commands for the lifetime of the `ListingTableProvider` instance.
+`LIST` commands for the lifetime of the `ListingTableProvider` instance or
+until a cache entry expires.
 
 Note that by default the cache has no expiration time, so if files are added 
or removed
 from the underlying object store, the `ListingTableProvider` will not see
 those changes until the `ListingTableProvider` instance is dropped and 
recreated.
 
-You will be able to configure the maximum cache size and cache expiration time 
via a configuration option:
+You can configure the maximum cache size and cache entry expiration time via 
configuration options:
 
-See <https://github.com/apache/datafusion/issues/19056> for more details.
+`datafusion.runtime.list_files_cache_limit`

Review Comment:
   ```suggestion
   * `datafusion.runtime.list_files_cache_limit`
   ```



##########
docs/source/library-user-guide/upgrading.md:
##########
@@ -45,15 +45,23 @@ directly on the `Field`. For example:
 In prior versions, `ListingTableProvider` would issue `LIST` commands to
 the underlying object store each time it needed to list files for a query.
 To improve performance, `ListingTableProvider` now caches the results of
-`LIST` commands for the lifetime of the `ListingTableProvider` instance.
+`LIST` commands for the lifetime of the `ListingTableProvider` instance or
+until a cache entry expires.
 
 Note that by default the cache has no expiration time, so if files are added 
or removed
 from the underlying object store, the `ListingTableProvider` will not see
 those changes until the `ListingTableProvider` instance is dropped and 
recreated.
 
-You will be able to configure the maximum cache size and cache expiration time 
via a configuration option:
+You can configure the maximum cache size and cache entry expiration time via 
configuration options:
 
-See <https://github.com/apache/datafusion/issues/19056> for more details.
+`datafusion.runtime.list_files_cache_limit`

Review Comment:
   Those do not render well at the moment - 
https://github.com/BlakeOrth/datafusion/blob/73c667216ee2fcb85196694cc5a36633f9928c19/docs/source/library-user-guide/upgrading.md#listingtableprovider-now-caches-list-commands
   Also it would be good to mention the units of their values



##########
datafusion/execution/src/cache/cache_manager.rs:
##########
@@ -178,15 +178,21 @@ impl CacheManager {
         let file_statistic_cache =
             config.table_files_statistics_cache.as_ref().map(Arc::clone);
 
-        let list_files_cache = config
-            .list_files_cache
-            .as_ref()
-            .inspect(|c| {
-                // the cache memory limit or ttl might have changed, ensure 
they are updated
-                c.update_cache_limit(config.list_files_cache_limit);
-                c.update_cache_ttl(config.list_files_cache_ttl);
-            })
-            .map(Arc::clone);
+        let list_files_cache = match &config.list_files_cache {
+            Some(lfc) if config.list_files_cache_limit > 0 => {
+                lfc.update_cache_limit(config.list_files_cache_limit);
+                lfc.update_cache_ttl(config.list_files_cache_ttl);
+                Some(Arc::clone(lfc))
+            }
+            None if config.list_files_cache_limit > 0 => {
+                let lfc: Arc<dyn ListFilesCache> = 
Arc::new(DefaultListFilesCache::new(
+                    config.list_files_cache_limit,
+                    config.list_files_cache_ttl,
+                ));
+                Some(lfc)
+            }
+            _ => None,

Review Comment:
   If the user has disabled caching with `list_files_cache_limit = "0K"` then 
`None` will be returned here, but in this case `get_list_files_cache_limit()` 
will return "1M"
   * 
https://github.com/BlakeOrth/datafusion/blob/73c667216ee2fcb85196694cc5a36633f9928c19/datafusion/execution/src/cache/cache_manager.rs#L229
   * 
https://github.com/BlakeOrth/datafusion/blob/main/datafusion/execution/src/cache/list_files_cache.rs#L144



##########
docs/source/library-user-guide/upgrading.md:
##########
@@ -45,15 +45,23 @@ directly on the `Field`. For example:
 In prior versions, `ListingTableProvider` would issue `LIST` commands to
 the underlying object store each time it needed to list files for a query.
 To improve performance, `ListingTableProvider` now caches the results of
-`LIST` commands for the lifetime of the `ListingTableProvider` instance.
+`LIST` commands for the lifetime of the `ListingTableProvider` instance or
+until a cache entry expires.
 
 Note that by default the cache has no expiration time, so if files are added 
or removed
 from the underlying object store, the `ListingTableProvider` will not see
 those changes until the `ListingTableProvider` instance is dropped and 
recreated.
 
-You will be able to configure the maximum cache size and cache expiration time 
via a configuration option:
+You can configure the maximum cache size and cache entry expiration time via 
configuration options:
 
-See <https://github.com/apache/datafusion/issues/19056> for more details.
+`datafusion.runtime.list_files_cache_limit`
+`datafusion.runtime.list_files_cache_ttl`
+
+Caching can be disable by setting the limit to 0:

Review Comment:
   ```suggestion
   Caching can be disabled by setting the limit to 0:
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to