LantaoJin opened a new pull request, #78:
URL: https://github.com/apache/datafusion-java/pull/78
## Which issue does this PR close?
- Closes #74 .
## Rationale for this change
DataFusion's `RuntimeEnv` accepts a `CacheManagerConfig` with three
independent caches: the file-embedded metadata cache (parquet footers / page
metadata), the list-files cache (object-store `LIST` results), and the
file-statistics cache (per-file row counts and column stats used by the
planner). The Rust API is
`RuntimeEnvBuilder::with_cache_manager(CacheManagerConfig)`. The Java binding
has no surface for any of it — every `SessionContext` ends up with the no-op
upstream defaults today, so a parquet workload reading the same footer
thousands of times across queries goes back to the object store every single
time, and statistics-driven planners can't persist their stats across queries.
This PR adds a typed `cacheManager(CacheManagerOptions)` setter on
`SessionContextBuilder` that exposes the three caches independently:
```java
SessionContext ctx = SessionContext.builder()
.cacheManager(CacheManagerOptions.builder()
.fileMetadataCache(64L << 20) // 64 MiB cap
.listFilesCache(8L << 20, Duration.ofMinutes(5)) // 8 MiB cap,
5min TTL
.fileStatisticsCache(true)
.build())
.build();
```
Each setter is independent; calling one doesn't touch the others. Builders
that never call `cacheManager(...)` see no change — the wire-format
`cache_manager` field is absent and the JNI layer skips
`with_cache_manager(...)` entirely, leaving upstream's own `RuntimeEnvBuilder`
defaults in place.
## What changes are included in this PR?
- **Proto:** `proto/cache_manager_options.proto`.
- **Java API:** `org.apache.datafusion.CacheManagerOptions`
- **Native:** `native/src/cache_manager.rs`
- **Build wiring:** `proto/cache_manager_options.proto`
## Are these changes tested?
Yes, 18 new tests cross `CacheManagerOptionsTest` and
`SessionContextCacheManagerTest`.
## Are there any user-facing changes?
Yes, but additive only — no breaking changes:
- New public class `org.apache.datafusion.CacheManagerOptions` with a static
`builder()` and three setters.
- New `SessionContextBuilder.cacheManager(CacheManagerOptions)` setter.
No behavior change for callers that do not invoke the new setter — the
`cache_manager` field is absent on the wire and the native side leaves
upstream's `RuntimeEnvBuilder` defaults in place.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]