[I] feat(metrics): per-query memory accounting and Tokio runtime metrics [datafusion-java]

via GitHub Wed, 20 May 2026 23:56:46 -0700


LantaoJin opened a new issue, #82:
URL: https://github.com/apache/datafusion-java/issues/82


   ### Is your feature request related to a problem or challenge?
   
   For any embedder running multi-tenant DataFusion workloads, two operational 
signals are missing from the Java binding today:
   
   1. **Per-query memory** — there is no way to ask DataFusion how many bytes a 
specific in-flight query is currently holding, or what its peak was. 
`SessionContextBuilder.memoryLimit(...)` (PR #28) lets you cap the *global* 
pool, but if a tenant blows past their fair-share allocation you can't tell 
whose query it is. Without this you cannot do fair-share scheduling, abuse 
detection, or correctly attribute OOMs.
   
   2. **Tokio runtime stats** — the JNI library drives a single shared 
multi-threaded Tokio runtime in `lib.rs`. There is no Java-side window into how 
busy that runtime is right now: how many worker threads exist, how saturated 
they are, how deep the global queue is. Embedders that report node-level health 
(e.g. an OpenSearch `_nodes/stats` endpoint) need these numbers; today they 
have to hand-roll a parallel native bridge to surface them.
   
   Both problems share the same FFI shape: snapshot a small struct of counters 
across the boundary on demand. They are bundled here so the design conversation 
only needs to happen once.
   
   ### Describe the solution you'd like
   
   Two narrow additions, both opt-in.
   
   ### 1. Per-query memory: `MemoryUsage` snapshot from a `DataFrame`
   
   ```java
   DataFrame df = ctx.sql(...);
   ArrowReader r = df.executeStream(allocator);
   // ... while another thread drains r ...
   MemoryUsage usage = df.memoryUsage();   // peakBytes / currentBytes / 
consumerCount
   ```
   
   The Java surface is a single new accessor on `DataFrame`. The Rust side 
wraps the session's `MemoryPool` with a per-query tracking adapter (modelled on 
`TrackConsumersPool` upstream) so the snapshot reports only the consumers tied 
to *this* query's `MemoryConsumer`s, not the whole pool. The accessor is 
non-consuming, can be polled from any thread, and returns immutable values.
   
   For most users the wrapper is harmless and adds no measurable overhead; it 
can be inserted automatically by `SessionContext` so callers don't have to opt 
in per query.
   
   ### 2. Tokio runtime stats: `SessionContext.runtimeStats()`
   
   ```java
   RuntimeStats stats = ctx.runtimeStats();
   // numWorkers, liveTasksCount, globalQueueDepth, elapsed,
   // totalBusyDuration, totalParkCount, totalPollsCount, ...
   ```
   
   A small immutable POJO mirroring the subset of 
`tokio_metrics::RuntimeMetrics` that operations dashboards typically render. 
Snapshotted under the JNI guard; cheap (one struct copy across FFI).
   
   Because `tokio_metrics` requires the `--cfg tokio_unstable` build flag to 
compile in, **runtime stats are gated behind a default-off `runtime-metrics` 
Cargo feature** on `datafusion-jni`. Default `cargo build` is unaffected (no 
new build flags, no new dep). To enable:
   
   ```sh
   RUSTFLAGS="--cfg tokio_unstable" cargo build --features runtime-metrics
   ```
   
   The Java method always exists; if invoked against a build that compiled the 
feature off it throws a clear `RuntimeException` pointing at the flag. Same 
shape as the just-shipped `substrait` feature (PR #75).
   
   The two pieces are intentionally bundled. Per-query memory tracking and 
runtime stats both need the same JNI plumbing pattern (snapshot a struct of 
counters across FFI), the same lifecycle considerations (polling from another 
thread while a query runs), and the same opt-in gating story for builds that 
don't want the extra dependency. Splitting them would mean two near-identical 
reviews.
   
   ### Describe alternatives you've considered
   
   - **Always-on Tokio metrics, set `--cfg tokio_unstable` globally.** Would 
force every contributor and every CI environment to override `RUSTFLAGS`. 
Pollutes the build env and breaks anyone who already has their own RUSTFLAGS. 
Rejected.
   
   ### Additional context
   
   - `tokio-metrics 0.5.0` is the right version against the Tokio currently 
used in `native/Cargo.toml` (`tokio = "1"`). It exposes `RuntimeMonitor` whose 
`intervals()` iterator yields `RuntimeMetrics` snapshots; the JNI handler grabs 
one snapshot per call.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] feat(metrics): per-query memory accounting and Tokio runtime metrics [datafusion-java]

Reply via email to