xudong963 opened a new issue, #19697:
URL: https://github.com/apache/datafusion/issues/19697
## Background
Currently:
1. SpillMetrics (per operator) are updated only at the end of a spill.
2. DiskManager tracks `used_disk_space` (current total) but doesn't expose a
structured "progress" view.
## Proposed Changes
1. Real-time Metric Updates in SpillMetrics: modify `InProgressSpillFile` to
ensure `spilled_bytes`
and `spill_file_count` metrics are updated as soon as the data is written
to disk.
- Initial update: In append_batch, when the IPCStreamWriter is first
created, immediately call `update_disk_usage()` on the file and add the size
(schema/header) to `spilled_bytes`
- Incremental update: After each writer.write(batch) call, call
update_disk_usage() and add the delta size to
spilled_bytes
- Final update: In finish() call update_disk_usage() after finishing the
writer and add the remaining delta size (footer/metadata) to spilled_bytes
.
2. Spilling Progress Interface in DiskManager: expose the current global
state of the disk manager.
- New SpillingProgress struct
```rust
pub struct SpillingProgress {
/// Total bytes currently used on disk for spilling
pub current_bytes: u64,
/// Total number of active spill files
pub active_files_count: usize,
}
```
- Implement `spilling_progress(&self) -> SpillingProgress`
3. Delegate Interface in RuntimeEnv: provide a convenient entry point for
users.
```
let progress = ctx.runtime_env().spilling_progress();
```
---
Then users could call the API to get the real-time spilling progress, for
our use case, we want to call this from the SQL UI to give users the real-time
feedback about their SQLs.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]