xudong963 opened a new issue, #19697:
URL: https://github.com/apache/datafusion/issues/19697

   ## Background
   Currently:
   
   1. SpillMetrics (per operator) are updated only at the end of a spill.
   2. DiskManager tracks `used_disk_space` (current total) but doesn't expose a 
structured "progress" view.
   
   ## Proposed Changes
   1. Real-time Metric Updates in SpillMetrics: modify `InProgressSpillFile` to 
ensure `spilled_bytes`
    and `spill_file_count` metrics are updated as soon as the data is written 
to disk.
   - Initial update: In append_batch, when the IPCStreamWriter is first 
created, immediately call `update_disk_usage()` on the file and add the size 
(schema/header) to  `spilled_bytes`
   - Incremental update: After each writer.write(batch) call, call 
update_disk_usage() and add the delta size to 
   spilled_bytes
   - Final update: In finish() call update_disk_usage() after finishing the 
writer and add the remaining delta size (footer/metadata) to spilled_bytes
   .
   2. Spilling Progress Interface in DiskManager: expose the current global 
state of the disk manager.
    - New SpillingProgress struct
       ```rust
       pub struct SpillingProgress {
           /// Total bytes currently used on disk for spilling
           pub current_bytes: u64,
           /// Total number of active spill files
           pub active_files_count: usize,
       }
       ```
     - Implement `spilling_progress(&self) -> SpillingProgress`
   3. Delegate Interface in RuntimeEnv: provide a convenient entry point for 
users.
       ```
       let progress = ctx.runtime_env().spilling_progress();
       ```
   
   ---
   Then users could call the API to get the real-time spilling progress, for 
our use case, we want to call this from the SQL UI to give users the real-time 
feedback about their SQLs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to