EeshanBembi opened a new pull request, #17867:
URL: https://github.com/apache/datafusion/pull/17867
## Summary
Adds a comprehensive progress bar feature to the DataFusion CLI with
DuckDB-style ETA estimation,
providing real-time feedback during query execution.
- ✅ Progress bar with percentage, throughput, and ETA display
- ✅ Kalman filter smoothed ETA estimation algorithm
- ✅ TTY auto-detection (shows progress on terminal, disabled when piped)
- ✅ Configurable progress styles and update intervals
- ✅ Support for Parquet, CSV, JSON data sources
- ✅ Graceful fallback to spinner mode when totals unknown
## CLI Usage
New flags added:
```bash
# Progress mode control
--progress {auto|on|off} # Default: auto (TTY detection)
# Visual customization
--progress-style {bar|spinner} # Default: bar
--progress-interval <ms> # Default: 200ms
# ETA algorithm
--progress-estimator {kalman|linear} # Default: kalman
```
Example Output
Progress bar (when totals known):
▉▉▉▉▉▊▏ 63% 12.3M / 19.5M rows • 48.1 MB/s • ETA 00:27
Spinner (when totals unknown):
⠋ rows: 1.2M elapsed: 00:11
Implementation Details
- Architecture: New progress module with plan introspection, metrics
polling, ETA estimation, and
TTY-aware display
- Integration: Hooks into StatementExecutor::execute() after physical plan
creation
- Performance: Background polling every 200ms with <1% overhead
- APIs Used: ExecutionPlan::metrics() for live data,
ExecutionPlan::statistics() for totals
Test Plan
- Unit tests for all progress components (9 tests passing)
- Manual testing with various file formats and CLI options
- TTY vs non-TTY behavior verification
- Progress bar appears correctly and clears on completion
- No impact on query correctness or final output
Backwards Compatibility
- ✅ Fully backwards compatible - no existing behavior changed
- ✅ Progress disabled by default in non-TTY environments
- ✅ All existing CLI flags and functionality preserved
Addresses #17812
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]