Tamar-Posen opened a new pull request, #18885: URL: https://github.com/apache/datafusion/pull/18885
Previously, AggregateExec dropped total_byte_size statistics (Precision::Absent) through aggregation operations, preventing the optimizer from making informed decisions about memory allocation and execution strategies(join side selection -> dynamic filters). This commit implements proportional byte-size scaling based on row count ratios: - Added calculate_scaled_byte_size helper with inline optimization - Scales byte size for Final/FinalPartitioned without GROUP BY - Scales byte size proportionally for all other aggregation modes - Always returns Precision::Inexact for estimates (semantically correct) - Returns Precision::Absent when insufficient input statistics Added test coverage for edge cases (absent statistics, zero rows). ## Which issue does this PR close? https://github.com/apache/datafusion/issues/18850 - Closes #18850 ## Rationale for this change Without byte-size statistics, the optimizer cannot estimate memory requirements for join-side selection, dynamic filter generation, and memory allocation decisions. This preserves statistics using proportional scaling (bytes_per_row × output_rows). ## What changes are included in this PR? 1. Modified `statistics_inner` to calculate proportional byte size instead of returning `Precision::Absent` 2. Added `calculate_scaled_byte_size` helper (inline optimized, guards against division by zero) 3. Updated test assertions and added edge case coverage ## Are these changes tested? Yes: - Modified `check_aggregates` validates statistics preservation through aggregation pipeline - New `test_aggregate_statistics_edge_cases` covers edge cases scenarios ## Are there any user-facing changes? No breaking changes. Internal optimization that may improve query planning and provide more accurate memory estimates in EXPLAIN output. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
