2010YOUY01 commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2191389061
########## datafusion/physical-plan/src/sorts/streaming_merge.rs: ########## @@ -131,14 +168,42 @@ impl<'a> StreamingMergeBuilder<'a> { enable_round_robin_tie_breaker, } = self; - // Early return if streams or expressions are empty: - if streams.is_empty() { - return internal_err!("Streams cannot be empty for streaming merge"); - } + // Early return if expressions are empty: let Some(expressions) = expressions else { return internal_err!("Sort expressions cannot be empty for streaming merge"); }; + if !sorted_spill_files.is_empty() { Review Comment: The current code flow is ``` Sort/Aggregate -> StreamingMerge -> MultiLevelMerge (with potential internal re-spills to ensure the final merge can proceed under memory limit) -> StreamingMerge ``` Then the first `StreamingMerge` -> `MultiLevelMerge` indirection implemented here seems redundant. How about let sort/aggregate directly use `MultiLevelMergeBuilder` instead? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org