2010YOUY01 commented on code in PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#discussion_r2191389061


##########
datafusion/physical-plan/src/sorts/streaming_merge.rs:
##########
@@ -131,14 +168,42 @@ impl<'a> StreamingMergeBuilder<'a> {
             enable_round_robin_tie_breaker,
         } = self;
 
-        // Early return if streams or expressions are empty:
-        if streams.is_empty() {
-            return internal_err!("Streams cannot be empty for streaming 
merge");
-        }
+        // Early return if expressions are empty:
         let Some(expressions) = expressions else {
             return internal_err!("Sort expressions cannot be empty for 
streaming merge");
         };
 
+        if !sorted_spill_files.is_empty() {

Review Comment:
   The current code flow is
   ```
   Sort/Aggregate
   -> StreamingMerge
   -> MultiLevelMerge (with potential internal re-spills to ensure the final 
merge can proceed under memory limit)
   -> StreamingMerge
   ```
   Then the first `StreamingMerge` -> `MultiLevelMerge` indirection implemented 
here seems redundant.
   
   How about let sort/aggregate directly use `MultiLevelMergeBuilder` instead?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to