alamb commented on code in PR #7180:
URL: https://github.com/apache/arrow-datafusion/pull/7180#discussion_r1286384488


##########
datafusion/core/src/physical_plan/sorts/sort.rs:
##########
@@ -465,16 +484,23 @@ impl ExternalSorter {
         // This is a very rough heuristic and likely could be refined further
         if self.reservation.size() < 1048576 {
             // Concatenate memory batches together and sort
-            let batch = concat_batches(&self.schema, &self.in_mem_batches)?;
+            let (_, batches): (Vec<bool>, Vec<RecordBatch>) =
+                std::mem::take(&mut self.in_mem_batches).into_iter().unzip();
+            let batch = concat_batches(&self.schema, &batches)?;
             self.in_mem_batches.clear();
-            return self.sort_batch_stream(batch, metrics);
+            // Even if all individual batches were themselves sorted the 
resulting concatenated one
+            // isn't guaranteed to be sorted, so we must perform sorting on 
the stream.
+            return self.sort_batch_stream(batch, false, metrics);

Review Comment:
   Another approach might be be to not use the `concat` in place heuristic if 
there are any previously sorted batches -- now the code only checks for the 
overall size less than 1MB -- it could also check if there was any true in 
`in_mem_batches` and if there are use the merge path below



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to