alamb opened a new issue #362:
URL: https://github.com/apache/arrow-datafusion/issues/362


   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   
   In an attempt to unlock sort based optimizations in DataFusion (and IOx) 
often we want to take several input streams and merge them together so that the 
output is still sorted in the same way.
   
   One major usecase we have in IOx is that we will have data in several 
streams (e.g. parquet files, or in memory) that are already sorted and we want 
to merge those streams together into a single stream with the same sort order 
   
   As another example, it would be really nice to have a repartitioned sort 
operation (so that we are able to sort on partitions of an input in parallel) 
but there is currently no way to combine several sorted streams together. 
   
   Also, to implement something like a parallel sort-merge-join, 
https://github.com/apache/arrow-datafusion/issues/141, again having the ability 
to repartition / merge / retain sort is likely important. 
   
   **Describe the solution you'd like**
   I would like a `SortPreservingMerge` operator that takes as input a list of 
`SortExprs` and some number of input streams. The input streams are guaranteed 
to be sorted according to the sort exprs.
   
   When the operator is run, it produces a output stream that is also sorted on 
`SortExprs`
   
   **Describe alternatives you've considered**
   TBD
   
   **Additional context**
   I think @tustvold plans to implement an operator similar to this in IOx. 
Depending on how that goes, we will contemplate putting it into DataFusion


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to