NGA-TRAN opened a new issue, #10257:
URL: https://github.com/apache/datafusion/issues/10257

   ### Is your feature request related to a problem or challenge?
   
   We have run into a an issue in IOx and described 
[here](https://github.com/influxdata/arrow-datafusion/pull/4) that the `Union` 
is converted to `Interleave` if their input data can interleave. The 
Interleave's job is to send their corresponding partitions to the right output. 
 If I understand correctly its purposed is to keep data grouped in their same 
partitions which will be useful if the operators down stream want data in that 
shape. As the result, it is not useful to push sort down because we do not 
need/want data sorted in that case.
   
   However, for our IOx use case, even though the input data to Union can be 
interleave, a `Projection` above that (See [the plan here 
](https://github.com/influxdata/arrow-datafusion/pull/4#issue-2223275385) adds 
different constants ("m0" and  "m1" in the example) and if we add that constant 
as a column into the output, data no longer interleave even though their 
`output_partitioning` says the do. Further more, we do want those data to get 
sorted and hence need the sort-push-down.
   
   With the opposite needed described in 2 paragraphs above, converting Union 
to Interleave is not always needed even if data  interleave. This ticket is a 
request to avoid that from happening.
   
   ### Describe the solution you'd like
   
   After chatting with @alamb  we propose to add an `option` into the config to 
tell `enforce_distribution` no to use Interleave [at this 
step](https://github.com/apache/datafusion/blob/f8c623fe045d70a87eac8dc8620b74ff73be56d5/datafusion/core/src/physical_optimizer/enforce_distribution.rs#L1195)
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to