westonpace opened a new issue, #14287:
URL: https://github.com/apache/datafusion/issues/14287

   ### Is your feature request related to a problem or challenge?
   
   `RepartitionExec` is often used to fan out batches from a single partition 
into multiple partitions.  For example, if we are scanning a very big parquet 
file we use the `RepartitionExec` to take the batches we receive from the 
Parquet file and fan it out to multiple partitions so that the data can be 
processed in parallel.  Note: these `RepartitionExec` are often not setup by 
hand but rather inserted by the plan optimizer.
   
   The current approach sets up a channel per partition and (I believe) emits 
batches in a round-robin order.  This works well when the consumer is faster 
than the producer (typical in simple queries) or the workload is evenly 
balanced.  However, when the workload is skewed this leads to problems.
   
   * The query can use too much memory because the data builds up in the 
channels of the slower consumers.  (Note: if all consumers are slow then all 
outputs will fill and that _does_ trigger the producer to pause).
   * There are potential performance disadvantages because we have cores that 
are ready to do processing but their queue is empty and meanwhile there are 
cores that are busy and have deep queues.
   
   ### Describe the solution you'd like
   
   Work stealing queues come to mind.  I think there's a some literature on 
putting these to use in databases.  They can be designed fairly efficiently.  
Maybe there are some solid Rust implementations (building one from scratch 
might be a bit annoying).
   
   Otherwise, a simple and slow mutex-bound MPMC queue might be a nice 
alternative to at least avoid the memory issues (if not fix the performance 
issues).
   
   There could be plenty of other approaches as well.
   
   ### Describe alternatives you've considered
   
   I don't have a good workaround at the moment.
   
   ### Additional context
   
   A smattering of Discord conversation: 
https://discord.com/channels/885562378132000778/1331914577935597579


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to