alamb commented on issue #22090:
URL: https://github.com/apache/datafusion/issues/22090#issuecomment-4415084807

   > RepartitionExec's distribution channels (distributor_channels.rs) only 
throttle producers when every output channel has at least one buffered item 
   
   This is by design to avoid deadlocks
   
   > The producer should be throttled when total buffered memory crosses a 
configured threshold, regardless of how many channels are technically non-empty.
   
   If you have a situation where one of the channels is empty, are you 
guaranteed that the other non empty channels can make progress? For example the 
classic diamond plan
   
   ```
   SortPreservingMerge (or some other operator                
         where consumption is a function                      
          of the values in the streams)                       
                                                              
          ┌──────────────────────┐                            
          │         Merge        │                            
          └──────────────────────┘                            
              ▲       ▲      ▲                                
              │       │      │                                
              │       │      │                                
              │       │      │                                
              │       │      │                                
              │       │      │                                
              │       │      │                                
              │       │      │                                
              │       │      │                                
         ┌───┐│       │      │ ┌───┐                          
         │   ││       │      │ │   │                          
         │   ││       │      │ │   │ Channel 1 and 3 are full 
         └───┘│       │      │ └───┘       / memory full      
          ┌───┴───────┴──────┴───┐   but a batch is needed in 
          │      Repartition     │       Channel 2 to make    
          └──────────────────────┘           progress         
                                                              
   ```
   
   If you have one consumer falling behind, I think better strategy might be to 
apply back pressure at the consumer end (rather than the Repartition) 
   
   What are the consumers in this case?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to