alamb commented on issue #22090:
URL: https://github.com/apache/datafusion/issues/22090#issuecomment-4425078087

   > Thanks for the quick reply!
   
   No problem!
   
   > With my proposal, the buffered bytes are capped at a certain value, but 
every time a record batch is processed by a consumer, its bytes are released, 
freeing up headroom. This essentially means the RepartitionExec stays "open" as 
long as it receives and releases memory at roughly the same rate. So yes, it 
can theoretically lead to more deadlocks. 
   
   This is not just a theory -- we have seen such deadlocks in production at 
InfluxData -- typically on very skewed datasets
   
   > That said, the situation under which it "locks" is generally an 
undesirable one to begin with: it's the case where a RepartitionExec 
continuously receives more bytes than it can release. 
   
   I agree the situation is undesirable, though I disagree with the root cause. 
The root cause is that there is some downstream operator consuming at very 
different rates
   
   
   > This can of course be legitimately required by certain downstream 
consumers, as your example shows. But there is typically a limit to how much 
data downstream consumers need in order to make progress, and that's what this 
cap is meant to enforce.
   
   I don't see how the cap will ensure that the downstream consumer can make 
progress
   
   > A bit of context of what I'm trying to achieve. So I'm trying to run TPCH 
query 17 sf=100 in a memory constrained environment with either 4gb or 8gb of 
memory. I'm using SortMergeJoin as the Join algorithm. And I'm experimenting 
with diffent(also custom) memory pools.
   
   I wonder  if the problem is that some of the SMJ partitions have chosen to 
spill but others have not (yet) resulting in different rates... maybe you could 
look into forcing all SMJ partitions to spill if any does
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to