pantShrey commented on PR #21882:
URL: https://github.com/apache/datafusion/pull/21882#issuecomment-4435286903

   Hey @adriangb, Andrew suggested I reach out to you since you originally 
authored `repartition::test::test_preserve_order_with_spilling`. I'm currently 
hitting a wall with it while migrating the spilling architecture to async 
streams.
   
   The test is currently stuck in a memory-accounting deadlock. Here’s what is 
happening:
   
   * If I set the memory pool limit tight enough to force a spill, 
`RepartitionMerge` panics during initialization. It needs to reserve some 
memory to set up its streams, but exhausts the pool before completing its 
unspillable setup.
   
   * However, if I increase the pool limit to give Merge enough headroom to 
initialize safely and then scale up the data volume to force overflow, the 
`RepartitionExec` producers greedily consume the additional memory first. This 
either ends up starving Merge again or allows the query to complete entirely in 
memory without triggering a spill.
   
   I was able to trigger a spill once by setting the test memory limit to 608 
B, but even that was not sufficient for the test to pass reliably.
   
   Is there a correct or idiomatic way to configure this test (batch sizes, 
data volume, memory pool limits, etc.) to reliably force a `RepartitionExec` 
spill without violating the Merge operator’s baseline initialization overhead? 
Or am I approaching this incorrectly and missing something obvious?
   
   I would really appreciate any guidance you could provide.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to