pantShrey commented on PR #21882: URL: https://github.com/apache/datafusion/pull/21882#issuecomment-4435286903
Hey @adriangb, Andrew suggested I reach out to you since you originally authored `repartition::test::test_preserve_order_with_spilling`. I'm currently hitting a wall with it while migrating the spilling architecture to async streams. The test is currently stuck in a memory-accounting deadlock. Here’s what is happening: * If I set the memory pool limit tight enough to force a spill, `RepartitionMerge` panics during initialization. It needs to reserve some memory to set up its streams, but exhausts the pool before completing its unspillable setup. * However, if I increase the pool limit to give Merge enough headroom to initialize safely and then scale up the data volume to force overflow, the `RepartitionExec` producers greedily consume the additional memory first. This either ends up starving Merge again or allows the query to complete entirely in memory without triggering a spill. I was able to trigger a spill once by setting the test memory limit to 608 B, but even that was not sufficient for the test to pass reliably. Is there a correct or idiomatic way to configure this test (batch sizes, data volume, memory pool limits, etc.) to reliably force a `RepartitionExec` spill without violating the Merge operator’s baseline initialization overhead? Or am I approaching this incorrectly and missing something obvious? I would really appreciate any guidance you could provide. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
