2010YOUY01 commented on PR #15610:
URL: https://github.com/apache/datafusion/pull/15610#issuecomment-3034782742

   > > tested my fuzz tests with this pr and all of them are failing currently
   > 
   > Update: I think the failure is not due to this PR's implementation, 
instead it's caused by `FairMemoryPool`'s limitation.
   > 
   > After manually setting `max_spill_merge_degree` to 2, first 3 tests 
passed, the 4th one failed with
   > 
   > ```
   > Error: ResourcesExhausted("Additional allocation failed with top memory 
consumers (across reservations) as: mock_memory_consumer#2(can spill: false) 
consumed 1695480 bytes, ExternalSorterMerge[0]#1(can spill: false) consumed 
401664 bytes. Error: Failed to allocate additional 297024 bytes for 
ExternalSorterMerge[0] with 0 bytes already allocated for this reservation - 8 
bytes remain available for the total pool
   > ```
   > 
   > I believe it's the limitation of `FairSpillPool` that non-spillable 
consumers are not able to back off, and it can block spilling consumers from 
normal execution. (and this specific test is also possible to pass due to some 
complex interactions if the runtime memory consumers are set up differently)
   > 
   > I'll try to come up with a minimal reproducer later.
   
   I spotted a critical issue in this PR:
   
   This approach works if the memory pool is only used by a single query, since 
we can guarantee during different stage in sort operator (partial sort -> SPM), 
the memory limit for a partition is roughly the same.
   
   However, if a memory pool is shared among many queries (new query join the 
pool, get finished, and leave): given a certain partition, its memory budget 
can vary in its life cycle. For example: for a sort operator, when doing 
partial sorting it has 2GB memory budget, but when it moves to SPM phase, it 
only got 1GB because a new memory-consuming neighbor has recently joined.
   
   As a result, SPM has to be inherently spillable.
   
   I plan to close this PR and help to get 
https://github.com/apache/datafusion/pull/15700 merged, I have a rough idea to 
do a patch for my previous concern, I'll share my ideas there.
   
   cc @rluvaton @ding-young 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to