Github user jerryshao commented on the issue:

    https://github.com/apache/spark/pull/19184
  
    Hi @mridulm , sorry for late response. I agree with you that the scenario 
is different between here and shuffle, but the underlying structure and 
solutions to spill data is the same, so the problem is the same. While in the 
shuffle side, we could control the memory size to hold more data before 
spilling to avoid too many spills, but as you mentioned here we cannot do it. 
    
    Yes it is not necessary to open all the files beforehand. But since we're 
using priority queue to do merge sort, which will make all the file handler 
opened very likely. And this fix only reduces the chances to encounter too many 
files issue. Maybe we can call this fix as an intermittent fix, what do you 
think?
    
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to