cloud-fan commented on pull request #32210:
URL: https://github.com/apache/spark/pull/32210#issuecomment-826239123


   After more thinking, I'm wondering if this is the right direction to go. 
Apparently falling back to SMJ wastes the partially-built hash map.
   
   If one partition is a bit larger to build the in-memory hash map, I feel 
spilling the hash map might be a better choice? If one partition is much larger 
to build the in-memory hash map, seems we can use the same technique of skew 
join handling, to split the partition into multiple smaller ones so that they 
can fit in memory.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to