[GitHub] [spark] c21 commented on pull request #29079: [SPARK-32286][SQL] Coalesce bucketed table for shuffled hash join if applicable

GitBox Sun, 12 Jul 2020 17:51:07 -0700


c21 commented on pull request #29079:
URL: https://github.com/apache/spark/pull/29079#issuecomment-657304351



   @viirya, I see your point for coalescing reduces parallelism to cause more 
OOM on build side. I agree this can happen. All in all, this is a 
disable-by-default feature, and user can selectively enable it depending on 
their table size. But I think it's worth to have as it indeed helped our users 
in production for using shuffled hash join on bucketed tables.
   
   Re OOM issue in shuffled hash join - I think we can add a fallback mechanism 
when building hash map and fall back to sort merge join if the size of hash map 
being too big to OOM (i.e. rethink 
https://issues.apache.org/jira/browse/SPARK-21505), we have been running this 
feature in production for years, and it works well.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] c21 commented on pull request #29079: [SPARK-32286][SQL] Coalesce bucketed table for shuffled hash join if applicable

Reply via email to