c21 commented on pull request #29079: URL: https://github.com/apache/spark/pull/29079#issuecomment-657304351
@viirya, I see your point for coalescing reduces parallelism to cause more OOM on build side. I agree this can happen. All in all, this is a disable-by-default feature, and user can selectively enable it depending on their table size. But I think it's worth to have as it indeed helped our users in production for using shuffled hash join on bucketed tables. Re OOM issue in shuffled hash join - I think we can add a fallback mechanism when building hash map and fall back to sort merge join if the size of hash map being too big to OOM (i.e. rethink https://issues.apache.org/jira/browse/SPARK-21505), we have been running this feature in production for years, and it works well. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org