GitHub user rdblue reopened a pull request: https://github.com/apache/spark/pull/19394
[SPARK-22170][SQL] Reduce memory consumption in broadcast joins. ## What changes were proposed in this pull request? This updates the broadcast join code path to lazily decompress pages and iterate through UnsafeRows to prevent all rows from being held in memory while the broadcast table is being built. ## How was this patch tested? Existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rdblue/spark broadcast-driver-memory Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19394.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19394 ---- ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org