GitHub user bersprockets opened a pull request: https://github.com/apache/spark/pull/21899
[SPARK-24912][SQL] Don't obscure source of OOM during broadcast join ## What changes were proposed in this pull request? This PR shows the stack trace of the original OutOfMemoryError that occurs while building or broadcasting a HashedRelation, rather than the stack trace of the newly created OutOfMemoryError that's created during error handling. Currently, when an OOM occurs while broadcasting a table, the stack trace shows a line in the error handling: <pre> java.lang.OutOfMemoryError: Not enough memory to build and broadcast the table to all worker nodes. As a workaround, you can either disable broadcast by setting spark.sql.autoBroadcastJoinThreshold to -1 or increase the spark driver memory by setting spark.driver.memory to a higher value at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1$$anonfun$apply$1.apply(BroadcastExchangeExec.scala:122) at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1$$anonfun$apply$1.apply(BroadcastExchangeExec.scala:76) at org.apache.spark.sql.execution.SQLExecution$$anonfun$withExecutionId$1.apply(SQLExecution.scala:101) </pre> With this PR, it shows the original stack trace. <pre> java.lang.OutOfMemoryError: Not enough memory to build and broadcast the table to all worker nodes. As a workaround, you can either disable broadcast by setting spark.sql.autoBroadcastJoinThreshold to -1 or increase the spark driver memory by setting spark.driver.memory to a higher value at org.apache.spark.sql.execution.joins.LongToUnsafeRowMap.grow(HashedRelation.scala:628) at org.apache.spark.sql.execution.joins.LongToUnsafeRowMap.append(HashedRelation.scala:570) at org.apache.spark.sql.execution.joins.LongHashedRelation$.apply(HashedRelation.scala:865) </pre> While sometimes the line on which is the exception is thrown is just a victim, sometimes it is a participant in the problem, as was the case in the above exception. ## How was this patch tested? Manually tested case where broadcast join caused an OOM, and a case where it did not. You can merge this pull request into a Git repository by running: $ git pull https://github.com/bersprockets/spark SPARK-24912 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21899.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21899 ---- commit ca19596c9c09e2cc0ef5667d84f1d289b8184b91 Author: Bruce Robbins <bersprockets@...> Date: 2018-07-26T01:11:07Z Initial commit ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org