GitHub user bersprockets opened a pull request:

    https://github.com/apache/spark/pull/21899

    [SPARK-24912][SQL] Don't obscure source of OOM during broadcast join

    ## What changes were proposed in this pull request?
    
    This PR shows the stack trace of the original OutOfMemoryError that occurs 
while building or broadcasting a HashedRelation, rather than the stack trace of 
the newly created OutOfMemoryError that's created during error handling.
    
    Currently, when an OOM occurs while broadcasting a table, the stack trace 
shows a line in the error handling:
    <pre>
    java.lang.OutOfMemoryError: Not enough memory to build and broadcast the 
table to all worker nodes. As a workaround, you can either disable broadcast by 
setting spark.sql.autoBroadcastJoinThreshold to -1 or increase the spark driver 
memory by setting spark.driver.memory to a higher value
      at 
org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1$$anonfun$apply$1.apply(BroadcastExchangeExec.scala:122)
      at 
org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1$$anonfun$apply$1.apply(BroadcastExchangeExec.scala:76)
      at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withExecutionId$1.apply(SQLExecution.scala:101)
    </pre>
    With this PR, it shows the original stack trace.
    <pre>
    java.lang.OutOfMemoryError: Not enough memory to build and broadcast the 
table to all worker nodes. As a workaround, you can either disable broadcast by 
setting spark.sql.autoBroadcastJoinThreshold to -1 or increase the spark driver 
memory by setting spark.driver.memory to a higher value
      at 
org.apache.spark.sql.execution.joins.LongToUnsafeRowMap.grow(HashedRelation.scala:628)
      at 
org.apache.spark.sql.execution.joins.LongToUnsafeRowMap.append(HashedRelation.scala:570)
      at 
org.apache.spark.sql.execution.joins.LongHashedRelation$.apply(HashedRelation.scala:865)
    </pre>
    
    While sometimes the line on which is the exception is thrown is just a 
victim, sometimes it is a participant in the problem, as was the case in the 
above exception.
    
    ## How was this patch tested?
    
    Manually tested case where broadcast join caused an OOM, and a case where 
it did not.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/bersprockets/spark SPARK-24912

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21899.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21899
    
----
commit ca19596c9c09e2cc0ef5667d84f1d289b8184b91
Author: Bruce Robbins <bersprockets@...>
Date:   2018-07-26T01:11:07Z

    Initial commit

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to