Github user bersprockets commented on a diff in the pull request: https://github.com/apache/spark/pull/21899#discussion_r211047556 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala --- @@ -118,12 +119,20 @@ case class BroadcastExchangeExec( // SparkFatalException, which is a subclass of Exception. ThreadUtils.awaitResult // will catch this exception and re-throw the wrapped fatal throwable. case oe: OutOfMemoryError => - throw new SparkFatalException( + val sizeMessage = if (dataSize != -1) { + s"${SparkLauncher.DRIVER_MEMORY} by at least the estimated size of the " + --- End diff -- @hvanhovell That's what was being obscured :). In testing this, I've seen various places. In the three cases I have seen first hand: <pre> java.lang.OutOfMemoryError: Not enough memory to build and broadcast the table to all worker nodes. As a workaround, you can either disable broadcast by setting spark.sql.autoBroadcastJoinThreshold to -1 or increase the spark driver memory by setting spark.driver.memory to a higher value. at org.apache.spark.sql.execution.joins.LongToUnsafeRowMap.grow(HashedRelation.scala:628) at org.apache.spark.sql.execution.joins.LongToUnsafeRowMap.append(HashedRelation.scala:570) at org.apache.spark.sql.execution.joins.LongHashedRelation$.apply(HashedRelation.scala:865) </pre> At that line is an allocation: <pre> val newPage = new Array[Long](newNumWords.toInt) </pre> 2nd case: <pre> java.lang.OutOfMemoryError: Not enough memory to build and broadcast the table to all worker nodes. As a workaround, you can either disable broadcast by setting spark.sql.autoBroadcastJoinThreshold to -1 or increase spark.driver.memory by at least the estimated size of the relation (96468992 bytes). at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$3.apply(TorrentBroadcast.scala:286) at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$3.apply(TorrentBroadcast.scala:286) </pre> 3rd case: <pre> java.lang.OutOfMemoryError: Not enough memory to build and broadcast the table to all worker nodes. As a workaround, you can either disable broadcast by setting \ spark.sql.autoBroadcastJoinThreshold to -1 or increase the spark driver memory by setting spark.driver.memory to a higher value. at org.apache.spark.unsafe.memory.MemoryBlock.allocateFromObject(MemoryBlock.java:118) at org.apache.spark.sql.catalyst.expressions.UnsafeRow.getUTF8String(UnsafeRow.java:420) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source) at org.apache.spark.sql.execution.joins.UnsafeHashedRelation$.apply(HashedRelation.scala:311) </pre> At that line is also an allocation: <pre> mb = new ByteArrayMemoryBlock(array, offset, length); </pre>
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org