[ https://issues.apache.org/jira/browse/SPARK-39983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17575506#comment-17575506 ]
Apache Spark commented on SPARK-39983: -------------------------------------- User 'alex-balikov' has created a pull request for this issue: https://github.com/apache/spark/pull/37413 > Should not cache unserialized broadcast relations on the driver > --------------------------------------------------------------- > > Key: SPARK-39983 > URL: https://issues.apache.org/jira/browse/SPARK-39983 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 3.3.0 > Reporter: Alex Balikov > Priority: Minor > > In TorrentBroadcast.writeBlocks we store the unserialized broadcast object in > addition to the serialized version of it - > {code:java} > private def writeBlocks(value: T): Int = { > import StorageLevel._ > // Store a copy of the broadcast variable in the driver so that tasks run > on the driver > // do not create a duplicate copy of the broadcast variable's value. > val blockManager = SparkEnv.get.blockManager > if (!blockManager.putSingle(broadcastId, value, MEMORY_AND_DISK, > tellMaster = false)) { > throw new SparkException(s"Failed to store $broadcastId in > BlockManager") > } > {code} > In case of broadcast relations, these objects can be fairly large (60MB in > one observed case) and are not strictly necessary on the driver. > Add the option to not keep the unserialized versions of the objects. > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org