[ https://issues.apache.org/jira/browse/SPARK-19556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15925067#comment-15925067 ]
Apache Spark commented on SPARK-19556: -------------------------------------- User 'vanzin' has created a pull request for this issue: https://github.com/apache/spark/pull/17295 > Broadcast data is not encrypted when I/O encryption is on > --------------------------------------------------------- > > Key: SPARK-19556 > URL: https://issues.apache.org/jira/browse/SPARK-19556 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.1.0 > Reporter: Marcelo Vanzin > > {{TorrentBroadcast}} uses a couple of "back doors" into the block manager to > write and read data: > {code} > if (!blockManager.putBytes(pieceId, bytes, MEMORY_AND_DISK_SER, > tellMaster = true)) { > throw new SparkException(s"Failed to store $pieceId of $broadcastId > in local BlockManager") > } > {code} > {code} > bm.getLocalBytes(pieceId) match { > case Some(block) => > blocks(pid) = block > releaseLock(pieceId) > case None => > bm.getRemoteBytes(pieceId) match { > case Some(b) => > if (checksumEnabled) { > val sum = calcChecksum(b.chunks(0)) > if (sum != checksums(pid)) { > throw new SparkException(s"corrupt remote block $pieceId of > $broadcastId:" + > s" $sum != ${checksums(pid)}") > } > } > // We found the block from remote executors/driver's > BlockManager, so put the block > // in this executor's BlockManager. > if (!bm.putBytes(pieceId, b, StorageLevel.MEMORY_AND_DISK_SER, > tellMaster = true)) { > throw new SparkException( > s"Failed to store $pieceId of $broadcastId in local > BlockManager") > } > blocks(pid) = b > case None => > throw new SparkException(s"Failed to get $pieceId of > $broadcastId") > } > } > {code} > The thing these block manager methods have in common is that they bypass the > encryption code; so broadcast data is stored unencrypted in the block > manager, causing unencrypted data to be written to disk if those blocks need > to be evicted from memory. > The correct fix here is actually not to change {{TorrentBroadcast}}, but to > fix the block manager so that: > - data stored in memory is not encrypted > - data written to disk is encrypted > This would simplify the code paths that use BlockManager / SerializerManager > APIs (e.g. see SPARK-19520), but requires some tricky changes inside the > BlockManager to still be able to use file channels to avoid reading whole > blocks back into memory so they can be decrypted. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org