Marcelo Vanzin created SPARK-19556: -------------------------------------- Summary: Broadcast data is not encrypted when I/O encryption is on Key: SPARK-19556 URL: https://issues.apache.org/jira/browse/SPARK-19556 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.1.0 Reporter: Marcelo Vanzin
{{TorrentBroadcast}} uses a couple of "back doors" into the block manager to write and read data: {code} if (!blockManager.putBytes(pieceId, bytes, MEMORY_AND_DISK_SER, tellMaster = true)) { throw new SparkException(s"Failed to store $pieceId of $broadcastId in local BlockManager") } {code} {code} bm.getLocalBytes(pieceId) match { case Some(block) => blocks(pid) = block releaseLock(pieceId) case None => bm.getRemoteBytes(pieceId) match { case Some(b) => if (checksumEnabled) { val sum = calcChecksum(b.chunks(0)) if (sum != checksums(pid)) { throw new SparkException(s"corrupt remote block $pieceId of $broadcastId:" + s" $sum != ${checksums(pid)}") } } // We found the block from remote executors/driver's BlockManager, so put the block // in this executor's BlockManager. if (!bm.putBytes(pieceId, b, StorageLevel.MEMORY_AND_DISK_SER, tellMaster = true)) { throw new SparkException( s"Failed to store $pieceId of $broadcastId in local BlockManager") } blocks(pid) = b case None => throw new SparkException(s"Failed to get $pieceId of $broadcastId") } } {code} The thing these block manager methods have in common is that they bypass the encryption code; so broadcast data is stored unencrypted in the block manager, causing unencrypted data to be written to disk if those blocks need to be evicted from memory. The correct fix here is actually not to change {{TorrentBroadcast}}, but to fix the block manager so that: - data stored in memory is not encrypted - data written to disk is encrypted This would simplify the code paths that use BlockManager / SerializerManager APIs (e.g. see SPARK-19520), but requires some tricky changes inside the BlockManager to still be able to use file channels to avoid reading whole blocks back into memory so they can be decrypted. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org