Marcelo Vanzin created SPARK-19556:
--------------------------------------

             Summary: Broadcast data is not encrypted when I/O encryption is on
                 Key: SPARK-19556
                 URL: https://issues.apache.org/jira/browse/SPARK-19556
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 2.1.0
            Reporter: Marcelo Vanzin


{{TorrentBroadcast}} uses a couple of "back doors" into the block manager to 
write and read data:

{code}
      if (!blockManager.putBytes(pieceId, bytes, MEMORY_AND_DISK_SER, 
tellMaster = true)) {
        throw new SparkException(s"Failed to store $pieceId of $broadcastId in 
local BlockManager")
      }
{code}

{code}
      bm.getLocalBytes(pieceId) match {
        case Some(block) =>
          blocks(pid) = block
          releaseLock(pieceId)
        case None =>
          bm.getRemoteBytes(pieceId) match {
            case Some(b) =>
              if (checksumEnabled) {
                val sum = calcChecksum(b.chunks(0))
                if (sum != checksums(pid)) {
                  throw new SparkException(s"corrupt remote block $pieceId of 
$broadcastId:" +
                    s" $sum != ${checksums(pid)}")
                }
              }
              // We found the block from remote executors/driver's 
BlockManager, so put the block
              // in this executor's BlockManager.
              if (!bm.putBytes(pieceId, b, StorageLevel.MEMORY_AND_DISK_SER, 
tellMaster = true)) {
                throw new SparkException(
                  s"Failed to store $pieceId of $broadcastId in local 
BlockManager")
              }
              blocks(pid) = b
            case None =>
              throw new SparkException(s"Failed to get $pieceId of 
$broadcastId")
          }
      }
{code}

The thing these block manager methods have in common is that they bypass the 
encryption code; so broadcast data is stored unencrypted in the block manager, 
causing unencrypted data to be written to disk if those blocks need to be 
evicted from memory.

The correct fix here is actually not to change {{TorrentBroadcast}}, but to fix 
the block manager so that:

- data stored in memory is not encrypted
- data written to disk is encrypted

This would simplify the code paths that use BlockManager / SerializerManager 
APIs (e.g. see SPARK-19520), but requires some tricky changes inside the 
BlockManager to still be able to use file channels to avoid reading whole 
blocks back into memory so they can be decrypted.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to