Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1165#discussion_r14808775
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala 
---
    @@ -463,16 +463,16 @@ private[spark] class BlockManager(
                   val values = dataDeserialize(blockId, bytes)
                   if (level.deserialized) {
                     // Cache the values before returning them
    -                // TODO: Consider creating a putValues that also takes in 
a iterator?
    -                val valuesBuffer = new ArrayBuffer[Any]
    -                valuesBuffer ++= values
    -                memoryStore.putValues(blockId, valuesBuffer, level, 
returnValues = true).data
    -                  match {
    -                    case Left(values2) =>
    -                      return Some(new BlockResult(values2, 
DataReadMethod.Disk, info.size))
    -                    case _ =>
    -                      throw new SparkException("Memory store did not 
return back an iterator")
    -                  }
    +                val putResult = memoryStore.putValues(blockId, values, 
level, returnValues = true)
    +                putResult.data match {
    +                  case Left(it) =>
    +                    return Some(new BlockResult(it, DataReadMethod.Disk, 
info.size))
    +                  case Right(b) =>
    +                    return Some(new BlockResult(
    +                      dataDeserialize(blockId, b),
    --- End diff --
    
    We should talk more about what's going on here tomorrow. I read this 
function for 15 minutes and couldn't figure out what was going on. For one 
thing though, AFAIK we are only at this branch if we've read a block from disk. 
Will we just over-write the existing on-disk block again in this scenario. 
Also, is there any reason to return `dataDeserailize(blocId, b)` here instead 
of just returning `values`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to