[GitHub] spark pull request #19285: [SPARK-22068][CORE]Reduce the duplicate code betw...

ConeyLiu Wed, 20 Sep 2017 04:04:08 -0700

Github user ConeyLiu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19285#discussion_r139938279
  
    --- Diff: 
core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala ---
    @@ -233,17 +235,13 @@ private[spark] class MemoryStore(
         }
     
         if (keepUnrolling) {
    -      // We successfully unrolled the entirety of this block
    -      val arrayValues = vector.toArray
    -      vector = null
    -      val entry =
    -        new DeserializedMemoryEntry[T](arrayValues, 
SizeEstimator.estimate(arrayValues), classTag)
    -      val size = entry.size
    +      // get the precise size
    +      val size = estimateSize(true)
    --- End diff --
    
    Previously, the `putIteratorAsValues ` seems no problem. But the 
`putIteratorAsBytes ` doesn't check again after unrolled the iterator.  Now the 
`putIterator` is copied form previous `putIteratorAsValues `. For 
`SizeTrackingVector`, we could call `arrayValues.toIterator` to get a iterator 
again after call `SizeTrackingVector.toArray`. But for 
`ChunkedByteBufferOutputStream`, we can't back to `stream` after called 
`ChunkedByteBufferOutputStream.toChunkedByteBuffer` (the 
`PartiallySerializedBlock` need a stream).



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19285: [SPARK-22068][CORE]Reduce the duplicate code betw...

Reply via email to