Wenchen Fan created SPARK-1912:
----------------------------------

             Summary: Compression memory issue during shuffle
                 Key: SPARK-1912
                 URL: https://issues.apache.org/jira/browse/SPARK-1912
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
            Reporter: Wenchen Fan


When we need to read a compressed block, we will first create a compress stream 
instance(LZF or Snappy) and use it to wrap that block.
Let's say a reducer task need to read 1000 local shuffle blocks, it will first 
prepare to read that 1000 blocks, which means create 1000 compression stream 
instance to wrap them. But the initialization of compression instance will 
allocate some memory and when we have many compression instance at the same 
time, it is a problem.
Actually reducer reads the shuffle blocks one by one, so why we create 
compression instance at the first time? Can we do it lazily that when a block 
is first read, create compression instance for it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to