[ https://issues.apache.org/jira/browse/SPARK-1912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matei Zaharia resolved SPARK-1912. ---------------------------------- Resolution: Fixed > Compression memory issue during reduce > -------------------------------------- > > Key: SPARK-1912 > URL: https://issues.apache.org/jira/browse/SPARK-1912 > Project: Spark > Issue Type: Bug > Components: Spark Core > Reporter: Wenchen Fan > Assignee: Wenchen Fan > Fix For: 1.1.0 > > > When we need to read a compressed block, we will first create a compress > stream instance(LZF or Snappy) and use it to wrap that block. > Let's say a reducer task need to read 1000 local shuffle blocks, it will > first prepare to read that 1000 blocks, which means create 1000 compression > stream instance to wrap them. But the initialization of compression instance > will allocate some memory and when we have many compression instance at the > same time, it is a problem. > Actually reducer reads the shuffle blocks one by one, so why we create > compression instance at the first time? Can we do it lazily that when a block > is first read, create compression instance for it. -- This message was sent by Atlassian JIRA (v6.2#6252)