[ 
https://issues.apache.org/jira/browse/SPARK-6056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SaintBacchus updated SPARK-6056:
--------------------------------
    Description: 
No matter set the `preferDirectBufs` or limit the number of thread or not 
,spark can not limit the use of offheap memory.
At line 269 of the class 'AbstractNioByteChannel' in netty-4.0.23.Final, Netty 
had allocated a offheap memory buffer with the same size in heap.
So how many buffer you want to transfor, the same size offheap memory will be 
allocated.
But once the allocated memory size reach the capacity of the overhead momery 
set in yarn, this executor will be killed.
I wrote a simple code to test it:
```scala
val bufferRdd = sc.makeRDD(0 to 10, 10).map(x=>new 
Array[Byte](10*1024*1024)).persist
bufferRdd.count
val part =  bufferRdd.partitions(0)
val sparkEnv = SparkEnv.get
val blockMgr = sparkEnv.blockManager
val blockOption = blockMgr.get(RDDBlockId(bufferRdd.id, part.index))
val resultIt = blockOption.get.data.asInstanceOf[Iterator[Array[Byte]]]
val len = resultIt.map(_.length).sum
```
If use multi-thread to get len, the physical memery will soon   exceed the 
limit set by spark.yarn.executor.memoryOverhead

> Unlimit offHeap memory use cause RM killing the container
> ---------------------------------------------------------
>
>                 Key: SPARK-6056
>                 URL: https://issues.apache.org/jira/browse/SPARK-6056
>             Project: Spark
>          Issue Type: Bug
>          Components: Shuffle, Spark Core
>    Affects Versions: 1.2.1
>            Reporter: SaintBacchus
>
> No matter set the `preferDirectBufs` or limit the number of thread or not 
> ,spark can not limit the use of offheap memory.
> At line 269 of the class 'AbstractNioByteChannel' in netty-4.0.23.Final, 
> Netty had allocated a offheap memory buffer with the same size in heap.
> So how many buffer you want to transfor, the same size offheap memory will be 
> allocated.
> But once the allocated memory size reach the capacity of the overhead momery 
> set in yarn, this executor will be killed.
> I wrote a simple code to test it:
> ```scala
> val bufferRdd = sc.makeRDD(0 to 10, 10).map(x=>new 
> Array[Byte](10*1024*1024)).persist
> bufferRdd.count
> val part =  bufferRdd.partitions(0)
> val sparkEnv = SparkEnv.get
> val blockMgr = sparkEnv.blockManager
> val blockOption = blockMgr.get(RDDBlockId(bufferRdd.id, part.index))
> val resultIt = blockOption.get.data.asInstanceOf[Iterator[Array[Byte]]]
> val len = resultIt.map(_.length).sum
> ```
> If use multi-thread to get len, the physical memery will soon   exceed the 
> limit set by spark.yarn.executor.memoryOverhead



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to