[ 
https://issues.apache.org/jira/browse/SPARK-17465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15541946#comment-15541946
 ] 

Xing Shi commented on SPARK-17465:
----------------------------------

Resolved.

In every task, method _currentUnrollMemory_ will be called several times. It 
will scan all keys of _unrollMemoryMap_ and _ pendingUnrollMemoryMap_, so the 
processing time is proportional to the map size.
https://github.com/apache/spark/blob/v1.6.0/core/src/main/scala/org/apache/spark/storage/MemoryStore.scala#L540-L542

I have checked the processing time of _currentUnrollMemory_. It just equals to 
the time increased from before.

Hope this will help someone who has a similar issue of increasing processing 
time when upgrade Spark to 1.6.0 :)

> Inappropriate memory management in `org.apache.spark.storage.MemoryStore` may 
> lead to memory leak
> -------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-17465
>                 URL: https://issues.apache.org/jira/browse/SPARK-17465
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.6.0, 1.6.1, 1.6.2
>            Reporter: Xing Shi
>            Assignee: Xing Shi
>             Fix For: 1.6.3, 2.0.1, 2.1.0
>
>
> After updating Spark from 1.5.0 to 1.6.0, I found that it seems to have a 
> memory leak on my Spark streaming application.
> Here is the head of the heap histogram of my application, which has been 
> running about 160 hours:
> {code:borderStyle=solid}
>  num     #instances         #bytes  class name
> ----------------------------------------------
>    1:         28094       71753976  [B
>    2:       1188086       28514064  java.lang.Long
>    3:       1183844       28412256  scala.collection.mutable.DefaultEntry
>    4:        102242       13098768  <methodKlass>
>    5:        102242       12421000  <constMethodKlass>
>    6:          8184        9199032  <constantPoolKlass>
>    7:            38        8391584  [Lscala.collection.mutable.HashEntry;
>    8:          8184        7514288  <instanceKlassKlass>
>    9:          6651        4874080  <constantPoolCacheKlass>
>   10:         37197        3438040  [C
>   11:          6423        2445640  <methodDataKlass>
>   12:          8773        1044808  java.lang.Class
>   13:         36869         884856  java.lang.String
>   14:         15715         848368  [[I
>   15:         13690         782808  [S
>   16:         18903         604896  
> java.util.concurrent.ConcurrentHashMap$HashEntry
>   17:            13         426192  [Lscala.concurrent.forkjoin.ForkJoinTask;
> {code}
> It shows that *scala.collection.mutable.DefaultEntry* and *java.lang.Long* 
> have unexpected big numbers of instances. In fact, the numbers started 
> growing at streaming process began, and keep growing proportional to total 
> number of tasks.
> After some further investigation, I found that the problem is caused by some 
> inappropriate memory management in _releaseUnrollMemoryForThisTask_ and 
> _unrollSafely_ method of class 
> [org.apache.spark.storage.MemoryStore|https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/storage/MemoryStore.scala].
> In Spark 1.6.x, a _releaseUnrollMemoryForThisTask_ operation will be 
> processed only with the parameter _memoryToRelease_ > 0:
> https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/storage/MemoryStore.scala#L530-L537
> But in fact, if a task successfully unrolled all its blocks in memory by 
> _unrollSafely_ method, the memory saved in _unrollMemoryMap_ would be set to 
> zero:
> https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/storage/MemoryStore.scala#L322
> So the result is, the memory saved in _unrollMemoryMap_ will be released, but 
> the key of that part of memory will never be removed from the hash map. The 
> hash table will keep increasing, while new tasks keep incoming. Although the 
> speed of increase is comparatively slow (about dozens of bytes per task), it 
> is possible that result into OOM after weeks or months.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to