[ 
https://issues.apache.org/jira/browse/SPARK-35610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17355685#comment-17355685
 ] 

Apache Spark commented on SPARK-35610:
--------------------------------------

User 'attilapiros' has created a pull request for this issue:
https://github.com/apache/spark/pull/32748

> Memory leak in Spark interpreter 
> ---------------------------------
>
>                 Key: SPARK-35610
>                 URL: https://issues.apache.org/jira/browse/SPARK-35610
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, Tests
>    Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.1.0, 3.1.1, 3.1.2, 3.2.0
>            Reporter: Attila Zsolt Piros
>            Assignee: Attila Zsolt Piros
>            Priority: Major
>
> I have identified this leak by running the Livy tests (I know it is close to 
> the attic but this leak causes a constant OOM there) and it is in our Spark 
> unit tests as well. 
> This leak can be identified by checking the number of LeakyEntry in case of 
> Scala 2.12.14 (and ZipEntry for Scala 2.12.10) instances which with its 
> related data can take up a considerable amount of memory (as those are 
> created from the jars which are on the classpath).
> I have my own tool for instrumenting JVM code 
> ([trace-agent|https://github.com/attilapiros/trace-agent]) and with that I am 
> able to call JVM diagnostic commands at specific methods. Let me show how it 
> in action.
>  
> It has a single text file embedded into the tool's jar called action.txt. 
> In this case actions.txt content is:
> {noformat}
> $ unzip -q -c trace-agent-0.0.7.jar actions.txt
> diagnostic_command org.apache.spark.repl.ReplSuite runInterpreter  
> cmd:gcClassHistogram,limit_output_lines:8,where:beforeAndAfter,with_gc:true
> diagnostic_command org.apache.spark.repl.ReplSuite afterAll  
> cmd:gcClassHistogram,limit_output_lines:8,where:after,with_gc:true
> {noformat}
> Which creates a class histogram at the beginning and at the end of 
> org.apache.spark.repl.ReplSuite#runInterpreter() (after triggering a GC which 
> might not finish as GC is done in a separate thread..) and one histogram in 
> the end of the afterAll() method.
> And the histograms are the followings on master branch:
> {noformat}
> $ ./build/sbt ";project repl;set Test/javaOptions += 
> \"-javaagent:/Users/attilazsoltpiros/git/attilapiros/memoryLeak/trace-agent-0.0.7.jar\";
>  testOnly" |grep "ZipEntry\|LeakyEntry"
>    3:        197089        9460272  scala.reflect.io.FileZipArchive$LeakyEntry
>    3:        197089        9460272  scala.reflect.io.FileZipArchive$LeakyEntry
>    3:        197089        9460272  scala.reflect.io.FileZipArchive$LeakyEntry
>    3:        197089        9460272  scala.reflect.io.FileZipArchive$LeakyEntry
>    3:        197089        9460272  scala.reflect.io.FileZipArchive$LeakyEntry
>    3:        197089        9460272  scala.reflect.io.FileZipArchive$LeakyEntry
>    3:        394178       18920544  scala.reflect.io.FileZipArchive$LeakyEntry
>    3:        394178       18920544  scala.reflect.io.FileZipArchive$LeakyEntry
>    3:        591267       28380816  scala.reflect.io.FileZipArchive$LeakyEntry
>    3:        591267       28380816  scala.reflect.io.FileZipArchive$LeakyEntry
>    3:        788356       37841088  scala.reflect.io.FileZipArchive$LeakyEntry
>    3:        788356       37841088  scala.reflect.io.FileZipArchive$LeakyEntry
>    3:        985445       47301360  scala.reflect.io.FileZipArchive$LeakyEntry
>    3:        985445       47301360  scala.reflect.io.FileZipArchive$LeakyEntry
>    3:       1182534       56761632  scala.reflect.io.FileZipArchive$LeakyEntry
>    3:       1182534       56761632  scala.reflect.io.FileZipArchive$LeakyEntry
>    3:       1379623       66221904  scala.reflect.io.FileZipArchive$LeakyEntry
>    3:       1379623       66221904  scala.reflect.io.FileZipArchive$LeakyEntry
>    3:       1576712       75682176  scala.reflect.io.FileZipArchive$LeakyEntry
> {noformat}
> Where the header of the table is:
> {noformat}
> num     #instances         #bytes  class name
> {noformat}
> So the LeakyEntry in the end is about 75MB (173MB in case of Scala 2.12.10 
> and before for another class called ZipEntry) but the first item (a char/byte 
> arrays) and the second item (strings) in the histogram also relates to this 
> leak:
> {noformat}
> $ ./build/sbt ";project repl;set Test/javaOptions += 
> \"-javaagent:/Users/attilazsoltpiros/git/attilapiros/memoryLeak/trace-agent-0.0.7.jar\";
>  testOnly" |grep "1:\|2:\|3:"
>    1:          2701        3496112  [B
>    2:         21855        2607192  [C
>    3:          4885         537264  java.lang.Class
>    1:        480323       55970208  [C
>    2:        480499       11531976  java.lang.String
>    3:        197089        9460272  scala.reflect.io.FileZipArchive$LeakyEntry
>    1:        481825       56148024  [C
>    2:        481998       11567952  java.lang.String
>    3:        197089        9460272  scala.reflect.io.FileZipArchive$LeakyEntry
>    1:        487056       57550344  [C
>    2:        487179       11692296  java.lang.String
>    3:        197089        9460272  scala.reflect.io.FileZipArchive$LeakyEntry
>    1:        487054       57551008  [C
>    2:        487176       11692224  java.lang.String
>    3:        197089        9460272  scala.reflect.io.FileZipArchive$LeakyEntry
>    1:        927823      107139160  [C
>    2:        928072       22273728  java.lang.String
>    3:        394178       18920544  scala.reflect.io.FileZipArchive$LeakyEntry
>    1:        927793      107129328  [C
>    2:        928041       22272984  java.lang.String
>    3:        394178       18920544  scala.reflect.io.FileZipArchive$LeakyEntry
>    1:       1361851      155555608  [C
>    2:       1362261       32694264  java.lang.String
>    3:        591267       28380816  scala.reflect.io.FileZipArchive$LeakyEntry
>    1:       1361683      155493464  [C
>    2:       1362092       32690208  java.lang.String
>    3:        591267       28380816  scala.reflect.io.FileZipArchive$LeakyEntry
>    1:       1803074      205157728  [C
>    2:       1803268       43278432  java.lang.String
>    3:        788356       37841088  scala.reflect.io.FileZipArchive$LeakyEntry
>    1:       1802385      204938224  [C
>    2:       1802579       43261896  java.lang.String
>    3:        788356       37841088  scala.reflect.io.FileZipArchive$LeakyEntry
>    1:       2236631      253636592  [C
>    2:       2237029       53688696  java.lang.String
>    3:        985445       47301360  scala.reflect.io.FileZipArchive$LeakyEntry
>    1:       2236536      253603008  [C
>    2:       2236933       53686392  java.lang.String
>    3:        985445       47301360  scala.reflect.io.FileZipArchive$LeakyEntry
>    1:       2668892      301893920  [C
>    2:       2669510       64068240  java.lang.String
>    3:       1182534       56761632  scala.reflect.io.FileZipArchive$LeakyEntry
>    1:       2668759      301846376  [C
>    2:       2669376       64065024  java.lang.String
>    3:       1182534       56761632  scala.reflect.io.FileZipArchive$LeakyEntry
>    1:       3101238      350101048  [C
>    2:       3102073       74449752  java.lang.String
>    3:       1379623       66221904  scala.reflect.io.FileZipArchive$LeakyEntry
>    1:       3101240      350101104  [C
>    2:       3102075       74449800  java.lang.String
>    3:       1379623       66221904  scala.reflect.io.FileZipArchive$LeakyEntry
>    1:       3533785      398371760  [C
>    2:       3534835       84836040  java.lang.String
>    3:       1576712       75682176  scala.reflect.io.FileZipArchive$LeakyEntry
>    1:       3533759      398367088  [C
>    2:       3534807       84835368  java.lang.String
>    3:       1576712       75682176  scala.reflect.io.FileZipArchive$LeakyEntry
>    1:       3967049      446893400  [C
>    2:       3968314       95239536  java.lang.String
>    3:       1773801       85142448  scala.reflect.io.FileZipArchive$LeakyEntry
> [info] - SPARK-26633: ExecutorClassLoader.getResourceAsStream find REPL 
> classes (8 seconds, 248 milliseconds)
> Setting default log level to "ERROR".
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
> setLogLevel(newLevel).
>    1:       3966423      446709584  [C
>    2:       3967682       95224368  java.lang.String
>    3:       1773801       85142448  scala.reflect.io.FileZipArchive$LeakyEntry
>    1:       4399583      495097208  [C
>    2:       4401050      105625200  java.lang.String
>    3:       1970890       94602720  scala.reflect.io.FileZipArchive$LeakyEntry
>    1:       4399578      495070064  [C
>    2:       4401040      105624960  java.lang.String
>    3:       1970890       94602720  scala.reflect.io.FileZipArchive$LeakyEntry
> {noformat}
> The last three is about 700MB altogether.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to