[ https://issues.apache.org/jira/browse/SPARK-35610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Attila Zsolt Piros updated SPARK-35610: --------------------------------------- Description: I have identified this leak by running the Livy tests (I know it is close to the attic but this leak causes a constant OOM there) and it is in our Spark unit tests as well. This leak can be identified by checking the number of LeakyEntry in case of Scala 2.12.14 (and ZipEntry fo Scala 2.12.10) instances which can take up a considerable amount of memory (as those are created from the jars which are on the classpath). I have my own tool to instrument JVM code ([trace-agent|https://github.com/attilapiros/trace-agent]) and with that I am able to call JVM diagnostic commands at specific methods. It has a single text file embedded into the tool's jar called action.txt. In this case actions.txt content is: {noformat} $ unzip -q -c trace-agent-0.0.7.jar actions.txt diagnostic_command org.apache.spark.repl.ReplSuite runInterpreter cmd:gcClassHistogram,limit_output_lines:8,where:beforeAndAfter,with_gc:true {noformat} Which creates a class histogram at the beginning and at the end of org.apache.spark.repl.ReplSuite#runInterpreter() (after triggering a GC which might not finish as GC is done in a separate thread..). And the histograms are the followings on master branch: {noformat} $ ./build/sbt ";project repl;set Test/javaOptions += \"-javaagent:/Users/attilazsoltpiros/git/attilapiros/memoryLeak/trace-agent-0.0.7.jar\"; testOnly" |grep "ZipEntry\|LeakyEntry" 3: 197089 9460272 scala.reflect.io.FileZipArchive$LeakyEntry 3: 197089 9460272 scala.reflect.io.FileZipArchive$LeakyEntry 3: 197089 9460272 scala.reflect.io.FileZipArchive$LeakyEntry 3: 197089 9460272 scala.reflect.io.FileZipArchive$LeakyEntry 3: 197089 9460272 scala.reflect.io.FileZipArchive$LeakyEntry 3: 197089 9460272 scala.reflect.io.FileZipArchive$LeakyEntry 3: 394178 18920544 scala.reflect.io.FileZipArchive$LeakyEntry 3: 394178 18920544 scala.reflect.io.FileZipArchive$LeakyEntry 3: 591267 28380816 scala.reflect.io.FileZipArchive$LeakyEntry 3: 591267 28380816 scala.reflect.io.FileZipArchive$LeakyEntry 3: 788356 37841088 scala.reflect.io.FileZipArchive$LeakyEntry 3: 788356 37841088 scala.reflect.io.FileZipArchive$LeakyEntry 3: 985445 47301360 scala.reflect.io.FileZipArchive$LeakyEntry 3: 985445 47301360 scala.reflect.io.FileZipArchive$LeakyEntry 3: 1182534 56761632 scala.reflect.io.FileZipArchive$LeakyEntry 3: 1182534 56761632 scala.reflect.io.FileZipArchive$LeakyEntry 3: 1379623 66221904 scala.reflect.io.FileZipArchive$LeakyEntry 3: 1379623 66221904 scala.reflect.io.FileZipArchive$LeakyEntry 3: 1576712 75682176 scala.reflect.io.FileZipArchive$LeakyEntry {noformat} Where the header of the table is: {noformat} num #instances #bytes class name {noformat} So it ZipEntry instances altogether is about 75MB (173MB in case of Scala 2.12.10 and before with the ZipEntry), but the first item in the histogram would the char/byte array which also relates to this leak: {noformat} $ ./build/sbt ";project repl;set Test/javaOptions += \"-javaagent:/Users/attilazsoltpiros/git/attilapiros/memoryLeak/trace-agent-0.0.7.jar\"; testOnly" |grep "1:\|2:\|3:" 1: 2701 3496112 [B 2: 21855 2607192 [C 3: 4885 537264 java.lang.Class 1: 480323 55970208 [C 2: 480499 11531976 java.lang.String 3: 197089 9460272 scala.reflect.io.FileZipArchive$LeakyEntry 1: 481825 56148024 [C 2: 481998 11567952 java.lang.String 3: 197089 9460272 scala.reflect.io.FileZipArchive$LeakyEntry 1: 487056 57550344 [C 2: 487179 11692296 java.lang.String 3: 197089 9460272 scala.reflect.io.FileZipArchive$LeakyEntry 1: 487054 57551008 [C 2: 487176 11692224 java.lang.String 3: 197089 9460272 scala.reflect.io.FileZipArchive$LeakyEntry 1: 927823 107139160 [C 2: 928072 22273728 java.lang.String 3: 394178 18920544 scala.reflect.io.FileZipArchive$LeakyEntry 1: 927793 107129328 [C 2: 928041 22272984 java.lang.String 3: 394178 18920544 scala.reflect.io.FileZipArchive$LeakyEntry 1: 1361851 155555608 [C 2: 1362261 32694264 java.lang.String 3: 591267 28380816 scala.reflect.io.FileZipArchive$LeakyEntry 1: 1361683 155493464 [C 2: 1362092 32690208 java.lang.String 3: 591267 28380816 scala.reflect.io.FileZipArchive$LeakyEntry 1: 1803074 205157728 [C 2: 1803268 43278432 java.lang.String 3: 788356 37841088 scala.reflect.io.FileZipArchive$LeakyEntry 1: 1802385 204938224 [C 2: 1802579 43261896 java.lang.String 3: 788356 37841088 scala.reflect.io.FileZipArchive$LeakyEntry 1: 2236631 253636592 [C 2: 2237029 53688696 java.lang.String 3: 985445 47301360 scala.reflect.io.FileZipArchive$LeakyEntry 1: 2236536 253603008 [C 2: 2236933 53686392 java.lang.String 3: 985445 47301360 scala.reflect.io.FileZipArchive$LeakyEntry 1: 2668892 301893920 [C 2: 2669510 64068240 java.lang.String 3: 1182534 56761632 scala.reflect.io.FileZipArchive$LeakyEntry 1: 2668759 301846376 [C 2: 2669376 64065024 java.lang.String 3: 1182534 56761632 scala.reflect.io.FileZipArchive$LeakyEntry 1: 3101238 350101048 [C 2: 3102073 74449752 java.lang.String 3: 1379623 66221904 scala.reflect.io.FileZipArchive$LeakyEntry 1: 3101240 350101104 [C 2: 3102075 74449800 java.lang.String 3: 1379623 66221904 scala.reflect.io.FileZipArchive$LeakyEntry 1: 3533785 398371760 [C 2: 3534835 84836040 java.lang.String 3: 1576712 75682176 scala.reflect.io.FileZipArchive$LeakyEntry 1: 3533759 398367088 [C 2: 3534807 84835368 java.lang.String 3: 1576712 75682176 scala.reflect.io.FileZipArchive$LeakyEntry 1: 3967049 446893400 [C 2: 3968314 95239536 java.lang.String 3: 1773801 85142448 scala.reflect.io.FileZipArchive$LeakyEntry [info] - SPARK-26633: ExecutorClassLoader.getResourceAsStream find REPL classes (8 seconds, 248 milliseconds) Setting default log level to "ERROR". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 1: 3966423 446709584 [C 2: 3967682 95224368 java.lang.String 3: 1773801 85142448 scala.reflect.io.FileZipArchive$LeakyEntry 1: 4399583 495097208 [C 2: 4401050 105625200 java.lang.String 3: 1970890 94602720 scala.reflect.io.FileZipArchive$LeakyEntry 1: 4399578 495070064 [C 2: 4401040 105624960 java.lang.String 3: 1970890 94602720 scala.reflect.io.FileZipArchive$LeakyEntry {noformat} This is 495MB. was: I have identified this leak by running the Livy tests (I know it is close to the attic but this leak causes a constant OOM there) and it is in our Spark unit tests as well. This leak can be identified by checking the number of ZipEntry instances which can take up a considerable amount of memory (as those are created from the jars which are on the classpath). I have my own tool to instrument JVM code ([trace-agent|https://github.com/attilapiros/trace-agent]) and with that I am able to call JVM diagnostic commands at specific methods. It has a single text file embedded into the tool's jar called action.txt. In this case actions.txt content is: {noformat} $ unzip -q -c trace-agent-0.0.7.jar actions.txt diagnostic_command org.apache.spark.repl.ReplSuite runInterpreter cmd:gcClassHistogram,limit_output_lines:8,where:beforeAndAfter,with_gc:true {noformat} Which creates a class histogram at the beginning and at the end of org.apache.spark.repl.ReplSuite#runInterpreter() (after triggering a GC which might not finish as GC is done in a separate thread..). And the histograms are the followings on master branch: {noformat} $ ./build/sbt ";project repl;set Test/javaOptions += \"-javaagent:/Users/attilazsoltpiros/git/attilapiros/memoryLeak/trace-agent-0.0.7.jar\"; testOnly" | grep "ZipEntry" 2: 196797 15743760 java.util.zip.ZipEntry 2: 196797 15743760 java.util.zip.ZipEntry 2: 393594 31487520 java.util.zip.ZipEntry 2: 393594 31487520 java.util.zip.ZipEntry 2: 590391 47231280 java.util.zip.ZipEntry 2: 590391 47231280 java.util.zip.ZipEntry 2: 787188 62975040 java.util.zip.ZipEntry 2: 787188 62975040 java.util.zip.ZipEntry 2: 983985 78718800 java.util.zip.ZipEntry 2: 983985 78718800 java.util.zip.ZipEntry 2: 1180782 94462560 java.util.zip.ZipEntry 2: 1180782 94462560 java.util.zip.ZipEntry 2: 1377579 110206320 java.util.zip.ZipEntry 2: 1377579 110206320 java.util.zip.ZipEntry 2: 1574376 125950080 java.util.zip.ZipEntry 2: 1574376 125950080 java.util.zip.ZipEntry 2: 1771173 141693840 java.util.zip.ZipEntry 2: 1771173 141693840 java.util.zip.ZipEntry 2: 1967970 157437600 java.util.zip.ZipEntry Setting default log level to "ERROR". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 2: 1967970 157437600 java.util.zip.ZipEntry 2: 2164767 173181360 java.util.zip.ZipEntry {noformat} Where the header of the table is: {noformat} num #instances #bytes class name {noformat} So it ZipEntry instances altogether is about 173MB, but the first item in the histogram would the char/byte array which also relates to this leak: {noformat} $ ./build/sbt ";project repl;set Test/javaOptions += \"-javaagent:/Users/attilazsoltpiros/git/attilapiros/memoryLeak/trace-agent-0.0.7.jar\"; testOnly" | grep "1:" 1: 2619 3185752 [B 1: 480784 55931000 [C 1: 480969 55954072 [C 1: 912647 104092392 [C 1: 912552 104059536 [C 1: 1354362 153683280 [C 1: 1354332 153673448 [C 1: 1789703 202088704 [C 1: 1789676 202079056 [C 1: 2232868 251789104 [C 1: 2232248 251593392 [C 1: 2667318 300297664 [C 1: 2667203 300256912 [C 1: 3100253 348498384 [C 1: 3100250 348498896 [C 1: 3533763 396801848 [C 1: 3533725 396789720 [C 1: 3967515 445141784 [C 1: 3967459 445128328 [C 1: 4401309 493509768 [C Setting default log level to "ERROR". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 1: 4401236 493496752 [C 1: 4836168 541965464 [C {noformat} This is 541MB. > Memory leak in Spark interpreter > --------------------------------- > > Key: SPARK-35610 > URL: https://issues.apache.org/jira/browse/SPARK-35610 > Project: Spark > Issue Type: Bug > Components: Spark Core, Tests > Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.1.0, 3.1.1, 3.1.2, 3.2.0 > Reporter: Attila Zsolt Piros > Assignee: Attila Zsolt Piros > Priority: Major > > I have identified this leak by running the Livy tests (I know it is close to > the attic but this leak causes a constant OOM there) and it is in our Spark > unit tests as well. > This leak can be identified by checking the number of LeakyEntry in case of > Scala 2.12.14 (and ZipEntry fo Scala 2.12.10) instances which can take up a > considerable amount of memory (as those are created from the jars which are > on the classpath). > I have my own tool to instrument JVM code > ([trace-agent|https://github.com/attilapiros/trace-agent]) and with that I am > able to call JVM diagnostic commands at specific methods. > It has a single text file embedded into the tool's jar called action.txt. > In this case actions.txt content is: > {noformat} > $ unzip -q -c trace-agent-0.0.7.jar actions.txt > diagnostic_command org.apache.spark.repl.ReplSuite runInterpreter > cmd:gcClassHistogram,limit_output_lines:8,where:beforeAndAfter,with_gc:true > {noformat} > Which creates a class histogram at the beginning and at the end of > org.apache.spark.repl.ReplSuite#runInterpreter() (after triggering a GC which > might not finish as GC is done in a separate thread..). > And the histograms are the followings on master branch: > {noformat} > $ ./build/sbt ";project repl;set Test/javaOptions += > \"-javaagent:/Users/attilazsoltpiros/git/attilapiros/memoryLeak/trace-agent-0.0.7.jar\"; > testOnly" |grep "ZipEntry\|LeakyEntry" > 3: 197089 9460272 scala.reflect.io.FileZipArchive$LeakyEntry > 3: 197089 9460272 scala.reflect.io.FileZipArchive$LeakyEntry > 3: 197089 9460272 scala.reflect.io.FileZipArchive$LeakyEntry > 3: 197089 9460272 scala.reflect.io.FileZipArchive$LeakyEntry > 3: 197089 9460272 scala.reflect.io.FileZipArchive$LeakyEntry > 3: 197089 9460272 scala.reflect.io.FileZipArchive$LeakyEntry > 3: 394178 18920544 scala.reflect.io.FileZipArchive$LeakyEntry > 3: 394178 18920544 scala.reflect.io.FileZipArchive$LeakyEntry > 3: 591267 28380816 scala.reflect.io.FileZipArchive$LeakyEntry > 3: 591267 28380816 scala.reflect.io.FileZipArchive$LeakyEntry > 3: 788356 37841088 scala.reflect.io.FileZipArchive$LeakyEntry > 3: 788356 37841088 scala.reflect.io.FileZipArchive$LeakyEntry > 3: 985445 47301360 scala.reflect.io.FileZipArchive$LeakyEntry > 3: 985445 47301360 scala.reflect.io.FileZipArchive$LeakyEntry > 3: 1182534 56761632 scala.reflect.io.FileZipArchive$LeakyEntry > 3: 1182534 56761632 scala.reflect.io.FileZipArchive$LeakyEntry > 3: 1379623 66221904 scala.reflect.io.FileZipArchive$LeakyEntry > 3: 1379623 66221904 scala.reflect.io.FileZipArchive$LeakyEntry > 3: 1576712 75682176 scala.reflect.io.FileZipArchive$LeakyEntry > {noformat} > Where the header of the table is: > {noformat} > num #instances #bytes class name > {noformat} > So it ZipEntry instances altogether is about 75MB (173MB in case of Scala > 2.12.10 and before with the ZipEntry), but the first item in the histogram > would the char/byte array which also relates to this leak: > {noformat} > $ ./build/sbt ";project repl;set Test/javaOptions += > \"-javaagent:/Users/attilazsoltpiros/git/attilapiros/memoryLeak/trace-agent-0.0.7.jar\"; > testOnly" |grep "1:\|2:\|3:" > 1: 2701 3496112 [B > 2: 21855 2607192 [C > 3: 4885 537264 java.lang.Class > 1: 480323 55970208 [C > 2: 480499 11531976 java.lang.String > 3: 197089 9460272 scala.reflect.io.FileZipArchive$LeakyEntry > 1: 481825 56148024 [C > 2: 481998 11567952 java.lang.String > 3: 197089 9460272 scala.reflect.io.FileZipArchive$LeakyEntry > 1: 487056 57550344 [C > 2: 487179 11692296 java.lang.String > 3: 197089 9460272 scala.reflect.io.FileZipArchive$LeakyEntry > 1: 487054 57551008 [C > 2: 487176 11692224 java.lang.String > 3: 197089 9460272 scala.reflect.io.FileZipArchive$LeakyEntry > 1: 927823 107139160 [C > 2: 928072 22273728 java.lang.String > 3: 394178 18920544 scala.reflect.io.FileZipArchive$LeakyEntry > 1: 927793 107129328 [C > 2: 928041 22272984 java.lang.String > 3: 394178 18920544 scala.reflect.io.FileZipArchive$LeakyEntry > 1: 1361851 155555608 [C > 2: 1362261 32694264 java.lang.String > 3: 591267 28380816 scala.reflect.io.FileZipArchive$LeakyEntry > 1: 1361683 155493464 [C > 2: 1362092 32690208 java.lang.String > 3: 591267 28380816 scala.reflect.io.FileZipArchive$LeakyEntry > 1: 1803074 205157728 [C > 2: 1803268 43278432 java.lang.String > 3: 788356 37841088 scala.reflect.io.FileZipArchive$LeakyEntry > 1: 1802385 204938224 [C > 2: 1802579 43261896 java.lang.String > 3: 788356 37841088 scala.reflect.io.FileZipArchive$LeakyEntry > 1: 2236631 253636592 [C > 2: 2237029 53688696 java.lang.String > 3: 985445 47301360 scala.reflect.io.FileZipArchive$LeakyEntry > 1: 2236536 253603008 [C > 2: 2236933 53686392 java.lang.String > 3: 985445 47301360 scala.reflect.io.FileZipArchive$LeakyEntry > 1: 2668892 301893920 [C > 2: 2669510 64068240 java.lang.String > 3: 1182534 56761632 scala.reflect.io.FileZipArchive$LeakyEntry > 1: 2668759 301846376 [C > 2: 2669376 64065024 java.lang.String > 3: 1182534 56761632 scala.reflect.io.FileZipArchive$LeakyEntry > 1: 3101238 350101048 [C > 2: 3102073 74449752 java.lang.String > 3: 1379623 66221904 scala.reflect.io.FileZipArchive$LeakyEntry > 1: 3101240 350101104 [C > 2: 3102075 74449800 java.lang.String > 3: 1379623 66221904 scala.reflect.io.FileZipArchive$LeakyEntry > 1: 3533785 398371760 [C > 2: 3534835 84836040 java.lang.String > 3: 1576712 75682176 scala.reflect.io.FileZipArchive$LeakyEntry > 1: 3533759 398367088 [C > 2: 3534807 84835368 java.lang.String > 3: 1576712 75682176 scala.reflect.io.FileZipArchive$LeakyEntry > 1: 3967049 446893400 [C > 2: 3968314 95239536 java.lang.String > 3: 1773801 85142448 scala.reflect.io.FileZipArchive$LeakyEntry > [info] - SPARK-26633: ExecutorClassLoader.getResourceAsStream find REPL > classes (8 seconds, 248 milliseconds) > Setting default log level to "ERROR". > To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use > setLogLevel(newLevel). > 1: 3966423 446709584 [C > 2: 3967682 95224368 java.lang.String > 3: 1773801 85142448 scala.reflect.io.FileZipArchive$LeakyEntry > 1: 4399583 495097208 [C > 2: 4401050 105625200 java.lang.String > 3: 1970890 94602720 scala.reflect.io.FileZipArchive$LeakyEntry > 1: 4399578 495070064 [C > 2: 4401040 105624960 java.lang.String > 3: 1970890 94602720 scala.reflect.io.FileZipArchive$LeakyEntry > {noformat} > This is 495MB. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org