[ https://issues.apache.org/jira/browse/SPARK-35610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Attila Zsolt Piros updated SPARK-35610: --------------------------------------- Description: I have identified this leak by running the Livy tests (I know it is close to attic but this leak causes a constant OOM there) but it is in our Spark unit tests as well. This leak can be identified by checking the number of ZipEntry instances which can take up a considerable amount of memory (as those are created from the jars which are on the classpath). I have my own tool to instrument JVM code ([trace-agent|https://github.com/attilapiros/trace-agent]) and with that I am able to call JVM diagnostic commands at specific methods. It has a single text file embedded into the tool's jar called action.txt. In this case actions.txt content is: {noformat} $ unzip -q -c trace-agent-0.0.7.jar actions.txt diagnostic_command org.apache.spark.repl.ReplSuite runInterpreter cmd:gcClassHistogram,limit_output_lines:8,where:beforeAndAfter,with_gc:true {noformat} Which creates a class histogram at the beginning and at the end of org.apache.spark.repl.ReplSuite#runInterpreter() (after triggering a GC which might not finish as GC is done in a separate thread..). And the histograms are the followings on master branch: {noformat} $ ./build/sbt ";project repl;set Test/javaOptions += \"-javaagent:/Users/attilazsoltpiros/git/attilapiros/memoryLeak/trace-agent-0.0.7.jar\"; testOnly" | grep "ZipEntry" 2: 196797 15743760 java.util.zip.ZipEntry 2: 196797 15743760 java.util.zip.ZipEntry 2: 393594 31487520 java.util.zip.ZipEntry 2: 393594 31487520 java.util.zip.ZipEntry 2: 590391 47231280 java.util.zip.ZipEntry 2: 590391 47231280 java.util.zip.ZipEntry 2: 787188 62975040 java.util.zip.ZipEntry 2: 787188 62975040 java.util.zip.ZipEntry 2: 983985 78718800 java.util.zip.ZipEntry 2: 983985 78718800 java.util.zip.ZipEntry 2: 1180782 94462560 java.util.zip.ZipEntry 2: 1180782 94462560 java.util.zip.ZipEntry 2: 1377579 110206320 java.util.zip.ZipEntry 2: 1377579 110206320 java.util.zip.ZipEntry 2: 1574376 125950080 java.util.zip.ZipEntry 2: 1574376 125950080 java.util.zip.ZipEntry 2: 1771173 141693840 java.util.zip.ZipEntry 2: 1771173 141693840 java.util.zip.ZipEntry 2: 1967970 157437600 java.util.zip.ZipEntry Setting default log level to "ERROR". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 2: 1967970 157437600 java.util.zip.ZipEntry 2: 2164767 173181360 java.util.zip.ZipEntry {noformat} Where the header of the table is: {noformat} num #instances #bytes class name {noformat} So it ZipEntry instances altogether is about 173MB, but the first item in the histogram would the char/byte array which also relates to this leak: {noformat} $ ./build/sbt ";project repl;set Test/javaOptions += \"-javaagent:/Users/attilazsoltpiros/git/attilapiros/memoryLeak/trace-agent-0.0.7.jar\"; testOnly" | grep "1:" 1: 2619 3185752 [B 1: 480784 55931000 [C 1: 480969 55954072 [C 1: 912647 104092392 [C 1: 912552 104059536 [C 1: 1354362 153683280 [C 1: 1354332 153673448 [C 1: 1789703 202088704 [C 1: 1789676 202079056 [C 1: 2232868 251789104 [C 1: 2232248 251593392 [C 1: 2667318 300297664 [C 1: 2667203 300256912 [C 1: 3100253 348498384 [C 1: 3100250 348498896 [C 1: 3533763 396801848 [C 1: 3533725 396789720 [C 1: 3967515 445141784 [C 1: 3967459 445128328 [C 1: 4401309 493509768 [C Setting default log level to "ERROR". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 1: 4401236 493496752 [C 1: 4836168 541965464 [C {noformat} This is 541MB. was: I have identified this leak by running the Livy tests (I know it is close to attic but this leak causes a constant OOM there) but it is in our Spark unit tests as well. This leak can be identified by checking the number of ZipEntry instances which can take up a considerable amount of memory (as those are created from the jars which are on the classpath). I have my own tool to instrument JVM code (https://github.com/attilapiros/trace-agent) and with that I am able to call JVM diagnostic commands at specific methods. It has a single text file embedded into the tool's jar called action.txt. In this case actions.txt content is: {noformat} $ unzip -q -c trace-agent-0.0.7.jar actions.txt diagnostic_command org.apache.spark.repl.ReplSuite runInterpreter cmd:gcClassHistogram,limit_output_lines:8,where:beforeAndAfter,with_gc:true {noformat} Which creates a class histogram at the beginning and at the end of org.apache.spark.repl.ReplSuite#runInterpreter() (after triggering a GC which might not finish as GC is done in a separate thread..). And the histograms are the followings on master branch: {noformat} $ ./build/sbt ";project repl;set Test/javaOptions += \"-javaagent:/Users/attilazsoltpiros/git/attilapiros/memoryLeak/trace-agent-0.0.7.jar\"; testOnly" | grep "ZipEntry" 2: 196797 15743760 java.util.zip.ZipEntry 2: 196797 15743760 java.util.zip.ZipEntry 2: 393594 31487520 java.util.zip.ZipEntry 2: 393594 31487520 java.util.zip.ZipEntry 2: 590391 47231280 java.util.zip.ZipEntry 2: 590391 47231280 java.util.zip.ZipEntry 2: 787188 62975040 java.util.zip.ZipEntry 2: 787188 62975040 java.util.zip.ZipEntry 2: 983985 78718800 java.util.zip.ZipEntry 2: 983985 78718800 java.util.zip.ZipEntry 2: 1180782 94462560 java.util.zip.ZipEntry 2: 1180782 94462560 java.util.zip.ZipEntry 2: 1377579 110206320 java.util.zip.ZipEntry 2: 1377579 110206320 java.util.zip.ZipEntry 2: 1574376 125950080 java.util.zip.ZipEntry 2: 1574376 125950080 java.util.zip.ZipEntry 2: 1771173 141693840 java.util.zip.ZipEntry 2: 1771173 141693840 java.util.zip.ZipEntry 2: 1967970 157437600 java.util.zip.ZipEntry Setting default log level to "ERROR". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 2: 1967970 157437600 java.util.zip.ZipEntry 2: 2164767 173181360 java.util.zip.ZipEntry {noformat} Where the header of the table is: {noformat} num #instances #bytes class name {noformat} So it ZipEntry instances altogether is about 173MB, but the first item in the histogram would the char/byte array which also relates to this leak: {noformat} $ ./build/sbt ";project repl;set Test/javaOptions += \"-javaagent:/Users/attilazsoltpiros/git/attilapiros/memoryLeak/trace-agent-0.0.7.jar\"; testOnly" | grep "1:" 1: 2619 3185752 [B 1: 480784 55931000 [C 1: 480969 55954072 [C 1: 912647 104092392 [C 1: 912552 104059536 [C 1: 1354362 153683280 [C 1: 1354332 153673448 [C 1: 1789703 202088704 [C 1: 1789676 202079056 [C 1: 2232868 251789104 [C 1: 2232248 251593392 [C 1: 2667318 300297664 [C 1: 2667203 300256912 [C 1: 3100253 348498384 [C 1: 3100250 348498896 [C 1: 3533763 396801848 [C 1: 3533725 396789720 [C 1: 3967515 445141784 [C 1: 3967459 445128328 [C 1: 4401309 493509768 [C Setting default log level to "ERROR". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 1: 4401236 493496752 [C 1: 4836168 541965464 [C {noformat} This is 541MB. > Memory leak in Spark interpreter > --------------------------------- > > Key: SPARK-35610 > URL: https://issues.apache.org/jira/browse/SPARK-35610 > Project: Spark > Issue Type: Bug > Components: Spark Core, Tests > Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.1.0, 3.1.1, 3.1.2, 3.2.0 > Reporter: Attila Zsolt Piros > Assignee: Attila Zsolt Piros > Priority: Major > > I have identified this leak by running the Livy tests (I know it is close to > attic but this leak causes a constant OOM there) but it is in our Spark unit > tests as well. > This leak can be identified by checking the number of ZipEntry instances > which can take up a considerable amount of memory (as those are created from > the jars which are on the classpath). > I have my own tool to instrument JVM code > ([trace-agent|https://github.com/attilapiros/trace-agent]) and with that I am > able to call JVM diagnostic commands at specific methods. > It has a single text file embedded into the tool's jar called action.txt. > In this case actions.txt content is: > {noformat} > $ unzip -q -c trace-agent-0.0.7.jar actions.txt > diagnostic_command org.apache.spark.repl.ReplSuite runInterpreter > cmd:gcClassHistogram,limit_output_lines:8,where:beforeAndAfter,with_gc:true > {noformat} > Which creates a class histogram at the beginning and at the end of > org.apache.spark.repl.ReplSuite#runInterpreter() (after triggering a GC which > might not finish as GC is done in a separate thread..). > And the histograms are the followings on master branch: > {noformat} > $ ./build/sbt ";project repl;set Test/javaOptions += > \"-javaagent:/Users/attilazsoltpiros/git/attilapiros/memoryLeak/trace-agent-0.0.7.jar\"; > testOnly" | grep "ZipEntry" > 2: 196797 15743760 java.util.zip.ZipEntry > 2: 196797 15743760 java.util.zip.ZipEntry > 2: 393594 31487520 java.util.zip.ZipEntry > 2: 393594 31487520 java.util.zip.ZipEntry > 2: 590391 47231280 java.util.zip.ZipEntry > 2: 590391 47231280 java.util.zip.ZipEntry > 2: 787188 62975040 java.util.zip.ZipEntry > 2: 787188 62975040 java.util.zip.ZipEntry > 2: 983985 78718800 java.util.zip.ZipEntry > 2: 983985 78718800 java.util.zip.ZipEntry > 2: 1180782 94462560 java.util.zip.ZipEntry > 2: 1180782 94462560 java.util.zip.ZipEntry > 2: 1377579 110206320 java.util.zip.ZipEntry > 2: 1377579 110206320 java.util.zip.ZipEntry > 2: 1574376 125950080 java.util.zip.ZipEntry > 2: 1574376 125950080 java.util.zip.ZipEntry > 2: 1771173 141693840 java.util.zip.ZipEntry > 2: 1771173 141693840 java.util.zip.ZipEntry > 2: 1967970 157437600 java.util.zip.ZipEntry > Setting default log level to "ERROR". > To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use > setLogLevel(newLevel). > 2: 1967970 157437600 java.util.zip.ZipEntry > 2: 2164767 173181360 java.util.zip.ZipEntry > {noformat} > Where the header of the table is: > {noformat} > num #instances #bytes class name > {noformat} > So it ZipEntry instances altogether is about 173MB, but the first item in the > histogram would the char/byte array which also relates to this leak: > {noformat} > $ ./build/sbt ";project repl;set Test/javaOptions += > \"-javaagent:/Users/attilazsoltpiros/git/attilapiros/memoryLeak/trace-agent-0.0.7.jar\"; > testOnly" | grep "1:" > 1: 2619 3185752 [B > 1: 480784 55931000 [C > 1: 480969 55954072 [C > 1: 912647 104092392 [C > 1: 912552 104059536 [C > 1: 1354362 153683280 [C > 1: 1354332 153673448 [C > 1: 1789703 202088704 [C > 1: 1789676 202079056 [C > 1: 2232868 251789104 [C > 1: 2232248 251593392 [C > 1: 2667318 300297664 [C > 1: 2667203 300256912 [C > 1: 3100253 348498384 [C > 1: 3100250 348498896 [C > 1: 3533763 396801848 [C > 1: 3533725 396789720 [C > 1: 3967515 445141784 [C > 1: 3967459 445128328 [C > 1: 4401309 493509768 [C > Setting default log level to "ERROR". > To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use > setLogLevel(newLevel). > 1: 4401236 493496752 [C > 1: 4836168 541965464 [C > {noformat} > This is 541MB. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org