[jira] [Updated] (SPARK-35610) Memory leak in Spark interpreter

Attila Zsolt Piros (Jira) Wed, 02 Jun 2021 02:55:20 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-35610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Attila Zsolt Piros updated SPARK-35610:
---------------------------------------
    Description: 
I have identified this leak by running the Livy tests (I know it is close to 
attic but this leak causes a constant OOM there) but it is in our Spark unit 
tests as well. 

This leak can be identified by checking the number of ZipEntry instances which 
can take up a considerable amount of memory (as those are created from the jars 
which are on the classpath).

I have my own tool to instrument JVM code 
([trace-agent|https://github.com/attilapiros/trace-agent]) and with that I am 
able to call JVM diagnostic commands at specific methods. 
It has a single text file embedded into the tool's jar called action.txt. 

In this case actions.txt content is:

{noformat}
$ unzip -q -c trace-agent-0.0.7.jar actions.txt
diagnostic_command org.apache.spark.repl.ReplSuite runInterpreter  
cmd:gcClassHistogram,limit_output_lines:8,where:beforeAndAfter,with_gc:true
{noformat}

Which creates a class histogram at the beginning and at the end of 
org.apache.spark.repl.ReplSuite#runInterpreter() (after triggering a GC which 
might not finish as GC is done in a separate thread..).

And the histograms are the followings on master branch:

{noformat}
$ ./build/sbt ";project repl;set Test/javaOptions += 
\"-javaagent:/Users/attilazsoltpiros/git/attilapiros/memoryLeak/trace-agent-0.0.7.jar\";
 testOnly" | grep "ZipEntry"
   2:        196797       15743760  java.util.zip.ZipEntry
   2:        196797       15743760  java.util.zip.ZipEntry
   2:        393594       31487520  java.util.zip.ZipEntry
   2:        393594       31487520  java.util.zip.ZipEntry
   2:        590391       47231280  java.util.zip.ZipEntry
   2:        590391       47231280  java.util.zip.ZipEntry
   2:        787188       62975040  java.util.zip.ZipEntry
   2:        787188       62975040  java.util.zip.ZipEntry
   2:        983985       78718800  java.util.zip.ZipEntry
   2:        983985       78718800  java.util.zip.ZipEntry
   2:       1180782       94462560  java.util.zip.ZipEntry
   2:       1180782       94462560  java.util.zip.ZipEntry
   2:       1377579      110206320  java.util.zip.ZipEntry
   2:       1377579      110206320  java.util.zip.ZipEntry
   2:       1574376      125950080  java.util.zip.ZipEntry
   2:       1574376      125950080  java.util.zip.ZipEntry
   2:       1771173      141693840  java.util.zip.ZipEntry
   2:       1771173      141693840  java.util.zip.ZipEntry
   2:       1967970      157437600  java.util.zip.ZipEntry
Setting default log level to "ERROR".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
   2:       1967970      157437600  java.util.zip.ZipEntry
   2:       2164767      173181360  java.util.zip.ZipEntry
{noformat}

Where the header of the table is:
{noformat}
num     #instances         #bytes  class name
{noformat}

So it ZipEntry instances altogether is about 173MB, but the first item in the 
histogram would the char/byte array which also relates to this leak:


{noformat}
$ ./build/sbt ";project repl;set Test/javaOptions += 
\"-javaagent:/Users/attilazsoltpiros/git/attilapiros/memoryLeak/trace-agent-0.0.7.jar\";
 testOnly" | grep "1:"
   1:          2619        3185752  [B
   1:        480784       55931000  [C
   1:        480969       55954072  [C
   1:        912647      104092392  [C
   1:        912552      104059536  [C
   1:       1354362      153683280  [C
   1:       1354332      153673448  [C
   1:       1789703      202088704  [C
   1:       1789676      202079056  [C
   1:       2232868      251789104  [C
   1:       2232248      251593392  [C
   1:       2667318      300297664  [C
   1:       2667203      300256912  [C
   1:       3100253      348498384  [C
   1:       3100250      348498896  [C
   1:       3533763      396801848  [C
   1:       3533725      396789720  [C
   1:       3967515      445141784  [C
   1:       3967459      445128328  [C
   1:       4401309      493509768  [C
Setting default log level to "ERROR".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
   1:       4401236      493496752  [C
   1:       4836168      541965464  [C
{noformat}

This is 541MB.


  was:
I have identified this leak by running the Livy tests (I know it is close to 
attic but this leak causes a constant OOM there) but it is in our Spark unit 
tests as well. 

This leak can be identified by checking the number of ZipEntry instances which 
can take up a considerable amount of memory (as those are created from the jars 
which are on the classpath).

I have my own tool to instrument JVM code 
(https://github.com/attilapiros/trace-agent) and with that I am able to call 
JVM diagnostic commands at specific methods. 
It has a single text file embedded into the tool's jar called action.txt. 

In this case actions.txt content is:

{noformat}
$ unzip -q -c trace-agent-0.0.7.jar actions.txt
diagnostic_command org.apache.spark.repl.ReplSuite runInterpreter  
cmd:gcClassHistogram,limit_output_lines:8,where:beforeAndAfter,with_gc:true
{noformat}

Which creates a class histogram at the beginning and at the end of 
org.apache.spark.repl.ReplSuite#runInterpreter() (after triggering a GC which 
might not finish as GC is done in a separate thread..).

And the histograms are the followings on master branch:

{noformat}
$ ./build/sbt ";project repl;set Test/javaOptions += 
\"-javaagent:/Users/attilazsoltpiros/git/attilapiros/memoryLeak/trace-agent-0.0.7.jar\";
 testOnly" | grep "ZipEntry"
   2:        196797       15743760  java.util.zip.ZipEntry
   2:        196797       15743760  java.util.zip.ZipEntry
   2:        393594       31487520  java.util.zip.ZipEntry
   2:        393594       31487520  java.util.zip.ZipEntry
   2:        590391       47231280  java.util.zip.ZipEntry
   2:        590391       47231280  java.util.zip.ZipEntry
   2:        787188       62975040  java.util.zip.ZipEntry
   2:        787188       62975040  java.util.zip.ZipEntry
   2:        983985       78718800  java.util.zip.ZipEntry
   2:        983985       78718800  java.util.zip.ZipEntry
   2:       1180782       94462560  java.util.zip.ZipEntry
   2:       1180782       94462560  java.util.zip.ZipEntry
   2:       1377579      110206320  java.util.zip.ZipEntry
   2:       1377579      110206320  java.util.zip.ZipEntry
   2:       1574376      125950080  java.util.zip.ZipEntry
   2:       1574376      125950080  java.util.zip.ZipEntry
   2:       1771173      141693840  java.util.zip.ZipEntry
   2:       1771173      141693840  java.util.zip.ZipEntry
   2:       1967970      157437600  java.util.zip.ZipEntry
Setting default log level to "ERROR".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
   2:       1967970      157437600  java.util.zip.ZipEntry
   2:       2164767      173181360  java.util.zip.ZipEntry
{noformat}

Where the header of the table is:
{noformat}
num     #instances         #bytes  class name
{noformat}

So it ZipEntry instances altogether is about 173MB, but the first item in the 
histogram would the char/byte array which also relates to this leak:


{noformat}
$ ./build/sbt ";project repl;set Test/javaOptions += 
\"-javaagent:/Users/attilazsoltpiros/git/attilapiros/memoryLeak/trace-agent-0.0.7.jar\";
 testOnly" | grep "1:"
   1:          2619        3185752  [B
   1:        480784       55931000  [C
   1:        480969       55954072  [C
   1:        912647      104092392  [C
   1:        912552      104059536  [C
   1:       1354362      153683280  [C
   1:       1354332      153673448  [C
   1:       1789703      202088704  [C
   1:       1789676      202079056  [C
   1:       2232868      251789104  [C
   1:       2232248      251593392  [C
   1:       2667318      300297664  [C
   1:       2667203      300256912  [C
   1:       3100253      348498384  [C
   1:       3100250      348498896  [C
   1:       3533763      396801848  [C
   1:       3533725      396789720  [C
   1:       3967515      445141784  [C
   1:       3967459      445128328  [C
   1:       4401309      493509768  [C
Setting default log level to "ERROR".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
   1:       4401236      493496752  [C
   1:       4836168      541965464  [C
{noformat}

This is 541MB.



> Memory leak in Spark interpreter 
> ---------------------------------
>
>                 Key: SPARK-35610
>                 URL: https://issues.apache.org/jira/browse/SPARK-35610
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, Tests
>    Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.1.0, 3.1.1, 3.1.2, 3.2.0
>            Reporter: Attila Zsolt Piros
>            Assignee: Attila Zsolt Piros
>            Priority: Major
>
> I have identified this leak by running the Livy tests (I know it is close to 
> attic but this leak causes a constant OOM there) but it is in our Spark unit 
> tests as well. 
> This leak can be identified by checking the number of ZipEntry instances 
> which can take up a considerable amount of memory (as those are created from 
> the jars which are on the classpath).
> I have my own tool to instrument JVM code 
> ([trace-agent|https://github.com/attilapiros/trace-agent]) and with that I am 
> able to call JVM diagnostic commands at specific methods. 
> It has a single text file embedded into the tool's jar called action.txt. 
> In this case actions.txt content is:
> {noformat}
> $ unzip -q -c trace-agent-0.0.7.jar actions.txt
> diagnostic_command org.apache.spark.repl.ReplSuite runInterpreter  
> cmd:gcClassHistogram,limit_output_lines:8,where:beforeAndAfter,with_gc:true
> {noformat}
> Which creates a class histogram at the beginning and at the end of 
> org.apache.spark.repl.ReplSuite#runInterpreter() (after triggering a GC which 
> might not finish as GC is done in a separate thread..).
> And the histograms are the followings on master branch:
> {noformat}
> $ ./build/sbt ";project repl;set Test/javaOptions += 
> \"-javaagent:/Users/attilazsoltpiros/git/attilapiros/memoryLeak/trace-agent-0.0.7.jar\";
>  testOnly" | grep "ZipEntry"
>    2:        196797       15743760  java.util.zip.ZipEntry
>    2:        196797       15743760  java.util.zip.ZipEntry
>    2:        393594       31487520  java.util.zip.ZipEntry
>    2:        393594       31487520  java.util.zip.ZipEntry
>    2:        590391       47231280  java.util.zip.ZipEntry
>    2:        590391       47231280  java.util.zip.ZipEntry
>    2:        787188       62975040  java.util.zip.ZipEntry
>    2:        787188       62975040  java.util.zip.ZipEntry
>    2:        983985       78718800  java.util.zip.ZipEntry
>    2:        983985       78718800  java.util.zip.ZipEntry
>    2:       1180782       94462560  java.util.zip.ZipEntry
>    2:       1180782       94462560  java.util.zip.ZipEntry
>    2:       1377579      110206320  java.util.zip.ZipEntry
>    2:       1377579      110206320  java.util.zip.ZipEntry
>    2:       1574376      125950080  java.util.zip.ZipEntry
>    2:       1574376      125950080  java.util.zip.ZipEntry
>    2:       1771173      141693840  java.util.zip.ZipEntry
>    2:       1771173      141693840  java.util.zip.ZipEntry
>    2:       1967970      157437600  java.util.zip.ZipEntry
> Setting default log level to "ERROR".
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
> setLogLevel(newLevel).
>    2:       1967970      157437600  java.util.zip.ZipEntry
>    2:       2164767      173181360  java.util.zip.ZipEntry
> {noformat}
> Where the header of the table is:
> {noformat}
> num     #instances         #bytes  class name
> {noformat}
> So it ZipEntry instances altogether is about 173MB, but the first item in the 
> histogram would the char/byte array which also relates to this leak:
> {noformat}
> $ ./build/sbt ";project repl;set Test/javaOptions += 
> \"-javaagent:/Users/attilazsoltpiros/git/attilapiros/memoryLeak/trace-agent-0.0.7.jar\";
>  testOnly" | grep "1:"
>    1:          2619        3185752  [B
>    1:        480784       55931000  [C
>    1:        480969       55954072  [C
>    1:        912647      104092392  [C
>    1:        912552      104059536  [C
>    1:       1354362      153683280  [C
>    1:       1354332      153673448  [C
>    1:       1789703      202088704  [C
>    1:       1789676      202079056  [C
>    1:       2232868      251789104  [C
>    1:       2232248      251593392  [C
>    1:       2667318      300297664  [C
>    1:       2667203      300256912  [C
>    1:       3100253      348498384  [C
>    1:       3100250      348498896  [C
>    1:       3533763      396801848  [C
>    1:       3533725      396789720  [C
>    1:       3967515      445141784  [C
>    1:       3967459      445128328  [C
>    1:       4401309      493509768  [C
> Setting default log level to "ERROR".
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
> setLogLevel(newLevel).
>    1:       4401236      493496752  [C
>    1:       4836168      541965464  [C
> {noformat}
> This is 541MB.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35610) Memory leak in Spark interpreter

Reply via email to