[jira] [Comment Edited] (SPARK-29055) Memory leak in Spark

George Papa (Jira) Sun, 29 Sep 2019 06:48:28 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-29055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16940417#comment-16940417
 ]


George Papa edited comment on SPARK-29055 at 9/29/19 1:47 PM:
--------------------------------------------------------------

[~kabhwan]

I have tested also with the latest Spark 2.4.4, and I had the same behavior.

Initially, I tested with production applications in a testing environment. I 
run them for a lot of hours, but the memory storage didn't have any decrease.

Due to the production applications are based on Spark streaming and structured 
streaming, it is difficult someone else to reproduce the same environment,  so 
I created a simple application (snippet code).

You can run the snippet code (with spark version <=2.3.2) and observe the 
storage memory of the executors and the driver; then run the same code with 
2.3.3 or newer and observe the difference. I think the starting point is to 
reproduce the same behavior with a very simple application like the application 
in the snippet code.

In my view the problem is not related with the Spark API (batch, spark 
streaming, structured streaming etc) but it is something more fundamental in 
Spark versions. The issue SPARK-27648 it looks the same, but I think the 
problem is not related with the structured streaming (Initially, I faced it 
with a structured streaming application, but then I understand that it was not 
the structured streaming API)


was (Author: geopap):
[~kabhwan]

I have tested also with the latest Spark 2.4.4, and I have the same behavior.

Initially, I tested production applications in a testing environment for a lot 
of hours the the memory storage didn't have any decrease.

Because the production applications are based on spark streaming and structured 
streaming, it is difficult someone else to reproduce the same environment,  so 
I create simple application that has this behavior. So, can you run the snippet 
code (with spark version <=2.3.2) and observe the storage memory of the 
executors and the driver and then run it with 2.3.3 or newer and observe the 
difference? I think the starting point is to reproduce the same behavior with a 
very simple application like the application in the snippet code.

In my view the problem is not relate with the spark API (batch, spark 
streaming, structured streaming etc) but it something more fundamental in Spark 
versions. The issue SPARK-27648 it looks the same, but I think the problem is 
not related with the structured streaming (Initially, I faced it with a 
structured streaming application, bu then I understand that it was not the 
structured streaming API)

> Memory leak in Spark
> --------------------
>
>                 Key: SPARK-29055
>                 URL: https://issues.apache.org/jira/browse/SPARK-29055
>             Project: Spark
>          Issue Type: Bug
>          Components: Block Manager, Spark Core
>    Affects Versions: 2.3.3
>            Reporter: George Papa
>            Priority: Major
>         Attachments: test_csvs.zip
>
>
> I used Spark 2.1.1 and I upgraded into new versions. After Spark version 
> 2.3.3,  I observed from Spark UI that the driver memory is{color:#ff0000} 
> increasing continuously.{color}
> In more detail, the driver memory and executors memory have the same used 
> memory storage and after each iteration the storage memory is increasing. You 
> can reproduce this behavior by running the following snippet code. The 
> following example, is very simple, without any dataframe persistence, but the 
> memory consumption is not stable as it was in former Spark versions 
> (Specifically until Spark 2.3.2).
> Also, I tested with Spark streaming and structured streaming API and I had 
> the same behavior. I tested with an existing application which reads from 
> Kafka source and do some aggregations, persist dataframes and then unpersist 
> them. The persist and unpersist it works correct, I see the dataframes in the 
> storage tab in Spark UI and after the unpersist, all dataframe have removed. 
> But, after the unpersist the executors memory is not zero, BUT has the same 
> value with the driver memory. This behavior also affects the application 
> performance because the memory of the executors is increasing as the driver 
> increasing and after a while the persisted dataframes are not fit in the 
> executors memory and  I have spill to disk.
> Another error which I had after a long running, was 
> {color:#ff0000}java.lang.OutOfMemoryError: GC overhead limit exceeded, but I 
> don't know if its relevant with the above behavior or not.{color}
>  
> *HOW TO REPRODUCE THIS BEHAVIOR:*
> Create a very simple application(streaming count_file.py) in order to 
> reproduce this behavior. This application reads CSV files from a directory, 
> count the rows and then remove the processed files.
> {code:java}
> import time
> import os
> from pyspark.sql import SparkSession
> from pyspark.sql import functions as F
> from pyspark.sql import types as T
> target_dir = "..."
> spark=SparkSession.builder.appName("DataframeCount").getOrCreate()
> while True:
>     for f in os.listdir(target_dir):
>         df = spark.read.load(target_dir + f, format="csv")
>         print("Number of records: {0}".format(df.count()))
>         time.sleep(15){code}
> Submit code:
> {code:java}
> spark-submit 
> --master spark://xxx.xxx.xx.xxx
> --deploy-mode client
> --executor-memory 4g
> --executor-cores 3
> streaming count_file.py
> {code}
>  
> *TESTED CASES WITH THE SAME BEHAVIOUR:*
>  * I tested with default settings (spark-defaults.conf)
>  * Add spark.cleaner.periodicGC.interval 1min (or less)
>  * {{Turn spark.cleaner.referenceTracking.blocking}}=false
>  * Run the application in cluster mode
>  * Increase/decrease the resources of the executors and driver
>  * I tested with extraJavaOptions in driver and executor -XX:+UseG1GC 
> -XX:InitiatingHeapOccupancyPercent=35 -XX:ConcGCThreads=12
>   
> *DEPENDENCIES*
>  * Operation system: Ubuntu 16.04.3 LTS
>  * Java: jdk1.8.0_131 (tested also with jdk1.8.0_221)
>  * Python: Python 2.7.12
>  
> *NOTE:* In Spark 2.1.1 the driver memory consumption (Storage Memory tab) was 
> extremely low and after the run of ContextCleaner and BlockManager the memory 
> was decreasing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-29055) Memory leak in Spark

Reply via email to