Peter Liu created SPARK-16333:
---------------------------------

             Summary: Excessive Spark history event/json data (5GB!)
                 Key: SPARK-16333
                 URL: https://issues.apache.org/jira/browse/SPARK-16333
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 2.0.0
         Environment: this is seen on both x86 (Intel(R) Xeon(R), E5-2699 ) and 
ppc platform (Habanero, Model: 8348-21C), Red Hat Enterprise Linux Server 
release 7.2 (Maipo)., Spark2.0.0-preview (May-24, 2016 build)
            Reporter: Peter Liu


With Spark2.0.0-preview (May-24 build), the history event data (the json file), 
that is generated for each Spark application run (see below), can be as big as 
5GB (instead of 14 MB for exactly the same application run and the same input 
data of 1TB under Spark1.6.1)

-rwxrwx--- 1 root root 5.3G Jun 30 09:39 app-20160630091959-0000
-rwxrwx--- 1 root root 5.3G Jun 30 09:56 app-20160630094213-0000
-rwxrwx--- 1 root root 5.3G Jun 30 10:13 app-20160630095856-0000
-rwxrwx--- 1 root root 5.3G Jun 30 10:30 app-20160630101556-0000

The test is done with Sparkbench V2, SQL RDD (see github: 
https://github.com/SparkTC/spark-bench)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to