Kushal Mahajan created SPARK-28575:
--------------------------------------

             Summary: Spark job time increasing when upgrading Spark from 2.1.1 
to 2.3.1
                 Key: SPARK-28575
                 URL: https://issues.apache.org/jira/browse/SPARK-28575
             Project: Spark
          Issue Type: Bug
          Components: Scheduler
    Affects Versions: 2.3.1
            Reporter: Kushal Mahajan


I am running a spark job using standalone cluster with Spark 2.1.1. The 
standalone cluster was upgraded from 2.1.1 to Spark 2.3.1. There was 
considerable drop in performance(~3-4 times) in the spark job. Upon 
investigation, I found out that there is considerable time lag(ranging from 30 
sec to 2 min) between start time of different spark actions(excluding the time 
taken by the action itself).(as can be seen from start time of each job in 
Spark UI page). This was not there in Spark 2.1.1. Can anybody tell what is the 
issue here?

PS: I am reading multiple text files from S3 using wholeTextFile, creating 
multiple dataframes for thos textfiles and writing them out to S3 in csv format.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to