[ 
https://issues.apache.org/jira/browse/SPARK-54037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrian Pusty updated SPARK-54037:
---------------------------------
    Summary: Throughput deteriorated after migration from spark 3.5.5 to spark 
4.0.0  (was: Throughput deteriorated after migration from spark 3.5.5 to spark 
4.0.0w)

> Throughput deteriorated after migration from spark 3.5.5 to spark 4.0.0
> -----------------------------------------------------------------------
>
>                 Key: SPARK-54037
>                 URL: https://issues.apache.org/jira/browse/SPARK-54037
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 4.0.0
>            Reporter: Adrian Pusty
>            Priority: Major
>
> My team recently updated spark dependency version from 3.5.5 to 4.0.0
> This included use of spark-4.0.0-bin-hadoop3.tgz, update in pom.xml files and 
> change of import statements (org.apache.spark.sql -> 
> org.apache.spark.sql.classic).
> After this change our throughput (calculated as rows transferred per second) 
> has significantly dropped for our both scenarios: 1. read from file, write to 
> database and 2. read from database, write to database.
> I have performed comparison between application versions with spark 3.5.5 and 
> 4.0.0 in cluster mode, local mode and one comparison (with use of synthetic 
> file) using spark-shell only.
> In case of spark-shell I had more or less the same throughput for 3.5.5 and 
> 4.0.0 but in case of our app used in cluster / local mode - both of these 
> scenarios had better throughput with 3.5.5.
> I have observed that with 4.0.0 there are longer delays (when compared with 
> 3.5.5) between log lines
> "Running task x in stage y"
> and
> "Finished task x in stage y".
> Is this throughput degradation a known issue? Could it be related to this 
> task - [SPARK-48456] [M1] Performance benchmark - ASF JIRA ?
> (I'll also mention that we are using checkpointing (in case it might be 
> important here))



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to