[
https://issues.apache.org/jira/browse/SPARK-54037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Adrian Pusty updated SPARK-54037:
---------------------------------
Summary: Throughput deteriorated after migration from spark 3.5.5 to spark
4.0.0 (was: Throughput deteriorated after migration from spark 3.5.5 to spark
4.0.0w)
> Throughput deteriorated after migration from spark 3.5.5 to spark 4.0.0
> -----------------------------------------------------------------------
>
> Key: SPARK-54037
> URL: https://issues.apache.org/jira/browse/SPARK-54037
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 4.0.0
> Reporter: Adrian Pusty
> Priority: Major
>
> My team recently updated spark dependency version from 3.5.5 to 4.0.0
> This included use of spark-4.0.0-bin-hadoop3.tgz, update in pom.xml files and
> change of import statements (org.apache.spark.sql ->
> org.apache.spark.sql.classic).
> After this change our throughput (calculated as rows transferred per second)
> has significantly dropped for our both scenarios: 1. read from file, write to
> database and 2. read from database, write to database.
> I have performed comparison between application versions with spark 3.5.5 and
> 4.0.0 in cluster mode, local mode and one comparison (with use of synthetic
> file) using spark-shell only.
> In case of spark-shell I had more or less the same throughput for 3.5.5 and
> 4.0.0 but in case of our app used in cluster / local mode - both of these
> scenarios had better throughput with 3.5.5.
> I have observed that with 4.0.0 there are longer delays (when compared with
> 3.5.5) between log lines
> "Running task x in stage y"
> and
> "Finished task x in stage y".
> Is this throughput degradation a known issue? Could it be related to this
> task - [SPARK-48456] [M1] Performance benchmark - ASF JIRA ?
> (I'll also mention that we are using checkpointing (in case it might be
> important here))
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]