[ 
https://issues.apache.org/jira/browse/SPARK-52362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Piyush Chamria updated SPARK-52362:
-----------------------------------
    Attachment: SparkPerfBenchmark.scala

> S3 (CSV read) → S3 (CSV write) task shows 16% performance regression in Spark 
> 3.5.1 over 3.3.1 with TPC-H lineitem.tbl dataset
> ------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-52362
>                 URL: https://issues.apache.org/jira/browse/SPARK-52362
>             Project: Spark
>          Issue Type: Bug
>          Components: Input/Output, Spark Core, SQL
>    Affects Versions: 3.5.1
>         Environment: * OS: RHEL8
>  * Hardware: Intel Xeon Gold 6258R @ 2.70GHz, 8 cores (VMware VM)
>  * Spark versions tested: 3.3.1 vs 3.5.1 (VanillaSpark)
>  * Hadoop: 3.2.3
>  * Executor configuration: Single executor
>            Reporter: Piyush Chamria
>            Priority: Major
>         Attachments: SparkPerfBenchmark.scala
>
>
> *Issue Summary:*
> Performance regression of ~16% observed in Spark 3.5.1 compared to 3.3.1 for 
> S3 CSV read → S3 CSV write operations.
> *Test Methodology:*
> Simple pass-through task: S3 (CSV read) → S3 (CSV write)
> *Benchmark Results:*
> Dataset : TPC-H lineitem.tbl 
> - Performance degradation: ~16%
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to