[ https://issues.apache.org/jira/browse/SPARK-52362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Piyush Chamria updated SPARK-52362: ----------------------------------- Attachment: SparkPerfBenchmark.scala > S3 (CSV read) → S3 (CSV write) task shows 16% performance regression in Spark > 3.5.1 over 3.3.1 with TPC-H lineitem.tbl dataset > ------------------------------------------------------------------------------------------------------------------------------ > > Key: SPARK-52362 > URL: https://issues.apache.org/jira/browse/SPARK-52362 > Project: Spark > Issue Type: Bug > Components: Input/Output, Spark Core, SQL > Affects Versions: 3.5.1 > Environment: * OS: RHEL8 > * Hardware: Intel Xeon Gold 6258R @ 2.70GHz, 8 cores (VMware VM) > * Spark versions tested: 3.3.1 vs 3.5.1 (VanillaSpark) > * Hadoop: 3.2.3 > * Executor configuration: Single executor > Reporter: Piyush Chamria > Priority: Major > Attachments: SparkPerfBenchmark.scala > > > *Issue Summary:* > Performance regression of ~16% observed in Spark 3.5.1 compared to 3.3.1 for > S3 CSV read → S3 CSV write operations. > *Test Methodology:* > Simple pass-through task: S3 (CSV read) → S3 (CSV write) > *Benchmark Results:* > Dataset : TPC-H lineitem.tbl > - Performance degradation: ~16% > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org