xuanzhiang created SPARK-40499: ---------------------------------- Summary: Spark 3.2.1 percentlie_approx query much slower than Spark 2.4.0 Key: SPARK-40499 URL: https://issues.apache.org/jira/browse/SPARK-40499 Project: Spark Issue Type: Bug Components: Shuffle Affects Versions: 3.2.1 Environment: !image-2022-09-20-16-57-01-881.png! Reporter: xuanzhiang
spark.sql( s""" |SELECT | Info , | PERCENTILE_APPROX(cost,0.5) cost_p50, | PERCENTILE_APPROX(cost,0.9) cost_p90, | PERCENTILE_APPROX(cost,0.95) cost_p95, | PERCENTILE_APPROX(cost,0.99) cost_p99, | PERCENTILE_APPROX(cost,0.999) cost_p999 |FROM | textData |""".stripMargin) * When we used spark 2.4.0, aggregation adopted objHashAggregator, stage 2 pull shuffle data very quick . but , when we use spark 3.2.1 and use old shuffle , 140M shuffle data cost 3 hours. * -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org