[ https://issues.apache.org/jira/browse/SPARK-32690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yang Jie updated SPARK-32690: ----------------------------- Attachment: image-2020-08-24-19-30-55-380.png > Spark-32550 affects the performance of some cases > ------------------------------------------------- > > Key: SPARK-32690 > URL: https://issues.apache.org/jira/browse/SPARK-32690 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.1.0 > Reporter: Yang Jie > Priority: Major > Attachments: image-2020-08-24-19-30-17-712.png, > image-2020-08-24-19-30-55-380.png > > > I found that [Spark-32550|https://github.com/apache/spark/pull/29366] > affected the performance of some cases, the typical cases is "deterministic > cardinality estimation" in > HyperLogLogPlusPlusSuite when rsd is 0.001, we found the code that is > significantly slower is > > [https://github.com/apache/spark/blob/08b951b1cb58cea2c34703e43202fe7c84725c8a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/aggregate/HyperLogLogPlusPlusSuite.scala#L41] > > The results of comparison before and after spark-32550 merged are as follows: > | |After SPARK-32550 create createBuffer|After SPARK-32550 end to end |Before > SPARK-32550 create input|Before SPARK-32550 end to end | > |rsd 0.001, n 1000|52715513243|53004810687|195807999|773977677| > |rsd 0.001, n 5000|51881246165|52519358215|13689949|249974855| > |rsd 0.001, n 10000|52234282788|52374639172|14199071|183452846| > |rsd 0.001, n 50000|55503517122|55664035449|15219394|584477125| > |rsd 0.001, n 100000|51862662845|52116774177|19662834|166483678| > |rsd 0.001, n 500000|51619226715|52183189526|178048012|16681330| > |rsd 0.001, n 1000000|54861366981|54976399142|226178708|18826340| > |rsd 0.001, n 5000000|52023602143|52354615149|388173579|15446409| > |rsd 0.001, n 10000000|53008591660|53601392304|533454460|16033032| > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org