[ 
https://issues.apache.org/jira/browse/SPARK-32690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-32690:
-----------------------------
    Attachment: image-2020-08-24-19-30-55-380.png

> Spark-32550 affects the performance of some cases
> -------------------------------------------------
>
>                 Key: SPARK-32690
>                 URL: https://issues.apache.org/jira/browse/SPARK-32690
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.1.0
>            Reporter: Yang Jie
>            Priority: Major
>         Attachments: image-2020-08-24-19-30-17-712.png, 
> image-2020-08-24-19-30-55-380.png
>
>
> I found that [Spark-32550|https://github.com/apache/spark/pull/29366] 
> affected the performance of some cases, the typical cases is "deterministic 
> cardinality estimation" in 
> HyperLogLogPlusPlusSuite when rsd is 0.001, we found the code that is 
> significantly slower is
>  
> [https://github.com/apache/spark/blob/08b951b1cb58cea2c34703e43202fe7c84725c8a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/aggregate/HyperLogLogPlusPlusSuite.scala#L41]
>  
> The results of comparison before and after spark-32550 merged are as follows:
> | |After SPARK-32550 create createBuffer|After SPARK-32550 end to end |Before 
> SPARK-32550 create input|Before SPARK-32550 end to end |
> |rsd 0.001, n 1000|52715513243|53004810687|195807999|773977677|
> |rsd 0.001, n 5000|51881246165|52519358215|13689949|249974855|
> |rsd 0.001, n 10000|52234282788|52374639172|14199071|183452846|
> |rsd 0.001, n 50000|55503517122|55664035449|15219394|584477125|
> |rsd 0.001, n 100000|51862662845|52116774177|19662834|166483678|
> |rsd 0.001, n 500000|51619226715|52183189526|178048012|16681330|
> |rsd 0.001, n 1000000|54861366981|54976399142|226178708|18826340|
> |rsd 0.001, n 5000000|52023602143|52354615149|388173579|15446409|
> |rsd 0.001, n 10000000|53008591660|53601392304|533454460|16033032|
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to