[ https://issues.apache.org/jira/browse/SPARK-22208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16193078#comment-16193078 ]
Apache Spark commented on SPARK-22208: -------------------------------------- User 'wzhfy' has created a pull request for this issue: https://github.com/apache/spark/pull/19438 > Improve percentile_approx by not rounding up targetError and starting from > index 0 > ---------------------------------------------------------------------------------- > > Key: SPARK-22208 > URL: https://issues.apache.org/jira/browse/SPARK-22208 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.3.0 > Reporter: Zhenhua Wang > > percentile_approx never returns the first element when percentile is in > (relativeError, 1/N], where relativeError default is 1/10000, and N is the > total number of elements. But ideally, percentiles in [0, 1/N] should all > return the first element as the answer. > For example, given input data 1 to 10, if a user queries 10% (or even less) > percentile, it should return 1, because the first value 1 already reaches > 10%. Currently it returns 2. > Based on the paper, targetError is not rounded up, and searching index should > start from 0 instead of 1. By following the paper, we should be able to fix > the cases mentioned above. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org