[ https://issues.apache.org/jira/browse/KYLIN-5787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17835371#comment-17835371 ]
pengfei.zhan edited comment on KYLIN-5787 at 4/9/24 12:17 PM: -------------------------------------------------------------- h1. The old behavior || ||*percentile*||*percentile_approx*|| |Precomputation|t-digest|t-digest| |runtime computation|QuantileSummaries|QuantileSummaries| |pushdown / spark-sql|Sort and take the exact value|QuantileSummaries| h1. Design Add configuration "kylin.query.percentile-approx-algorithm", default value is null, keep current behavior unchanged by default, project level setting is not supported, restart KYLIN to make it work. Configure the optional value "t-digest", the configured behavior is as follows || ||*percentile*||*percentile_approx*|| |Precomputation|t-digest|t-digest| |runtime computation|t-digest|t-digest| |pushdown|Sort and take the exact value|t-digest| |spark-sql|Sort and take the exact value|QuantileSummaries| runtime computation means need extra aggregation on the layout(also called cuboid). More info please refer to: https://cn.kyligence.io/resources/kyligence-public-seminar-190403/ was (Author: JIRAUSER294653): h1. The old behavior || ||*percentile*||*percentile_approx*|| |Precomputation|t-digest|t-digest| |runtime computation|QuantileSummaries|QuantileSummaries| |pushdown / spark-sql|Sort and take the exact value|QuantileSummaries| h1. Design Add configuration "kylin.query.percentile-approx-algorithm", default value is null, keep current behavior unchanged by default, project level setting is not supported, restart KYLIN to make it work. Configure the optional value "t-digest", the configured behavior is as follows || ||*percentile*||*percentile_approx*|| |Precomputation|t-digest|t-digest| |runtime computation|t-digest|t-digest| |pushdown|Sort and take the exact value|t-digest| |spark-sql|Sort and take the exact value|QuantileSummaries| runtime computation means need extra aggregation on the layout(also called cuboid). > Use t-digest as spark percentile_approx function > ------------------------------------------------ > > Key: KYLIN-5787 > URL: https://issues.apache.org/jira/browse/KYLIN-5787 > Project: Kylin > Issue Type: Improvement > Components: Job Engine, Query Engine > Affects Versions: 5.0-beta > Reporter: pengfei.zhan > Assignee: pengfei.zhan > Priority: Critical > Fix For: 5.0-beta > > > The underlying implementation of the percentile_approx function in KYLIN is > the open-source t-digest. > The underlying implementation of the percentile_approx function in spark is > spark's own PercentileDigest (based on QuantileSummaries). > Different implementations lead to different results. -- This message was sent by Atlassian Jira (v8.20.10#820010)