[ 
https://issues.apache.org/jira/browse/KYLIN-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16743363#comment-16743363
 ] 

KANG-SEN LU commented on KYLIN-2620:
------------------------------------

This bug would limit the selection of topn metric only when the query is better 
served by the topn cube.

However, the cube cost evaluation algorithm in

core-metadata/src/main/java/org/apache/kylin/measure/topn/TopNMeasureType.java, 
function influenceCapabilityCheck().

must be enhanced when there are more than one cube associated with the same 
data model.

The current problem is that when "select sum(x) from fact_table " is issued, if 
there are two cube spec both can answer this query, the kylin would prefer to 
use topn cue, even if that means we would retrieve limited rows of data from 
"group by col_id" then aggregated later. That is not only inefficient, but also 
incorrect.

> Check for "ORDER BY LIMIT" clause when rewrite SUM query as TOPN
> ----------------------------------------------------------------
>
>                 Key: KYLIN-2620
>                 URL: https://issues.apache.org/jira/browse/KYLIN-2620
>             Project: Kylin
>          Issue Type: Bug
>            Reporter: Lin Tingmao
>            Priority: Major
>
> When running the following query
> select sum(measure) from table group by col_id
> if there exists TOPN(measure, group by col_id)  measure, 
> TopNMeasureType.isTopNCompatibleSum()    will pass, so the SUM is rewritten 
> to TOPN. This confuses the user since they may expect a accurate result for 
> every distinct value of group by column(s). 
> Kylin should check if "ORDER BY col_id LIMIT topncapacity" is present in the 
> query to determine whether to rewrite.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to