[jira] [Commented] (KYLIN-3961) Optimize TopN measure merge function to reduce TopNCounter errors
[ https://issues.apache.org/jira/browse/KYLIN-3961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16850322#comment-16850322 ] ASF GitHub Bot commented on KYLIN-3961: --- nichunen commented on pull request #612: KYLIN-3961 Optimize TopNCounter's merge function to reduce TopNCounter's error size. URL: https://github.com/apache/kylin/pull/612 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Optimize TopN measure merge function to reduce TopNCounter errors > - > > Key: KYLIN-3961 > URL: https://issues.apache.org/jira/browse/KYLIN-3961 > Project: Kylin > Issue Type: Improvement > Components: Measure - TopN >Affects Versions: v2.5.2 > Environment: Huawei FusionInsight >Reporter: zhao jintao >Assignee: zhao jintao >Priority: Major > Labels: easyfix > Original Estimate: 168h > Remaining Estimate: 168h > > Hi Team: > I use "Top-N "measure to query such sql "select sum(AAA) from BBB group by > CCC,DDD", It is much better than a cube without "Top-N". > In my system, kylin cost just 0.2s to query sql with "Top-N" measure cube; If > without "Top-N" measure it may be cost 10s. > But I find that Top-N measure can be optimized to reduce errors. > I use kylin demo to test "TopN". > I build two cube using "KYLIN_SALES". The first cube has three > dimentions:"SELLER_ID","BUYER_ID" and "PART_DT", has one measures: > "SUM(PRICE)" . The second cube has one dimention:"PART_DT", has twon > measures: "SUM(PRICE)" and "TOPN(10)", the "ORDER|SUM by Column" of > "TOPN(10)" is "PRICE", the "Group by Column" of “TOPN(10)” is "SELLER_ID" > and "BUYER_ID",the "Return Type" of "TOPN(10)" is "Top 10". Then I build cube > from "2012-01-01" to "2014-01-01". > I use same sql to query two cube. I find that 2 cubes have a larger error. > The top5 "SUM PRICE" of first cube without "TopN" is "167.7269", "99.9908", > "99.9888","99.9865","99.978". > The top5 "SUM PRICE" of second cube with "TopN" is > "179.27699...","167.6320...","167.3050...","167.2069...","166.7429...". > Does any one meet same problem? > > Best regards. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3961) Optimize TopN measure merge function to reduce TopNCounter errors
[ https://issues.apache.org/jira/browse/KYLIN-3961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826545#comment-16826545 ] zhao jintao commented on KYLIN-3961: There are 3 query result about 3 cubes search the same datasource. All of them use the same sql. The first result of cube is without TopN measure. !image001.png! The second result of cube has topN measure with current version code. !image002.png! The third result of cube has topN measure with my modified code. !image003.png! I think that the third optimized cube may be better than the second current code cube. > Optimize TopN measure merge function to reduce TopNCounter errors > - > > Key: KYLIN-3961 > URL: https://issues.apache.org/jira/browse/KYLIN-3961 > Project: Kylin > Issue Type: Improvement > Components: Measure - TopN >Affects Versions: v2.5.2 > Environment: Huawei FusionInsight >Reporter: zhao jintao >Assignee: zhao jintao >Priority: Major > Labels: easyfix > Original Estimate: 168h > Remaining Estimate: 168h > > Hi Team: > I use "Top-N "measure to query such sql "select sum(AAA) from BBB group by > CCC,DDD", It is much better than a cube without "Top-N". > In my system, kylin cost just 0.2s to query sql with "Top-N" measure cube; If > without "Top-N" measure it may be cost 10s. > But I find that Top-N measure can be optimized to reduce errors. > I use kylin demo to test "TopN". > I build two cube using "KYLIN_SALES". The first cube has three > dimentions:"SELLER_ID","BUYER_ID" and "PART_DT", has one measures: > "SUM(PRICE)" . The second cube has one dimention:"PART_DT", has twon > measures: "SUM(PRICE)" and "TOPN(10)", the "ORDER|SUM by Column" of > "TOPN(10)" is "PRICE", the "Group by Column" of “TOPN(10)” is "SELLER_ID" > and "BUYER_ID",the "Return Type" of "TOPN(10)" is "Top 10". Then I build cube > from "2012-01-01" to "2014-01-01". > I use same sql to query two cube. I find that 2 cubes have a larger error. > The top5 "SUM PRICE" of first cube without "TopN" is "167.7269", "99.9908", > "99.9888","99.9865","99.978". > The top5 "SUM PRICE" of second cube with "TopN" is > "179.27699...","167.6320...","167.3050...","167.2069...","166.7429...". > Does any one meet same problem? > > Best regards. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3961) Optimize TopN measure merge function to reduce TopNCounter errors
[ https://issues.apache.org/jira/browse/KYLIN-3961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16820142#comment-16820142 ] ASF GitHub Bot commented on KYLIN-3961: --- zhaojintaozhao commented on pull request #612: KYLIN-3961 Optimize TopNCounter's merge function to reduce TopNCounter's error size. URL: https://github.com/apache/kylin/pull/612 Sometimes TopN measure query will return a large error. I optimize TopNCounter's merge function to reduce TopNCounter's errors when using TopN measure. This optimization work well in my kylin system and reduce TopNCounter's error size. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Optimize TopN measure merge function to reduce TopNCounter errors > - > > Key: KYLIN-3961 > URL: https://issues.apache.org/jira/browse/KYLIN-3961 > Project: Kylin > Issue Type: Improvement > Components: Measure - TopN >Affects Versions: v2.5.2 > Environment: Huawei FusionInsight >Reporter: zhao jintao >Assignee: zhao jintao >Priority: Major > Labels: easyfix > Original Estimate: 168h > Remaining Estimate: 168h > > Hi Team: > I use "Top-N "measure to query such sql "select sum(AAA) from BBB group by > CCC,DDD", It is much better than a cube without "Top-N". > In my system, kylin cost just 0.2s to query sql with "Top-N" measure cube; If > without "Top-N" measure it may be cost 10s. > But I find that Top-N measure can be optimized to reduce errors. > I use kylin demo to test "TopN". > I build two cube using "KYLIN_SALES". The first cube has three > dimentions:"SELLER_ID","BUYER_ID" and "PART_DT", has one measures: > "SUM(PRICE)" . The second cube has one dimention:"PART_DT", has twon > measures: "SUM(PRICE)" and "TOPN(10)", the "ORDER|SUM by Column" of > "TOPN(10)" is "PRICE", the "Group by Column" of “TOPN(10)” is "SELLER_ID" > and "BUYER_ID",the "Return Type" of "TOPN(10)" is "Top 10". Then I build cube > from "2012-01-01" to "2014-01-01". > I use same sql to query two cube. I find that 2 cubes have a larger error. > The top5 "SUM PRICE" of first cube without "TopN" is "167.7269", "99.9908", > "99.9888","99.9865","99.978". > The top5 "SUM PRICE" of second cube with "TopN" is > "179.27699...","167.6320...","167.3050...","167.2069...","166.7429...". > Does any one meet same problem? > > Best regards. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)