[jira] [Commented] (KYLIN-3961) Optimize TopN measure merge function to reduce TopNCounter errors

2019-05-28 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16850322#comment-16850322
 ] 

ASF GitHub Bot commented on KYLIN-3961:
---

nichunen commented on pull request #612: KYLIN-3961 Optimize TopNCounter's 
merge function to reduce TopNCounter's error size.
URL: https://github.com/apache/kylin/pull/612
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


>  Optimize TopN  measure merge function  to  reduce TopNCounter errors
> -
>
> Key: KYLIN-3961
> URL: https://issues.apache.org/jira/browse/KYLIN-3961
> Project: Kylin
>  Issue Type: Improvement
>  Components: Measure - TopN
>Affects Versions: v2.5.2
> Environment: Huawei FusionInsight
>Reporter: zhao jintao
>Assignee: zhao jintao
>Priority: Major
>  Labels: easyfix
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Hi Team:
> I use "Top-N "measure to query such sql "select sum(AAA) from BBB group by 
> CCC,DDD", It is much better than a cube without "Top-N".
> In my system, kylin cost just 0.2s to query sql with "Top-N" measure cube; If 
> without "Top-N" measure it may be cost 10s.
> But I find that Top-N measure can be optimized to reduce errors.
> I use kylin demo to test "TopN".
> I build two cube using "KYLIN_SALES". The first cube has three 
> dimentions:"SELLER_ID","BUYER_ID" and "PART_DT", has one measures: 
> "SUM(PRICE)" . The second cube has one dimention:"PART_DT", has twon 
> measures: "SUM(PRICE)" and "TOPN(10)", the "ORDER|SUM by Column" of  
> "TOPN(10)" is "PRICE", the "Group by Column"  of “TOPN(10)” is "SELLER_ID" 
> and "BUYER_ID",the "Return Type" of "TOPN(10)" is "Top 10". Then I build cube 
> from "2012-01-01" to "2014-01-01".
> I use same sql to query two cube. I find that 2 cubes have a larger error.
> The top5  "SUM PRICE" of first cube without "TopN" is "167.7269", "99.9908", 
> "99.9888","99.9865","99.978".
> The top5 "SUM PRICE" of second cube with "TopN" is 
> "179.27699...","167.6320...","167.3050...","167.2069...","166.7429...".
> Does any one meet same problem?
>  
> Best regards.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3961) Optimize TopN measure merge function to reduce TopNCounter errors

2019-04-25 Thread zhao jintao (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826545#comment-16826545
 ] 

zhao jintao commented on KYLIN-3961:


There are 3 query result about 3 cubes search the same datasource. All of them 
use the same sql.

The first result of cube is without TopN measure.

!image001.png!

The second result of cube has topN measure with current version code.

!image002.png!

The third result of cube has topN measure with my modified code.

!image003.png!

I think that the third optimized cube may be better than the second current 
code cube.

>  Optimize TopN  measure merge function  to  reduce TopNCounter errors
> -
>
> Key: KYLIN-3961
> URL: https://issues.apache.org/jira/browse/KYLIN-3961
> Project: Kylin
>  Issue Type: Improvement
>  Components: Measure - TopN
>Affects Versions: v2.5.2
> Environment: Huawei FusionInsight
>Reporter: zhao jintao
>Assignee: zhao jintao
>Priority: Major
>  Labels: easyfix
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Hi Team:
> I use "Top-N "measure to query such sql "select sum(AAA) from BBB group by 
> CCC,DDD", It is much better than a cube without "Top-N".
> In my system, kylin cost just 0.2s to query sql with "Top-N" measure cube; If 
> without "Top-N" measure it may be cost 10s.
> But I find that Top-N measure can be optimized to reduce errors.
> I use kylin demo to test "TopN".
> I build two cube using "KYLIN_SALES". The first cube has three 
> dimentions:"SELLER_ID","BUYER_ID" and "PART_DT", has one measures: 
> "SUM(PRICE)" . The second cube has one dimention:"PART_DT", has twon 
> measures: "SUM(PRICE)" and "TOPN(10)", the "ORDER|SUM by Column" of  
> "TOPN(10)" is "PRICE", the "Group by Column"  of “TOPN(10)” is "SELLER_ID" 
> and "BUYER_ID",the "Return Type" of "TOPN(10)" is "Top 10". Then I build cube 
> from "2012-01-01" to "2014-01-01".
> I use same sql to query two cube. I find that 2 cubes have a larger error.
> The top5  "SUM PRICE" of first cube without "TopN" is "167.7269", "99.9908", 
> "99.9888","99.9865","99.978".
> The top5 "SUM PRICE" of second cube with "TopN" is 
> "179.27699...","167.6320...","167.3050...","167.2069...","166.7429...".
> Does any one meet same problem?
>  
> Best regards.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3961) Optimize TopN measure merge function to reduce TopNCounter errors

2019-04-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16820142#comment-16820142
 ] 

ASF GitHub Bot commented on KYLIN-3961:
---

zhaojintaozhao commented on pull request #612: KYLIN-3961 Optimize 
TopNCounter's merge function to reduce TopNCounter's error size.
URL: https://github.com/apache/kylin/pull/612
 
 
   Sometimes TopN measure query will return a large error.
   I optimize TopNCounter's merge function to reduce TopNCounter's errors when 
using TopN measure.
   This optimization work well in my kylin system and reduce TopNCounter's 
error size.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


>  Optimize TopN  measure merge function  to  reduce TopNCounter errors
> -
>
> Key: KYLIN-3961
> URL: https://issues.apache.org/jira/browse/KYLIN-3961
> Project: Kylin
>  Issue Type: Improvement
>  Components: Measure - TopN
>Affects Versions: v2.5.2
> Environment: Huawei FusionInsight
>Reporter: zhao jintao
>Assignee: zhao jintao
>Priority: Major
>  Labels: easyfix
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Hi Team:
> I use "Top-N "measure to query such sql "select sum(AAA) from BBB group by 
> CCC,DDD", It is much better than a cube without "Top-N".
> In my system, kylin cost just 0.2s to query sql with "Top-N" measure cube; If 
> without "Top-N" measure it may be cost 10s.
> But I find that Top-N measure can be optimized to reduce errors.
> I use kylin demo to test "TopN".
> I build two cube using "KYLIN_SALES". The first cube has three 
> dimentions:"SELLER_ID","BUYER_ID" and "PART_DT", has one measures: 
> "SUM(PRICE)" . The second cube has one dimention:"PART_DT", has twon 
> measures: "SUM(PRICE)" and "TOPN(10)", the "ORDER|SUM by Column" of  
> "TOPN(10)" is "PRICE", the "Group by Column"  of “TOPN(10)” is "SELLER_ID" 
> and "BUYER_ID",the "Return Type" of "TOPN(10)" is "Top 10". Then I build cube 
> from "2012-01-01" to "2014-01-01".
> I use same sql to query two cube. I find that 2 cubes have a larger error.
> The top5  "SUM PRICE" of first cube without "TopN" is "167.7269", "99.9908", 
> "99.9888","99.9865","99.978".
> The top5 "SUM PRICE" of second cube with "TopN" is 
> "179.27699...","167.6320...","167.3050...","167.2069...","166.7429...".
> Does any one meet same problem?
>  
> Best regards.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)