[jira] [Commented] (KYLIN-2899) Enable segment level query cache

2018-10-29 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667050#comment-16667050
 ] 

ASF subversion and git services commented on KYLIN-2899:


Commit 7b58b161a1d3264e744e3e78b0cffbde5e830e67 in kylin's branch 
refs/heads/master from Ma,Gang
[ https://gitbox.apache.org/repos/asf?p=kylin.git;h=7b58b16 ]

KYLIN-2899 Introduce segment level query cache


> Enable segment level query cache
> 
>
> Key: KYLIN-2899
> URL: https://issues.apache.org/jira/browse/KYLIN-2899
> Project: Kylin
>  Issue Type: Sub-task
>  Components: Query Engine
>Affects Versions: v2.1.0
>Reporter: Zhong Yanghong
>Assignee: Ma Gang
>Priority: Major
> Fix For: v2.6.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-2899) Enable segment level query cache

2018-06-25 Thread Shaofeng SHI (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16523239#comment-16523239
 ] 

Shaofeng SHI commented on KYLIN-2899:
-

Gang, thanks for the info. I will take a look.

> Enable segment level query cache
> 
>
> Key: KYLIN-2899
> URL: https://issues.apache.org/jira/browse/KYLIN-2899
> Project: Kylin
>  Issue Type: Sub-task
>  Components: Query Engine
>Affects Versions: v2.1.0
>Reporter: Zhong Yanghong
>Assignee: Ma Gang
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-2899) Enable segment level query cache

2018-06-24 Thread Shaofeng SHI (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16521809#comment-16521809
 ] 

Shaofeng SHI commented on KYLIN-2899:
-

Gang, are you working on this? Do you have an ETA for it?

> Enable segment level query cache
> 
>
> Key: KYLIN-2899
> URL: https://issues.apache.org/jira/browse/KYLIN-2899
> Project: Kylin
>  Issue Type: Sub-task
>  Components: Query Engine
>Affects Versions: v2.1.0
>Reporter: Zhong Yanghong
>Assignee: Ma Gang
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-2899) Enable segment level query cache

2018-04-02 Thread liyang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16423397#comment-16423397
 ] 

liyang commented on KYLIN-2899:
---

Very good document! Thanks Ma Gang!

Very helpful for understanding.

> Enable segment level query cache
> 
>
> Key: KYLIN-2899
> URL: https://issues.apache.org/jira/browse/KYLIN-2899
> Project: Kylin
>  Issue Type: Sub-task
>  Components: Query Engine
>Affects Versions: v2.1.0
>Reporter: Zhong Yanghong
>Assignee: Ma Gang
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-2899) Enable segment level query cache

2018-03-27 Thread Ma Gang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416822#comment-16416822
 ] 

Ma Gang commented on KYLIN-2899:


Add some draft design and performance test here
h2. Motivation


Currently Kylin use sql as the cache key, when sql comes, if result exists in 
the cache, it will directly returned the cached result and don't need to query 
hbase, when there is new segment build or existing segment refresh, all related 
cache result need to be evicted. For some frequently build cube such as 
streaming cube, the cache miss will increase dramatically, that may decrease 
the query performance.

Since for Kylin cube most historical segments are immutable, the same query 
against historical segments should be always same, don't need to be evicted for 
new segment building. So we decide to implement the segment level cache, it is 
a complement of the existing front-end cache, the idea is similar as the 
level1/level2 cache in operating system.
h2. Design
h3. 
How to enable


By default, the segment-level closed, and open only all following conditions 
satisfied:
1. "kylin.query.segment-cache-enabled" config is set to true, it can be set at 
cube level. 
2. there is memcached configured in Kylin, because segment query result can be 
very large, may consume lots of memory if no external cache enabled.
h3. What is cached


cache key is \{cubeName} + "_" + \{segmentUUID} + "_" + \{serlized 
GTScanRequest string}
cache value is SegmentQueryResult:

 
{code:java}
// result byte array for all regions of the the segment
private Collection regionResults;

// store segment query stats for cube planer usage
private byte[] cubeSegmentStatisticsBytes;
{code}
 
h3. 
How it works


Before calling segment endpoint rpc, if the segment level cache is enabled, it 
will try to get the SegmentQueryResult from cache, if the result exist, 
directly return the result, else call the endpint rpc to get result, then save 
the result to cache for future usage. If the query result is very big, it will 
be chunked automatically.
The cache result will not be evicted explictly, it depends on the ttl 
configuration and LRU mechanism of the memcached, by default the ttl is set to 
7 days.
h2. Performance


Since memcached performance is very good, it often takes 1-10 ms to get data 
from memcached, and don't need to do further aggregation/filter, so most of 
time the performance is better than HBase coprocessor rpc. Especially for the 
queries that need large aggregation/filter in the HBase region server, and no 
fuzzy key can be used, sometimes the performance has more than 10 times 
increase, below is some test result:

Query1:
select s1, s2, s3, s4, s5, s6, s7, s8, sum(pcount) c from 
shop_exp_path_analytics_flat where site_id = 0 AND device = 'Mobile' AND s5 = 
'Checkout: Success' group by s1, s2, s3, s4, s5, s6, s7, s8

Below is some number for the query
total scan count: 2348
hit cuboid row count: 10,063,375
not use segment level cache: 2.247s
using segment level cache 0.16s

Query2:
hit cuboid row count: 800,317,603
Total scan count: 62347166
not use segment level cache: 12.823
use segment level cache: 0.519

Query3:
Total scan count: 64
Result row count: 58
not use segment level cache: 0.173
use segment level cache: 0.153

> Enable segment level query cache
> 
>
> Key: KYLIN-2899
> URL: https://issues.apache.org/jira/browse/KYLIN-2899
> Project: Kylin
>  Issue Type: Sub-task
>  Components: Query Engine
>Affects Versions: v2.1.0
>Reporter: Zhong Yanghong
>Assignee: Ma Gang
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-2899) Enable segment level query cache

2018-02-27 Thread liyang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16379763#comment-16379763
 ] 

liyang commented on KYLIN-2899:
---

Segment level cache is a new idea. Can we have an elaboration of design and 
some performance comparison here?

> Enable segment level query cache
> 
>
> Key: KYLIN-2899
> URL: https://issues.apache.org/jira/browse/KYLIN-2899
> Project: Kylin
>  Issue Type: Sub-task
>  Components: Query Engine
>Affects Versions: v2.1.0
>Reporter: Zhong Yanghong
>Assignee: Ma Gang
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-2899) Enable segment level query cache

2018-02-01 Thread Shaofeng SHI (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348361#comment-16348361
 ] 

Shaofeng SHI commented on KYLIN-2899:
-

Good idea! 

> Enable segment level query cache
> 
>
> Key: KYLIN-2899
> URL: https://issues.apache.org/jira/browse/KYLIN-2899
> Project: Kylin
>  Issue Type: Sub-task
>  Components: Query Engine
>Affects Versions: v2.1.0
>Reporter: Zhong Yanghong
>Assignee: Ma Gang
>Priority: Major
> Fix For: v2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-2899) Enable segment level query cache

2018-01-31 Thread Ma Gang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348084#comment-16348084
 ] 

Ma Gang commented on KYLIN-2899:


By default, it will be closed, and open only all following conditions satisfied:

1.  "kylin.query.segment-cache-enabled" config is set to true, it can be set in 
cube level.

2.  there is memcached configured in Kylin.

Segment level cache can improve query performance for the cube that is built 
frequently, for example the streaming cube, the global cache will be 
invalidated frequently.

 

> Enable segment level query cache
> 
>
> Key: KYLIN-2899
> URL: https://issues.apache.org/jira/browse/KYLIN-2899
> Project: Kylin
>  Issue Type: Sub-task
>  Components: Query Engine
>Affects Versions: v2.1.0
>Reporter: Zhong Yanghong
>Assignee: Ma Gang
>Priority: Major
> Fix For: v2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-2899) Enable segment level query cache

2018-01-24 Thread Zhong Yanghong (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16338747#comment-16338747
 ] 

Zhong Yanghong commented on KYLIN-2899:
---

Yes, waiting for other tasks of [KYLIN-2895]

> Enable segment level query cache
> 
>
> Key: KYLIN-2899
> URL: https://issues.apache.org/jira/browse/KYLIN-2899
> Project: Kylin
>  Issue Type: Sub-task
>  Components: Query Engine
>Affects Versions: v2.1.0
>Reporter: Zhong Yanghong
>Assignee: Ma Gang
>Priority: Major
> Fix For: v2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-2899) Enable segment level query cache

2018-01-24 Thread Billy Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16338673#comment-16338673
 ] 

Billy Liu commented on KYLIN-2899:
--

Hello [~magang] Are you still working on this issue? 

> Enable segment level query cache
> 
>
> Key: KYLIN-2899
> URL: https://issues.apache.org/jira/browse/KYLIN-2899
> Project: Kylin
>  Issue Type: Sub-task
>  Components: Query Engine
>Affects Versions: v2.1.0
>Reporter: Zhong Yanghong
>Assignee: Ma Gang
>Priority: Major
> Fix For: v2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)