[jira] [Commented] (KYLIN-2899) Enable segment level query cache

2018-03-27 Thread Ma Gang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416822#comment-16416822
 ] 

Ma Gang commented on KYLIN-2899:


Add some draft design and performance test here
h2. Motivation


Currently Kylin use sql as the cache key, when sql comes, if result exists in 
the cache, it will directly returned the cached result and don't need to query 
hbase, when there is new segment build or existing segment refresh, all related 
cache result need to be evicted. For some frequently build cube such as 
streaming cube, the cache miss will increase dramatically, that may decrease 
the query performance.

Since for Kylin cube most historical segments are immutable, the same query 
against historical segments should be always same, don't need to be evicted for 
new segment building. So we decide to implement the segment level cache, it is 
a complement of the existing front-end cache, the idea is similar as the 
level1/level2 cache in operating system.
h2. Design
h3. 
How to enable


By default, the segment-level closed, and open only all following conditions 
satisfied:
1. "kylin.query.segment-cache-enabled" config is set to true, it can be set at 
cube level. 
2. there is memcached configured in Kylin, because segment query result can be 
very large, may consume lots of memory if no external cache enabled.
h3. What is cached


cache key is \{cubeName} + "_" + \{segmentUUID} + "_" + \{serlized 
GTScanRequest string}
cache value is SegmentQueryResult:

 
{code:java}
// result byte array for all regions of the the segment
private Collection regionResults;

// store segment query stats for cube planer usage
private byte[] cubeSegmentStatisticsBytes;
{code}
 
h3. 
How it works


Before calling segment endpoint rpc, if the segment level cache is enabled, it 
will try to get the SegmentQueryResult from cache, if the result exist, 
directly return the result, else call the endpint rpc to get result, then save 
the result to cache for future usage. If the query result is very big, it will 
be chunked automatically.
The cache result will not be evicted explictly, it depends on the ttl 
configuration and LRU mechanism of the memcached, by default the ttl is set to 
7 days.
h2. Performance


Since memcached performance is very good, it often takes 1-10 ms to get data 
from memcached, and don't need to do further aggregation/filter, so most of 
time the performance is better than HBase coprocessor rpc. Especially for the 
queries that need large aggregation/filter in the HBase region server, and no 
fuzzy key can be used, sometimes the performance has more than 10 times 
increase, below is some test result:

Query1:
select s1, s2, s3, s4, s5, s6, s7, s8, sum(pcount) c from 
shop_exp_path_analytics_flat where site_id = 0 AND device = 'Mobile' AND s5 = 
'Checkout: Success' group by s1, s2, s3, s4, s5, s6, s7, s8

Below is some number for the query
total scan count: 2348
hit cuboid row count: 10,063,375
not use segment level cache: 2.247s
using segment level cache 0.16s

Query2:
hit cuboid row count: 800,317,603
Total scan count: 62347166
not use segment level cache: 12.823
use segment level cache: 0.519

Query3:
Total scan count: 64
Result row count: 58
not use segment level cache: 0.173
use segment level cache: 0.153

> Enable segment level query cache
> 
>
> Key: KYLIN-2899
> URL: https://issues.apache.org/jira/browse/KYLIN-2899
> Project: Kylin
>  Issue Type: Sub-task
>  Components: Query Engine
>Affects Versions: v2.1.0
>Reporter: Zhong Yanghong
>Assignee: Ma Gang
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KYLIN-3300) Upgrade jackson-databind to 2.6.7.1 with security issue fixed

2018-03-27 Thread Billy Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billy Liu updated KYLIN-3300:
-
Summary: Upgrade jackson-databind to 2.6.7.1 with security issue fixed  
(was: Upgrade jackson-databind)

> Upgrade jackson-databind to 2.6.7.1 with security issue fixed
> -
>
> Key: KYLIN-3300
> URL: https://issues.apache.org/jira/browse/KYLIN-3300
> Project: Kylin
>  Issue Type: Improvement
>  Components: Integration
>Affects Versions: v2.2.0, v2.3.0
>Reporter: Shaofeng SHI
>Assignee: Shaofeng SHI
>Priority: Major
> Fix For: v2.3.1
>
> Attachments: KYLIN-3300.master.001.patch
>
>
> jackson-databind 2.6.3 and 2.6.5 are reported with security issue 
> (CVE-2017-7525), need ugprade



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KYLIN-3301) Upgrade opensaml to 2.6.6 with security issue fixed

2018-03-27 Thread Billy Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billy Liu updated KYLIN-3301:
-
Summary: Upgrade opensaml to 2.6.6 with security issue fixed  (was: Upgrade 
opensaml)

> Upgrade opensaml to 2.6.6 with security issue fixed
> ---
>
> Key: KYLIN-3301
> URL: https://issues.apache.org/jira/browse/KYLIN-3301
> Project: Kylin
>  Issue Type: Improvement
>  Components: Integration
>Affects Versions: v2.2.0, v2.3.0
>Reporter: Shaofeng SHI
>Assignee: Shaofeng SHI
>Priority: Major
> Fix For: v2.3.1
>
> Attachments: KYLIN-3301.master.002.patch
>
>
> opensaml 2.6.4 is reported with security issue (CVE-2015-1796), need upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KYLIN-3322) TopN requires a SUM to work

2018-03-27 Thread liyang (JIRA)
liyang created KYLIN-3322:
-

 Summary: TopN requires a SUM to work
 Key: KYLIN-3322
 URL: https://issues.apache.org/jira/browse/KYLIN-3322
 Project: Kylin
  Issue Type: Bug
Reporter: liyang
Assignee: liyang


Currently if user creates a measure of TopN seller by sum of price, it is 
required that user also creates a measure of SUM(price). Otherwise, NPE will be 
thrown at query time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KYLIN-3321) Set MALLOC_ARENA_MAX in script

2018-03-27 Thread Ted Yu (JIRA)
Ted Yu created KYLIN-3321:
-

 Summary: Set MALLOC_ARENA_MAX in script
 Key: KYLIN-3321
 URL: https://issues.apache.org/jira/browse/KYLIN-3321
 Project: Kylin
  Issue Type: Task
Reporter: Ted Yu


conf/setenv.sh would be good place to set MALLOC_ARENA_MAX which prevents 
native memory OOM.

See https://github.com/prestodb/presto/issues/8993



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KYLIN-3293) FixedLenHexDimEnc return a wrong code length leads to cut bytes error.

2018-03-27 Thread Billy Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billy Liu resolved KYLIN-3293.
--
   Resolution: Fixed
Fix Version/s: v2.4.0

> FixedLenHexDimEnc return a wrong code length leads to cut bytes error.
> --
>
> Key: KYLIN-3293
> URL: https://issues.apache.org/jira/browse/KYLIN-3293
> Project: Kylin
>  Issue Type: Bug
>Reporter: jiatao.tao
>Assignee: jiatao.tao
>Priority: Major
> Fix For: v2.4.0
>
>
> FixedLenHexDimEnc lost its byteLen when deserializing, so in  
> GTRecord#loadColumnsFromColumnBlocks, the byte after cutting are wrong.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KYLIN-3320) CubeStatsReader cannot print stats properly for some cube

2018-03-27 Thread Zhong Yanghong (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhong Yanghong updated KYLIN-3320:
--
Fix Version/s: v2.4.0

> CubeStatsReader cannot print stats properly for some cube 
> --
>
> Key: KYLIN-3320
> URL: https://issues.apache.org/jira/browse/KYLIN-3320
> Project: Kylin
>  Issue Type: Improvement
>  Components: Tools, Build and Test
>Reporter: Ma Gang
>Assignee: Ma Gang
>Priority: Minor
> Fix For: v2.4.0
>
> Attachments: fix_KYLIN-3320.patch
>
>
> For the cubes that have cuboid_bytes set in the CubeInstance, the cuboid 
> stats cannot print properly using tool CubeStatsReader



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3318) Kylin 2.3 UI top n group by only show dimension columns

2018-03-27 Thread Billy Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16415363#comment-16415363
 ] 

Billy Liu commented on KYLIN-3318:
--

[~Shaofengshi] +1, very smart solution. 

> Kylin 2.3 UI top n group by only show dimension columns
> ---
>
> Key: KYLIN-3318
> URL: https://issues.apache.org/jira/browse/KYLIN-3318
> Project: Kylin
>  Issue Type: Bug
>Reporter: Le Anh Vu
>Priority: Major
>
> In Kylin 2.3.0 Web UI, when I use TopN measure, the group by column drop down 
> only show me dimension columns. Is it the expected behavior or a bug?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3318) Kylin 2.3 UI top n group by only show dimension columns

2018-03-27 Thread Shaofeng SHI (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16415252#comment-16415252
 ] 

Shaofeng SHI commented on KYLIN-3318:
-

Hi Le Anh Vu, you can keep "user_id" as dimension on the data model, but 
exclude it from the dimension list of the Cube. Then the cube won't have this 
UHC dimension.

> Kylin 2.3 UI top n group by only show dimension columns
> ---
>
> Key: KYLIN-3318
> URL: https://issues.apache.org/jira/browse/KYLIN-3318
> Project: Kylin
>  Issue Type: Bug
>Reporter: Le Anh Vu
>Priority: Major
>
> In Kylin 2.3.0 Web UI, when I use TopN measure, the group by column drop down 
> only show me dimension columns. Is it the expected behavior or a bug?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3318) Kylin 2.3 UI top n group by only show dimension columns

2018-03-27 Thread Le Anh Vu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16415214#comment-16415214
 ] 

Le Anh Vu commented on KYLIN-3318:
--

I have an UHC column (user_id) that I want to use COUNT_DISTINCT and TOPN 
user_id have largest SUM measure but never filter on user_id. If set user_id as 
dimension, I think it will likely be included in row key combination which will 
make the number of row key grow very large.

> Kylin 2.3 UI top n group by only show dimension columns
> ---
>
> Key: KYLIN-3318
> URL: https://issues.apache.org/jira/browse/KYLIN-3318
> Project: Kylin
>  Issue Type: Bug
>Reporter: Le Anh Vu
>Priority: Major
>
> In Kylin 2.3.0 Web UI, when I use TopN measure, the group by column drop down 
> only show me dimension columns. Is it the expected behavior or a bug?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3293) FixedLenHexDimEnc return a wrong code length leads to cut bytes error.

2018-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16415201#comment-16415201
 ] 

ASF GitHub Bot commented on KYLIN-3293:
---

yiming187 closed pull request #123: KYLIN-3293, fix FixedLenHexDimEnc that 
return a wrong code length lea…
URL: https://github.com/apache/kylin/pull/123
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/core-cube/src/main/java/org/apache/kylin/cube/gridtable/TrimmedCubeCodeSystem.java
 
b/core-cube/src/main/java/org/apache/kylin/cube/gridtable/TrimmedCubeCodeSystem.java
index a0b230e9b0..261e501d8c 100644
--- 
a/core-cube/src/main/java/org/apache/kylin/cube/gridtable/TrimmedCubeCodeSystem.java
+++ 
b/core-cube/src/main/java/org/apache/kylin/cube/gridtable/TrimmedCubeCodeSystem.java
@@ -51,7 +51,7 @@ public void encodeColumnValue(int col, Object value, int 
roundingFlag, ByteBuffe
 serializer.serialize(value, buf);
 }
 
-private static void writeDimensionEncoding(DimensionEncoding encoding, 
ByteBuffer out) {
+public static void writeDimensionEncoding(DimensionEncoding encoding, 
ByteBuffer out) {
 try {
 if (encoding == null) {
 BytesUtil.writeVInt(1, out);
@@ -71,7 +71,7 @@ private static void writeDimensionEncoding(DimensionEncoding 
encoding, ByteBuffe
 }
 }
 
-private static DimensionEncoding readDimensionEncoding(ByteBuffer in) {
+public static DimensionEncoding readDimensionEncoding(ByteBuffer in) {
 try {
 int isNull = BytesUtil.readVInt(in);
 if (isNull == 1) {
diff --git 
a/core-cube/src/test/java/org/apache/kylin/gridtable/TrimmedCubeCodeSystemTest.java
 
b/core-cube/src/test/java/org/apache/kylin/gridtable/TrimmedCubeCodeSystemTest.java
new file mode 100644
index 00..dc3c762119
--- /dev/null
+++ 
b/core-cube/src/test/java/org/apache/kylin/gridtable/TrimmedCubeCodeSystemTest.java
@@ -0,0 +1,41 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kylin.gridtable;
+
+import static 
org.apache.kylin.cube.gridtable.TrimmedCubeCodeSystem.readDimensionEncoding;
+import static 
org.apache.kylin.cube.gridtable.TrimmedCubeCodeSystem.writeDimensionEncoding;
+
+import java.nio.ByteBuffer;
+
+import org.apache.kylin.dimension.DimensionEncoding;
+import org.apache.kylin.dimension.FixedLenHexDimEnc;
+import org.junit.Assert;
+import org.junit.Test;
+
+public class TrimmedCubeCodeSystemTest {
+@Test
+public void testFixLenHexEncSerDser() {
+FixedLenHexDimEnc enc = new FixedLenHexDimEnc(6);
+ByteBuffer buff = ByteBuffer.allocate(1024);
+writeDimensionEncoding(enc, buff);
+buff.flip();
+DimensionEncoding dimensionEncoding = readDimensionEncoding(buff);
+Assert.assertEquals(3, 
dimensionEncoding.asDataTypeSerializer().peekLength(null));
+}
+}
diff --git 
a/core-metadata/src/main/java/org/apache/kylin/dimension/FixedLenHexDimEnc.java 
b/core-metadata/src/main/java/org/apache/kylin/dimension/FixedLenHexDimEnc.java
index a931450a0b..1d7e3c983e 100644
--- 
a/core-metadata/src/main/java/org/apache/kylin/dimension/FixedLenHexDimEnc.java
+++ 
b/core-metadata/src/main/java/org/apache/kylin/dimension/FixedLenHexDimEnc.java
@@ -40,7 +40,7 @@
  * 
  * 1. "" will become null encode and decode
  * 2. "AB" will become "AB00"
- * 
+ *
  * 
  * Due to these limitations hex representation of hash values(with no padding, 
better with even characters) is more suitable
  */
@@ -166,7 +166,7 @@ public void encode(String valueStr, byte[] output, int 
outputOffset) {
 byte[] value = Bytes.toBytes(valueStr);
 int valueLen = value.length;
 int endOffset = outputOffset + bytelen;
-
+
 if (valueLen > hexLength) {
 if (avoidVerbose++ % 1 == 0) {
 logger.warn("Expect at most " + hexLength + " bytes, but got " 
+ valueLen + ", will truncate, 

[jira] [Commented] (KYLIN-3293) FixedLenHexDimEnc return a wrong code length leads to cut bytes error.

2018-03-27 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16415200#comment-16415200
 ] 

ASF subversion and git services commented on KYLIN-3293:


Commit 8350de4493ac792a62878a1aebf6588b3119e4bb in kylin's branch 
refs/heads/master from [~Aron.tao]
[ https://gitbox.apache.org/repos/asf?p=kylin.git;h=8350de4 ]

KYLIN-3293, fix FixedLenHexDimEnc that return a wrong code length leads to cut 
bytes error.


> FixedLenHexDimEnc return a wrong code length leads to cut bytes error.
> --
>
> Key: KYLIN-3293
> URL: https://issues.apache.org/jira/browse/KYLIN-3293
> Project: Kylin
>  Issue Type: Bug
>Reporter: jiatao.tao
>Assignee: jiatao.tao
>Priority: Major
>
> FixedLenHexDimEnc lost its byteLen when deserializing, so in  
> GTRecord#loadColumnsFromColumnBlocks, the byte after cutting are wrong.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3318) Kylin 2.3 UI top n group by only show dimension columns

2018-03-27 Thread Shaofeng SHI (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16415155#comment-16415155
 ] 

Shaofeng SHI commented on KYLIN-3318:
-

I think this is expected. Logically, the "group by" column in Top-N is a 
dimension. It was a mistake that treating it as a measure column, now it is 
corrected I remember.

[~Zhixiong Chen] can double confirm this.

> Kylin 2.3 UI top n group by only show dimension columns
> ---
>
> Key: KYLIN-3318
> URL: https://issues.apache.org/jira/browse/KYLIN-3318
> Project: Kylin
>  Issue Type: Bug
>Reporter: Le Anh Vu
>Priority: Major
>
> In Kylin 2.3.0 Web UI, when I use TopN measure, the group by column drop down 
> only show me dimension columns. Is it the expected behavior or a bug?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (KYLIN-3316) Reported NPE after cube build

2018-03-27 Thread peng.jianhua (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

peng.jianhua closed KYLIN-3316.
---
   Resolution: Duplicate
Fix Version/s: (was: v2.4.0)

> Reported NPE after cube build
> -
>
> Key: KYLIN-3316
> URL: https://issues.apache.org/jira/browse/KYLIN-3316
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine
>Affects Versions: v2.3.0
>Reporter: TianZhiwei
>Assignee: TianZhiwei
>Priority: Major
>  Labels: build
> Attachments: 0001-KYLIN-3316-modify-CubingJob.updateMetrics.patch
>
>
> Does not affect the completion of the build task and any build task can be 
> reproduced



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)