from:"\"kangkaisen \\\(JIRA\\\)\""

[jira] [Updated] (KYLIN-2604) Use global dict as the default encoding for precise distinct count in web

2017-07-19 Thread kangkaisen (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kangkaisen updated KYLIN-2604:
--
Attachment: (was: KYLIN-2604.patch)

> Use global dict as the default encoding for precise distinct count in web
> -
>
> Key: KYLIN-2604
> URL: https://issues.apache.org/jira/browse/KYLIN-2604
> Project: Kylin
>  Issue Type: Improvement
>  Components: Web 
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
>Priority: Minor
> Attachments: KYLIN-2604.patch
>
>
> we should use global dict as the default encoding for precise distinct count 
> in web, which more easy-to-use for users.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (KYLIN-2604) Use global dict as the default encoding for precise distinct count in web

2017-07-19 Thread kangkaisen (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kangkaisen updated KYLIN-2604:
--
Attachment: KYLIN-2604.patch

Update the patch. ReGenerate AdvancedDict as long as measures change.

> Use global dict as the default encoding for precise distinct count in web
> -
>
> Key: KYLIN-2604
> URL: https://issues.apache.org/jira/browse/KYLIN-2604
> Project: Kylin
>  Issue Type: Improvement
>  Components: Web 
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
>Priority: Minor
> Attachments: KYLIN-2604.patch
>
>
> we should use global dict as the default encoding for precise distinct count 
> in web, which more easy-to-use for users.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Issue Comment Deleted] (KYLIN-2604) Use global dict as the default encoding for precise distinct count in web

2017-07-19 Thread kangkaisen (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kangkaisen updated KYLIN-2604:
--
Comment: was deleted

(was: Update the patch.)

> Use global dict as the default encoding for precise distinct count in web
> -
>
> Key: KYLIN-2604
> URL: https://issues.apache.org/jira/browse/KYLIN-2604
> Project: Kylin
>  Issue Type: Improvement
>  Components: Web 
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
>Priority: Minor
> Attachments: KYLIN-2604.patch
>
>
> we should use global dict as the default encoding for precise distinct count 
> in web, which more easy-to-use for users.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KYLIN-1926) Loosen the constraint on FK-PK data type matching

2017-07-19 Thread kangkaisen (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16094130#comment-16094130
 ] 

kangkaisen commented on KYLIN-1926:
---

Yes, I know.   

I don't understand why the compatible data type (int and tinyint ) result in 
wrong execution plan.

> Loosen the constraint on FK-PK data type matching
> -
>
> Key: KYLIN-1926
> URL: https://issues.apache.org/jira/browse/KYLIN-1926
> Project: Kylin
>  Issue Type: Improvement
>  Components: Metadata
>Affects Versions: all
>Reporter: Shaofeng SHI
>Assignee: Shaofeng SHI
>Priority: Minor
> Fix For: v1.5.4
>
> Attachments: 0001-KYLIN-1926-FK-PK-data-type-matching.patch
>
>
> If lookup table's PK datatype isn't equal to fact table's FK datatype, Kylin 
> will report error saying "Primary key are not consistent with Foreign key". 
> This constraint is too strong. Should allow user to disable this check.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KYLIN-2653) Spark cubing support HBase cluster with kerberos

2017-07-22 Thread kangkaisen (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16097154#comment-16097154
 ] 

kangkaisen commented on KYLIN-2653:
---

liyang. Thanks for your review.

I agree with you.

1 There no doubt that {{KylinConfig}} and {{KylinConfigBase}} should be 
real-only. I have rolled back the signature for {{getAllProperties}} and 
{{reloadKylinConfig}}. But I changed the {{getAllProperties}} in KylinConfigExt 
to public. I think which is reasonable because we need at least one way to get 
the all properties and this operation is read-only.  what do you think of ?

2 This commit didn't invoke {KylinConfigBase.setProperty()}}.

> Spark cubing support HBase cluster with kerberos
> 
>
> Key: KYLIN-2653
> URL: https://issues.apache.org/jira/browse/KYLIN-2653
> Project: Kylin
>  Issue Type: Bug
>  Components: Spark Engine
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
>
> Currently, Spark cubing doesn't support HBase cluster with kerberos.
> Temporarily，we could support HBase cluster with kerberos on Yarn client mode, 
> because which is easy.
> In the long term，we should avoid access HBase in Spark cubing.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KYLIN-2744) Should return correct type for SUM measure in web

2017-07-22 Thread kangkaisen (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16097168#comment-16097168
 ] 

kangkaisen commented on KYLIN-2744:
---

Thanks liyang and Billy for your comment.

I think the column type and sum measure type should be same in web, and 
back-end use current Ingester and Aggregator handle sum measure,  and the final 
result will consistent with Presto and Hive. which is reasonable, user-friendly 
and easy. 

As for the sum for double type is not fully precise，If user want to get fully 
precise result, user should use decimal type in Hive. 

> Should return correct type for SUM measure in web
> -
>
> Key: KYLIN-2744
> URL: https://issues.apache.org/jira/browse/KYLIN-2744
> Project: Kylin
>  Issue Type: Bug
>  Components: Web 
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
> Attachments: KYLIN-2744.patch
>
>
> Currently, Kylin return decimal type for the  sum measure of double type, 
> which will result in wrong result. So, We should return correct type for SUM 
> measure in web.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KYLIN-2672) Only clean necessary cache for CubeMigrationCLI

2017-07-22 Thread kangkaisen (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16097175#comment-16097175
 ] 

kangkaisen commented on KYLIN-2672:
---

KYLIN-2717 is great.

After KYLIN-2717, we could only reload the related tables.  I will update this 
patch when you finish KYLIN-2717.  Thanks you.

> Only clean necessary cache for CubeMigrationCLI
> ---
>
> Key: KYLIN-2672
> URL: https://issues.apache.org/jira/browse/KYLIN-2672
> Project: Kylin
>  Issue Type: Improvement
>  Components: Tools, Build and Test
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
> Attachments: KYLIN-2672.patch
>
>
> Currently, we simply clear ALL cache in  CubeMigrationCLI. which will make a 
> few of queries slower in prod env when we have many tables, models, cubes and 
> migrate cube often.
> So, we could only clean necessary cache for CubeMigrationCLI.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KYLIN-2740) FileNotFoundException on base cuboid build if GlobalDictionary is used

2017-07-23 Thread kangkaisen (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16097539#comment-16097539
 ] 

kangkaisen commented on KYLIN-2740:
---

Hi, sterligovak. Thanks you.

KYLIN-2506 has fixed this issue. After KYLIN-2506, the GlobalDictionary is more 
robust.

> FileNotFoundException on base cuboid build if GlobalDictionary is used
> --
>
> Key: KYLIN-2740
> URL: https://issues.apache.org/jira/browse/KYLIN-2740
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v2.0.0
>Reporter: Alexander Sterligov
>Assignee: kangkaisen
> Attachments: KYLIN-2740-patch
>
>
> 2017-07-13 15:25:20,515 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : java.lang.RuntimeException: 
> java.io.FileNotFoundException: No such file or directory: 
> 'home/production/bi/kylin/kylin_metadata/resources/GlobalDict/dict/MART.STAR_MAIN_EVENT/DEVICE_ID/version_1499959477799/.index'
>   at 
> org.apache.kylin.dict.DictionaryManager.getDictionaryInfo(DictionaryManager.java:129)
>   at org.apache.kylin.cube.CubeManager.getDictionary(CubeManager.java:264)
>   at org.apache.kylin.cube.CubeSegment.getDictionary(CubeSegment.java:329)
>   at 
> org.apache.kylin.cube.CubeSegment.buildDictionaryMap(CubeSegment.java:321)
>   at 
> org.apache.kylin.engine.mr.common.BaseCuboidBuilder.(BaseCuboidBuilder.java:86)
>   at 
> org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.setup(BaseCuboidMapperBase.java:70)
>   at 
> org.apache.kylin.engine.mr.steps.HiveToBaseCuboidMapper.setup(HiveToBaseCuboidMapper.java:36)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:796)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.io.FileNotFoundException: No such file or directory: 
> 'home/production/bi/kylin/kylin_metadata/resources/GlobalDict/dict/MART.STAR_MAIN_EVENT/DEVICE_ID/version_1499959477799/.index'
> The reason of the exception is that flushIndex in 
> org.apache.kylin.dict.AppendTrieDictionary flushes and closes file after 
> CachedTreeMap is committed. .index file is left in working directory.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KYLIN-2706) Should disable Storage limit push down when singleValuesD doesn't containsAll othersD

2017-07-23 Thread kangkaisen (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16097583#comment-16097583
 ] 

kangkaisen commented on KYLIN-2706:
---

After discussed with [~mahongbin], we thought the root cause of this issue is 
the Comparator of SortedIteratorMergerWithLimit has a bug.
I think we only need to compare with group columns in 
SortedIteratorMergerWithLimit.

> Should disable Storage limit push down when singleValuesD doesn't containsAll 
> othersD
> -
>
> Key: KYLIN-2706
> URL: https://issues.apache.org/jira/browse/KYLIN-2706
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
> Fix For: v2.1.0
>
> Attachments: KYLIN-2706.patch
>
>
> For this SQL, which should disable Storage limit push. Because this SQL will 
> return more than one record from HBase tables, but the 
> SortedIteratorMergerWithLimit only return one record, which will get wrong 
> result.
> {code:java}
> SELECT sum(A) 
> FROM TABLE 
> WHERE date_id >= 20170624 and date_id <= 20170626 
> limit 1
> {code}
> We should disable Storage limit push down when singleValuesD doesn't 
> containsAll othersD



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (KYLIN-2706) Should disable Storage limit push down when singleValuesD doesn't containsAll othersD

2017-07-23 Thread kangkaisen (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kangkaisen updated KYLIN-2706:
--
Attachment: KYLIN-2706.patch

This patch fix the bug for the comparator in SortedIteratorMergerWithLimit

> Should disable Storage limit push down when singleValuesD doesn't containsAll 
> othersD
> -
>
> Key: KYLIN-2706
> URL: https://issues.apache.org/jira/browse/KYLIN-2706
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
> Fix For: v2.1.0
>
> Attachments: KYLIN-2706.patch, KYLIN-2706.patch
>
>
> For this SQL, which should disable Storage limit push. Because this SQL will 
> return more than one record from HBase tables, but the 
> SortedIteratorMergerWithLimit only return one record, which will get wrong 
> result.
> {code:java}
> SELECT sum(A) 
> FROM TABLE 
> WHERE date_id >= 20170624 and date_id <= 20170626 
> limit 1
> {code}
> We should disable Storage limit push down when singleValuesD doesn't 
> containsAll othersD



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (KYLIN-2706) Should disable Storage limit push down when singleValuesD doesn't containsAll othersD

2017-07-23 Thread kangkaisen (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kangkaisen updated KYLIN-2706:
--
Attachment: (was: KYLIN-2706.patch)

> Should disable Storage limit push down when singleValuesD doesn't containsAll 
> othersD
> -
>
> Key: KYLIN-2706
> URL: https://issues.apache.org/jira/browse/KYLIN-2706
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
> Fix For: v2.1.0
>
> Attachments: KYLIN-2706.patch
>
>
> For this SQL, which should disable Storage limit push. Because this SQL will 
> return more than one record from HBase tables, but the 
> SortedIteratorMergerWithLimit only return one record, which will get wrong 
> result.
> {code:java}
> SELECT sum(A) 
> FROM TABLE 
> WHERE date_id >= 20170624 and date_id <= 20170626 
> limit 1
> {code}
> We should disable Storage limit push down when singleValuesD doesn't 
> containsAll othersD



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (KYLIN-2706) Fix the bug for the comparator in SortedIteratorMergerWithLimit

2017-07-23 Thread kangkaisen (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kangkaisen updated KYLIN-2706:
--
Summary: Fix the bug for the comparator in SortedIteratorMergerWithLimit  
(was: Should disable Storage limit push down when singleValuesD doesn't 
containsAll othersD)

> Fix the bug for the comparator in SortedIteratorMergerWithLimit
> ---
>
> Key: KYLIN-2706
> URL: https://issues.apache.org/jira/browse/KYLIN-2706
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
> Fix For: v2.1.0
>
> Attachments: KYLIN-2706.patch
>
>
> For this SQL, which should disable Storage limit push. Because this SQL will 
> return more than one record from HBase tables, but the 
> SortedIteratorMergerWithLimit only return one record, which will get wrong 
> result.
> {code:java}
> SELECT sum(A) 
> FROM TABLE 
> WHERE date_id >= 20170624 and date_id <= 20170626 
> limit 1
> {code}
> We should disable Storage limit push down when singleValuesD doesn't 
> containsAll othersD



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KYLIN-1926) Loosen the constraint on FK-PK data type matching

2017-07-23 Thread kangkaisen (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16097588#comment-16097588
 ] 

kangkaisen commented on KYLIN-1926:
---

OK. Thanks liyang.

> Loosen the constraint on FK-PK data type matching
> -
>
> Key: KYLIN-1926
> URL: https://issues.apache.org/jira/browse/KYLIN-1926
> Project: Kylin
>  Issue Type: Improvement
>  Components: Metadata
>Affects Versions: all
>Reporter: Shaofeng SHI
>Assignee: Shaofeng SHI
>Priority: Minor
> Fix For: v1.5.4
>
> Attachments: 0001-KYLIN-1926-FK-PK-data-type-matching.patch
>
>
> If lookup table's PK datatype isn't equal to fact table's FK datatype, Kylin 
> will report error saying "Primary key are not consistent with Foreign key". 
> This constraint is too strong. Should allow user to disable this check.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KYLIN-2653) Spark cubing support HBase cluster with kerberos

2017-07-23 Thread kangkaisen (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16097589#comment-16097589
 ] 

kangkaisen commented on KYLIN-2653:
---

Update the commit: 
https://github.com/apache/kylin/commit/d8d0395a80cc50fcb59bab4d402c7675aef6cd22

> Spark cubing support HBase cluster with kerberos
> 
>
> Key: KYLIN-2653
> URL: https://issues.apache.org/jira/browse/KYLIN-2653
> Project: Kylin
>  Issue Type: Bug
>  Components: Spark Engine
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
>
> Currently, Spark cubing doesn't support HBase cluster with kerberos.
> Temporarily，we could support HBase cluster with kerberos on Yarn client mode, 
> because which is easy.
> In the long term，we should avoid access HBase in Spark cubing.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Reopened] (KYLIN-1926) Loosen the constraint on FK-PK data type matching

2017-07-26 Thread kangkaisen (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kangkaisen reopened KYLIN-1926:
---

> Loosen the constraint on FK-PK data type matching
> -
>
> Key: KYLIN-1926
> URL: https://issues.apache.org/jira/browse/KYLIN-1926
> Project: Kylin
>  Issue Type: Improvement
>  Components: Metadata
>Affects Versions: all
>Reporter: Shaofeng SHI
>Assignee: Shaofeng SHI
>Priority: Minor
> Fix For: v1.5.4
>
> Attachments: 0001-KYLIN-1926-FK-PK-data-type-matching.patch
>
>
> If lookup table's PK datatype isn't equal to fact table's FK datatype, Kylin 
> will report error saying "Primary key are not consistent with Foreign key". 
> This constraint is too strong. Should allow user to disable this check.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KYLIN-1926) Loosen the constraint on FK-PK data type matching

2017-07-26 Thread kangkaisen (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16101734#comment-16101734
 ] 

kangkaisen commented on KYLIN-1926:
---

We can reproduce this issue easily by changing the KYLIN_SALES.BUYER_ID from 
bigint to int.

Then we query this SQL:
{code:java}
select SUM(price) 
from KYLIN_SALES
inner join KYLIN_ACCOUNT
on KYLIN_SALES.BUYER_ID = KYLIN_ACCOUNT.ACCOUNT_ID
{code}


Finally, the "No model found" error will happen.

The logic chain  for this error is :

1 The datatype for KYLIN_SALES.BUYER_ID and KYLIN_ACCOUNT.ACCOUNT_ID is 
inconsistent

2 Calcite cast BUYER_ID from int to bigint

3 Calcite pushDownJoinConditions  join

4 Calcite create a Project with all KYLIN_SALES column and BUYER_ID cast column

5 Kylin Add the column startwith _KY_ to the context.allColumns

6 real.getAllColumnDescs() don't contain all context.allColumns because 
real.getAllColumnDescs() don't contain the column startwith _KY_ in ModelChooser

7 The "No model found" error will happen in ModelChooser

> Loosen the constraint on FK-PK data type matching
> -
>
> Key: KYLIN-1926
> URL: https://issues.apache.org/jira/browse/KYLIN-1926
> Project: Kylin
>  Issue Type: Improvement
>  Components: Metadata
>Affects Versions: all
>Reporter: Shaofeng SHI
>Assignee: Shaofeng SHI
>Priority: Minor
> Fix For: v1.5.4
>
> Attachments: 0001-KYLIN-1926-FK-PK-data-type-matching.patch
>
>
> If lookup table's PK datatype isn't equal to fact table's FK datatype, Kylin 
> will report error saying "Primary key are not consistent with Foreign key". 
> This constraint is too strong. Should allow user to disable this check.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KYLIN-1926) Loosen the constraint on FK-PK data type matching

2017-07-26 Thread kangkaisen (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16101743#comment-16101743
 ] 

kangkaisen commented on KYLIN-1926:
---

So, I'm sure this is a bug in Kylin.

I notice that Kylin specially handle the column startwith "_KY_"  in 
SqlToRelConverter.hackSelectStar. But it didn't handle this case.

> Loosen the constraint on FK-PK data type matching
> -
>
> Key: KYLIN-1926
> URL: https://issues.apache.org/jira/browse/KYLIN-1926
> Project: Kylin
>  Issue Type: Improvement
>  Components: Metadata
>Affects Versions: all
>Reporter: Shaofeng SHI
>Assignee: Shaofeng SHI
>Priority: Minor
> Fix For: v1.5.4
>
> Attachments: 0001-KYLIN-1926-FK-PK-data-type-matching.patch
>
>
> If lookup table's PK datatype isn't equal to fact table's FK datatype, Kylin 
> will report error saying "Primary key are not consistent with Foreign key". 
> This constraint is too strong. Should allow user to disable this check.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (KYLIN-1926) Loosen the constraint on FK-PK data type matching

2017-07-26 Thread kangkaisen (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16101734#comment-16101734
 ] 

kangkaisen edited comment on KYLIN-1926 at 7/26/17 2:40 PM:


We can reproduce this issue easily by changing the KYLIN_SALES.BUYER_ID from 
bigint to int.

Then we query this SQL:
{code:java}
select SUM(price) 
from KYLIN_SALES
inner join KYLIN_ACCOUNT
on KYLIN_SALES.BUYER_ID = KYLIN_ACCOUNT.ACCOUNT_ID
{code}


Finally, the "No model found" error will happen.

The logic chain  for this error is :

1 The datatype for KYLIN_SALES.BUYER_ID and KYLIN_ACCOUNT.ACCOUNT_ID is 
inconsistent

2 Calcite cast BUYER_ID from int to bigint

3 Calcite pushDownJoinConditions  join

4 Calcite create a Project with all KYLIN_SALES column and BUYER_ID cast column

5 Kylin Add the column startwith _KY_ to the context.allColumns in 
OLAPProjectRel

6 real.getAllColumnDescs() don't contain all context.allColumns because 
real.getAllColumnDescs() don't contain the column startwith _KY_ in ModelChooser

7 The "No model found" error will happen in ModelChooser


was (Author: kangkaisen):
We can reproduce this issue easily by changing the KYLIN_SALES.BUYER_ID from 
bigint to int.

Then we query this SQL:
{code:java}
select SUM(price) 
from KYLIN_SALES
inner join KYLIN_ACCOUNT
on KYLIN_SALES.BUYER_ID = KYLIN_ACCOUNT.ACCOUNT_ID
{code}


Finally, the "No model found" error will happen.

The logic chain  for this error is :

1 The datatype for KYLIN_SALES.BUYER_ID and KYLIN_ACCOUNT.ACCOUNT_ID is 
inconsistent

2 Calcite cast BUYER_ID from int to bigint

3 Calcite pushDownJoinConditions  join

4 Calcite create a Project with all KYLIN_SALES column and BUYER_ID cast column

5 Kylin Add the column startwith _KY_ to the context.allColumns

6 real.getAllColumnDescs() don't contain all context.allColumns because 
real.getAllColumnDescs() don't contain the column startwith _KY_ in ModelChooser

7 The "No model found" error will happen in ModelChooser

> Loosen the constraint on FK-PK data type matching
> -
>
> Key: KYLIN-1926
> URL: https://issues.apache.org/jira/browse/KYLIN-1926
> Project: Kylin
>  Issue Type: Improvement
>  Components: Metadata
>Affects Versions: all
>Reporter: Shaofeng SHI
>Assignee: Shaofeng SHI
>Priority: Minor
> Fix For: v1.5.4
>
> Attachments: 0001-KYLIN-1926-FK-PK-data-type-matching.patch
>
>
> If lookup table's PK datatype isn't equal to fact table's FK datatype, Kylin 
> will report error saying "Primary key are not consistent with Foreign key". 
> This constraint is too strong. Should allow user to disable this check.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (KYLIN-2653) Spark cubing support HBase cluster with kerberos

2017-07-27 Thread kangkaisen (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-2653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kangkaisen resolved KYLIN-2653.
---
   Resolution: Fixed
Fix Version/s: v2.1.0

> Spark cubing support HBase cluster with kerberos
> 
>
> Key: KYLIN-2653
> URL: https://issues.apache.org/jira/browse/KYLIN-2653
> Project: Kylin
>  Issue Type: Bug
>  Components: Spark Engine
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
> Fix For: v2.1.0
>
>
> Currently, Spark cubing doesn't support HBase cluster with kerberos.
> Temporarily，we could support HBase cluster with kerberos on Yarn client mode, 
> because which is easy.
> In the long term，we should avoid access HBase in Spark cubing.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (KYLIN-2740) FileNotFoundException on base cuboid build if GlobalDictionary is used

2017-07-27 Thread kangkaisen (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-2740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kangkaisen closed KYLIN-2740.
-
Resolution: Duplicate

> FileNotFoundException on base cuboid build if GlobalDictionary is used
> --
>
> Key: KYLIN-2740
> URL: https://issues.apache.org/jira/browse/KYLIN-2740
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v2.0.0
>Reporter: Alexander Sterligov
>Assignee: kangkaisen
> Attachments: KYLIN-2740-patch
>
>
> 2017-07-13 15:25:20,515 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : java.lang.RuntimeException: 
> java.io.FileNotFoundException: No such file or directory: 
> 'home/production/bi/kylin/kylin_metadata/resources/GlobalDict/dict/MART.STAR_MAIN_EVENT/DEVICE_ID/version_1499959477799/.index'
>   at 
> org.apache.kylin.dict.DictionaryManager.getDictionaryInfo(DictionaryManager.java:129)
>   at org.apache.kylin.cube.CubeManager.getDictionary(CubeManager.java:264)
>   at org.apache.kylin.cube.CubeSegment.getDictionary(CubeSegment.java:329)
>   at 
> org.apache.kylin.cube.CubeSegment.buildDictionaryMap(CubeSegment.java:321)
>   at 
> org.apache.kylin.engine.mr.common.BaseCuboidBuilder.(BaseCuboidBuilder.java:86)
>   at 
> org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.setup(BaseCuboidMapperBase.java:70)
>   at 
> org.apache.kylin.engine.mr.steps.HiveToBaseCuboidMapper.setup(HiveToBaseCuboidMapper.java:36)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:796)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.io.FileNotFoundException: No such file or directory: 
> 'home/production/bi/kylin/kylin_metadata/resources/GlobalDict/dict/MART.STAR_MAIN_EVENT/DEVICE_ID/version_1499959477799/.index'
> The reason of the exception is that flushIndex in 
> org.apache.kylin.dict.AppendTrieDictionary flushes and closes file after 
> CachedTreeMap is committed. .index file is left in working directory.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (KYLIN-2764) Build the dict for UHC column with MR

2017-07-27 Thread kangkaisen (JIRA)

kangkaisen created KYLIN-2764:
-

 Summary: Build the dict for UHC column with MR
 Key: KYLIN-2764
 URL: https://issues.apache.org/jira/browse/KYLIN-2764
 Project: Kylin
  Issue Type: Improvement
  Components: Job Engine
Affects Versions: v2.0.0
Reporter: kangkaisen
Assignee: kangkaisen


KYLIN-2217 has built dict for  normal column with MR,  but the UHC column still 
build dict in JobServer. Like KYLIN-2217, we also could use MR build dict for 
UHC column. which could thoroughly release the memory pressure and  improve job 
concurrent for JobServer  as well as speed up multi UHC columns procedure.

The MR input is the output of  "Extract Fact Table Distinct Columns", the MR 
output is the UHC column dict. Because it is very hard build global dict with 
multi reducers, I use one reducer handle one UHC column and allocate enough 
memory to the reducer. According to my test, 8G memory is enough.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KYLIN-2653) Spark cubing support HBase cluster with kerberos

2017-07-27 Thread kangkaisen (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16104358#comment-16104358
 ] 

kangkaisen commented on KYLIN-2653:
---

Yes. HBase jars can be removed.

> Spark cubing support HBase cluster with kerberos
> 
>
> Key: KYLIN-2653
> URL: https://issues.apache.org/jira/browse/KYLIN-2653
> Project: Kylin
>  Issue Type: Bug
>  Components: Spark Engine
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
> Fix For: v2.1.0
>
>
> Currently, Spark cubing doesn't support HBase cluster with kerberos.
> Temporarily，we could support HBase cluster with kerberos on Yarn client mode, 
> because which is easy.
> In the long term，we should avoid access HBase in Spark cubing.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (KYLIN-2706) Fix the bug for the comparator in SortedIteratorMergerWithLimit

2017-07-30 Thread kangkaisen (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kangkaisen updated KYLIN-2706:
--
Fix Version/s: (was: v2.1.0)

> Fix the bug for the comparator in SortedIteratorMergerWithLimit
> ---
>
> Key: KYLIN-2706
> URL: https://issues.apache.org/jira/browse/KYLIN-2706
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
> Attachments: KYLIN-2706.patch
>
>
> For this SQL, which should disable Storage limit push. Because this SQL will 
> return more than one record from HBase tables, but the 
> SortedIteratorMergerWithLimit only return one record, which will get wrong 
> result.
> {code:java}
> SELECT sum(A) 
> FROM TABLE 
> WHERE date_id >= 20170624 and date_id <= 20170626 
> limit 1
> {code}
> We should disable Storage limit push down when singleValuesD doesn't 
> containsAll othersD



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KYLIN-2706) Fix the bug for the comparator in SortedIteratorMergerWithLimit

2017-07-30 Thread kangkaisen (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16106374#comment-16106374
 ] 

kangkaisen commented on KYLIN-2706:
---

No. This patch need to review.

> Fix the bug for the comparator in SortedIteratorMergerWithLimit
> ---
>
> Key: KYLIN-2706
> URL: https://issues.apache.org/jira/browse/KYLIN-2706
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
> Attachments: KYLIN-2706.patch
>
>
> For this SQL, which should disable Storage limit push. Because this SQL will 
> return more than one record from HBase tables, but the 
> SortedIteratorMergerWithLimit only return one record, which will get wrong 
> result.
> {code:java}
> SELECT sum(A) 
> FROM TABLE 
> WHERE date_id >= 20170624 and date_id <= 20170626 
> limit 1
> {code}
> We should disable Storage limit push down when singleValuesD doesn't 
> containsAll othersD



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KYLIN-2765) Eliminate restriction on Global Dictionary of Dim columns

2017-08-01 Thread kangkaisen (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16108767#comment-16108767
 ] 

kangkaisen commented on KYLIN-2765:
---

What's the goal of this JIRA?  Use Global Dict for Dimension column or one 
column support two type dicts when one column need to be Dimension and Measure 
at the same time?

> Eliminate restriction on Global Dictionary of Dim columns
> -
>
> Key: KYLIN-2765
> URL: https://issues.apache.org/jira/browse/KYLIN-2765
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine, Metadata, Query Engine
>Reporter: Roger Shi
>Assignee: Dong Li
>
> Cube dimension column is not allow to be in accurate-count-distinct measure. 
> Global Dictionary encoding is a kind of dict in metadata. Dict encoding is 
> created for dim at the beginning, so Global Dictionary is a special one for 
> measure.
> To eliminate the restriction, there're two possible ways in my view. One is 
> move Global Dictionary metadata out of dict section to a new section such as 
> "measure dict" (not there now, create a new one). The other way is handle 
> Global Dictionary differently in both cubing engine and query engine.
> There might be other better methods. Let's discuss here and find a good way 
> out. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KYLIN-2653) Spark cubing support HBase cluster with kerberos

2017-08-07 Thread kangkaisen (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16116526#comment-16116526
 ] 

kangkaisen commented on KYLIN-2653:
---

Hi,liyang.  In that case, How could we get all Kylin config?  reflection? 

> Spark cubing support HBase cluster with kerberos
> 
>
> Key: KYLIN-2653
> URL: https://issues.apache.org/jira/browse/KYLIN-2653
> Project: Kylin
>  Issue Type: Bug
>  Components: Spark Engine
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
> Fix For: v2.2.0
>
>
> Currently, Spark cubing doesn't support HBase cluster with kerberos.
> Temporarily，we could support HBase cluster with kerberos on Yarn client mode, 
> because which is easy.
> In the long term，we should avoid access HBase in Spark cubing.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (KYLIN-2604) Use global dict as the default encoding for precise distinct count in web

2017-08-09 Thread kangkaisen (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kangkaisen updated KYLIN-2604:
--
Fix Version/s: (was: v2.1.0)
   v2.2.0

> Use global dict as the default encoding for precise distinct count in web
> -
>
> Key: KYLIN-2604
> URL: https://issues.apache.org/jira/browse/KYLIN-2604
> Project: Kylin
>  Issue Type: Improvement
>  Components: Web 
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
>Priority: Minor
> Fix For: v2.2.0
>
> Attachments: KYLIN-2604.patch
>
>
> we should use global dict as the default encoding for precise distinct count 
> in web, which more easy-to-use for users.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KYLIN-2604) Use global dict as the default encoding for precise distinct count in web

2017-08-09 Thread kangkaisen (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16119756#comment-16119756
 ] 

kangkaisen commented on KYLIN-2604:
---

Thanks zhixiong. I am sorry to delay KYLIN-2604 to 2.2.0 because KYLIN-2622 is 
still open.

> Use global dict as the default encoding for precise distinct count in web
> -
>
> Key: KYLIN-2604
> URL: https://issues.apache.org/jira/browse/KYLIN-2604
> Project: Kylin
>  Issue Type: Improvement
>  Components: Web 
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
>Priority: Minor
> Fix For: v2.2.0
>
> Attachments: KYLIN-2604.patch
>
>
> we should use global dict as the default encoding for precise distinct count 
> in web, which more easy-to-use for users.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KYLIN-2706) Fix the bug for the comparator in SortedIteratorMergerWithLimit

2017-08-09 Thread kangkaisen (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16119781#comment-16119781
 ] 

kangkaisen commented on KYLIN-2706:
---

Thanks hongbin. 
This is the commit:  
https://github.com/apache/kylin/commit/659eeaedd571c837df3beae44456dadde3036c3d

> Fix the bug for the comparator in SortedIteratorMergerWithLimit
> ---
>
> Key: KYLIN-2706
> URL: https://issues.apache.org/jira/browse/KYLIN-2706
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
> Attachments: KYLIN-2706.patch
>
>
> For this SQL, which should disable Storage limit push. Because this SQL will 
> return more than one record from HBase tables, but the 
> SortedIteratorMergerWithLimit only return one record, which will get wrong 
> result.
> {code:java}
> SELECT sum(A) 
> FROM TABLE 
> WHERE date_id >= 20170624 and date_id <= 20170626 
> limit 1
> {code}
> We should disable Storage limit push down when singleValuesD doesn't 
> containsAll othersD



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (KYLIN-2706) Fix the bug for the comparator in SortedIteratorMergerWithLimit

2017-08-09 Thread kangkaisen (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kangkaisen resolved KYLIN-2706.
---
   Resolution: Fixed
Fix Version/s: v2.2.0

> Fix the bug for the comparator in SortedIteratorMergerWithLimit
> ---
>
> Key: KYLIN-2706
> URL: https://issues.apache.org/jira/browse/KYLIN-2706
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
> Fix For: v2.2.0
>
> Attachments: KYLIN-2706.patch
>
>
> For this SQL, which should disable Storage limit push. Because this SQL will 
> return more than one record from HBase tables, but the 
> SortedIteratorMergerWithLimit only return one record, which will get wrong 
> result.
> {code:java}
> SELECT sum(A) 
> FROM TABLE 
> WHERE date_id >= 20170624 and date_id <= 20170626 
> limit 1
> {code}
> We should disable Storage limit push down when singleValuesD doesn't 
> containsAll othersD



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KYLIN-2653) Spark cubing support HBase cluster with kerberos

2017-08-15 Thread kangkaisen (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16128214#comment-16128214
 ] 

kangkaisen commented on KYLIN-2653:
---

OK. I see. Thanks you, liyang.

> Spark cubing support HBase cluster with kerberos
> 
>
> Key: KYLIN-2653
> URL: https://issues.apache.org/jira/browse/KYLIN-2653
> Project: Kylin
>  Issue Type: Bug
>  Components: Spark Engine
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
> Fix For: v2.2.0
>
>
> Currently, Spark cubing doesn't support HBase cluster with kerberos.
> Temporarily，we could support HBase cluster with kerberos on Yarn client mode, 
> because which is easy.
> In the long term，we should avoid access HBase in Spark cubing.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KYLIN-2606) Only return counter for precise count_distinct if query is exactAggregate

2017-08-18 Thread kangkaisen (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16133967#comment-16133967
 ] 

kangkaisen commented on KYLIN-2606:
---

Thanks hongbin.

> Only return counter for precise count_distinct if query is exactAggregate
> -
>
> Key: KYLIN-2606
> URL: https://issues.apache.org/jira/browse/KYLIN-2606
> Project: Kylin
>  Issue Type: Improvement
>  Components: Query Engine
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
>
> If the query is exactAggregation and has some memory hungry measures, we 
> could directly return final result to speed up the query , reduce the RPC 
> data size and memory usage in queryServer.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (KYLIN-2606) Only return counter for precise count_distinct if query is exactAggregate

2017-08-18 Thread kangkaisen (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-2606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kangkaisen resolved KYLIN-2606.
---
   Resolution: Fixed
Fix Version/s: v2.2.0

> Only return counter for precise count_distinct if query is exactAggregate
> -
>
> Key: KYLIN-2606
> URL: https://issues.apache.org/jira/browse/KYLIN-2606
> Project: Kylin
>  Issue Type: Improvement
>  Components: Query Engine
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
> Fix For: v2.2.0
>
>
> If the query is exactAggregation and has some memory hungry measures, we 
> could directly return final result to speed up the query , reduce the RPC 
> data size and memory usage in queryServer.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KYLIN-2622) AppendTrieDictionary support not global

2017-09-02 Thread kangkaisen (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16151476#comment-16151476
 ] 

kangkaisen commented on KYLIN-2622:
---

This is the commit: 
https://github.com/apache/kylin/commit/ec5dd54e9ea5e373569cd65cab322a17716718ff

> AppendTrieDictionary support not global
> ---
>
> Key: KYLIN-2622
> URL: https://issues.apache.org/jira/browse/KYLIN-2622
> Project: Kylin
>  Issue Type: Improvement
>  Components: Metadata
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
>
> Currently, AppendTrieDictionary only support global dict, which means the 
> dict will grow continuously. But for the cube doesn't have Partition Date 
> Column and the cube  doesn't need aggregate query across segments, we could 
> build AppendTrieDictionary from empty dict every time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KYLIN-2622) AppendTrieDictionary support not global

2017-09-02 Thread kangkaisen (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16151477#comment-16151477
 ] 

kangkaisen commented on KYLIN-2622:
---

The main idea is add a new DictionaryBuilder  SegmentAppendTrieDictBuilder,  
which build AppendTrieDictionary from empty dict every time in different HDFS 
dir, so SegmentAppendTrieDictBuilder needn't lock and support concurrency. 

> AppendTrieDictionary support not global
> ---
>
> Key: KYLIN-2622
> URL: https://issues.apache.org/jira/browse/KYLIN-2622
> Project: Kylin
>  Issue Type: Improvement
>  Components: Metadata
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
>
> Currently, AppendTrieDictionary only support global dict, which means the 
> dict will grow continuously. But for the cube doesn't have Partition Date 
> Column and the cube  doesn't need aggregate query across segments, we could 
> build AppendTrieDictionary from empty dict every time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KYLIN-2764) Build the dict for UHC column with MR

2017-09-02 Thread kangkaisen (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16151478#comment-16151478
 ] 

kangkaisen commented on KYLIN-2764:
---

This is the commit: 
https://github.com/apache/kylin/commit/2607e18b5e17d2a68f4079a76b8c990f144cbbd6.

The core idea is easy, but there are four special points we should note:

1. The FK column in fact table could be UHC column.
2. we could not get correct HDFS working dir from KylinConfig in MR.
3. The one or all UHC columns maybe NULL.
4. There maybe timeout in setup phase of Reducer because of global dict copy 
and lock.

> Build the dict for UHC column with MR
> -
>
> Key: KYLIN-2764
> URL: https://issues.apache.org/jira/browse/KYLIN-2764
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
>
> KYLIN-2217 has built dict for  normal column with MR,  but the UHC column 
> still build dict in JobServer. Like KYLIN-2217, we also could use MR build 
> dict for UHC column. which could thoroughly release the memory pressure and  
> improve job concurrent for JobServer  as well as speed up multi UHC columns 
> procedure.
> The MR input is the output of  "Extract Fact Table Distinct Columns", the MR 
> output is the UHC column dict. Because it is very hard build global dict with 
> multi reducers, I use one reducer handle one UHC column and allocate enough 
> memory to the reducer. According to my test, 8G memory is enough.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (KYLIN-2764) Build the dict for UHC column with MR

2017-09-02 Thread kangkaisen (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kangkaisen updated KYLIN-2764:
--
Attachment: job-memory-before.png
job-memory-after.png

This commit has run a long time in our prod env. 

The two pictures show this commit could remarkably reducer memory usage for 
Kylin JobServer, in addition to this, which could remarkably improve Concurrent 
ability for Kylin JobServer.  After applied this commit, we have removed one 
JobServer from all three JobServers.

> Build the dict for UHC column with MR
> -
>
> Key: KYLIN-2764
> URL: https://issues.apache.org/jira/browse/KYLIN-2764
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
> Attachments: job-memory-after.png, job-memory-before.png
>
>
> KYLIN-2217 has built dict for  normal column with MR,  but the UHC column 
> still build dict in JobServer. Like KYLIN-2217, we also could use MR build 
> dict for UHC column. which could thoroughly release the memory pressure and  
> improve job concurrent for JobServer  as well as speed up multi UHC columns 
> procedure.
> The MR input is the output of  "Extract Fact Table Distinct Columns", the MR 
> output is the UHC column dict. Because it is very hard build global dict with 
> multi reducers, I use one reducer handle one UHC column and allocate enough 
> memory to the reducer. According to my test, 8G memory is enough.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (KYLIN-2604) Use global dict as the default encoding for precise distinct count in web

2017-09-03 Thread kangkaisen (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kangkaisen updated KYLIN-2604:
--
Attachment: 
KYLIN-2602-Non-Int-type-precise-count-distinct-measure-must-set-advanced-dict.patch

This patch add a check in web.   Hi， Zhixiong，please you review this patch, 
Thanks you.

> Use global dict as the default encoding for precise distinct count in web
> -
>
> Key: KYLIN-2604
> URL: https://issues.apache.org/jira/browse/KYLIN-2604
> Project: Kylin
>  Issue Type: Improvement
>  Components: Web 
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
>Priority: Minor
> Fix For: v2.2.0
>
> Attachments: 
> KYLIN-2602-Non-Int-type-precise-count-distinct-measure-must-set-advanced-dict.patch,
>  KYLIN-2604.patch
>
>
> we should use global dict as the default encoding for precise distinct count 
> in web, which more easy-to-use for users.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (KYLIN-2838) Should get storageType in changeHtableHost of CubeMigrationCLI

2017-09-03 Thread kangkaisen (JIRA)

kangkaisen created KYLIN-2838:
-

 Summary: Should get storageType in changeHtableHost of 
CubeMigrationCLI
 Key: KYLIN-2838
 URL: https://issues.apache.org/jira/browse/KYLIN-2838
 Project: Kylin
  Issue Type: Bug
  Components: Tools, Build and Test
Affects Versions: v2.1.0
Reporter: kangkaisen
Assignee: kangkaisen
 Fix For: v2.2.0


We should get storageType in changeHtableHost of CubeMigrationCLI, not 
engineType.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (KYLIN-2838) Should get storageType in changeHtableHost of CubeMigrationCLI

2017-09-03 Thread kangkaisen (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-2838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kangkaisen resolved KYLIN-2838.
---
Resolution: Fixed

> Should get storageType in changeHtableHost of CubeMigrationCLI
> --
>
> Key: KYLIN-2838
> URL: https://issues.apache.org/jira/browse/KYLIN-2838
> Project: Kylin
>  Issue Type: Bug
>  Components: Tools, Build and Test
>Affects Versions: v2.1.0
>Reporter: kangkaisen
>Assignee: kangkaisen
> Fix For: v2.2.0
>
>
> We should get storageType in changeHtableHost of CubeMigrationCLI, not 
> engineType.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KYLIN-2838) Should get storageType in changeHtableHost of CubeMigrationCLI

2017-09-03 Thread kangkaisen (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16151762#comment-16151762
 ] 

kangkaisen commented on KYLIN-2838:
---

This is the commit: 
https://github.com/apache/kylin/commit/78543d6e970cfb9dc85bcc48775681afcdb1c0e9

> Should get storageType in changeHtableHost of CubeMigrationCLI
> --
>
> Key: KYLIN-2838
> URL: https://issues.apache.org/jira/browse/KYLIN-2838
> Project: Kylin
>  Issue Type: Bug
>  Components: Tools, Build and Test
>Affects Versions: v2.1.0
>Reporter: kangkaisen
>Assignee: kangkaisen
> Fix For: v2.2.0
>
>
> We should get storageType in changeHtableHost of CubeMigrationCLI, not 
> engineType.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (KYLIN-2838) Should get storageType in changeHtableHost of CubeMigrationCLI

2017-09-03 Thread kangkaisen (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-2838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kangkaisen closed KYLIN-2838.
-

> Should get storageType in changeHtableHost of CubeMigrationCLI
> --
>
> Key: KYLIN-2838
> URL: https://issues.apache.org/jira/browse/KYLIN-2838
> Project: Kylin
>  Issue Type: Bug
>  Components: Tools, Build and Test
>Affects Versions: v2.1.0
>Reporter: kangkaisen
>Assignee: kangkaisen
> Fix For: v2.2.0
>
>
> We should get storageType in changeHtableHost of CubeMigrationCLI, not 
> engineType.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KYLIN-2604) Use global dict as the default encoding for precise distinct count in web

2017-09-03 Thread kangkaisen (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16151765#comment-16151765
 ] 

kangkaisen commented on KYLIN-2604:
---

The int value is the input type of RoaringBitmap, So int value needn't dict 
encode. In other word, The Int type precise distinct count measure needn't 
global dict.

> Use global dict as the default encoding for precise distinct count in web
> -
>
> Key: KYLIN-2604
> URL: https://issues.apache.org/jira/browse/KYLIN-2604
> Project: Kylin
>  Issue Type: Improvement
>  Components: Web 
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
>Priority: Minor
> Fix For: v2.2.0
>
> Attachments: 
> KYLIN-2602-Non-Int-type-precise-count-distinct-measure-must-set-advanced-dict.patch,
>  KYLIN-2604.patch
>
>
> we should use global dict as the default encoding for precise distinct count 
> in web, which more easy-to-use for users.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KYLIN-2764) Build the dict for UHC column with MR

2017-09-04 Thread kangkaisen (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16152630#comment-16152630
 ] 

kangkaisen commented on KYLIN-2764:
---

I have rebased KYLIN-2622 and KYLIN-2764 on master branch. KYLIN-2622 and 
KYLIN-2764 are both about global dict, So I put those two commit on one branch 
2622-2764 and run IT together.

> Build the dict for UHC column with MR
> -
>
> Key: KYLIN-2764
> URL: https://issues.apache.org/jira/browse/KYLIN-2764
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
> Attachments: job-memory-after.png, job-memory-before.png
>
>
> KYLIN-2217 has built dict for  normal column with MR,  but the UHC column 
> still build dict in JobServer. Like KYLIN-2217, we also could use MR build 
> dict for UHC column. which could thoroughly release the memory pressure and  
> improve job concurrent for JobServer  as well as speed up multi UHC columns 
> procedure.
> The MR input is the output of  "Extract Fact Table Distinct Columns", the MR 
> output is the UHC column dict. Because it is very hard build global dict with 
> multi reducers, I use one reducer handle one UHC column and allocate enough 
> memory to the reducer. According to my test, 8G memory is enough.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KYLIN-2841) LIMIT is buggy with subquery

2017-09-20 Thread kangkaisen (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16174348#comment-16174348
 ] 

kangkaisen commented on KYLIN-2841:
---

Hi, [~zhengd] Thanks you.

I think maybe  we use context.afterAggregate is enough and needn't add a 
afterOuterAggregate variable.  What do you think of it?

> LIMIT is buggy with subquery
> 
>
> Key: KYLIN-2841
> URL: https://issues.apache.org/jira/browse/KYLIN-2841
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v2.1.0
>Reporter: Mu Kong
>Assignee: zhengdong
>  Labels: scope
> Attachments: 0001-KYLIN-2841-LIMIT-is-buggy-with-subquery.patch
>
>
> Hi, all.
> I found that limit in the web UI seems not behaving as expected.
> When I run a query like the follows:
> {code:sql}
> SELECT
>   SUM(col3) AS col4, 
>   SUM(col5) AS total_col5,
>   col1 
> FROM
> (
>   SELECT
> col1,
> col2,
> MAX(col3) AS col3,
> COUNT(*) AS col5
>   FROM db.table
>   WHERE col6 = 'somestring'
>   GROUP BY col1, col2
> )
> GROUP BY col1
> {code}
> When I specify the limit as 50, the result has 19 records, and when I specify 
> the limit as 50, there are 90+ records in the result and each record has 
> higher col4 and total_col5.
> But for query that doesn't have subquery, the result remains the same no 
> matter how I change the limit.
> I guess for the query with subquery, limit somehow limits the number of the 
> result from the inner query instead of the result of the outer query.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KYLIN-2841) LIMIT is buggy with subquery

2017-09-21 Thread kangkaisen (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16176016#comment-16176016
 ] 

kangkaisen commented on KYLIN-2841:
---

Yes. you are right! I think wrong. I am sorry.  Thanks you.

> LIMIT is buggy with subquery
> 
>
> Key: KYLIN-2841
> URL: https://issues.apache.org/jira/browse/KYLIN-2841
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v2.1.0
>Reporter: Mu Kong
>Assignee: zhengdong
>  Labels: scope
> Attachments: 0001-KYLIN-2841-LIMIT-is-buggy-with-subquery.patch
>
>
> Hi, all.
> I found that limit in the web UI seems not behaving as expected.
> When I run a query like the follows:
> {code:sql}
> SELECT
>   SUM(col3) AS col4, 
>   SUM(col5) AS total_col5,
>   col1 
> FROM
> (
>   SELECT
> col1,
> col2,
> MAX(col3) AS col3,
> COUNT(*) AS col5
>   FROM db.table
>   WHERE col6 = 'somestring'
>   GROUP BY col1, col2
> )
> GROUP BY col1
> {code}
> When I specify the limit as 50, the result has 19 records, and when I specify 
> the limit as 50, there are 90+ records in the result and each record has 
> higher col4 and total_col5.
> But for query that doesn't have subquery, the result remains the same no 
> matter how I change the limit.
> I guess for the query with subquery, limit somehow limits the number of the 
> result from the inner query instead of the result of the outer query.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KYLIN-2764) Build the dict for UHC column with MR

2017-09-23 Thread kangkaisen (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16177724#comment-16177724
 ] 

kangkaisen commented on KYLIN-2764:
---

liyang, Thanks very much for your review.

In KYLIN-2135, we use multiple reducers to speed up  "Extract Fact Table 
Distinct Columns"  for UHC column.  This is the reason why I couldn't build 
global dict in {{FactDistinctColumnsReducer}}.

In addition to this point, do you have any other suggestions? 

> Build the dict for UHC column with MR
> -
>
> Key: KYLIN-2764
> URL: https://issues.apache.org/jira/browse/KYLIN-2764
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
> Attachments: job-memory-after.png, job-memory-before.png
>
>
> KYLIN-2217 has built dict for  normal column with MR,  but the UHC column 
> still build dict in JobServer. Like KYLIN-2217, we also could use MR build 
> dict for UHC column. which could thoroughly release the memory pressure and  
> improve job concurrent for JobServer  as well as speed up multi UHC columns 
> procedure.
> The MR input is the output of  "Extract Fact Table Distinct Columns", the MR 
> output is the UHC column dict. Because it is very hard build global dict with 
> multi reducers, I use one reducer handle one UHC column and allocate enough 
> memory to the reducer. According to my test, 8G memory is enough.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KYLIN-2622) AppendTrieDictionary support not global

2017-09-23 Thread kangkaisen (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16177729#comment-16177729
 ] 

kangkaisen commented on KYLIN-2622:
---

Thanks very much for your review.

I believe this feature is useful and it is widely used in our prod env. I will 
write a post about global dict and precise distinct, in that post , I will use 
data and fact explain why this feature is necessary.

> AppendTrieDictionary support not global
> ---
>
> Key: KYLIN-2622
> URL: https://issues.apache.org/jira/browse/KYLIN-2622
> Project: Kylin
>  Issue Type: Improvement
>  Components: Metadata
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
> Fix For: v2.2.0
>
>
> Currently, AppendTrieDictionary only support global dict, which means the 
> dict will grow continuously. But for the cube doesn't have Partition Date 
> Column and the cube  doesn't need aggregate query across segments, we could 
> build AppendTrieDictionary from empty dict every time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KYLIN-2764) Build the dict for UHC column with MR

2017-09-23 Thread kangkaisen (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16178081#comment-16178081
 ] 

kangkaisen commented on KYLIN-2764:
---

If the UHC columns have Hundreds of billions of rows and we use one reducer to 
handle it , the {{FactDistinctColumnsReducer}} will be very very slow.  In 
other words，if we could use multiple reducers to build one global dict, we will 
needn't add a new UHCDictionaryJob, but it is very hard build global dict with 
multi reducers.

> Build the dict for UHC column with MR
> -
>
> Key: KYLIN-2764
> URL: https://issues.apache.org/jira/browse/KYLIN-2764
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
> Attachments: job-memory-after.png, job-memory-before.png
>
>
> KYLIN-2217 has built dict for  normal column with MR,  but the UHC column 
> still build dict in JobServer. Like KYLIN-2217, we also could use MR build 
> dict for UHC column. which could thoroughly release the memory pressure and  
> improve job concurrent for JobServer  as well as speed up multi UHC columns 
> procedure.
> The MR input is the output of  "Extract Fact Table Distinct Columns", the MR 
> output is the UHC column dict. Because it is very hard build global dict with 
> multi reducers, I use one reducer handle one UHC column and allocate enough 
> memory to the reducer. According to my test, 8G memory is enough.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KYLIN-2180) Add project config and make config priority become "cube > project > server"

2017-10-15 Thread kangkaisen (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16205374#comment-16205374
 ] 

kangkaisen commented on KYLIN-2180:
---

HI, julian.

I don't see anywhere I change the ACL in this patch. Could you point out the 
concrete code ？

> Add project config and make config priority become "cube > project > server"
> 
>
> Key: KYLIN-2180
> URL: https://issues.apache.org/jira/browse/KYLIN-2180
> Project: Kylin
>  Issue Type: New Feature
>  Components: Metadata
>Affects Versions: v1.5.4.1
>Reporter: kangkaisen
>Assignee: kangkaisen
> Fix For: v2.0.0
>
> Attachments: KYLIN-2180-refactor-ProjectRequest.patch, 
> KYLIN-2180.patch
>
>
> There are cases we want to override global kylin.properties in the scope of a 
> project. E.g. the queue name of Hadoop job.
> Finally, the config priority for Kylin should be "cube > project > server". I 
> think which is reasonable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KYLIN-2764) Build the dict for UHC column with MR

2017-10-21 Thread kangkaisen (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16214197#comment-16214197
 ] 

kangkaisen commented on KYLIN-2764:
---

Thanks you very much, liyang and shaofeng.

Shaofeng, you should let me do the merge work，thanks you. 

I don't have further change, but there is a issue in 2764 branch:
After KYLIN-2800 
https://github.com/apache/kylin/commit/ac77008ee81d4dcc2956b1a2cfd6eaa7ae9fc5d9
There isn't the first point I had pointed in the comment:
{quote}
1. The FK column in fact table could be UHC column.
{quote}
So the latest commit in 2764 branch coube be simplify, This is the commit to 
apply KYLIN-2800:
https://github.com/apache/kylin/commit/48f3fb1953a413acfdd405539a7cfd211a5e85de.

> Build the dict for UHC column with MR
> -
>
> Key: KYLIN-2764
> URL: https://issues.apache.org/jira/browse/KYLIN-2764
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
> Fix For: v2.3.0
>
> Attachments: job-memory-after.png, job-memory-before.png
>
>
> KYLIN-2217 has built dict for  normal column with MR,  but the UHC column 
> still build dict in JobServer. Like KYLIN-2217, we also could use MR build 
> dict for UHC column. which could thoroughly release the memory pressure and  
> improve job concurrent for JobServer  as well as speed up multi UHC columns 
> procedure.
> The MR input is the output of  "Extract Fact Table Distinct Columns", the MR 
> output is the UHC column dict. Because it is very hard build global dict with 
> multi reducers, I use one reducer handle one UHC column and allocate enough 
> memory to the reducer. According to my test, 8G memory is enough.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KYLIN-2744) Should return correct type for SUM measure in web

2017-10-21 Thread kangkaisen (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16214201#comment-16214201
 ] 

kangkaisen commented on KYLIN-2744:
---

Hi, Zhixiong.  OK, I see.  I will update the patch later. 

> Should return correct type for SUM measure in web
> -
>
> Key: KYLIN-2744
> URL: https://issues.apache.org/jira/browse/KYLIN-2744
> Project: Kylin
>  Issue Type: Bug
>  Components: Web 
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
> Attachments: KYLIN-2744.patch
>
>
> Currently, Kylin return decimal type for the  sum measure of double type, 
> which will result in wrong result. So, We should return correct type for SUM 
> measure in web.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KYLIN-2764) Build the dict for UHC column with MR

2017-10-22 Thread kangkaisen (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16214609#comment-16214609
 ] 

kangkaisen commented on KYLIN-2764:
---

OK. Thanks you, shaofeng!

> Build the dict for UHC column with MR
> -
>
> Key: KYLIN-2764
> URL: https://issues.apache.org/jira/browse/KYLIN-2764
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
> Fix For: v2.3.0
>
> Attachments: job-memory-after.png, job-memory-before.png
>
>
> KYLIN-2217 has built dict for  normal column with MR,  but the UHC column 
> still build dict in JobServer. Like KYLIN-2217, we also could use MR build 
> dict for UHC column. which could thoroughly release the memory pressure and  
> improve job concurrent for JobServer  as well as speed up multi UHC columns 
> procedure.
> The MR input is the output of  "Extract Fact Table Distinct Columns", the MR 
> output is the UHC column dict. Because it is very hard build global dict with 
> multi reducers, I use one reducer handle one UHC column and allocate enough 
> memory to the reducer. According to my test, 8G memory is enough.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (KYLIN-2992) Avoid OOM in CubeHFileJob.Reducer

2017-11-02 Thread kangkaisen (JIRA)

kangkaisen created KYLIN-2992:
-

 Summary: Avoid OOM in  CubeHFileJob.Reducer
 Key: KYLIN-2992
 URL: https://issues.apache.org/jira/browse/KYLIN-2992
 Project: Kylin
  Issue Type: Improvement
  Components: Storage - HBase
Affects Versions: v2.1.0
Reporter: kangkaisen
Assignee: kangkaisen
Priority: Major


Refer to  HBASE-13897, we also could improve CubeHFileJob.Reducer and avoid OOM.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KYLIN-2992) Avoid OOM in CubeHFileJob.Reducer

2017-11-02 Thread kangkaisen (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16237137#comment-16237137
 ] 

kangkaisen commented on KYLIN-2992:
---

The main idea is changing reducer sort to shuffle sort, There are two key 
points:

# implement a {{RowKeyWritable}} class to compare KeyValue with 
KeyValue.KVComparator()
# construct a KeyValue base on cuboid with KeyValue.createFirstOnRow

The more detail could refer this Chinese blog:  
https://blog.bcmeng.com/post/kylin-hfile-improve.html

> Avoid OOM in  CubeHFileJob.Reducer
> --
>
> Key: KYLIN-2992
> URL: https://issues.apache.org/jira/browse/KYLIN-2992
> Project: Kylin
>  Issue Type: Improvement
>  Components: Storage - HBase
>Affects Versions: v2.1.0
>Reporter: kangkaisen
>Assignee: kangkaisen
>Priority: Major
>
> Refer to  HBASE-13897, we also could improve CubeHFileJob.Reducer and avoid 
> OOM.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (KYLIN-2993) Add special mr config for base cuboid step

2017-11-02 Thread kangkaisen (JIRA)

kangkaisen created KYLIN-2993:
-

 Summary: Add special mr config for base cuboid step
 Key: KYLIN-2993
 URL: https://issues.apache.org/jira/browse/KYLIN-2993
 Project: Kylin
  Issue Type: Improvement
  Components: Job Engine
Affects Versions: v2.1.0
Reporter: kangkaisen
Assignee: kangkaisen
Priority: Major


Refer to http://kylin.apache.org/blog/2016/08/01/count-distinct-in-kylin/, 
currently, if user want to enlarge MR memory for global dict, they must use 
kylin.engine.mr.config-override., which will enlarge the memory of  all mr job. 
In fact, we only need to enlarge the memory for "Build Base Cuboid", so we 
could add a special mr config for base cuboid step.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (KYLIN-2994) Handle NPE when load dict in DictionaryManager

2017-11-02 Thread kangkaisen (JIRA)

kangkaisen created KYLIN-2994:
-

 Summary: Handle NPE when load dict in DictionaryManager
 Key: KYLIN-2994
 URL: https://issues.apache.org/jira/browse/KYLIN-2994
 Project: Kylin
  Issue Type: Bug
  Components: Metadata
Affects Versions: v2.1.0
Reporter: kangkaisen
Assignee: kangkaisen
Priority: Minor


Currently, the argument {{resourcePath}} in 
{{DictionaryManager.getDictionaryInfo}} could be NULL



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (KYLIN-2995) Set SparkContext.hadoopConfiguration to HadoopUtil in Spark Cuing

2017-11-02 Thread kangkaisen (JIRA)

kangkaisen created KYLIN-2995:
-

 Summary: Set SparkContext.hadoopConfiguration to HadoopUtil in 
Spark Cuing
 Key: KYLIN-2995
 URL: https://issues.apache.org/jira/browse/KYLIN-2995
 Project: Kylin
  Issue Type: Bug
  Components: Spark Engine
Affects Versions: v2.1.0
Reporter: kangkaisen
Assignee: kangkaisen
Priority: Major


Currenly, we load metadata from HDFS in 
SparkCubing:{{AbstractHadoopJob.loadKylinConfigFromHdfs}}, But HadoopUtil will 
use new Configuration, we should use SparkContext.hadoopConfiguration.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (KYLIN-2996) DeployCoprocessorCLI Log failed tables info

2017-11-02 Thread kangkaisen (JIRA)

kangkaisen created KYLIN-2996:
-

 Summary: DeployCoprocessorCLI Log failed tables info
 Key: KYLIN-2996
 URL: https://issues.apache.org/jira/browse/KYLIN-2996
 Project: Kylin
  Issue Type: Improvement
  Components: Storage - HBase
Affects Versions: v2.1.0
Reporter: kangkaisen
Assignee: kangkaisen


Currently, updating coprocessor will be less likely to fail, we should tell 
user the info in final output.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (KYLIN-2997) Allow change engineType even if there are segments in cube

2017-11-03 Thread kangkaisen (JIRA)

kangkaisen created KYLIN-2997:
-

 Summary: Allow change engineType even if there are segments in cube
 Key: KYLIN-2997
 URL: https://issues.apache.org/jira/browse/KYLIN-2997
 Project: Kylin
  Issue Type: Bug
  Components: Metadata, Web 
Affects Versions: v2.1.0
Reporter: kangkaisen
Assignee: kangkaisen
Priority: Major


Currently, the cube signature contains engineType, if user want to switch 
engine, they must purge the cube firstly. I think which is unreasonable because 
the engine doesn't effect query and existing segments.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (KYLIN-2998) Kill spark app when job was discarded

2017-11-03 Thread kangkaisen (JIRA)

kangkaisen created KYLIN-2998:
-

 Summary: Kill spark app when job was discarded
 Key: KYLIN-2998
 URL: https://issues.apache.org/jira/browse/KYLIN-2998
 Project: Kylin
  Issue Type: Improvement
  Components: Spark Engine
Affects Versions: v2.1.0
Reporter: kangkaisen
Assignee: kangkaisen
Priority: Major


Currently, when we discard spark job, the spark job will still running, and 
when we restart JobServer, the SparkExecutable will submit a new spark job. we 
should handle spark job as mr job.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (KYLIN-2998) Kill spark app when cube job was discarded

2017-11-03 Thread kangkaisen (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kangkaisen updated KYLIN-2998:
--
Summary: Kill spark app when cube job was discarded  (was: Kill spark app 
when job was discarded)

> Kill spark app when cube job was discarded
> --
>
> Key: KYLIN-2998
> URL: https://issues.apache.org/jira/browse/KYLIN-2998
> Project: Kylin
>  Issue Type: Improvement
>  Components: Spark Engine
>Affects Versions: v2.1.0
>Reporter: kangkaisen
>Assignee: kangkaisen
>Priority: Major
>
> Currently, when we discard spark job, the spark job will still running, and 
> when we restart JobServer, the SparkExecutable will submit a new spark job. 
> we should handle spark job as mr job.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (KYLIN-2999) One click migrate cube in web

2017-11-03 Thread kangkaisen (JIRA)

kangkaisen created KYLIN-2999:
-

 Summary: One click migrate cube in web
 Key: KYLIN-2999
 URL: https://issues.apache.org/jira/browse/KYLIN-2999
 Project: Kylin
  Issue Type: New Feature
  Components: Tools, Build and Test, Web 
Reporter: kangkaisen
Assignee: kangkaisen
Priority: Major


Currently, the cube migration must be done by Kylin Admin,  which will waste a 
lot of time for Kylin Admin. So, we should allow use to migrate cube by one 
click in web. Of Course, which is configurable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (KYLIN-3000) Add a tool supporting migrate Cubedesc across different HBase cluster

2017-11-03 Thread kangkaisen (JIRA)

kangkaisen created KYLIN-3000:
-

 Summary: Add a tool supporting migrate Cubedesc across different 
HBase cluster
 Key: KYLIN-3000
 URL: https://issues.apache.org/jira/browse/KYLIN-3000
 Project: Kylin
  Issue Type: New Feature
  Components: Tools, Build and Test
Reporter: kangkaisen
Assignee: kangkaisen
Priority: Major


Add a tool supporting migrate Cubedesc across different HBase cluster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (KYLIN-3002) Use Spark as default engine for none-global-dict cube

2017-11-03 Thread kangkaisen (JIRA)

kangkaisen created KYLIN-3002:
-

 Summary: Use Spark as default engine for none-global-dict cube
 Key: KYLIN-3002
 URL: https://issues.apache.org/jira/browse/KYLIN-3002
 Project: Kylin
  Issue Type: Improvement
  Components: Web 
Reporter: kangkaisen
Assignee: kangkaisen


After KYLIN-2997, like KYLIN-2963, we could use Spark as default engine for 
none-global-dict cube.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KYLIN-2992) Avoid OOM in CubeHFileJob.Reducer

2017-11-04 Thread kangkaisen (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16238898#comment-16238898
 ] 

kangkaisen commented on KYLIN-2992:
---

This is the commit: 
https://github.com/apache/kylin/commit/b837071a6048433a0ec1708f358a62a8e90c2d1a.
This commit has passed the IT.

> Avoid OOM in  CubeHFileJob.Reducer
> --
>
> Key: KYLIN-2992
> URL: https://issues.apache.org/jira/browse/KYLIN-2992
> Project: Kylin
>  Issue Type: Improvement
>  Components: Storage - HBase
>Affects Versions: v2.1.0
>Reporter: kangkaisen
>Assignee: kangkaisen
>Priority: Major
>
> Refer to  HBASE-13897, we also could improve CubeHFileJob.Reducer and avoid 
> OOM.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KYLIN-3002) Use Spark as default engine for none-global-dict cube

2017-11-05 Thread kangkaisen (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16239828#comment-16239828
 ] 

kangkaisen commented on KYLIN-3002:
---

I agree with you, Thanks for your reminder.  I could do this work.

> Use Spark as default engine for none-global-dict cube
> -
>
> Key: KYLIN-3002
> URL: https://issues.apache.org/jira/browse/KYLIN-3002
> Project: Kylin
>  Issue Type: Improvement
>  Components: Web 
>Reporter: kangkaisen
>Assignee: kangkaisen
>Priority: Trivial
>
> After KYLIN-2997, like KYLIN-2963, we could use Spark as default engine for 
> none-global-dict cube.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (KYLIN-3002) Use Spark as default engine in web

2017-11-05 Thread kangkaisen (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kangkaisen updated KYLIN-3002:
--
Summary: Use Spark as default engine in web  (was: Use Spark as default 
engine for none-global-dict cube)

> Use Spark as default engine in web
> --
>
> Key: KYLIN-3002
> URL: https://issues.apache.org/jira/browse/KYLIN-3002
> Project: Kylin
>  Issue Type: Improvement
>  Components: Web 
>Reporter: kangkaisen
>Assignee: kangkaisen
>Priority: Trivial
>
> After KYLIN-2997, like KYLIN-2963, we could use Spark as default engine for 
> none-global-dict cube.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KYLIN-2992) Avoid OOM in CubeHFileJob.Reducer

2017-11-14 Thread kangkaisen (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16251339#comment-16251339
 ] 

kangkaisen commented on KYLIN-2992:
---

Thanks liyang.  Will you review this commit ?

> Avoid OOM in  CubeHFileJob.Reducer
> --
>
> Key: KYLIN-2992
> URL: https://issues.apache.org/jira/browse/KYLIN-2992
> Project: Kylin
>  Issue Type: Improvement
>  Components: Storage - HBase
>Affects Versions: v2.1.0
>Reporter: kangkaisen
>Assignee: kangkaisen
>
> Refer to  HBASE-13897, we also could improve CubeHFileJob.Reducer and avoid 
> OOM.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KYLIN-3055) NullPointerException in MutableRoaringBitmap.or

2017-12-01 Thread kangkaisen (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16274249#comment-16274249
 ] 

kangkaisen commented on KYLIN-3055:
---

Hi Chuqian,  Thanks you very much. This bug is introduced by my KYLIN-2606.

This patch looks good to me, I will test and merge this patch to master.

> NullPointerException in MutableRoaringBitmap.or
> ---
>
> Key: KYLIN-3055
> URL: https://issues.apache.org/jira/browse/KYLIN-3055
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v2.2.0
>Reporter: yuchuqian
>Assignee: yuchuqian
> Fix For: v2.3.0
>
> Attachments: KYLIN-3055.patch
>
>
> 2017-11-21 19:55:17,363 ERROR [Query 
> b1fbcd45-6524-4b1e-8844-1d6d6277a1bf-120] service.QueryService:459 : 
> Exception while executing query
> java.sql.SQLException: Error while executing SQL "select part_dt,
> intersect_count(item_count, part_dt, array[date'2012-01-01']) as first_day,
> intersect_count(item_count, part_dt, array[date'2012-01-02']) as second_day,
> intersect_count(item_count, part_dt, array[date'2012-01-03']) as third_day,
> intersect_count(item_count, part_dt, 
> array[date'2012-01-01',date'2012-01-02']) as retention_oneday,
> intersect_count(item_count, part_dt, 
> array[date'2012-01-01',date'2012-01-02',date'2012-01-03']) as retention_twoday
> from kylin_sales
> where part_dt in (date'2012-01-01',date'2012-01-02',date'2012-01-03')
> group by PART_DT
> LIMIT 5": null
> at org.apache.calcite.avatica.Helper.createException(Helper.java:56)
> at org.apache.calcite.avatica.Helper.createException(Helper.java:41)
> at 
> org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:156)
> at 
> org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:218)
> at org.apache.kylin.rest.service.QueryService.execute(QueryService.java:834)
> at 
> org.apache.kylin.rest.service.QueryService.queryWithSqlMassage(QueryService.java:561)
> at org.apache.kylin.rest.service.QueryService.query(QueryService.java:181)
> at 
> org.apache.kylin.rest.service.QueryService.doQueryWithCache(QueryService.java:415)
> at 
> org.apache.kylin.rest.controller.QueryController.query(QueryController.java:78)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at 
> org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:205)
> ..
> Caused by: java.lang.NullPointerException
> at 
> org.roaringbitmap.buffer.MutableRoaringBitmap.or(MutableRoaringBitmap.java:1041)
> at 
> org.apache.kylin.measure.bitmap.RoaringBitmapCounter.orWith(RoaringBitmapCounter.java:72)
> at 
> org.apache.kylin.measure.bitmap.BitmapIntersectDistinctCountAggFunc$RetentionPartialResult.add(BitmapIntersectDistinctCountAggFunc.java:57)
> at 
> org.apache.kylin.measure.bitmap.BitmapIntersectDistinctCountAggFunc.add(BitmapIntersectDistinctCountAggFunc.java:90)
> at Baz$4.apply(ANONYMOUS.java:136)
> at Baz$4.apply(ANONYMOUS.java:158)
> at Baz$4.apply(ANONYMOUS.java)
> at 
> org.apache.calcite.linq4j.EnumerableDefaults.groupBy_(EnumerableDefaults.java:832)
> at 
> org.apache.calcite.linq4j.EnumerableDefaults.groupBy(EnumerableDefaults.java:761)
> at 
> org.apache.calcite.linq4j.DefaultEnumerable.groupBy(DefaultEnumerable.java:302)
> at Baz.bind(Baz.java:99)
> How to re-produce:
> 1. run $KYLIN_HOME/bin/sample.sh 
> 2. then create a cube like 
> {
>   "uuid": "9554f6f6-74dc-489e-b780-2f48f281576c",
>   "last_modified": 1511247707372,
>   "version": "2.2.0.0",
>   "name": "test",
>   "is_draft": false,
>   "model_name": "kylin_sales_model",
>   "description": "",
>   "null_string": null,
>   "dimensions": [
> {
>   "name": "PART_DT",
>   "table": "KYLIN_SALES",
>   "column": "PART_DT",
>   "derived": null
> },
> {
>   "name": "LEAF_CATEG_ID",
>   "table": "KYLIN_SALES",
>   "column": "LEAF_CATEG_ID",
>   "derived": null
> },
> {
>   "name": "LSTG_SITE_ID",
>   "table": "KYLIN_SALES",
>   "column": "LSTG_SITE_ID",
>   "derived": null
> },
> {
>   "name": "CAL_DT",
>   "table": "KYLIN_CAL_DT",
>   "column": null,
>   "derived": [
> "CAL_DT"
>   ]
> },
> {
>   "name": "LEAF_CATEG_ID",
>   "table": "KYLIN_CATEGORY_GROUPINGS",
>   "column": null,
>   "derived": [
> "LEAF_CATEG_ID"
>   ]
> },
> {
>   "name": "USER_DEFINED_FIELD1",
>   "table": "KYLIN_CATEGORY_GROUPINGS",
>   "column": null,
>   "derived": [
> "USER_DEFINED_FIELD1"
>   ]
>

[jira] [Assigned] (KYLIN-3055) NullPointerException in MutableRoaringBitmap.or

2017-12-03 Thread kangkaisen (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kangkaisen reassigned KYLIN-3055:
-

Assignee: kangkaisen  (was: yuchuqian)

> NullPointerException in MutableRoaringBitmap.or
> ---
>
> Key: KYLIN-3055
> URL: https://issues.apache.org/jira/browse/KYLIN-3055
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v2.2.0
>Reporter: yuchuqian
>Assignee: kangkaisen
> Fix For: v2.3.0
>
> Attachments: KYLIN-3055.patch
>
>
> 2017-11-21 19:55:17,363 ERROR [Query 
> b1fbcd45-6524-4b1e-8844-1d6d6277a1bf-120] service.QueryService:459 : 
> Exception while executing query
> java.sql.SQLException: Error while executing SQL "select part_dt,
> intersect_count(item_count, part_dt, array[date'2012-01-01']) as first_day,
> intersect_count(item_count, part_dt, array[date'2012-01-02']) as second_day,
> intersect_count(item_count, part_dt, array[date'2012-01-03']) as third_day,
> intersect_count(item_count, part_dt, 
> array[date'2012-01-01',date'2012-01-02']) as retention_oneday,
> intersect_count(item_count, part_dt, 
> array[date'2012-01-01',date'2012-01-02',date'2012-01-03']) as retention_twoday
> from kylin_sales
> where part_dt in (date'2012-01-01',date'2012-01-02',date'2012-01-03')
> group by PART_DT
> LIMIT 5": null
> at org.apache.calcite.avatica.Helper.createException(Helper.java:56)
> at org.apache.calcite.avatica.Helper.createException(Helper.java:41)
> at 
> org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:156)
> at 
> org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:218)
> at org.apache.kylin.rest.service.QueryService.execute(QueryService.java:834)
> at 
> org.apache.kylin.rest.service.QueryService.queryWithSqlMassage(QueryService.java:561)
> at org.apache.kylin.rest.service.QueryService.query(QueryService.java:181)
> at 
> org.apache.kylin.rest.service.QueryService.doQueryWithCache(QueryService.java:415)
> at 
> org.apache.kylin.rest.controller.QueryController.query(QueryController.java:78)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at 
> org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:205)
> ..
> Caused by: java.lang.NullPointerException
> at 
> org.roaringbitmap.buffer.MutableRoaringBitmap.or(MutableRoaringBitmap.java:1041)
> at 
> org.apache.kylin.measure.bitmap.RoaringBitmapCounter.orWith(RoaringBitmapCounter.java:72)
> at 
> org.apache.kylin.measure.bitmap.BitmapIntersectDistinctCountAggFunc$RetentionPartialResult.add(BitmapIntersectDistinctCountAggFunc.java:57)
> at 
> org.apache.kylin.measure.bitmap.BitmapIntersectDistinctCountAggFunc.add(BitmapIntersectDistinctCountAggFunc.java:90)
> at Baz$4.apply(ANONYMOUS.java:136)
> at Baz$4.apply(ANONYMOUS.java:158)
> at Baz$4.apply(ANONYMOUS.java)
> at 
> org.apache.calcite.linq4j.EnumerableDefaults.groupBy_(EnumerableDefaults.java:832)
> at 
> org.apache.calcite.linq4j.EnumerableDefaults.groupBy(EnumerableDefaults.java:761)
> at 
> org.apache.calcite.linq4j.DefaultEnumerable.groupBy(DefaultEnumerable.java:302)
> at Baz.bind(Baz.java:99)
> How to re-produce:
> 1. run $KYLIN_HOME/bin/sample.sh 
> 2. then create a cube like 
> {
>   "uuid": "9554f6f6-74dc-489e-b780-2f48f281576c",
>   "last_modified": 1511247707372,
>   "version": "2.2.0.0",
>   "name": "test",
>   "is_draft": false,
>   "model_name": "kylin_sales_model",
>   "description": "",
>   "null_string": null,
>   "dimensions": [
> {
>   "name": "PART_DT",
>   "table": "KYLIN_SALES",
>   "column": "PART_DT",
>   "derived": null
> },
> {
>   "name": "LEAF_CATEG_ID",
>   "table": "KYLIN_SALES",
>   "column": "LEAF_CATEG_ID",
>   "derived": null
> },
> {
>   "name": "LSTG_SITE_ID",
>   "table": "KYLIN_SALES",
>   "column": "LSTG_SITE_ID",
>   "derived": null
> },
> {
>   "name": "CAL_DT",
>   "table": "KYLIN_CAL_DT",
>   "column": null,
>   "derived": [
> "CAL_DT"
>   ]
> },
> {
>   "name": "LEAF_CATEG_ID",
>   "table": "KYLIN_CATEGORY_GROUPINGS",
>   "column": null,
>   "derived": [
> "LEAF_CATEG_ID"
>   ]
> },
> {
>   "name": "USER_DEFINED_FIELD1",
>   "table": "KYLIN_CATEGORY_GROUPINGS",
>   "column": null,
>   "derived": [
> "USER_DEFINED_FIELD1"
>   ]
> },
> {
>   "name": "USER_DEFINED_FIELD3",
>   "table": "KYLIN_CATEGORY_GROUPINGS",
>   "column": null,
>   "derived": [
> "USER

[jira] [Assigned] (KYLIN-3055) NullPointerException in MutableRoaringBitmap.or

2017-12-03 Thread kangkaisen (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kangkaisen reassigned KYLIN-3055:
-

Assignee: yuchuqian  (was: kangkaisen)

> NullPointerException in MutableRoaringBitmap.or
> ---
>
> Key: KYLIN-3055
> URL: https://issues.apache.org/jira/browse/KYLIN-3055
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v2.2.0
>Reporter: yuchuqian
>Assignee: yuchuqian
> Fix For: v2.3.0
>
> Attachments: KYLIN-3055.patch
>
>
> 2017-11-21 19:55:17,363 ERROR [Query 
> b1fbcd45-6524-4b1e-8844-1d6d6277a1bf-120] service.QueryService:459 : 
> Exception while executing query
> java.sql.SQLException: Error while executing SQL "select part_dt,
> intersect_count(item_count, part_dt, array[date'2012-01-01']) as first_day,
> intersect_count(item_count, part_dt, array[date'2012-01-02']) as second_day,
> intersect_count(item_count, part_dt, array[date'2012-01-03']) as third_day,
> intersect_count(item_count, part_dt, 
> array[date'2012-01-01',date'2012-01-02']) as retention_oneday,
> intersect_count(item_count, part_dt, 
> array[date'2012-01-01',date'2012-01-02',date'2012-01-03']) as retention_twoday
> from kylin_sales
> where part_dt in (date'2012-01-01',date'2012-01-02',date'2012-01-03')
> group by PART_DT
> LIMIT 5": null
> at org.apache.calcite.avatica.Helper.createException(Helper.java:56)
> at org.apache.calcite.avatica.Helper.createException(Helper.java:41)
> at 
> org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:156)
> at 
> org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:218)
> at org.apache.kylin.rest.service.QueryService.execute(QueryService.java:834)
> at 
> org.apache.kylin.rest.service.QueryService.queryWithSqlMassage(QueryService.java:561)
> at org.apache.kylin.rest.service.QueryService.query(QueryService.java:181)
> at 
> org.apache.kylin.rest.service.QueryService.doQueryWithCache(QueryService.java:415)
> at 
> org.apache.kylin.rest.controller.QueryController.query(QueryController.java:78)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at 
> org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:205)
> ..
> Caused by: java.lang.NullPointerException
> at 
> org.roaringbitmap.buffer.MutableRoaringBitmap.or(MutableRoaringBitmap.java:1041)
> at 
> org.apache.kylin.measure.bitmap.RoaringBitmapCounter.orWith(RoaringBitmapCounter.java:72)
> at 
> org.apache.kylin.measure.bitmap.BitmapIntersectDistinctCountAggFunc$RetentionPartialResult.add(BitmapIntersectDistinctCountAggFunc.java:57)
> at 
> org.apache.kylin.measure.bitmap.BitmapIntersectDistinctCountAggFunc.add(BitmapIntersectDistinctCountAggFunc.java:90)
> at Baz$4.apply(ANONYMOUS.java:136)
> at Baz$4.apply(ANONYMOUS.java:158)
> at Baz$4.apply(ANONYMOUS.java)
> at 
> org.apache.calcite.linq4j.EnumerableDefaults.groupBy_(EnumerableDefaults.java:832)
> at 
> org.apache.calcite.linq4j.EnumerableDefaults.groupBy(EnumerableDefaults.java:761)
> at 
> org.apache.calcite.linq4j.DefaultEnumerable.groupBy(DefaultEnumerable.java:302)
> at Baz.bind(Baz.java:99)
> How to re-produce:
> 1. run $KYLIN_HOME/bin/sample.sh 
> 2. then create a cube like 
> {
>   "uuid": "9554f6f6-74dc-489e-b780-2f48f281576c",
>   "last_modified": 1511247707372,
>   "version": "2.2.0.0",
>   "name": "test",
>   "is_draft": false,
>   "model_name": "kylin_sales_model",
>   "description": "",
>   "null_string": null,
>   "dimensions": [
> {
>   "name": "PART_DT",
>   "table": "KYLIN_SALES",
>   "column": "PART_DT",
>   "derived": null
> },
> {
>   "name": "LEAF_CATEG_ID",
>   "table": "KYLIN_SALES",
>   "column": "LEAF_CATEG_ID",
>   "derived": null
> },
> {
>   "name": "LSTG_SITE_ID",
>   "table": "KYLIN_SALES",
>   "column": "LSTG_SITE_ID",
>   "derived": null
> },
> {
>   "name": "CAL_DT",
>   "table": "KYLIN_CAL_DT",
>   "column": null,
>   "derived": [
> "CAL_DT"
>   ]
> },
> {
>   "name": "LEAF_CATEG_ID",
>   "table": "KYLIN_CATEGORY_GROUPINGS",
>   "column": null,
>   "derived": [
> "LEAF_CATEG_ID"
>   ]
> },
> {
>   "name": "USER_DEFINED_FIELD1",
>   "table": "KYLIN_CATEGORY_GROUPINGS",
>   "column": null,
>   "derived": [
> "USER_DEFINED_FIELD1"
>   ]
> },
> {
>   "name": "USER_DEFINED_FIELD3",
>   "table": "KYLIN_CATEGORY_GROUPINGS",
>   "column": null,
>   "derived": [
> "USER_

[jira] [Commented] (KYLIN-3055) NullPointerException in MutableRoaringBitmap.or

2017-12-04 Thread kangkaisen (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16276662#comment-16276662
 ] 

kangkaisen commented on KYLIN-3055:
---

Hi Chuqian, I had added a test case for this bug and passed the IT.  Thanks you.

Could you re-submit a new patch with your commit author info or give me your 
commit author info directly? I will commit your patch to master branch.

> NullPointerException in MutableRoaringBitmap.or
> ---
>
> Key: KYLIN-3055
> URL: https://issues.apache.org/jira/browse/KYLIN-3055
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v2.2.0
>Reporter: yuchuqian
>Assignee: yuchuqian
> Fix For: v2.3.0
>
> Attachments: KYLIN-3055.patch
>
>
> 2017-11-21 19:55:17,363 ERROR [Query 
> b1fbcd45-6524-4b1e-8844-1d6d6277a1bf-120] service.QueryService:459 : 
> Exception while executing query
> java.sql.SQLException: Error while executing SQL "select part_dt,
> intersect_count(item_count, part_dt, array[date'2012-01-01']) as first_day,
> intersect_count(item_count, part_dt, array[date'2012-01-02']) as second_day,
> intersect_count(item_count, part_dt, array[date'2012-01-03']) as third_day,
> intersect_count(item_count, part_dt, 
> array[date'2012-01-01',date'2012-01-02']) as retention_oneday,
> intersect_count(item_count, part_dt, 
> array[date'2012-01-01',date'2012-01-02',date'2012-01-03']) as retention_twoday
> from kylin_sales
> where part_dt in (date'2012-01-01',date'2012-01-02',date'2012-01-03')
> group by PART_DT
> LIMIT 5": null
> at org.apache.calcite.avatica.Helper.createException(Helper.java:56)
> at org.apache.calcite.avatica.Helper.createException(Helper.java:41)
> at 
> org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:156)
> at 
> org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:218)
> at org.apache.kylin.rest.service.QueryService.execute(QueryService.java:834)
> at 
> org.apache.kylin.rest.service.QueryService.queryWithSqlMassage(QueryService.java:561)
> at org.apache.kylin.rest.service.QueryService.query(QueryService.java:181)
> at 
> org.apache.kylin.rest.service.QueryService.doQueryWithCache(QueryService.java:415)
> at 
> org.apache.kylin.rest.controller.QueryController.query(QueryController.java:78)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at 
> org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:205)
> ..
> Caused by: java.lang.NullPointerException
> at 
> org.roaringbitmap.buffer.MutableRoaringBitmap.or(MutableRoaringBitmap.java:1041)
> at 
> org.apache.kylin.measure.bitmap.RoaringBitmapCounter.orWith(RoaringBitmapCounter.java:72)
> at 
> org.apache.kylin.measure.bitmap.BitmapIntersectDistinctCountAggFunc$RetentionPartialResult.add(BitmapIntersectDistinctCountAggFunc.java:57)
> at 
> org.apache.kylin.measure.bitmap.BitmapIntersectDistinctCountAggFunc.add(BitmapIntersectDistinctCountAggFunc.java:90)
> at Baz$4.apply(ANONYMOUS.java:136)
> at Baz$4.apply(ANONYMOUS.java:158)
> at Baz$4.apply(ANONYMOUS.java)
> at 
> org.apache.calcite.linq4j.EnumerableDefaults.groupBy_(EnumerableDefaults.java:832)
> at 
> org.apache.calcite.linq4j.EnumerableDefaults.groupBy(EnumerableDefaults.java:761)
> at 
> org.apache.calcite.linq4j.DefaultEnumerable.groupBy(DefaultEnumerable.java:302)
> at Baz.bind(Baz.java:99)
> How to re-produce:
> 1. run $KYLIN_HOME/bin/sample.sh 
> 2. then create a cube like 
> {
>   "uuid": "9554f6f6-74dc-489e-b780-2f48f281576c",
>   "last_modified": 1511247707372,
>   "version": "2.2.0.0",
>   "name": "test",
>   "is_draft": false,
>   "model_name": "kylin_sales_model",
>   "description": "",
>   "null_string": null,
>   "dimensions": [
> {
>   "name": "PART_DT",
>   "table": "KYLIN_SALES",
>   "column": "PART_DT",
>   "derived": null
> },
> {
>   "name": "LEAF_CATEG_ID",
>   "table": "KYLIN_SALES",
>   "column": "LEAF_CATEG_ID",
>   "derived": null
> },
> {
>   "name": "LSTG_SITE_ID",
>   "table": "KYLIN_SALES",
>   "column": "LSTG_SITE_ID",
>   "derived": null
> },
> {
>   "name": "CAL_DT",
>   "table": "KYLIN_CAL_DT",
>   "column": null,
>   "derived": [
> "CAL_DT"
>   ]
> },
> {
>   "name": "LEAF_CATEG_ID",
>   "table": "KYLIN_CATEGORY_GROUPINGS",
>   "column": null,
>   "derived": [
> "LEAF_CATEG_ID"
>   ]
> },
> {
>   "name": "USER_DEFINED_FIELD1",
>   "table": "KYLIN_CATEGORY_GROUPINGS",
>   "

[jira] [Resolved] (KYLIN-2992) Avoid OOM in CubeHFileJob.Reducer

2017-12-04 Thread kangkaisen (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kangkaisen resolved KYLIN-2992.
---
Resolution: Fixed

> Avoid OOM in  CubeHFileJob.Reducer
> --
>
> Key: KYLIN-2992
> URL: https://issues.apache.org/jira/browse/KYLIN-2992
> Project: Kylin
>  Issue Type: Improvement
>  Components: Storage - HBase
>Affects Versions: v2.1.0
>Reporter: kangkaisen
>Assignee: kangkaisen
> Fix For: v2.3.0
>
>
> Refer to  HBASE-13897, we also could improve CubeHFileJob.Reducer and avoid 
> OOM.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (KYLIN-2997) Allow change engineType even if there are segments in cube

2017-12-04 Thread kangkaisen (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kangkaisen updated KYLIN-2997:
--
Attachment: KYLIN-2997.patch

This is the patch

> Allow change engineType even if there are segments in cube
> --
>
> Key: KYLIN-2997
> URL: https://issues.apache.org/jira/browse/KYLIN-2997
> Project: Kylin
>  Issue Type: Bug
>  Components: Metadata, Web 
>Affects Versions: v2.1.0
>Reporter: kangkaisen
>Assignee: kangkaisen
> Attachments: KYLIN-2997.patch
>
>
> Currently, the cube signature contains engineType, if user want to switch 
> engine, they must purge the cube firstly. I think which is unreasonable 
> because the engine doesn't effect query and existing segments.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (KYLIN-2996) DeployCoprocessorCLI Log failed tables info

2017-12-04 Thread kangkaisen (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-2996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kangkaisen updated KYLIN-2996:
--
Attachment: KYLIN-2996.patch

This is the patch

> DeployCoprocessorCLI Log failed tables info
> ---
>
> Key: KYLIN-2996
> URL: https://issues.apache.org/jira/browse/KYLIN-2996
> Project: Kylin
>  Issue Type: Improvement
>  Components: Storage - HBase
>Affects Versions: v2.1.0
>Reporter: kangkaisen
>Assignee: kangkaisen
>Priority: Trivial
> Attachments: KYLIN-2996.patch
>
>
> Currently, updating coprocessor will be less likely to fail, we should tell 
> user the info in final output.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (KYLIN-2993) Add special mr config for base cuboid step

2017-12-04 Thread kangkaisen (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kangkaisen updated KYLIN-2993:
--
Attachment: KYLIN-2993.patch

This is the patch

> Add special mr config for base cuboid step
> --
>
> Key: KYLIN-2993
> URL: https://issues.apache.org/jira/browse/KYLIN-2993
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Affects Versions: v2.1.0
>Reporter: kangkaisen
>Assignee: kangkaisen
> Attachments: KYLIN-2993.patch
>
>
> Refer to http://kylin.apache.org/blog/2016/08/01/count-distinct-in-kylin/, 
> currently, if user want to enlarge MR memory for global dict, they must use 
> kylin.engine.mr.config-override., which will enlarge the memory of  all mr 
> job. In fact, we only need to enlarge the memory for "Build Base Cuboid", so 
> we could add a special mr config for base cuboid step.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (KYLIN-2994) Handle NPE when load dict in DictionaryManager

2017-12-04 Thread kangkaisen (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kangkaisen updated KYLIN-2994:
--
Attachment: KYLIN-2994.patch

This is the patch.

> Handle NPE when load dict in DictionaryManager
> --
>
> Key: KYLIN-2994
> URL: https://issues.apache.org/jira/browse/KYLIN-2994
> Project: Kylin
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: v2.1.0
>Reporter: kangkaisen
>Assignee: kangkaisen
>Priority: Minor
> Attachments: KYLIN-2994.patch
>
>
> Currently, the argument {{resourcePath}} in 
> {{DictionaryManager.getDictionaryInfo}} could be NULL



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KYLIN-2604) Use global dict as the default encoding for precise distinct count in web

2017-12-04 Thread kangkaisen (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16276755#comment-16276755
 ] 

kangkaisen commented on KYLIN-2604:
---

Hi, Zhixiong. If you don't have any other question or advice. I will merge 
KYLIN-2602-Non-Int-type-precise-count-distinct-measure-must-set-advanced-dict.patch
 to master.

> Use global dict as the default encoding for precise distinct count in web
> -
>
> Key: KYLIN-2604
> URL: https://issues.apache.org/jira/browse/KYLIN-2604
> Project: Kylin
>  Issue Type: Improvement
>  Components: Web 
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
>Priority: Minor
> Fix For: v2.2.0
>
> Attachments: 
> KYLIN-2602-Non-Int-type-precise-count-distinct-measure-must-set-advanced-dict.patch,
>  KYLIN-2604.patch
>
>
> we should use global dict as the default encoding for precise distinct count 
> in web, which more easy-to-use for users.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KYLIN-2604) Use global dict as the default encoding for precise distinct count in web

2017-12-04 Thread kangkaisen (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16277914#comment-16277914
 ] 

kangkaisen commented on KYLIN-2604:
---

OK, Thanks Zhixiong. I have merged the second patch to master.

> Use global dict as the default encoding for precise distinct count in web
> -
>
> Key: KYLIN-2604
> URL: https://issues.apache.org/jira/browse/KYLIN-2604
> Project: Kylin
>  Issue Type: Improvement
>  Components: Web 
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
>Priority: Minor
> Fix For: v2.2.0
>
> Attachments: 
> KYLIN-2602-Non-Int-type-precise-count-distinct-measure-must-set-advanced-dict.patch,
>  KYLIN-2604.patch
>
>
> we should use global dict as the default encoding for precise distinct count 
> in web, which more easy-to-use for users.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (KYLIN-2995) Set SparkContext.hadoopConfiguration to HadoopUtil in Spark Cuing

2017-12-05 Thread kangkaisen (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-2995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kangkaisen updated KYLIN-2995:
--
Attachment: KYLIN-2995.patch

This is the patch. This patch has passed IT.

> Set SparkContext.hadoopConfiguration to HadoopUtil in Spark Cuing
> -
>
> Key: KYLIN-2995
> URL: https://issues.apache.org/jira/browse/KYLIN-2995
> Project: Kylin
>  Issue Type: Bug
>  Components: Spark Engine
>Affects Versions: v2.1.0
>Reporter: kangkaisen
>Assignee: kangkaisen
> Attachments: KYLIN-2995.patch
>
>
> Currenly, we load metadata from HDFS in 
> SparkCubing:{{AbstractHadoopJob.loadKylinConfigFromHdfs}}, But HadoopUtil 
> will use new Configuration, we should use SparkContext.hadoopConfiguration.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (KYLIN-2999) One click migrate cube in web

2017-12-05 Thread kangkaisen (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kangkaisen updated KYLIN-2999:
--
Attachment: KYLIN-2999.patch

This is the patch

> One click migrate cube in web
> -
>
> Key: KYLIN-2999
> URL: https://issues.apache.org/jira/browse/KYLIN-2999
> Project: Kylin
>  Issue Type: New Feature
>  Components: Tools, Build and Test, Web 
>Reporter: kangkaisen
>Assignee: kangkaisen
> Attachments: KYLIN-2999.patch
>
>
> Currently, the cube migration must be done by Kylin Admin,  which will waste 
> a lot of time for Kylin Admin. So, we should allow use to migrate cube by one 
> click in web. Of Course, which is configurable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KYLIN-3055) NullPointerException in MutableRoaringBitmap.or

2017-12-06 Thread kangkaisen (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16280098#comment-16280098
 ] 

kangkaisen commented on KYLIN-3055:
---

Hi Chuqian, This is the commit: 
https://github.com/apache/kylin/commit/9265e150d80519d3e4f532c5f106e6718543daba.
 Thanks you.
Welcome more contributions!

> NullPointerException in MutableRoaringBitmap.or
> ---
>
> Key: KYLIN-3055
> URL: https://issues.apache.org/jira/browse/KYLIN-3055
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v2.2.0
>Reporter: yuchuqian
>Assignee: yuchuqian
> Fix For: v2.3.0
>
> Attachments: KYLIN-3055.patch
>
>
> 2017-11-21 19:55:17,363 ERROR [Query 
> b1fbcd45-6524-4b1e-8844-1d6d6277a1bf-120] service.QueryService:459 : 
> Exception while executing query
> java.sql.SQLException: Error while executing SQL "select part_dt,
> intersect_count(item_count, part_dt, array[date'2012-01-01']) as first_day,
> intersect_count(item_count, part_dt, array[date'2012-01-02']) as second_day,
> intersect_count(item_count, part_dt, array[date'2012-01-03']) as third_day,
> intersect_count(item_count, part_dt, 
> array[date'2012-01-01',date'2012-01-02']) as retention_oneday,
> intersect_count(item_count, part_dt, 
> array[date'2012-01-01',date'2012-01-02',date'2012-01-03']) as retention_twoday
> from kylin_sales
> where part_dt in (date'2012-01-01',date'2012-01-02',date'2012-01-03')
> group by PART_DT
> LIMIT 5": null
> at org.apache.calcite.avatica.Helper.createException(Helper.java:56)
> at org.apache.calcite.avatica.Helper.createException(Helper.java:41)
> at 
> org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:156)
> at 
> org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:218)
> at org.apache.kylin.rest.service.QueryService.execute(QueryService.java:834)
> at 
> org.apache.kylin.rest.service.QueryService.queryWithSqlMassage(QueryService.java:561)
> at org.apache.kylin.rest.service.QueryService.query(QueryService.java:181)
> at 
> org.apache.kylin.rest.service.QueryService.doQueryWithCache(QueryService.java:415)
> at 
> org.apache.kylin.rest.controller.QueryController.query(QueryController.java:78)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at 
> org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:205)
> ..
> Caused by: java.lang.NullPointerException
> at 
> org.roaringbitmap.buffer.MutableRoaringBitmap.or(MutableRoaringBitmap.java:1041)
> at 
> org.apache.kylin.measure.bitmap.RoaringBitmapCounter.orWith(RoaringBitmapCounter.java:72)
> at 
> org.apache.kylin.measure.bitmap.BitmapIntersectDistinctCountAggFunc$RetentionPartialResult.add(BitmapIntersectDistinctCountAggFunc.java:57)
> at 
> org.apache.kylin.measure.bitmap.BitmapIntersectDistinctCountAggFunc.add(BitmapIntersectDistinctCountAggFunc.java:90)
> at Baz$4.apply(ANONYMOUS.java:136)
> at Baz$4.apply(ANONYMOUS.java:158)
> at Baz$4.apply(ANONYMOUS.java)
> at 
> org.apache.calcite.linq4j.EnumerableDefaults.groupBy_(EnumerableDefaults.java:832)
> at 
> org.apache.calcite.linq4j.EnumerableDefaults.groupBy(EnumerableDefaults.java:761)
> at 
> org.apache.calcite.linq4j.DefaultEnumerable.groupBy(DefaultEnumerable.java:302)
> at Baz.bind(Baz.java:99)
> How to re-produce:
> 1. run $KYLIN_HOME/bin/sample.sh 
> 2. then create a cube like 
> {
>   "uuid": "9554f6f6-74dc-489e-b780-2f48f281576c",
>   "last_modified": 1511247707372,
>   "version": "2.2.0.0",
>   "name": "test",
>   "is_draft": false,
>   "model_name": "kylin_sales_model",
>   "description": "",
>   "null_string": null,
>   "dimensions": [
> {
>   "name": "PART_DT",
>   "table": "KYLIN_SALES",
>   "column": "PART_DT",
>   "derived": null
> },
> {
>   "name": "LEAF_CATEG_ID",
>   "table": "KYLIN_SALES",
>   "column": "LEAF_CATEG_ID",
>   "derived": null
> },
> {
>   "name": "LSTG_SITE_ID",
>   "table": "KYLIN_SALES",
>   "column": "LSTG_SITE_ID",
>   "derived": null
> },
> {
>   "name": "CAL_DT",
>   "table": "KYLIN_CAL_DT",
>   "column": null,
>   "derived": [
> "CAL_DT"
>   ]
> },
> {
>   "name": "LEAF_CATEG_ID",
>   "table": "KYLIN_CATEGORY_GROUPINGS",
>   "column": null,
>   "derived": [
> "LEAF_CATEG_ID"
>   ]
> },
> {
>   "name": "USER_DEFINED_FIELD1",
>   "table": "KYLIN_CATEGORY_GROUPINGS",
>   "column": null,
>   "derived": [
> "USER_DEFINED_FIELD1"
>   ]

[jira] [Resolved] (KYLIN-3055) NullPointerException in MutableRoaringBitmap.or

2017-12-06 Thread kangkaisen (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kangkaisen resolved KYLIN-3055.
---
Resolution: Fixed

> NullPointerException in MutableRoaringBitmap.or
> ---
>
> Key: KYLIN-3055
> URL: https://issues.apache.org/jira/browse/KYLIN-3055
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v2.2.0
>Reporter: yuchuqian
>Assignee: yuchuqian
> Fix For: v2.3.0
>
> Attachments: KYLIN-3055.patch
>
>
> 2017-11-21 19:55:17,363 ERROR [Query 
> b1fbcd45-6524-4b1e-8844-1d6d6277a1bf-120] service.QueryService:459 : 
> Exception while executing query
> java.sql.SQLException: Error while executing SQL "select part_dt,
> intersect_count(item_count, part_dt, array[date'2012-01-01']) as first_day,
> intersect_count(item_count, part_dt, array[date'2012-01-02']) as second_day,
> intersect_count(item_count, part_dt, array[date'2012-01-03']) as third_day,
> intersect_count(item_count, part_dt, 
> array[date'2012-01-01',date'2012-01-02']) as retention_oneday,
> intersect_count(item_count, part_dt, 
> array[date'2012-01-01',date'2012-01-02',date'2012-01-03']) as retention_twoday
> from kylin_sales
> where part_dt in (date'2012-01-01',date'2012-01-02',date'2012-01-03')
> group by PART_DT
> LIMIT 5": null
> at org.apache.calcite.avatica.Helper.createException(Helper.java:56)
> at org.apache.calcite.avatica.Helper.createException(Helper.java:41)
> at 
> org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:156)
> at 
> org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:218)
> at org.apache.kylin.rest.service.QueryService.execute(QueryService.java:834)
> at 
> org.apache.kylin.rest.service.QueryService.queryWithSqlMassage(QueryService.java:561)
> at org.apache.kylin.rest.service.QueryService.query(QueryService.java:181)
> at 
> org.apache.kylin.rest.service.QueryService.doQueryWithCache(QueryService.java:415)
> at 
> org.apache.kylin.rest.controller.QueryController.query(QueryController.java:78)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at 
> org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:205)
> ..
> Caused by: java.lang.NullPointerException
> at 
> org.roaringbitmap.buffer.MutableRoaringBitmap.or(MutableRoaringBitmap.java:1041)
> at 
> org.apache.kylin.measure.bitmap.RoaringBitmapCounter.orWith(RoaringBitmapCounter.java:72)
> at 
> org.apache.kylin.measure.bitmap.BitmapIntersectDistinctCountAggFunc$RetentionPartialResult.add(BitmapIntersectDistinctCountAggFunc.java:57)
> at 
> org.apache.kylin.measure.bitmap.BitmapIntersectDistinctCountAggFunc.add(BitmapIntersectDistinctCountAggFunc.java:90)
> at Baz$4.apply(ANONYMOUS.java:136)
> at Baz$4.apply(ANONYMOUS.java:158)
> at Baz$4.apply(ANONYMOUS.java)
> at 
> org.apache.calcite.linq4j.EnumerableDefaults.groupBy_(EnumerableDefaults.java:832)
> at 
> org.apache.calcite.linq4j.EnumerableDefaults.groupBy(EnumerableDefaults.java:761)
> at 
> org.apache.calcite.linq4j.DefaultEnumerable.groupBy(DefaultEnumerable.java:302)
> at Baz.bind(Baz.java:99)
> How to re-produce:
> 1. run $KYLIN_HOME/bin/sample.sh 
> 2. then create a cube like 
> {
>   "uuid": "9554f6f6-74dc-489e-b780-2f48f281576c",
>   "last_modified": 1511247707372,
>   "version": "2.2.0.0",
>   "name": "test",
>   "is_draft": false,
>   "model_name": "kylin_sales_model",
>   "description": "",
>   "null_string": null,
>   "dimensions": [
> {
>   "name": "PART_DT",
>   "table": "KYLIN_SALES",
>   "column": "PART_DT",
>   "derived": null
> },
> {
>   "name": "LEAF_CATEG_ID",
>   "table": "KYLIN_SALES",
>   "column": "LEAF_CATEG_ID",
>   "derived": null
> },
> {
>   "name": "LSTG_SITE_ID",
>   "table": "KYLIN_SALES",
>   "column": "LSTG_SITE_ID",
>   "derived": null
> },
> {
>   "name": "CAL_DT",
>   "table": "KYLIN_CAL_DT",
>   "column": null,
>   "derived": [
> "CAL_DT"
>   ]
> },
> {
>   "name": "LEAF_CATEG_ID",
>   "table": "KYLIN_CATEGORY_GROUPINGS",
>   "column": null,
>   "derived": [
> "LEAF_CATEG_ID"
>   ]
> },
> {
>   "name": "USER_DEFINED_FIELD1",
>   "table": "KYLIN_CATEGORY_GROUPINGS",
>   "column": null,
>   "derived": [
> "USER_DEFINED_FIELD1"
>   ]
> },
> {
>   "name": "USER_DEFINED_FIELD3",
>   "table": "KYLIN_CATEGORY_GROUPINGS",
>   "column": null,
>   "derived": [
> "USER_DEFINED_FIELD3"
>   ]

[jira] [Commented] (KYLIN-2999) One click migrate cube in web

2017-12-06 Thread kangkaisen (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16280117#comment-16280117
 ] 

kangkaisen commented on KYLIN-2999:
---

The this feature has the following set of configurations:

||Property||Description||Default value||isRequired||
|kylin.tool.auto-migrate-cube.enabled|Whether enable this feature|false|true|
|kylin.tool.auto-migrate-cube.src-config |The kylin.properties file path for 
source server|""|true|
|kylin.tool.auto-migrate-cube.dest-config |The kylin.properties file path for 
target server|""|true|
|kylin.tool.auto-migrate-cube.copy-acl|Whether copy cube ACL to target 
server|true|false|
|kylin.tool.auto-migrate-cube.purge-src-cube |Whether purge the cube from src 
server after the migration |true|false|

> One click migrate cube in web
> -
>
> Key: KYLIN-2999
> URL: https://issues.apache.org/jira/browse/KYLIN-2999
> Project: Kylin
>  Issue Type: New Feature
>  Components: Tools, Build and Test, Web 
>Reporter: kangkaisen
>Assignee: kangkaisen
> Attachments: KYLIN-2999.patch
>
>
> Currently, the cube migration must be done by Kylin Admin,  which will waste 
> a lot of time for Kylin Admin. So, we should allow use to migrate cube by one 
> click in web. Of Course, which is configurable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KYLIN-2999) One click migrate cube in web

2017-12-06 Thread kangkaisen (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16280123#comment-16280123
 ] 

kangkaisen commented on KYLIN-2999:
---

The Kylin Admin could enable this feature project by project (even cube level) 
according to their user's familiarity.

> One click migrate cube in web
> -
>
> Key: KYLIN-2999
> URL: https://issues.apache.org/jira/browse/KYLIN-2999
> Project: Kylin
>  Issue Type: New Feature
>  Components: Tools, Build and Test, Web 
>Reporter: kangkaisen
>Assignee: kangkaisen
> Attachments: KYLIN-2999.patch
>
>
> Currently, the cube migration must be done by Kylin Admin,  which will waste 
> a lot of time for Kylin Admin. So, we should allow use to migrate cube by one 
> click in web. Of Course, which is configurable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KYLIN-2995) Set SparkContext.hadoopConfiguration to HadoopUtil in Spark Cubing

2017-12-06 Thread kangkaisen (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16281254#comment-16281254
 ] 

kangkaisen commented on KYLIN-2995:
---

Not about performance, It's a bug.

Like the method {{bindCurrentConfiguration}}  in {{KylinMapper}} and 
{{KylinReducer}}, All MR job must call this method first, Because we must 
ensure we use the  {{context.getConfiguration()}} for HDFS, not the default 
Configuration. It's the same thing in Spark.

For example， If the following config exists in Kylin server's mountTable.xml,  
doesn't exists in DN node's mountTable.xml. When Kylin Spark job visit 
hdfs:///kylin, The  {{FileNotFoundException}} will throw.

{code:java}
  
fs.viewfs.mounttable..link./kylin
hdfs:///kylin
  
{code}


> Set SparkContext.hadoopConfiguration to HadoopUtil in Spark Cubing
> --
>
> Key: KYLIN-2995
> URL: https://issues.apache.org/jira/browse/KYLIN-2995
> Project: Kylin
>  Issue Type: Bug
>  Components: Spark Engine
>Affects Versions: v2.1.0
>Reporter: kangkaisen
>Assignee: kangkaisen
> Attachments: KYLIN-2995.patch
>
>
> Currenly, we load metadata from HDFS in 
> SparkCubing:{{AbstractHadoopJob.loadKylinConfigFromHdfs}}, But HadoopUtil 
> will use new Configuration, we should use SparkContext.hadoopConfiguration.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KYLIN-3087) DistributedLock in GlobalDictionaryBuilder may not release

2017-12-08 Thread kangkaisen (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16283535#comment-16283535
 ] 

kangkaisen commented on KYLIN-3087:
---

Hi Fangyuan, Thanks you. This patch looks good to me, But doesn't have your 
author info.  Please re-submit a new patch with your author info by following 
the guide here:  https://kylin.apache.org/development/howto_contribute.html.  I 
will merge your patch to master branch, Thanks you.



> DistributedLock in GlobalDictionaryBuilder may not release
> --
>
> Key: KYLIN-3087
> URL: https://issues.apache.org/jira/browse/KYLIN-3087
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine
>Affects Versions: v2.2.0
>Reporter: Fangyuan Deng
>Assignee: Fangyuan Deng
> Attachments: KYLIN-3087.patch
>
>
> In GlobalDictionaryBuilder.init(),
> this.builder = new AppendTrieDictionaryBuilder(baseDir, maxEntriesPerSlice, 
> true);
> if this line has exception, the DistributedLock will not release, and other 
> jobs can not run.
> so,I added a try catch.
> try {
> this.builder = new AppendTrieDictionaryBuilder(baseDir, 
> maxEntriesPerSlice, true);
> } catch (Throwable e) {
> lock.unlock(getLockPath(sourceColumn));
> throw new RuntimeException(String.format("Failed to create global 
> dictionary on %s ", sourceColumn), e);
> }



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KYLIN-2997) Allow change engineType even if there are segments in cube

2017-12-12 Thread kangkaisen (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16287299#comment-16287299
 ] 

kangkaisen commented on KYLIN-2997:
---

I think this patch won't.

For  {{checkSignature}} method:

{code:java}
if (!kylinVersion.isCompatibleWith(cubeVersion)) {
logger.info("checkSignature on {} is skipped as the its version {} 
is different from kylin version {}", getName(), cubeVersion, kylinVersion);
return true;
}
{code}


For {{consistentWith}} method, {{calculateSignature}} won't include engineType 
any more.


> Allow change engineType even if there are segments in cube
> --
>
> Key: KYLIN-2997
> URL: https://issues.apache.org/jira/browse/KYLIN-2997
> Project: Kylin
>  Issue Type: Bug
>  Components: Metadata, Web 
>Affects Versions: v2.1.0
>Reporter: kangkaisen
>Assignee: kangkaisen
> Attachments: KYLIN-2997.patch
>
>
> Currently, the cube signature contains engineType, if user want to switch 
> engine, they must purge the cube firstly. I think which is unreasonable 
> because the engine doesn't effect query and existing segments.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KYLIN-3089) Query exception on SortedIteratorMergerWithLimit

2017-12-12 Thread kangkaisen (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16288656#comment-16288656
 ] 

kangkaisen commented on KYLIN-3089:
---

Hi, Yang hao. Thanks you report this issue.

I think the root cause is for fixed length string, the Comparator in 
SortMergedPartitionResultIterator is different from the Comparator in 
SortedIteratorMergerWithLimit.

I think we could fix this bug by disable limit push down for fixed length 
string.

Please go ahead, Thanks you.

> Query exception on SortedIteratorMergerWithLimit
> 
>
> Key: KYLIN-3089
> URL: https://issues.apache.org/jira/browse/KYLIN-3089
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.1.0
>Reporter: Yang Hao
>
> The executing error only exists on some special case. I have a simple sql, 
> and the query is routing onto SortedIteratorMergerWithLimit. When iterate 
> data, it triggers such error
> {code:java}
>//TODO: remove this check when validated
> if (last != null) {
> if (comparator.compare(last, fetched) > 0)
> throw new IllegalStateException("Not sorted! last: " + 
> last + " fetched: " + fetched);
> }
> {code}
> sql is as belows. 
> {code:java}
> select "DATE",appid,dim_1,dim_2, sum(uv) as uv
> from table_1
> where appid =  and "DATE" = 2017  
> group by "DATE",appid,dim_1,dim_2
> limit 5
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KYLIN-2996) DeployCoprocessorCLI Log failed tables info

2017-12-13 Thread kangkaisen (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16290467#comment-16290467
 ] 

kangkaisen commented on KYLIN-2996:
---

OK，Thanks Liyang!

> DeployCoprocessorCLI Log failed tables info
> ---
>
> Key: KYLIN-2996
> URL: https://issues.apache.org/jira/browse/KYLIN-2996
> Project: Kylin
>  Issue Type: Improvement
>  Components: Storage - HBase
>Affects Versions: v2.1.0
>Reporter: kangkaisen
>Assignee: kangkaisen
>Priority: Trivial
> Fix For: v2.3.0
>
> Attachments: KYLIN-2996.patch
>
>
> Currently, updating coprocessor will be less likely to fail, we should tell 
> user the info in final output.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KYLIN-2994) Handle NPE when load dict in DictionaryManager

2017-12-13 Thread kangkaisen (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16290469#comment-16290469
 ] 

kangkaisen commented on KYLIN-2994:
---

OK, Thanks Liyang!

> Handle NPE when load dict in DictionaryManager
> --
>
> Key: KYLIN-2994
> URL: https://issues.apache.org/jira/browse/KYLIN-2994
> Project: Kylin
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: v2.1.0
>Reporter: kangkaisen
>Assignee: kangkaisen
>Priority: Minor
> Fix For: v2.3.0
>
> Attachments: KYLIN-2994.patch
>
>
> Currently, the argument {{resourcePath}} in 
> {{DictionaryManager.getDictionaryInfo}} could be NULL



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KYLIN-2993) Add special mr config for base cuboid step

2017-12-13 Thread kangkaisen (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16290471#comment-16290471
 ] 

kangkaisen commented on KYLIN-2993:
---

OK, Thanks Liyang!

> Add special mr config for base cuboid step
> --
>
> Key: KYLIN-2993
> URL: https://issues.apache.org/jira/browse/KYLIN-2993
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Affects Versions: v2.1.0
>Reporter: kangkaisen
>Assignee: kangkaisen
> Fix For: v2.3.0
>
> Attachments: KYLIN-2993.patch
>
>
> Refer to http://kylin.apache.org/blog/2016/08/01/count-distinct-in-kylin/, 
> currently, if user want to enlarge MR memory for global dict, they must use 
> kylin.engine.mr.config-override., which will enlarge the memory of  all mr 
> job. In fact, we only need to enlarge the memory for "Build Base Cuboid", so 
> we could add a special mr config for base cuboid step.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (KYLIN-3091) A problem about retention rate analyze

2017-12-14 Thread kangkaisen (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kangkaisen reassigned KYLIN-3091:
-

Assignee: kangkaisen  (was: Yerui Sun)

> A problem about retention rate analyze
> --
>
> Key: KYLIN-3091
> URL: https://issues.apache.org/jira/browse/KYLIN-3091
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.0.0
> Environment: hbase 0.98.8-hadoop2
>Reporter: WangSheng
>Assignee: kangkaisen
>
> I found that kylin supported retention rate analyze function, so I made some 
> test for this function. The following SQL executed successful:
> {code:java}
> select city, version,
> intersect_count(uuid, dt, array['20161014', '20161015']) as retention_oneday,
> intersect_count(uuid, dt, array['20161014', '20161015', '20161016']) as 
> retention_twoday
> from visit_log
> where dt in ('2016104', '20161015', '20161016')
> group by city, version
> {code}
> but, other SQLs executed failed like this:
> {code:java}
> select city,
> intersect_count(uuid, dt, array['20161014', '20161015']) as retention_oneday
> from visit_log 
> where dt in ('2016104', '20161015',) 
> group by city, version
> select city, version,
> intersect_count(uuid, dt, array['20161014', '20161015', '20161016']) as 
> retention_twoday
> from visit_log 
> where dt in ('2016104', '20161015', '20161016') 
> group by city, version
> {code}
> which means I cannot use just one intersect_count UDAF in a SQL, at lease two 
> intersect_count. My kylin version is kylin 2.0.0-hbase 0.98.8, and here is 
> the error log:
> {code:java}
> Caused by: java.lang.IndexOutOfBoundsException: Index: 2, Size: 2
> at java.util.ArrayList.rangeCheck(ArrayList.java:635)
> at java.util.ArrayList.get(ArrayList.java:411)
> at 
> org.apache.kylin.query.relnode.ColumnRowType.getColumnByIndex(ColumnRowType.java:49)
> at 
> org.apache.kylin.query.relnode.OLAPAggregateRel.fillbackOptimizedColumn(OLAPAggregateRel.java:396)
> at 
> org.apache.kylin.query.relnode.OLAPAggregateRel.buildRewriteFieldsAndMetricsColumns(OLAPAggregateRel.java:347)
> at 
> org.apache.kylin.query.relnode.OLAPAggregateRel.implementRewrite(OLAPAggregateRel.java:283)
> at 
> org.apache.kylin.query.relnode.OLAPRel$RewriteImplementor.visitChild(OLAPRel.java:158)
> at 
> org.apache.kylin.query.relnode.OLAPLimitRel.implementRewrite(OLAPLimitRel.java:107)
> at 
> org.apache.kylin.query.relnode.OLAPRel$RewriteImplementor.visitChild(OLAPRel.java:158)
> at 
> org.apache.kylin.query.relnode.OLAPToEnumerableConverter.implement(OLAPToEnumerableConverter.java:100)
> at 
> org.apache.calcite.adapter.enumerable.EnumerableRelImplementor.implementRoot(EnumerableRelImplementor.java:108)
> at 
> org.apache.calcite.adapter.enumerable.EnumerableInterpretable.toBindable(EnumerableInterpretable.java:92)
> at 
> org.apache.calcite.prepare.CalcitePrepareImpl$CalcitePreparingStmt.implement(CalcitePrepareImpl.java:1248)
> at org.apache.calcite.prepare.Prepare.prepareSql(Prepare.java:306)
> at org.apache.calcite.prepare.Prepare.prepareSql(Prepare.java:203)
> at 
> org.apache.calcite.prepare.CalcitePrepareImpl.prepare2_(CalcitePrepareImpl.java:776)
> at 
> org.apache.calcite.prepare.CalcitePrepareImpl.prepare_(CalcitePrepareImpl.java:632)
> at 
> org.apache.calcite.prepare.CalcitePrepareImpl.prepareSql(CalcitePrepareImpl.java:602)
> at 
> org.apache.calcite.jdbc.CalciteConnectionImpl.parseQuery(CalciteConnectionImpl.java:214)
> at 
> org.apache.calcite.jdbc.CalciteMetaImpl.prepareAndExecute(CalciteMetaImpl.java:595)
> at 
> org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:615)
> at 
> org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:148)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KYLIN-3087) DistributedLock in GlobalDictionaryBuilder may not release

2017-12-14 Thread kangkaisen (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16292142#comment-16292142
 ] 

kangkaisen commented on KYLIN-3087:
---

This is the commit: 
https://github.com/apache/kylin/commit/60431f46494aaa1297d8da87bbf49bc78312fcb4.
 Thanks Fangyuan.

> DistributedLock in GlobalDictionaryBuilder may not release
> --
>
> Key: KYLIN-3087
> URL: https://issues.apache.org/jira/browse/KYLIN-3087
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine
>Affects Versions: v2.2.0
>Reporter: Fangyuan Deng
>Assignee: Fangyuan Deng
> Fix For: v2.3.0
>
> Attachments: KYLIN-3087.1.patch, KYLIN-3087.patch
>
>
> In GlobalDictionaryBuilder.init(),
> this.builder = new AppendTrieDictionaryBuilder(baseDir, maxEntriesPerSlice, 
> true);
> if this line has exception, the DistributedLock will not release, and other 
> jobs can not run.
> so,I added a try catch.
> try {
> this.builder = new AppendTrieDictionaryBuilder(baseDir, 
> maxEntriesPerSlice, true);
> } catch (Throwable e) {
> lock.unlock(getLockPath(sourceColumn));
> throw new RuntimeException(String.format("Failed to create global 
> dictionary on %s ", sourceColumn), e);
> }



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (KYLIN-3087) DistributedLock in GlobalDictionaryBuilder may not release

2017-12-14 Thread kangkaisen (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kangkaisen resolved KYLIN-3087.
---
Resolution: Fixed

> DistributedLock in GlobalDictionaryBuilder may not release
> --
>
> Key: KYLIN-3087
> URL: https://issues.apache.org/jira/browse/KYLIN-3087
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine
>Affects Versions: v2.2.0
>Reporter: Fangyuan Deng
>Assignee: Fangyuan Deng
> Fix For: v2.3.0
>
> Attachments: KYLIN-3087.1.patch, KYLIN-3087.patch
>
>
> In GlobalDictionaryBuilder.init(),
> this.builder = new AppendTrieDictionaryBuilder(baseDir, maxEntriesPerSlice, 
> true);
> if this line has exception, the DistributedLock will not release, and other 
> jobs can not run.
> so,I added a try catch.
> try {
> this.builder = new AppendTrieDictionaryBuilder(baseDir, 
> maxEntriesPerSlice, true);
> } catch (Throwable e) {
> lock.unlock(getLockPath(sourceColumn));
> throw new RuntimeException(String.format("Failed to create global 
> dictionary on %s ", sourceColumn), e);
> }



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (KYLIN-3113) Editing Measure supports fuzzy search in web

2017-12-15 Thread kangkaisen (JIRA)

kangkaisen created KYLIN-3113:
-

 Summary: Editing Measure supports fuzzy search in web
 Key: KYLIN-3113
 URL: https://issues.apache.org/jira/browse/KYLIN-3113
 Project: Kylin
  Issue Type: Improvement
  Components: Web 
Affects Versions: v2.2.0
Reporter: kangkaisen
Assignee: kangkaisen


After Kylin 2.0,  the column in web contains table name and column name, so the 
prefixal search is useless, which is a bad user experience. So we should 
support fuzzy search when editing measure.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (KYLIN-3113) Editing Measure supports fuzzy search in web

2017-12-15 Thread kangkaisen (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-3113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kangkaisen updated KYLIN-3113:
--
Attachment: KYLIN-3113.patch

This is the patch.

> Editing Measure supports fuzzy search in web
> 
>
> Key: KYLIN-3113
> URL: https://issues.apache.org/jira/browse/KYLIN-3113
> Project: Kylin
>  Issue Type: Improvement
>  Components: Web 
>Affects Versions: v2.2.0
>Reporter: kangkaisen
>Assignee: kangkaisen
> Attachments: KYLIN-3113.patch
>
>
> After Kylin 2.0,  the column in web contains table name and column name, so 
> the prefixal search is useless, which is a bad user experience. So we should 
> support fuzzy search when editing measure.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KYLIN-3091) A problem about retention rate analyze

2017-12-15 Thread kangkaisen (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16292431#comment-16292431
 ] 

kangkaisen commented on KYLIN-3091:
---

This commit has fixed this bug: 
https://github.com/apache/kylin/commit/6b4f70d257e1eb363a7b792cde8f6f59821094a6

I added a test case for this bug:  
https://github.com/apache/kylin/commit/f0e5e376d6466891873514f76c7b34c73c0ea28f

> A problem about retention rate analyze
> --
>
> Key: KYLIN-3091
> URL: https://issues.apache.org/jira/browse/KYLIN-3091
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.0.0
> Environment: hbase 0.98.8-hadoop2
>Reporter: WangSheng
>Assignee: kangkaisen
>
> I found that kylin supported retention rate analyze function, so I made some 
> test for this function. The following SQL executed successful:
> {code:java}
> select city, version,
> intersect_count(uuid, dt, array['20161014', '20161015']) as retention_oneday,
> intersect_count(uuid, dt, array['20161014', '20161015', '20161016']) as 
> retention_twoday
> from visit_log
> where dt in ('2016104', '20161015', '20161016')
> group by city, version
> {code}
> but, other SQLs executed failed like this:
> {code:java}
> select city,
> intersect_count(uuid, dt, array['20161014', '20161015']) as retention_oneday
> from visit_log 
> where dt in ('2016104', '20161015',) 
> group by city, version
> select city, version,
> intersect_count(uuid, dt, array['20161014', '20161015', '20161016']) as 
> retention_twoday
> from visit_log 
> where dt in ('2016104', '20161015', '20161016') 
> group by city, version
> {code}
> which means I cannot use just one intersect_count UDAF in a SQL, at lease two 
> intersect_count. My kylin version is kylin 2.0.0-hbase 0.98.8, and here is 
> the error log:
> {code:java}
> Caused by: java.lang.IndexOutOfBoundsException: Index: 2, Size: 2
> at java.util.ArrayList.rangeCheck(ArrayList.java:635)
> at java.util.ArrayList.get(ArrayList.java:411)
> at 
> org.apache.kylin.query.relnode.ColumnRowType.getColumnByIndex(ColumnRowType.java:49)
> at 
> org.apache.kylin.query.relnode.OLAPAggregateRel.fillbackOptimizedColumn(OLAPAggregateRel.java:396)
> at 
> org.apache.kylin.query.relnode.OLAPAggregateRel.buildRewriteFieldsAndMetricsColumns(OLAPAggregateRel.java:347)
> at 
> org.apache.kylin.query.relnode.OLAPAggregateRel.implementRewrite(OLAPAggregateRel.java:283)
> at 
> org.apache.kylin.query.relnode.OLAPRel$RewriteImplementor.visitChild(OLAPRel.java:158)
> at 
> org.apache.kylin.query.relnode.OLAPLimitRel.implementRewrite(OLAPLimitRel.java:107)
> at 
> org.apache.kylin.query.relnode.OLAPRel$RewriteImplementor.visitChild(OLAPRel.java:158)
> at 
> org.apache.kylin.query.relnode.OLAPToEnumerableConverter.implement(OLAPToEnumerableConverter.java:100)
> at 
> org.apache.calcite.adapter.enumerable.EnumerableRelImplementor.implementRoot(EnumerableRelImplementor.java:108)
> at 
> org.apache.calcite.adapter.enumerable.EnumerableInterpretable.toBindable(EnumerableInterpretable.java:92)
> at 
> org.apache.calcite.prepare.CalcitePrepareImpl$CalcitePreparingStmt.implement(CalcitePrepareImpl.java:1248)
> at org.apache.calcite.prepare.Prepare.prepareSql(Prepare.java:306)
> at org.apache.calcite.prepare.Prepare.prepareSql(Prepare.java:203)
> at 
> org.apache.calcite.prepare.CalcitePrepareImpl.prepare2_(CalcitePrepareImpl.java:776)
> at 
> org.apache.calcite.prepare.CalcitePrepareImpl.prepare_(CalcitePrepareImpl.java:632)
> at 
> org.apache.calcite.prepare.CalcitePrepareImpl.prepareSql(CalcitePrepareImpl.java:602)
> at 
> org.apache.calcite.jdbc.CalciteConnectionImpl.parseQuery(CalciteConnectionImpl.java:214)
> at 
> org.apache.calcite.jdbc.CalciteMetaImpl.prepareAndExecute(CalciteMetaImpl.java:595)
> at 
> org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:615)
> at 
> org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:148)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (KYLIN-3091) A problem about retention rate analyze

2017-12-15 Thread kangkaisen (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kangkaisen resolved KYLIN-3091.
---
   Resolution: Resolved
Fix Version/s: v2.2.0

> A problem about retention rate analyze
> --
>
> Key: KYLIN-3091
> URL: https://issues.apache.org/jira/browse/KYLIN-3091
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.0.0
> Environment: hbase 0.98.8-hadoop2
>Reporter: WangSheng
>Assignee: kangkaisen
> Fix For: v2.2.0
>
>
> I found that kylin supported retention rate analyze function, so I made some 
> test for this function. The following SQL executed successful:
> {code:java}
> select city, version,
> intersect_count(uuid, dt, array['20161014', '20161015']) as retention_oneday,
> intersect_count(uuid, dt, array['20161014', '20161015', '20161016']) as 
> retention_twoday
> from visit_log
> where dt in ('2016104', '20161015', '20161016')
> group by city, version
> {code}
> but, other SQLs executed failed like this:
> {code:java}
> select city,
> intersect_count(uuid, dt, array['20161014', '20161015']) as retention_oneday
> from visit_log 
> where dt in ('2016104', '20161015',) 
> group by city, version
> select city, version,
> intersect_count(uuid, dt, array['20161014', '20161015', '20161016']) as 
> retention_twoday
> from visit_log 
> where dt in ('2016104', '20161015', '20161016') 
> group by city, version
> {code}
> which means I cannot use just one intersect_count UDAF in a SQL, at lease two 
> intersect_count. My kylin version is kylin 2.0.0-hbase 0.98.8, and here is 
> the error log:
> {code:java}
> Caused by: java.lang.IndexOutOfBoundsException: Index: 2, Size: 2
> at java.util.ArrayList.rangeCheck(ArrayList.java:635)
> at java.util.ArrayList.get(ArrayList.java:411)
> at 
> org.apache.kylin.query.relnode.ColumnRowType.getColumnByIndex(ColumnRowType.java:49)
> at 
> org.apache.kylin.query.relnode.OLAPAggregateRel.fillbackOptimizedColumn(OLAPAggregateRel.java:396)
> at 
> org.apache.kylin.query.relnode.OLAPAggregateRel.buildRewriteFieldsAndMetricsColumns(OLAPAggregateRel.java:347)
> at 
> org.apache.kylin.query.relnode.OLAPAggregateRel.implementRewrite(OLAPAggregateRel.java:283)
> at 
> org.apache.kylin.query.relnode.OLAPRel$RewriteImplementor.visitChild(OLAPRel.java:158)
> at 
> org.apache.kylin.query.relnode.OLAPLimitRel.implementRewrite(OLAPLimitRel.java:107)
> at 
> org.apache.kylin.query.relnode.OLAPRel$RewriteImplementor.visitChild(OLAPRel.java:158)
> at 
> org.apache.kylin.query.relnode.OLAPToEnumerableConverter.implement(OLAPToEnumerableConverter.java:100)
> at 
> org.apache.calcite.adapter.enumerable.EnumerableRelImplementor.implementRoot(EnumerableRelImplementor.java:108)
> at 
> org.apache.calcite.adapter.enumerable.EnumerableInterpretable.toBindable(EnumerableInterpretable.java:92)
> at 
> org.apache.calcite.prepare.CalcitePrepareImpl$CalcitePreparingStmt.implement(CalcitePrepareImpl.java:1248)
> at org.apache.calcite.prepare.Prepare.prepareSql(Prepare.java:306)
> at org.apache.calcite.prepare.Prepare.prepareSql(Prepare.java:203)
> at 
> org.apache.calcite.prepare.CalcitePrepareImpl.prepare2_(CalcitePrepareImpl.java:776)
> at 
> org.apache.calcite.prepare.CalcitePrepareImpl.prepare_(CalcitePrepareImpl.java:632)
> at 
> org.apache.calcite.prepare.CalcitePrepareImpl.prepareSql(CalcitePrepareImpl.java:602)
> at 
> org.apache.calcite.jdbc.CalciteConnectionImpl.parseQuery(CalciteConnectionImpl.java:214)
> at 
> org.apache.calcite.jdbc.CalciteMetaImpl.prepareAndExecute(CalciteMetaImpl.java:595)
> at 
> org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:615)
> at 
> org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:148)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

< 1 2 3 4 5 6 7 >

501 - 600 of 633 matches

Mail list logo