from:"hongbin ma \(JIRA\)"

[jira] [Commented] (KYLIN-2312) Display Server Config/Environment by order in system tab

2018-07-16 Thread hongbin ma (JIRA)



[ 
https://issues.apache.org/jira/browse/KYLIN-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16544969#comment-16544969
 ] 

hongbin ma commented on KYLIN-2312:
---

after KYLIN-2659 this is no longer working, and we haven't received too many 
complains afterwards, so maybe the requirement here is not very strong

> Display Server Config/Environment by order in system tab 
> -
>
> Key: KYLIN-2312
> URL: https://issues.apache.org/jira/browse/KYLIN-2312
> Project: Kylin
>  Issue Type: Improvement
>  Components: Web 
>Affects Versions: v1.6.0
>Reporter: Billy Liu
>Assignee: Billy Liu
>Priority: Minor
> Fix For: v2.0.0
>
>
> The system tab page shows Server Config and Environment, it's useful for 
> debugging, but the item order is undetermined currently. The Config should 
> show the same order as the properties file. The Environment should show the 
> items order by name. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (KYLIN-3379) timestampadd test coverage is not enough

2018-05-12 Thread hongbin ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hongbin ma updated KYLIN-3379:
--
Description: 
complex cases like 

timestampadd(MONTH,23,test_kylin_fact.cal_dt) or 

timestampadd(MONTH,-23,test_kylin_fact.cal_dt) is not covered. 

 

And my tests shows this kind of queries will fail IT. 

> timestampadd test coverage is not enough
> 
>
> Key: KYLIN-3379
> URL: https://issues.apache.org/jira/browse/KYLIN-3379
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v2.3.1
>Reporter: hongbin ma
>Priority: Major
> Fix For: v2.4.0
>
>
> complex cases like 
> timestampadd(MONTH,23,test_kylin_fact.cal_dt) or 
> timestampadd(MONTH,-23,test_kylin_fact.cal_dt) is not covered. 
>  
> And my tests shows this kind of queries will fail IT. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (KYLIN-3379) timestampadd test coverage is not enough

2018-05-12 Thread hongbin ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hongbin ma reassigned KYLIN-3379:
-

Assignee: hongbin ma

> timestampadd test coverage is not enough
> 
>
> Key: KYLIN-3379
> URL: https://issues.apache.org/jira/browse/KYLIN-3379
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v2.3.1
>Reporter: hongbin ma
>Assignee: hongbin ma
>Priority: Major
> Fix For: v2.4.0
>
>
> complex cases like 
> timestampadd(MONTH,23,test_kylin_fact.cal_dt) or 
> timestampadd(MONTH,-23,test_kylin_fact.cal_dt) is not covered. 
>  
> And my tests shows this kind of queries will fail IT. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (KYLIN-3379) timestampadd test coverage is not enough

2018-05-12 Thread hongbin ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hongbin ma updated KYLIN-3379:
--
Fix Version/s: v2.4.0

> timestampadd test coverage is not enough
> 
>
> Key: KYLIN-3379
> URL: https://issues.apache.org/jira/browse/KYLIN-3379
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v2.3.1
>Reporter: hongbin ma
>Assignee: hongbin ma
>Priority: Major
> Fix For: v2.4.0
>
>
> complex cases like 
> timestampadd(MONTH,23,test_kylin_fact.cal_dt) or 
> timestampadd(MONTH,-23,test_kylin_fact.cal_dt) is not covered. 
>  
> And my tests shows this kind of queries will fail IT. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (KYLIN-3379) timestampadd test coverage is not enough

2018-05-12 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-3379:
-

 Summary: timestampadd test coverage is not enough
 Key: KYLIN-3379
 URL: https://issues.apache.org/jira/browse/KYLIN-3379
 Project: Kylin
  Issue Type: Bug
Affects Versions: v2.3.1
Reporter: hongbin ma






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (KYLIN-3149) Calcite's ReduceExpressionsRule.PROJECT_INSTANCE not working as expected

2018-01-03 Thread hongbin ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hongbin ma updated KYLIN-3149:
--
Attachment: dump.txt

> Calcite's ReduceExpressionsRule.PROJECT_INSTANCE not working as expected
> 
>
> Key: KYLIN-3149
> URL: https://issues.apache.org/jira/browse/KYLIN-3149
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v2.2.0
>Reporter: hongbin ma
> Attachments: dump.txt
>
>
> for queries like:
> {code:sql}
> select TRANS_ID from kylin_sales group by cast (case 
> WHEN  '1030101' = '1030101' then substring(COALESCE(OPS_USER_ID, 
> ''), 1, 1)
> when  '1030101' = '1030102' then substring(COALESCE(OPS_REGION, 
> ''), 1, 1)  
> when  '1030101' = '1030103' then substring(COALESCE(LSTG_FORMAT_NAME, 
> ''), 1, 1)
> when  '1030101' = '1030104' then substring(COALESCE(LSTG_FORMAT_NAME, 
> ''), 1, 1)
> end as varchar(256)), TRANS_ID;
> {code}
> the expected logical plan after volcano is:
> {code}
> EXECUTION PLAN BEFORE REWRITE
> OLAPToEnumerableConverter
>   OLAPProjectRel(TRANS_ID=[$1], ctx=[])
> OLAPLimitRel(ctx=[], fetch=[5])
>   OLAPAggregateRel(group=[{0, 1}], ctx=[])
> OLAPProjectRel($f0=[SUBSTRING(CASE(IS NOT NULL($9), $9, 
> ''), 1, 1)], TRANS_ID=[$0], ctx=[])
>   OLAPTableScan(table=[[DEFAULT, KYLIN_SALES]], ctx=[], fields=[[0, 
> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]])
> {code}
> however the actual is:
> {code}
> EXECUTION PLAN BEFORE REWRITE
> OLAPToEnumerableConverter
>   OLAPLimitRel(ctx=[], fetch=[5])
> OLAPProjectRel(TRANS_ID=[$1], ctx=[])
>   OLAPAggregateRel(group=[{0, 1}], ctx=[])
> OLAPProjectRel($f0=[CAST(CASE(=('1030101', '1030101'), 
> SUBSTRING(CASE(IS NOT NULL($9), $9, ''), 1, 1), =('1030101', 
> '1030102'), SUBSTRING(CASE(IS NOT NULL($10), $10, ''), 1, 1), 
> =('1030101', '1030103'), SUBSTRING(CASE(IS NOT NULL($2), $2, ''), 
> 1, 1), =('1030101', '1030104'), SUBSTRING(CASE(IS NOT NULL($2), $2, 
> ''), 1, 1), null)):VARCHAR(256) CHARACTER SET "UTF-16LE" COLLATE 
> "UTF-16LE$en_US$primary"], TRANS_ID=[$0], ctx=[])
>   OLAPTableScan(table=[[DEFAULT, KYLIN_SALES]], ctx=[], fields=[[0, 
> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]])
> {code}
> looks like Calcite's ReduceExpressionsRule.PROJECT_INSTANCE not working as 
> expected. If we dump the internal state of this VolcanoPlanner 
> (org.apache.calcite.plan.volcano.VolcanoPlanner#dump), line 19-21 from the 
> complete dump is attached:
> {code}
>   rel#337:Subset#1.OLAP.[], best=rel#339, importance=0.6561
>   
> rel#339:OLAPProjectRel.OLAP.[](input=rel#303:Subset#0.OLAP.[],$f0=CAST(CASE(=('1030101',
>  '1030101'), SUBSTRING(CASE(IS NOT NULL($9), $9, ''), 1, 1), 
> =('1030101', '1030102'), SUBSTRING(CASE(IS NOT NULL($10), $10, 
> ''), 1, 1), =('1030101', '1030103'), SUBSTRING(CASE(IS NOT 
> NULL($2), $2, ''), 1, 1), =('1030101', '1030104'), 
> SUBSTRING(CASE(IS NOT NULL($2), $2, ''), 1, 1), 
> null)):VARCHAR(256) CHARACTER SET "UTF-16LE" COLLATE 
> "UTF-16LE$en_US$primary",TRANS_ID=$0,ctx=), rowcount=100.0, cumulative 
> cost={15.0 rows, 25.05 cpu, 0.0 io}
>   
> rel#348:OLAPProjectRel.OLAP.[](input=rel#303:Subset#0.OLAP.[],$f0=SUBSTRING(CASE(IS
>  NOT NULL($9), $9, ''), 1, 1),TRANS_ID=$0,ctx=), rowcount=100.0, 
> cumulative cost={15.0 rows, 25.05 cpu, 0.0 io}
> {code}
> we see two rels with same cost:  #339 and #348, where #339 is created from 
> LogicalProject = (OLAPProjectRule)=> OLAPProject, and #348 is created from 
> LogicalProject =( ReduceExpressionsRule) => Reduced LogicalProject 
> =(OLAPProjectRule)=> Reduced OLAPProject . Since ReduceExpressionsRule 
> require Logical Project rather than OLAP Project, #339 is never reduced.
> The worse thing is that cost of #339 and #348 are same. By current volcano 
> planner algorithm  the first met rel will be chosen, so unexpected rel is 
> chosen
> A simple approach to fix this is to refine the rel choosing algorithm: when 
> two rels are equal in cost, choose a "simpler" one. Since we don't have a 
> perfect measurement of "simple", we simply choose the rel with smaller 
> toString() length



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (KYLIN-3149) Calcite's ReduceExpressionsRule.PROJECT_INSTANCE not working as expected

2018-01-03 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-3149:
-

 Summary: Calcite's ReduceExpressionsRule.PROJECT_INSTANCE not 
working as expected
 Key: KYLIN-3149
 URL: https://issues.apache.org/jira/browse/KYLIN-3149
 Project: Kylin
  Issue Type: Bug
Affects Versions: v2.2.0
Reporter: hongbin ma


for queries like:

{code:sql}
select TRANS_ID from kylin_sales group by cast (case 
WHEN  '1030101' = '1030101' then substring(COALESCE(OPS_USER_ID, 
''), 1, 1)
when  '1030101' = '1030102' then substring(COALESCE(OPS_REGION, 
''), 1, 1)  
when  '1030101' = '1030103' then substring(COALESCE(LSTG_FORMAT_NAME, 
''), 1, 1)
when  '1030101' = '1030104' then substring(COALESCE(LSTG_FORMAT_NAME, 
''), 1, 1)
end as varchar(256)), TRANS_ID;
{code}

the expected logical plan after volcano is:

{code}
EXECUTION PLAN BEFORE REWRITE
OLAPToEnumerableConverter
  OLAPProjectRel(TRANS_ID=[$1], ctx=[])
OLAPLimitRel(ctx=[], fetch=[5])
  OLAPAggregateRel(group=[{0, 1}], ctx=[])
OLAPProjectRel($f0=[SUBSTRING(CASE(IS NOT NULL($9), $9, 
''), 1, 1)], TRANS_ID=[$0], ctx=[])
  OLAPTableScan(table=[[DEFAULT, KYLIN_SALES]], ctx=[], fields=[[0, 1, 
2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]])
{code}

however the actual is:

{code}
EXECUTION PLAN BEFORE REWRITE
OLAPToEnumerableConverter
  OLAPLimitRel(ctx=[], fetch=[5])
OLAPProjectRel(TRANS_ID=[$1], ctx=[])
  OLAPAggregateRel(group=[{0, 1}], ctx=[])
OLAPProjectRel($f0=[CAST(CASE(=('1030101', '1030101'), 
SUBSTRING(CASE(IS NOT NULL($9), $9, ''), 1, 1), =('1030101', 
'1030102'), SUBSTRING(CASE(IS NOT NULL($10), $10, ''), 1, 1), 
=('1030101', '1030103'), SUBSTRING(CASE(IS NOT NULL($2), $2, ''), 
1, 1), =('1030101', '1030104'), SUBSTRING(CASE(IS NOT NULL($2), $2, 
''), 1, 1), null)):VARCHAR(256) CHARACTER SET "UTF-16LE" COLLATE 
"UTF-16LE$en_US$primary"], TRANS_ID=[$0], ctx=[])
  OLAPTableScan(table=[[DEFAULT, KYLIN_SALES]], ctx=[], fields=[[0, 1, 
2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]])
{code}

looks like Calcite's ReduceExpressionsRule.PROJECT_INSTANCE not working as 
expected. If we dump the internal state of this VolcanoPlanner 
(org.apache.calcite.plan.volcano.VolcanoPlanner#dump), line 19-21 from the 
complete dump is attached:

{code}
rel#337:Subset#1.OLAP.[], best=rel#339, importance=0.6561

rel#339:OLAPProjectRel.OLAP.[](input=rel#303:Subset#0.OLAP.[],$f0=CAST(CASE(=('1030101',
 '1030101'), SUBSTRING(CASE(IS NOT NULL($9), $9, ''), 1, 1), 
=('1030101', '1030102'), SUBSTRING(CASE(IS NOT NULL($10), $10, ''), 
1, 1), =('1030101', '1030103'), SUBSTRING(CASE(IS NOT NULL($2), $2, 
''), 1, 1), =('1030101', '1030104'), SUBSTRING(CASE(IS NOT 
NULL($2), $2, ''), 1, 1), null)):VARCHAR(256) CHARACTER SET 
"UTF-16LE" COLLATE "UTF-16LE$en_US$primary",TRANS_ID=$0,ctx=), rowcount=100.0, 
cumulative cost={15.0 rows, 25.05 cpu, 0.0 io}

rel#348:OLAPProjectRel.OLAP.[](input=rel#303:Subset#0.OLAP.[],$f0=SUBSTRING(CASE(IS
 NOT NULL($9), $9, ''), 1, 1),TRANS_ID=$0,ctx=), rowcount=100.0, 
cumulative cost={15.0 rows, 25.05 cpu, 0.0 io}
{code}

we see two rels with same cost:  #339 and #348, where #339 is created from 
LogicalProject = (OLAPProjectRule)=> OLAPProject, and #348 is created from 
LogicalProject =( ReduceExpressionsRule) => Reduced LogicalProject 
=(OLAPProjectRule)=> Reduced OLAPProject . Since ReduceExpressionsRule require 
Logical Project rather than OLAP Project, #339 is never reduced.

The worse thing is that cost of #339 and #348 are same. By current volcano 
planner algorithm  the first met rel will be chosen, so unexpected rel is chosen

A simple approach to fix this is to refine the rel choosing algorithm: when two 
rels are equal in cost, choose a "simpler" one. Since we don't have a perfect 
measurement of "simple", we simply choose the rel with smaller toString() length



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (KYLIN-3106) DefaultScheduler.shutdown should use ExecutorService.shutdownNow instead of ExecutorService.shutdown

2017-12-13 Thread hongbin ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-3106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hongbin ma updated KYLIN-3106:
--
Summary: DefaultScheduler.shutdown should use ExecutorService.shutdownNow 
instead of ExecutorService.shutdown  (was: DefaultScheduler#shutdown should use 
shutdownNow instead of shutdown)

> DefaultScheduler.shutdown should use ExecutorService.shutdownNow instead of 
> ExecutorService.shutdown
> 
>
> Key: KYLIN-3106
> URL: https://issues.apache.org/jira/browse/KYLIN-3106
> Project: Kylin
>  Issue Type: Bug
>Reporter: hongbin ma
> Fix For: v2.3.0
>
>
> java.util.concurrent.ExecutorService#shutdownNow will interrupt running 
> worker threads, while java.util.concurrent.ExecutorService#shutdown will not.
> if interrupt signal is sent, a worker thread can get aware of it and abort 
> itself in time. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (KYLIN-3106) DefaultScheduler#shutdown should use shutdownNow instead of shutdown

2017-12-13 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-3106:
-

 Summary: DefaultScheduler#shutdown should use shutdownNow instead 
of shutdown
 Key: KYLIN-3106
 URL: https://issues.apache.org/jira/browse/KYLIN-3106
 Project: Kylin
  Issue Type: Bug
Reporter: hongbin ma


java.util.concurrent.ExecutorService#shutdownNow will interrupt running worker 
threads, while java.util.concurrent.ExecutorService#shutdown will not.

if interrupt signal is sent, a worker thread can get aware of it and abort 
itself in time. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (KYLIN-3106) DefaultScheduler#shutdown should use shutdownNow instead of shutdown

2017-12-13 Thread hongbin ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-3106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hongbin ma updated KYLIN-3106:
--
Fix Version/s: v2.3.0

> DefaultScheduler#shutdown should use shutdownNow instead of shutdown
> 
>
> Key: KYLIN-3106
> URL: https://issues.apache.org/jira/browse/KYLIN-3106
> Project: Kylin
>  Issue Type: Bug
>Reporter: hongbin ma
> Fix For: v2.3.0
>
>
> java.util.concurrent.ExecutorService#shutdownNow will interrupt running 
> worker threads, while java.util.concurrent.ExecutorService#shutdown will not.
> if interrupt signal is sent, a worker thread can get aware of it and abort 
> itself in time. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (KYLIN-2982) Avoid upgrade column in OLAPTable

2017-11-02 Thread hongbin ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-2982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hongbin ma resolved KYLIN-2982.
---
Resolution: Fixed
  Assignee: hongbin ma

> Avoid upgrade column in OLAPTable
> -
>
> Key: KYLIN-2982
> URL: https://issues.apache.org/jira/browse/KYLIN-2982
> Project: Kylin
>  Issue Type: Improvement
>Reporter: hongbin ma
>Assignee: hongbin ma
>Priority: Normal
> Fix For: v2.3.0
>
>
> before CALCITE-845, to avoid sum(integer_typed_col) to overflow, we worked 
> around by upgrading all integer columns (which appearing in sum measure ) to 
> bigint type. The workaround will change the column's type without notifying 
> users, and will easily lead to code mess. 
> Now that CALCITE-845 is ready, we can use that to provide a cleaner impl



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (KYLIN-2985) Cache temp json file created by each Calcite Connection

2017-11-01 Thread hongbin ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-2985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hongbin ma updated KYLIN-2985:
--
Fix Version/s: v2.3.0

> Cache temp json file created by each Calcite Connection
> ---
>
> Key: KYLIN-2985
> URL: https://issues.apache.org/jira/browse/KYLIN-2985
> Project: Kylin
>  Issue Type: Improvement
>Reporter: hongbin ma
>Priority: Normal
> Fix For: v2.3.0
>
>
> In org.apache.kylin.query.schema.OLAPSchemaFactory, each caclite connection 
> will hold a temp file in JVM. The total number of temp files could accumulate 
> very large. A simple cache could address the problem



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (KYLIN-2985) Cache temp json file created by each Calcite Connection

2017-11-01 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2985:
-

 Summary: Cache temp json file created by each Calcite Connection
 Key: KYLIN-2985
 URL: https://issues.apache.org/jira/browse/KYLIN-2985
 Project: Kylin
  Issue Type: Improvement
Reporter: hongbin ma
Priority: Normal


In org.apache.kylin.query.schema.OLAPSchemaFactory, each caclite connection 
will hold a temp file in JVM. The total number of temp files could accumulate 
very large. A simple cache could address the problem



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (KYLIN-2982) Avoid upgrade column in OLAPTable

2017-11-01 Thread hongbin ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-2982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hongbin ma updated KYLIN-2982:
--
Fix Version/s: v2.3.0

> Avoid upgrade column in OLAPTable
> -
>
> Key: KYLIN-2982
> URL: https://issues.apache.org/jira/browse/KYLIN-2982
> Project: Kylin
>  Issue Type: Improvement
>Reporter: hongbin ma
>Priority: Normal
> Fix For: v2.3.0
>
>
> before CALCITE-845, to avoid sum(integer_typed_col) to overflow, we worked 
> around by upgrading all integer columns (which appearing in sum measure ) to 
> bigint type. The workaround will change the column's type without notifying 
> users, and will easily lead to code mess. 
> Now that CALCITE-845 is ready, we can use that to provide a cleaner impl



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (KYLIN-2982) Avoid upgrade column in OLAPTable

2017-11-01 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2982:
-

 Summary: Avoid upgrade column in OLAPTable
 Key: KYLIN-2982
 URL: https://issues.apache.org/jira/browse/KYLIN-2982
 Project: Kylin
  Issue Type: Improvement
Reporter: hongbin ma
Priority: Normal


before CALCITE-845, to avoid sum(integer_typed_col) to overflow, we worked 
around by upgrading all integer columns (which appearing in sum measure ) to 
bigint type. The workaround will change the column's type without notifying 
users, and will easily lead to code mess. 

Now that CALCITE-845 is ready, we can use that to provide a cleaner impl



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (KYLIN-2823) Trim TupleFilter after dictionary-based filter optimization

2017-08-30 Thread hongbin ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-2823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hongbin ma updated KYLIN-2823:
--
Fix Version/s: v2.2.0

> Trim TupleFilter after dictionary-based filter optimization
> ---
>
> Key: KYLIN-2823
> URL: https://issues.apache.org/jira/browse/KYLIN-2823
> Project: Kylin
>  Issue Type: Improvement
>Reporter: hongbin ma
> Fix For: v2.2.0
>
>
> with cube's dictionary, kylin will optimize filters like:
> ( a = 'value_in_dict' OR a = 'value_not_in_dict')   =>  (a = 
> 'value_in_dict' OR ConstantTupleFilter.FALSE)
> we need to further trim the filter to (a = 'value_in_dict') to avoid too many 
> children after flatten filter step



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (KYLIN-2823) Trim TupleFilter after dictionary-based filter optimization

2017-08-30 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2823:
-

 Summary: Trim TupleFilter after dictionary-based filter 
optimization
 Key: KYLIN-2823
 URL: https://issues.apache.org/jira/browse/KYLIN-2823
 Project: Kylin
  Issue Type: Improvement
Reporter: hongbin ma


with cube's dictionary, kylin will optimize filters like:

( a = 'value_in_dict' OR a = 'value_not_in_dict')   =>  (a = 
'value_in_dict' OR ConstantTupleFilter.FALSE)

we need to further trim the filter to (a = 'value_in_dict') to avoid too many 
children after flatten filter step




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (KYLIN-2801) Make default precision and scale in DataType (for hive) configurable

2017-08-22 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2801:
-

 Summary: Make default precision and scale in DataType (for hive) 
configurable
 Key: KYLIN-2801
 URL: https://issues.apache.org/jira/browse/KYLIN-2801
 Project: Kylin
  Issue Type: Improvement
Reporter: hongbin ma


currently these values are hard coded:

{code:java}
  // FIXME 256 for unknown string precision
if ((name.equals("char") || name.equals("varchar")) && precision == -1) 
{
precision = 256; // to save memory at frontend, e.g. tableau will
 // allocate memory according to this
if (name.equals("char")) {
precision -= 1; //at most 255 according to 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-CharcharChar
}
}

// FIXME (19,4) for unknown decimal precision
if ((name.equals("decimal") || name.equals("numeric")) && precision == 
-1) {
precision = 19;
scale = 4;
}

{code}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (KYLIN-2782) Replace DailyRollingFileAppender with RollingFileAppender to allow log retention

2017-08-09 Thread hongbin ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-2782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hongbin ma updated KYLIN-2782:
--
Fix Version/s: v2.2.0

> Replace DailyRollingFileAppender with RollingFileAppender to allow log 
> retention
> 
>
> Key: KYLIN-2782
> URL: https://issues.apache.org/jira/browse/KYLIN-2782
> Project: Kylin
>  Issue Type: Task
>Reporter: hongbin ma
> Fix For: v2.2.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (KYLIN-2782) Replace DailyRollingFileAppender with RollingFileAppender to allow log retention

2017-08-09 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2782:
-

 Summary: Replace DailyRollingFileAppender with RollingFileAppender 
to allow log retention
 Key: KYLIN-2782
 URL: https://issues.apache.org/jira/browse/KYLIN-2782
 Project: Kylin
  Issue Type: Task
Reporter: hongbin ma






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (KYLIN-1143) cache on partition column's hierarchy parents

2017-08-08 Thread hongbin ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hongbin ma resolved KYLIN-1143.
---
Resolution: Won't Fix

> cache on partition column's hierarchy parents
> -
>
> Key: KYLIN-1143
> URL: https://issues.apache.org/jira/browse/KYLIN-1143
> Project: Kylin
>  Issue Type: Sub-task
>Reporter: hongbin ma
>Assignee: hongbin ma
>
> currently dynamic cache enforces group by on partition column. in many cases  
>   partition column has a hierarchy, query on the hierarchy should be 
> optimized for dynamic cache, too.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (KYLIN-1146) reduce the number of objects put into ehcache

2017-08-08 Thread hongbin ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-1146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hongbin ma resolved KYLIN-1146.
---
Resolution: Won't Fix

> reduce the number of  objects put into ehcache
> --
>
> Key: KYLIN-1146
> URL: https://issues.apache.org/jira/browse/KYLIN-1146
> Project: Kylin
>  Issue Type: Sub-task
>Reporter: hongbin ma
>Assignee: hongbin ma
>
> echcache will give warning if the K/V has lots of objects, maybe we should 
> compact the K/V into just 2 objects before putting in



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Issue Comment Deleted] (KYLIN-1143) cache on partition column's hierarchy parents

2017-08-08 Thread hongbin ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hongbin ma updated KYLIN-1143:
--
Comment: was deleted

(was: will fix it in 2.1 release)

> cache on partition column's hierarchy parents
> -
>
> Key: KYLIN-1143
> URL: https://issues.apache.org/jira/browse/KYLIN-1143
> Project: Kylin
>  Issue Type: Sub-task
>Reporter: hongbin ma
>Assignee: hongbin ma
>
> currently dynamic cache enforces group by on partition column. in many cases  
>   partition column has a hierarchy, query on the hierarchy should be 
> optimized for dynamic cache, too.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (KYLIN-2703) kylin supports managing access rights for project and cube through apache ranger.

2017-08-08 Thread hongbin ma (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16118096#comment-16118096
 ] 

hongbin ma edited comment on KYLIN-2703 at 8/8/17 9:24 AM:
---

hi [~peng.jianhua]

I have some questions before merging the patch:

1. About org.apache.kylin.rest.controller.AccessController#getAccessEntities: 
Before your patch, this method is simple: return the access entry list of a 
requested domain object.  After your patch, Why is it necessary for the API 
caller to provide a "name" (Is it a must?) and "owner" (Why should API caller 
provide owner ) parameter? 
2. On kylin side, What configurations should users make to take effect? Is 
there a manual or doc?



was (Author: mahongbin):
hi [~peng.jianhua]

I have some questions before merging the patch:

1. About org.apache.kylin.rest.controller.AccessController#getAccessEntities: 
Before your patch, this method is simple: return the access entry list of a 
requested domain object.  After your patch, Why is it necessary for the API 
caller to provide a "name" (Is it a must?) and "owner" (Why should API caller 
provide owner ) parameter? 
2. What configurations should users make to use Ranger? Is there a manual or 
doc?


> kylin supports managing access rights for project and cube through apache 
> ranger.
> -
>
> Key: KYLIN-2703
> URL: https://issues.apache.org/jira/browse/KYLIN-2703
> Project: Kylin
>  Issue Type: New Feature
>  Components: General
>Reporter: peng.jianhua
>Assignee: peng.jianhua
>  Labels: newbie, patch
> Attachments: 
> 0001-KYLIN-2703-kylin-supports-managing-access-rights-for.patch, 
> KylinAuditLog.jpg, KylinPlugins.jpg, KylinPolicies.jpg, 
> KylinServiceEntry.jpg, NewKylinPolicy.jpg, NewKylinService.jpg, 
> Ranger-PMS-hope.png
>
>
> Ranger is a framework to enable, monitor and manage comprehensive data 
> security across the Hadoop platform. Apache Ranger has the following goals:
> 1. Centralized security administration to manage all security related tasks 
> in a central UI or using REST APIs.
> 2. Fine grained authorization to do a specific action and/or operation with 
> Hadoop component/tool and managed through a central administration tool
> 3. Standardize authorization method across all Hadoop components.
> 4. Enhanced support for different authorization methods - Role based access 
> control, attribute based access control etc.
> 5. Centralize auditing of user access and administrative actions (security 
> related) within all the components of Hadoop.
> Ranger has supported enable, monitor and manage following components:
> 1. HDFS
> 2. HIVE
> 3. HBASE
> 4. KNOX
> 5. YARN
> 6. STORM
> 7. SOLR
> 8. KAFKA
> 9. ATLAS
> In order to improve the flexibility of kylin privilege control and enhance 
> value of kylin in the Apache Hadoop ecosystem, like hdfs, yarn, hive, hbase, 
> Kylin should also support that using Ranger to control access rights for 
> project and cube. 
> Specific implementation plan is as following:
> On the ranger website, administrators can configure policies to control user 
> access to projects and cube permissions.
> Kylin provides an abstract class and authorization interfaces for use by the 
> ranger plugin. kylin instantiates ranger plugin’s implementation class when 
> starting(this class extends the abstract class provided by kylin).
> Ranger plugin periodically polls ranger admin, updates the policy to the 
> local, and updates project and cube access rights based on policy information.
> In the Kylin side：
> 1. Kylin provides an abstract class that enables the ranger plugin's 
> implementation class to extend.
> 2. Add configuration item.  1) ranger authorization switch, 2) ranger plugin 
> implementation class's name.
> 3. Instantiate the ranger plugin implementation class when starting kylin.
> 4. kylin provides authorization interfaces for ranger plugin calls.
> 5. According to the ranger authorization configuration item, hide kylin's 
> authorization management page.
> 6. Using ranger manager access rights of the kylin does not affect kylin's 
> existing permissions functions and logic.
> In the Ranger side：
> 1. Ranger plugin will periodically polls ranger admin, updates the policy to 
> the local.
> 2. The ranger plugin invoking the authorization interfaces provided by kylin 
> to updates the project and cube access rights based on the policy information.
> reference link:https://issues.apache.org/jira/browse/RANGER-1672



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KYLIN-2703) kylin supports managing access rights for project and cube through apache ranger.

2017-08-08 Thread hongbin ma (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16118096#comment-16118096
 ] 

hongbin ma commented on KYLIN-2703:
---

hi [~peng.jianhua]

I have some questions before merging the patch:

1. About org.apache.kylin.rest.controller.AccessController#getAccessEntities: 
Before your patch, this method is simple: return the access entry list of a 
requested domain object.  After your patch, Why is it necessary for the API 
caller to provide a "name" (Is it a must?) and "owner" (Why should API caller 
provide owner ) parameter? 
2. What configurations should users make to use Ranger? Is there a manual or 
doc?


> kylin supports managing access rights for project and cube through apache 
> ranger.
> -
>
> Key: KYLIN-2703
> URL: https://issues.apache.org/jira/browse/KYLIN-2703
> Project: Kylin
>  Issue Type: New Feature
>  Components: General
>Reporter: peng.jianhua
>Assignee: peng.jianhua
>  Labels: newbie, patch
> Attachments: 
> 0001-KYLIN-2703-kylin-supports-managing-access-rights-for.patch, 
> KylinAuditLog.jpg, KylinPlugins.jpg, KylinPolicies.jpg, 
> KylinServiceEntry.jpg, NewKylinPolicy.jpg, NewKylinService.jpg, 
> Ranger-PMS-hope.png
>
>
> Ranger is a framework to enable, monitor and manage comprehensive data 
> security across the Hadoop platform. Apache Ranger has the following goals:
> 1. Centralized security administration to manage all security related tasks 
> in a central UI or using REST APIs.
> 2. Fine grained authorization to do a specific action and/or operation with 
> Hadoop component/tool and managed through a central administration tool
> 3. Standardize authorization method across all Hadoop components.
> 4. Enhanced support for different authorization methods - Role based access 
> control, attribute based access control etc.
> 5. Centralize auditing of user access and administrative actions (security 
> related) within all the components of Hadoop.
> Ranger has supported enable, monitor and manage following components:
> 1. HDFS
> 2. HIVE
> 3. HBASE
> 4. KNOX
> 5. YARN
> 6. STORM
> 7. SOLR
> 8. KAFKA
> 9. ATLAS
> In order to improve the flexibility of kylin privilege control and enhance 
> value of kylin in the Apache Hadoop ecosystem, like hdfs, yarn, hive, hbase, 
> Kylin should also support that using Ranger to control access rights for 
> project and cube. 
> Specific implementation plan is as following:
> On the ranger website, administrators can configure policies to control user 
> access to projects and cube permissions.
> Kylin provides an abstract class and authorization interfaces for use by the 
> ranger plugin. kylin instantiates ranger plugin’s implementation class when 
> starting(this class extends the abstract class provided by kylin).
> Ranger plugin periodically polls ranger admin, updates the policy to the 
> local, and updates project and cube access rights based on policy information.
> In the Kylin side：
> 1. Kylin provides an abstract class that enables the ranger plugin's 
> implementation class to extend.
> 2. Add configuration item.  1) ranger authorization switch, 2) ranger plugin 
> implementation class's name.
> 3. Instantiate the ranger plugin implementation class when starting kylin.
> 4. kylin provides authorization interfaces for ranger plugin calls.
> 5. According to the ranger authorization configuration item, hide kylin's 
> authorization management page.
> 6. Using ranger manager access rights of the kylin does not affect kylin's 
> existing permissions functions and logic.
> In the Ranger side：
> 1. Ranger plugin will periodically polls ranger admin, updates the policy to 
> the local.
> 2. The ranger plugin invoking the authorization interfaces provided by kylin 
> to updates the project and cube access rights based on the policy information.
> reference link:https://issues.apache.org/jira/browse/RANGER-1672



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KYLIN-2706) Fix the bug for the comparator in SortedIteratorMergerWithLimit

2017-08-07 Thread hongbin ma (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16116291#comment-16116291
 ] 

hongbin ma commented on KYLIN-2706:
---

reviewed

> Fix the bug for the comparator in SortedIteratorMergerWithLimit
> ---
>
> Key: KYLIN-2706
> URL: https://issues.apache.org/jira/browse/KYLIN-2706
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
> Attachments: KYLIN-2706.patch
>
>
> For this SQL, which should disable Storage limit push. Because this SQL will 
> return more than one record from HBase tables, but the 
> SortedIteratorMergerWithLimit only return one record, which will get wrong 
> result.
> {code:java}
> SELECT sum(A) 
> FROM TABLE 
> WHERE date_id >= 20170624 and date_id <= 20170626 
> limit 1
> {code}
> We should disable Storage limit push down when singleValuesD doesn't 
> containsAll othersD



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KYLIN-2606) Only return counter for precise count_distinct if query is exactAggregate

2017-08-07 Thread hongbin ma (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16116289#comment-16116289
 ] 

hongbin ma commented on KYLIN-2606:
---

patch reviewed

> Only return counter for precise count_distinct if query is exactAggregate
> -
>
> Key: KYLIN-2606
> URL: https://issues.apache.org/jira/browse/KYLIN-2606
> Project: Kylin
>  Issue Type: Improvement
>  Components: Query Engine
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
>
> If the query is exactAggregation and has some memory hungry measures, we 
> could directly return final result to speed up the query , reduce the RPC 
> data size and memory usage in queryServer.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (KYLIN-2776) Using dropwizard as default metric framework

2017-08-03 Thread hongbin ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hongbin ma updated KYLIN-2776:
--
Description: 
With https://issues.apache.org/jira/browse/KYLIN-2721.We are plan to release a 
new metric framework. 
New metric is different hadoop metric  and based on dropwizard . which has the 
following advantage:
* Well-defined metric model for frequently-needed metrics (ie JVM metrics)
* Well-defined measurements for all metrics (ie max, mean, stddev, 
mean_rate, etc),
* Built-in pluggable reporting frameworks like JMX, Console, Log, JSON 

We refactored QueryMetric with new metrics, notice the exposed JMX MBeans have 
changed a little bit.

A new tool called perflog is also introduced. Perflog traces call duration time 
 and current active calls by recording them to metric system.

 Some snapshots of the new JMX MBeans can be seen in attachments

  was:
With https://issues.apache.org/jira/browse/KYLIN-2721.We are plan to release a 
new metric framework. 
New metric is different hadoop metric  and based on dropwizard . which has the 
following advantage:
* Well-defined metric model for frequently-needed metrics (ie JVM metrics)
* Well-defined measurements for all metrics (ie max, mean, stddev, 
mean_rate, etc),
* Built-in pluggable reporting frameworks like JMX, Console, Log, JSON 

We refactor QueryMetric with new metris. 
New metric  add perflog. Perflog  trace calls duration time  and current active 
calls by recording them to metric system.
 Attachment is  the difference between the two metric system .


> Using dropwizard as default metric framework
> 
>
> Key: KYLIN-2776
> URL: https://issues.apache.org/jira/browse/KYLIN-2776
> Project: Kylin
>  Issue Type: New Feature
>Affects Versions: v2.0.0
>Reporter: yiming.xu
>Assignee: yiming.xu
> Attachments: active_calls.png, calls.png, KYLIN-2776.patch, 
> metric_structure.png, query_count.png, query_duration.png, 
> query_result_rowcount.png, report.json
>
>
> With https://issues.apache.org/jira/browse/KYLIN-2721.We are plan to release 
> a new metric framework. 
> New metric is different hadoop metric  and based on dropwizard . which has 
> the following advantage:
> * Well-defined metric model for frequently-needed metrics (ie JVM metrics)
> * Well-defined measurements for all metrics (ie max, mean, stddev, 
> mean_rate, etc),
> * Built-in pluggable reporting frameworks like JMX, Console, Log, JSON 
> We refactored QueryMetric with new metrics, notice the exposed JMX MBeans 
> have changed a little bit.
> A new tool called perflog is also introduced. Perflog traces call duration 
> time  and current active calls by recording them to metric system.
>  Some snapshots of the new JMX MBeans can be seen in attachments



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (KYLIN-2776) Using dropwizard as default metric framework

2017-08-03 Thread hongbin ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hongbin ma updated KYLIN-2776:
--
Summary: Using dropwizard as default metric framework  (was: New metric 
framework with kylin)

> Using dropwizard as default metric framework
> 
>
> Key: KYLIN-2776
> URL: https://issues.apache.org/jira/browse/KYLIN-2776
> Project: Kylin
>  Issue Type: New Feature
>Affects Versions: v2.0.0
>Reporter: yiming.xu
>Assignee: yiming.xu
> Attachments: active_calls.png, calls.png, metric_structure.png, 
> query_count.png, query_duration.png, query_result_rowcount.png, report.json
>
>
> With https://issues.apache.org/jira/browse/KYLIN-2721.We are plan to release 
> a new metric framework. 
> New metric is different hadoop metric  and based on dropwizard . which has 
> the following advantage:
> * Well-defined metric model for frequently-needed metrics (ie JVM metrics)
> * Well-defined measurements for all metrics (ie max, mean, stddev, 
> mean_rate, etc),
> * Built-in pluggable reporting frameworks like JMX, Console, Log, JSON 
> We refactor QueryMetric with new metris. 
> New metric  add perflog. Perflog  trace calls duration time  and current 
> active calls by recording them to metric system.
>  Attachment is  the difference between the two metric system .



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (KYLIN-2776) New metric framework with kylin

2017-08-03 Thread hongbin ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hongbin ma updated KYLIN-2776:
--
Description: 
With https://issues.apache.org/jira/browse/KYLIN-2721.We are plan to release a 
new metric framework. 
New metric is different hadoop metric  and based on dropwizard . which has the 
following advantage:
* Well-defined metric model for frequently-needed metrics (ie JVM metrics)
* Well-defined measurements for all metrics (ie max, mean, stddev, 
mean_rate, etc),
* Built-in pluggable reporting frameworks like JMX, Console, Log, JSON 

We refactor QueryMetric with new metris. 
New metric  add perflog. Perflog  trace calls duration time  and current active 
calls by recording them to metric system.
 Attachment is  the difference between the two metric system .

  was:
With https://issues.apache.org/jira/browse/KYLIN-2721.We are plan to release a 
new metric framework. 
New metric is different hadoop metric  and based on dropwizard . which has the 
following advantage:
* Well-defined metric model for frequently-needed metrics (ie JVM metrics)
* Well-defined measurements for all metrics (ie max, mean, stddev, 
mean_rate, etc),
* Built-in pluggable reporting frameworks like JMX, Console, Log, JSON 

We refactor QueryMetric with new metris. 
New metric  add perflog. Perflog  trace calls duration time  and current active 
calls record to metric system.
 Attachment is  the difference between the two metric system .


> New metric framework with kylin
> ---
>
> Key: KYLIN-2776
> URL: https://issues.apache.org/jira/browse/KYLIN-2776
> Project: Kylin
>  Issue Type: New Feature
>Affects Versions: v2.0.0
>Reporter: yiming.xu
>Assignee: yiming.xu
> Attachments: active_calls.png, calls.png, metric_structure.png, 
> query_count.png, query_duration.png, query_result_rowcount.png, report.json
>
>
> With https://issues.apache.org/jira/browse/KYLIN-2721.We are plan to release 
> a new metric framework. 
> New metric is different hadoop metric  and based on dropwizard . which has 
> the following advantage:
> * Well-defined metric model for frequently-needed metrics (ie JVM metrics)
> * Well-defined measurements for all metrics (ie max, mean, stddev, 
> mean_rate, etc),
> * Built-in pluggable reporting frameworks like JMX, Console, Log, JSON 
> We refactor QueryMetric with new metris. 
> New metric  add perflog. Perflog  trace calls duration time  and current 
> active calls by recording them to metric system.
>  Attachment is  the difference between the two metric system .



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (KYLIN-2653) Spark cubing support HBase cluster with kerberos

2017-08-01 Thread hongbin ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-2653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hongbin ma updated KYLIN-2653:
--
Fix Version/s: (was: v2.1.0)
   v2.2.0

> Spark cubing support HBase cluster with kerberos
> 
>
> Key: KYLIN-2653
> URL: https://issues.apache.org/jira/browse/KYLIN-2653
> Project: Kylin
>  Issue Type: Bug
>  Components: Spark Engine
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
> Fix For: v2.2.0
>
>
> Currently, Spark cubing doesn't support HBase cluster with kerberos.
> Temporarily，we could support HBase cluster with kerberos on Yarn client mode, 
> because which is easy.
> In the long term，we should avoid access HBase in Spark cubing.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Reopened] (KYLIN-2720) Should not allow user to access to all tables' metadata of a project

2017-07-31 Thread hongbin ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-2720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hongbin ma reopened KYLIN-2720:
---

> Should not allow user to access to all tables' metadata of a project
> 
>
> Key: KYLIN-2720
> URL: https://issues.apache.org/jira/browse/KYLIN-2720
> Project: Kylin
>  Issue Type: Improvement
>Reporter: qiumingming
>Assignee: qiumingming
> Fix For: v2.1.0
>
> Attachments: KYLIN-2720.patch
>
>
> Currently, user can access to all tables and columns metadata of a specific 
> project as long as he can access to this project, which is not reasonable. 
> User should just allow to access to tables that he owned cubes dependent to. 
> However, user can see some other tables in the web UI in current version.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KYLIN-2720) Should not allow user to access to all tables' metadata of a project

2017-07-31 Thread hongbin ma (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16108291#comment-16108291
 ] 

hongbin ma commented on KYLIN-2720:
---

Hi [~qmm]

I'm afraid with https://issues.apache.org/jira/browse/KYLIN-2515 and 
https://issues.apache.org/jira/browse/KYLIN-2646 being added into kylin 2.1, we 
have to revert KYLIN-2720 as it conflicts with above issues. We're refining the 
authorization process recently. Discussions will be carried on in mail list or 
JIRA, please get informed 

> Should not allow user to access to all tables' metadata of a project
> 
>
> Key: KYLIN-2720
> URL: https://issues.apache.org/jira/browse/KYLIN-2720
> Project: Kylin
>  Issue Type: Improvement
>Reporter: qiumingming
>Assignee: qiumingming
> Fix For: v2.1.0
>
> Attachments: KYLIN-2720.patch
>
>
> Currently, user can access to all tables and columns metadata of a specific 
> project as long as he can access to this project, which is not reasonable. 
> User should just allow to access to tables that he owned cubes dependent to. 
> However, user can see some other tables in the web UI in current version.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (KYLIN-2646) Project level query authorization

2017-07-31 Thread hongbin ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-2646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hongbin ma resolved KYLIN-2646.
---
   Resolution: Fixed
Fix Version/s: v2.1.0

> Project level query authorization
> -
>
> Key: KYLIN-2646
> URL: https://issues.apache.org/jira/browse/KYLIN-2646
> Project: Kylin
>  Issue Type: Improvement
>Reporter: hongbin ma
>Assignee: hongbin ma
> Fix For: v2.1.0
>
>
> As we introduced ad-hoc queries in 
> https://issues.apache.org/jira/browse/KYLIN-2515, we'll need to adjust query 
> authorization as follows:
>  Query authorization is encouraged to be set as project level. If someone is 
> assigned READ permission on project, then he has access to query all tables 
> in the project, regardless thru adhoc or cubes
>  If a user has READ permission on cubes but no READ permission on project. He 
> can only issue queries only if the query can be satisfied by those cubes he 
> has READ permission.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KYLIN-2755) Kylin support hive and hbase authenticated with Kerberos

2017-07-27 Thread hongbin ma (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104310#comment-16104310
 ] 

hongbin ma commented on KYLIN-2755:
---

Hi [~wuyingjun]

the implementation has some problems:
1. Hive Kerberoes and HBase Kerbereros may race on "private static 
UserGroupInformation loginUser = null;" in UserGroupInformation.java
2. Cannot deal with isolate hive/hbase permissions at project level, thus finer 
permission control is impossible.

we're considering to introduce a way to allow "impersonation" (something like 
http://blog.bcmeng.com/post/kylin-hadoop-queue.html), which I think is a better 
solution to tackle your problem

> Kylin support hive and hbase authenticated with Kerberos
> 
>
> Key: KYLIN-2755
> URL: https://issues.apache.org/jira/browse/KYLIN-2755
> Project: Kylin
>  Issue Type: New Feature
>Affects Versions: v2.0.0
>Reporter: wuyingjun
>Assignee: wuyingjun
> Attachments: code modify.png, KYLIN-2755.patch
>
>
> I want to know how to integrate the kylin into hive datasource and hbase  
> storage with kerberos.
> I have used hive beeline and modifid the hbase configuration initialization 
> in the source code.
> Can the current kylin version support kerberos environment a a better way in 
> mapreduce cubing?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KYLIN-2721) Introduce a new metrics framework based on dropwizard metrics

2017-07-26 Thread hongbin ma (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102651#comment-16102651
 ] 

hongbin ma commented on KYLIN-2721:
---

[~yaho] please try to keep up with latest code. Consider kylin master branch or 
2.1.x branch

> Introduce a new metrics framework based on dropwizard metrics
> -
>
> Key: KYLIN-2721
> URL: https://issues.apache.org/jira/browse/KYLIN-2721
> Project: Kylin
>  Issue Type: New Feature
>Affects Versions: v2.0.0
>Reporter: Zhong Yanghong
>Assignee: Zhong Yanghong
> Attachments: Metrics Framework.png
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (KYLIN-2721) Introduce a new metrics framework based on dropwizard metrics

2017-07-26 Thread hongbin ma (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16101592#comment-16101592
 ] 

hongbin ma edited comment on KYLIN-2721 at 7/26/17 12:11 PM:
-

dropwizard, yammer, codehale  is talking about the same thing 
(http://ningg.top/yammer-metrics/) let's stop confusing the term.

[~yaho] I still have three questions.

1. You claimed hadoop metrics is less stable, do you have any published 
evidence?\
2. You claimed codehale is lightweight, but as I calculated hadoop metrics has 
only ~15000 lines of java code (including test), while latest codehale project 
(https://github.com/dropwizard/metrics.git) has ~23000 lines of java code. 
"Lightweight" does not favor codehale
3. Although your proposal serves different purpose with existing QueryMetrics, 
we still need to avoid two metrics frameworks. That said, If we decide to 
choose dropwizard, we need to migrate QueryMetrics to use dropwizard ASAP



was (Author: mahongbin):
dropwizard, yammer, codehale  is talking about the same thing 
(http://ningg.top/yammer-metrics/) let's stop confusing the term.

[~yaho] I still have two questions.

1. You claimed hadoop metrics is less stable, do you have any published 
evidence?
2. Although your proposal serves different purpose with existing QueryMetrics, 
we still need to avoid two metrics frameworks. That said, If we decide to 
choose dropwizard, we need to migrate QueryMetrics to use dropwizard ASAP


> Introduce a new metrics framework based on dropwizard metrics
> -
>
> Key: KYLIN-2721
> URL: https://issues.apache.org/jira/browse/KYLIN-2721
> Project: Kylin
>  Issue Type: New Feature
>Affects Versions: v2.0.0
>Reporter: Zhong Yanghong
>Assignee: Zhong Yanghong
> Attachments: Metrics Framework.png
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KYLIN-2721) Introduce a new metrics framework based on dropwizard metrics

2017-07-26 Thread hongbin ma (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16101592#comment-16101592
 ] 

hongbin ma commented on KYLIN-2721:
---

dropwizard, yammer, codehale  is talking about the same thing 
(http://ningg.top/yammer-metrics/) let's stop confusing the term.

[~yaho] I still have two questions.

1. You claimed hadoop metrics is less stable, do you have any published 
evidence?
2. Although your proposal serves different purpose with existing QueryMetrics, 
we still need to avoid two metrics frameworks. That said, If we decide to 
choose dropwizard, we need to migrate QueryMetrics to use dropwizard ASAP


> Introduce a new metrics framework based on dropwizard metrics
> -
>
> Key: KYLIN-2721
> URL: https://issues.apache.org/jira/browse/KYLIN-2721
> Project: Kylin
>  Issue Type: New Feature
>Affects Versions: v2.0.0
>Reporter: Zhong Yanghong
>Assignee: Zhong Yanghong
> Attachments: Metrics Framework.png
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KYLIN-2671) Speed up prepared query execution

2017-06-19 Thread hongbin ma (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16053570#comment-16053570
 ] 

hongbin ma commented on KYLIN-2671:
---

please check org.apache.kylin.jdbc.KylinConnection#mockPreparedSignature, 
currently the result of Kylin JDBC's 
java.sql.Connection#prepareStatement(java.lang.String) method is mocked, which 
means:  You get a mocked statement metadata  from prepareStatement before you 
call java.sql.PreparedStatement#executeQuery(), it's okay if you use JDBC in 
following way:

{code:java}
 PreparedStatement statement = conn.prepareStatement("select LSTG_FORMAT_NAME, 
sum(price) as GMV, count(1) as TRANS_CNT from test_kylin_fact " + "where 
LSTG_FORMAT_NAME = ? group by LSTG_FORMAT_NAME");
statement.setString(1, "FP-GTC");
ResultSet rs = statement.executeQuery();
{code}

however it's not okay to:

{code:java}
 PreparedStatement statement = conn.prepareStatement("select LSTG_FORMAT_NAME, 
sum(price) as GMV, count(1) as TRANS_CNT from test_kylin_fact " + "where 
LSTG_FORMAT_NAME = ? group by LSTG_FORMAT_NAME");
ResultSetMetaData metaData = statement.getMetaData();
// do something with metaData here will be problematical because 
metaData is merely a mock 
statement.setString(1, "FP-GTC");
ResultSet rs = statement.executeQuery();  
{code}

> Speed up prepared query execution
> -
>
> Key: KYLIN-2671
> URL: https://issues.apache.org/jira/browse/KYLIN-2671
> Project: Kylin
>  Issue Type: Improvement
>Reporter: hongbin ma
>
> BI tools use prepared query for function probing, kylin should not execute 
> such queries in standard way because it is too costly.
> It's still worth mentioning standard "prepare-bindparameter-execute" way of 
> PreparedStatement is still not supported. By now kylin only support Prepared 
> Statements WITHOUT parameters.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KYLIN-2670) CASE WHEN supporting problem in kylin2.0

2017-06-17 Thread hongbin ma (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16052736#comment-16052736
 ] 

hongbin ma commented on KYLIN-2670:
---

can you try to reproduce this issue with sample cube 
http://kylin.apache.org/docs20/tutorial/kylin_sample.html? 

> CASE WHEN supporting problem in kylin2.0
> 
>
> Key: KYLIN-2670
> URL: https://issues.apache.org/jira/browse/KYLIN-2670
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.0.0
>Reporter: zhou degao
>Assignee: liyang
>
> Following query failed in kylin 2.0 but succeeded in kylin 1.6
> select "fact_pv_data_alias"."PRODUCT_NAME" as "c0", 
> "fact_pv_data_alias"."PLATFORM" as "c1" from "CSDNBI"."FACT_PV_DATA" as 
> "fact_pv_data_alias" group by "fact_pv_data_alias"."PRODUCT_NAME", 
> "fact_pv_data_alias"."PLATFORM" order by CASE WHEN 
> "fact_pv_data_alias"."PRODUCT_NAME" IS NULL THEN 1 ELSE 0 END, 
> "fact_pv_data_alias"."PRODUCT_NAME" ASC, CASE WHEN 
> "fact_pv_data_alias"."PLATFORM" IS NULL THEN 1 ELSE 0 END, 
> "fact_pv_data_alias"."PLATFORM" ASC
> Reported error in kylin 2.0:
> Error while executing SQL "select "fact_pv_data_alias"."PRODUCT_NAME" as 
> "c0", "fact_pv_data_alias"."PLATFORM" as "c1" from "CSDNBI"."FACT_PV_DATA" as 
> "fact_pv_data_alias" group by "fact_pv_data_alias"."PRODUCT_NAME", 
> "fact_pv_data_alias"."PLATFORM" order by CASE WHEN 
> "fact_pv_data_alias"."PRODUCT_NAME" IS NULL THEN 1 ELSE 0 END, 
> "fact_pv_data_alias"."PRODUCT_NAME" ASC, CASE WHEN 
> "fact_pv_data_alias"."PLATFORM" IS NULL THEN 1 ELSE 0 END, 
> "fact_pv_data_alias"."PLATFORM" ASC LIMIT 5": index (2) must be less than 
> size (2) 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KYLIN-2673) Should allow user to change fact table as long as the cube is disable

2017-06-17 Thread hongbin ma (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16052719#comment-16052719
 ] 

hongbin ma commented on KYLIN-2673:
---

hi kaisen

Will disabled cubes with segments be an issue? 

> Should allow user to change fact table as long as the cube is disable
> -
>
> Key: KYLIN-2673
> URL: https://issues.apache.org/jira/browse/KYLIN-2673
> Project: Kylin
>  Issue Type: Bug
>  Components: Web 
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
> Fix For: v2.1.0
>
>
> Currently, user couldn't change fact table  even though the cube is disable, 
> which isn't reasonable. We should allow user to change fact table as long as 
> the cube is disable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (KYLIN-2671) Speed up prepared query execution

2017-06-15 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2671:
-

 Summary: Speed up prepared query execution
 Key: KYLIN-2671
 URL: https://issues.apache.org/jira/browse/KYLIN-2671
 Project: Kylin
  Issue Type: Improvement
Reporter: hongbin ma


BI tools use prepared query for function probing, kylin should not execute such 
queries in standard way because it is too costly.

It's still worth mentioning standard "prepare-bindparameter-execute" way of 
PreparedStatement is still not supported. By now kylin only support Prepared 
Statements WITHOUT parameters.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (KYLIN-2667) Ignore whitespace when caching query

2017-06-11 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2667:
-

 Summary: Ignore whitespace when caching query
 Key: KYLIN-2667
 URL: https://issues.apache.org/jira/browse/KYLIN-2667
 Project: Kylin
  Issue Type: Improvement
Reporter: hongbin ma
Assignee: hongbin ma






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (KYLIN-2659) Refactor KylinConfig so that all the default configurations are hidden in kylin-defaults.properties

2017-06-06 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2659:
-

 Summary: Refactor KylinConfig so that all the default 
configurations are hidden in kylin-defaults.properties
 Key: KYLIN-2659
 URL: https://issues.apache.org/jira/browse/KYLIN-2659
 Project: Kylin
  Issue Type: Improvement
Reporter: hongbin ma
Assignee: hongbin ma


Currently we ship a conf/kylin.properties file with a lot of configuration 
overrides. This is not a standard approach compared with other projects like 
hadoop or spark.

It's better to have a kylin-defaults.properties file to hide all the default 
configurations, users will only have to override necessary configurations in a 
blank kylin.properties.

After the refactor, a config might be override by the following precedence:

1. KV in kylin.properties.override, which is more of a "secret feature", never 
documented.
2. KV in kylin.properties, users are suggested to override configs here
3. KV in kylin-defaults.properties, readonly to users
4. KV in KylinConfigBase, readonly to users

The refactor will be backward compatible



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (KYLIN-2646) Project level query authorization

2017-05-25 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2646:
-

 Summary: Project level query authorization
 Key: KYLIN-2646
 URL: https://issues.apache.org/jira/browse/KYLIN-2646
 Project: Kylin
  Issue Type: Improvement
Reporter: hongbin ma
Assignee: hongbin ma


As we introduced ad-hoc queries in 
https://issues.apache.org/jira/browse/KYLIN-2515, we'll need to adjust query 
authorization as follows:

 Query authorization is encouraged to be set as project level. If someone is 
assigned READ permission on project, then he has access to query all tables in 
the project, regardless thru adhoc or cubes

 If a user has READ permission on cubes but no READ permission on project. He 
can only issue queries only if the query can be satisfied by those cubes he has 
READ permission.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (KYLIN-2636) optimize case when in group by

2017-05-22 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2636:
-

 Summary: optimize case when in group by 
 Key: KYLIN-2636
 URL: https://issues.apache.org/jira/browse/KYLIN-2636
 Project: Kylin
  Issue Type: Improvement
Reporter: hongbin ma
Assignee: hongbin ma


Similar to KYLIN-2635, for clauses like:

{code}
group by case when 1 = 1 then x 1 = 2 then y else z 
{code}

kylin only need to pick up x as grouping by column.

Again, like KYLIN-2635, we'll fix it in KYLIN rather than calcite first



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (KYLIN-2635) optimize determined case when filters

2017-05-22 Thread hongbin ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hongbin ma updated KYLIN-2635:
--
Issue Type: Improvement  (was: Bug)

> optimize determined case when filters 
> --
>
> Key: KYLIN-2635
> URL: https://issues.apache.org/jira/browse/KYLIN-2635
> Project: Kylin
>  Issue Type: Improvement
>Reporter: hongbin ma
>Assignee: hongbin ma
>
> currently calcite will not handle with determined filter like:
> 1. where 1 = 1 => where true
> 2. where ( 1= 1 or x = 2)  => where true
> 3. where case when 'a' = 'a' then x > 1 else x < 1 => where x > 1
> the first two cases have been handled in KYLIN-2539, however the third case 
> is not handled yet. This JIRA is to track the third case.
> In theory, this JIRA together with KYLIN-2539, KYLIN-2597 should be solved in 
> calcite rather than KYLIN. However it's urgent demand so we'll fix in KYLIN 
> first.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (KYLIN-2635) optimize determined case when filters

2017-05-22 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2635:
-

 Summary: optimize determined case when filters 
 Key: KYLIN-2635
 URL: https://issues.apache.org/jira/browse/KYLIN-2635
 Project: Kylin
  Issue Type: Bug
Reporter: hongbin ma
Assignee: hongbin ma


currently calcite will not handle with determined filter like:

1. where 1 = 1 => where true
2. where ( 1= 1 or x = 2)  => where true
3. where case when 'a' = 'a' then x > 1 else x < 1 => where x > 1

the first two cases have been handled in KYLIN-2539, however the third case is 
not handled yet. This JIRA is to track the third case.

In theory, this JIRA together with KYLIN-2539, KYLIN-2597 should be solved in 
calcite rather than KYLIN. However it's urgent demand so we'll fix in KYLIN 
first.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (KYLIN-2631) Seek to next model when no cube in current model satisfies query

2017-05-19 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2631:
-

 Summary: Seek to next model when no cube in current model 
satisfies query
 Key: KYLIN-2631
 URL: https://issues.apache.org/jira/browse/KYLIN-2631
 Project: Kylin
  Issue Type: Bug
Reporter: hongbin ma
Assignee: hongbin ma


ModelChooser is introduced in 2.0 to match JoinTree in query with JoinTree in 
model. 

Currently, we first use ModelChooser to decide the model, then choose cube from 
the selected model. The cubes in other models are never considered. Chances are 
there when selected model cannot provide capable cube while non-selected model 
can. So it's still necessary go through all models



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (KYLIN-2625) not null filter clause should be evaluable in storage

2017-05-16 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2625:
-

 Summary: not null filter clause should be evaluable in storage
 Key: KYLIN-2625
 URL: https://issues.apache.org/jira/browse/KYLIN-2625
 Project: Kylin
  Issue Type: Bug
Reporter: hongbin ma
Assignee: hongbin ma


currently, limit push down is not enabled for queries like 

{code:sql}
select * from (
select * from test_kylin_fact
  where lstg_format_name is not null
  ) limit 20
 
{code}

because "not null" is treated as un-evaluateable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (KYLIN-2599) select * in subquery fail due to bug in hackSelectStar

2017-05-09 Thread hongbin ma (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16004116#comment-16004116
 ] 

hongbin ma edited comment on KYLIN-2599 at 5/10/17 5:49 AM:


similar exception is thrown when I test query like:

{code:sql}
 select lstg_format_name from test_kylin_fact order by case when 1=1  then
  
 cal_dt
  
  ELSE
  
 seller_id
  
 end 
{code}

To fix the issue more completely, we'll:

1. check if rootProj has any field starting with "_KY_", if none is found, then 
current logical plan does not require hacking, 
org.apache.calcite.sql2rel.SqlToRelConverter#hackSelectStar should abort by 
returning root.
2. the field list of rootProj may be longer than that of root (in case of 
kylin-it/src/test/resources/query/sql_verifyCount/query10.sql), so when 
constructing validatedRowType we'll skip the longer tail on rootProj
3.  sort rel's RelCollation (if any) may become stale after removing the "_KY_" 
fields, need to fix its fieldIndex 


was (Author: mahongbin):
similar exception is thrown when I test query like:

{code:sql}
 select lstg_format_name from test_kylin_fact order by case
  
   when 1=1  then
  
 cal_dt
  
  ELSE
  
 seller_id
  
 end 
{code}

To fix the issue more completely, we'll:

1. check if rootProj has any field starting with "_KY_", if none is found, then 
current logical plan does not require hacking, 
org.apache.calcite.sql2rel.SqlToRelConverter#hackSelectStar should abort by 
returning root.
2. the field list of rootProj may be longer than that of root (in case of 
kylin-it/src/test/resources/query/sql_verifyCount/query10.sql), so when 
constructing validatedRowType we'll skip the longer tail on rootProj
3.  sort rel's RelCollation (if any) may become stale after removing the "_KY_" 
fields, need to fix its fieldIndex 

> select * in subquery fail due to bug in hackSelectStar 
> ---
>
> Key: KYLIN-2599
> URL: https://issues.apache.org/jira/browse/KYLIN-2599
> Project: Kylin
>  Issue Type: Improvement
>Reporter: hongbin ma
>
> {code:sql}
> select fact.lstg_format_name from 
>  
>  (select * from test_kylin_fact where cal_dt > date'2010-01-01' ) as fact
>  
>  group by fact.lstg_format_name 
>  
>  order by CASE WHEN fact.lstg_format_name IS NULL THEN 'sdf' ELSE 
> fact.lstg_format_name END 
>  
> {code}
> will generate logical plan like:
> {code}
> LogicalSort(sort0=[$1], dir0=[ASC])
>   LogicalProject(LSTG_FORMAT_NAME=[$0], EXPR$1=[CASE(IS NULL($0), 'sdf', $0)])
> LogicalAggregate(group=[{0}])
>   LogicalProject(LSTG_FORMAT_NAME=[$3])
> LogicalProject(TRANS_ID=[$0], ORDER_ID=[$1], CAL_DT=[$2], 
> LSTG_FORMAT_NAME=[$3], LEAF_CATEG_ID=[$4], LSTG_SITE_ID=[$5], 
> SLR_SEGMENT_CD=[$6], SELLER_ID=[$7], PRICE=[$8], ITEM_COUNT=[$9], 
> TEST_COUNT_DISTINCT_BITMAP=[$10], DEAL_AMOUNT=[$11], DEAL_YEAR=[$12], 
> _KY_COUNT__=[$13], _KY_MIN_TEST_KYLIN_FACT_PRICE_=[$14], 
> _KY_MAX_TEST_KYLIN_FACT_PRICE_=[$15], 
> _KY_COUNT_DISTINCT_TEST_KYLIN_FACT_SELLER_ID_=[$16], 
> _KY_COUNT_DISTINCT_TEST_KYLIN_FACT_LSTG_FORMAT_NAME_TEST_KYLIN_FACT_SELLER_ID_=[$17],
>  _KY_COUNT_DISTINCT_TEST_KYLIN_FACT_TEST_COUNT_DISTINCT_BITMAP_=[$18], 
> _KY_PERCENTILE_TEST_KYLIN_FACT_PRICE_=[$19])
>   LogicalFilter(condition=[>($2, 2010-01-01)])
> OLAPTableScan(table=[[DEFAULT, TEST_KYLIN_FACT]], fields=[[0, 1, 
> 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]])
> {code}
> org.apache.calcite.sql2rel.SqlToRelConverter#hackSelectStar will by mistake 
> treat it like a normal case and lead to throwing exception



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (KYLIN-2599) select * in subquery fail due to bug in hackSelectStar

2017-05-09 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2599:
-

 Summary: select * in subquery fail due to bug in hackSelectStar 
 Key: KYLIN-2599
 URL: https://issues.apache.org/jira/browse/KYLIN-2599
 Project: Kylin
  Issue Type: Improvement
Reporter: hongbin ma


{code:sql}

select fact.lstg_format_name from 
 
 (select * from test_kylin_fact where cal_dt > date'2010-01-01' ) as fact
 
 group by fact.lstg_format_name 
 
 order by CASE WHEN fact.lstg_format_name IS NULL THEN 'sdf' ELSE 
fact.lstg_format_name END 
 
{code}

will generate logical plan like:

{code}
LogicalSort(sort0=[$1], dir0=[ASC])
  LogicalProject(LSTG_FORMAT_NAME=[$0], EXPR$1=[CASE(IS NULL($0), 'sdf', $0)])
LogicalAggregate(group=[{0}])
  LogicalProject(LSTG_FORMAT_NAME=[$3])
LogicalProject(TRANS_ID=[$0], ORDER_ID=[$1], CAL_DT=[$2], 
LSTG_FORMAT_NAME=[$3], LEAF_CATEG_ID=[$4], LSTG_SITE_ID=[$5], 
SLR_SEGMENT_CD=[$6], SELLER_ID=[$7], PRICE=[$8], ITEM_COUNT=[$9], 
TEST_COUNT_DISTINCT_BITMAP=[$10], DEAL_AMOUNT=[$11], DEAL_YEAR=[$12], 
_KY_COUNT__=[$13], _KY_MIN_TEST_KYLIN_FACT_PRICE_=[$14], 
_KY_MAX_TEST_KYLIN_FACT_PRICE_=[$15], 
_KY_COUNT_DISTINCT_TEST_KYLIN_FACT_SELLER_ID_=[$16], 
_KY_COUNT_DISTINCT_TEST_KYLIN_FACT_LSTG_FORMAT_NAME_TEST_KYLIN_FACT_SELLER_ID_=[$17],
 _KY_COUNT_DISTINCT_TEST_KYLIN_FACT_TEST_COUNT_DISTINCT_BITMAP_=[$18], 
_KY_PERCENTILE_TEST_KYLIN_FACT_PRICE_=[$19])
  LogicalFilter(condition=[>($2, 2010-01-01)])
OLAPTableScan(table=[[DEFAULT, TEST_KYLIN_FACT]], fields=[[0, 1, 2, 
3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]])

{code}

org.apache.calcite.sql2rel.SqlToRelConverter#hackSelectStar will by mistake 
treat it like a normal case and lead to throwing exception





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (KYLIN-2598) Should not translate filter to a in-clause filter with too many elements

2017-05-09 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2598:
-

 Summary: Should not translate filter to a in-clause filter with 
too many elements
 Key: KYLIN-2598
 URL: https://issues.apache.org/jira/browse/KYLIN-2598
 Project: Kylin
  Issue Type: Improvement
Reporter: hongbin ma
Assignee: hongbin ma


In 
org.apache.kylin.dict.BuiltInFunctionTransformer#translateFunctionTupleFilter 
we will translate builtin-functions like upper,lower,like to in-clause filters.
 (KYLIN-993)

The approach is In-clause filter will soon become in-efficient when too many 
elements accumulate in the in-clause. Suggest to set a threshold so that when 
there're more elements than this threshold, the translation will abort



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (KYLIN-2597) Deal with trivial expression in filters like x = 1 + 2

2017-05-09 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2597:
-

 Summary: Deal with trivial expression in filters like x = 1 + 2
 Key: KYLIN-2597
 URL: https://issues.apache.org/jira/browse/KYLIN-2597
 Project: Kylin
  Issue Type: Improvement
Reporter: hongbin ma
Assignee: hongbin ma


BI tools will generate trivial expression in filters, e.g "x = 1 + 2". Such 
expressions will cause kylin to conceive it as "non-evaluateble", which in turn 
blocks other things like limit push down, or having to choose cuboid with more 
dimensions, etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (KYLIN-2589) Errors in WebUI Authentication

2017-05-06 Thread hongbin ma (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999441#comment-15999441
 ] 

hongbin ma commented on KYLIN-2589:
---

hi [~wooya] by " cleaned up all the info in hbase " do you mean cleaning the 
xxx_acl, xxx_user htables as well? Can you also describe your environment ( 
hadoop version, hdp or cloudera)?

> Errors in WebUI Authentication
> --
>
> Key: KYLIN-2589
> URL: https://issues.apache.org/jira/browse/KYLIN-2589
> Project: Kylin
>  Issue Type: Bug
>  Components: General
>Affects Versions: v2.0.0
> Environment: EMR
>Reporter: Young Wu
> Attachments: 2921494001551_.pic_hd.jpg, Screenshot 2017-05-06 
> 12.29.34.png
>
>
> There seems bugs exist in the webserver's authentication part in kylin. After 
> kylin run several hours, user will failed login with username/password. The 
> error reported in the log is "Encoded password cannot be null or empty". 
> Detailed attached behind. The only solution is restart kylin timely. Restart 
> can suppress this issue several hours and then suddenly error comes back 
> again. ISSUE detail is also here: 
> http://apache-kylin.74782.x6.nabble.com/Re-Encoded-password-cannot-be-null-or-empty-when-login-into-kylin-s-web-UI-td7879.html#a7887
> It is not due to upgrade from 2.0.0-BETA to 2.0.0 since I've already cleaned 
> up all the info in hbase and spun up a brand new kylin-2.0.0, but the issue 
> is still there.
> Another bug occurs seldom, but it looks like also relates to authentication. 
> It happens when kylin is having a heavy load of query requests. Details also 
> attached.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (KYLIN-2586) use random port for CacheServiceTest as fixed port 7777 might have been occupied

2017-05-04 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2586:
-

 Summary: use random port for CacheServiceTest as fixed port  
might have been occupied
 Key: KYLIN-2586
 URL: https://issues.apache.org/jira/browse/KYLIN-2586
 Project: Kylin
  Issue Type: Improvement
Reporter: hongbin ma
Assignee: hongbin ma


https://builds.apache.org/job/Kylin-Master-JDK-1.7/442/

2017-05-04 02:24:45,913 WARN  [main AbstractLifeCycle:212]: FAILED 
ServerConnector@29065a9f{HTTP/1.1}{0.0.0.0:}: java.net.BindException: 
Address already in use
java.net.BindException: Address already in use



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Resolved] (KYLIN-2580) Improvement on subqueries: allow grouping by columns from subquery

2017-05-01 Thread hongbin ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-2580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hongbin ma resolved KYLIN-2580.
---
   Resolution: Fixed
Fix Version/s: v2.1.0

> Improvement on subqueries: allow grouping by columns from subquery
> --
>
> Key: KYLIN-2580
> URL: https://issues.apache.org/jira/browse/KYLIN-2580
> Project: Kylin
>  Issue Type: Improvement
>Reporter: hongbin ma
>Assignee: hongbin ma
> Fix For: v2.1.0
>
>
> {code:sql}
> select test_kylin_fact.lstg_format_name, xxx.week_beg_dt , 
> sum(test_kylin_fact.price) as GMV 
>  , count(*) as TRANS_CNT 
>  from  
>  test_kylin_fact
>  inner JOIN test_category_groupings
>  ON test_kylin_fact.leaf_categ_id = test_category_groupings.leaf_categ_id AND 
> test_kylin_fact.lstg_site_id = test_category_groupings.site_id 
>  inner JOIN (select cal_dt,week_beg_dt from edw.test_cal_dt  where 
> week_beg_dt >= DATE '2010-02-10'  ) xxx
>  ON test_kylin_fact.cal_dt = xxx.cal_dt 
>  where test_category_groupings.meta_categ_name  <> 'Baby'
>  group by test_kylin_fact.lstg_format_name, xxx.week_beg_dt 
> {code}
> will fail due to groupby  xxx.week_beg_dt,  because week_beg_dt does not 
> necessarily appear in the cube



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (KYLIN-2427) Auto adjust join order to make query executable

2017-05-01 Thread hongbin ma (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15990768#comment-15990768
 ] 

hongbin ma commented on KYLIN-2427:
---

this issue should have been fixed by KYLIN-2579, but I have not verified it yet.

> Auto adjust join order to make query executable
> ---
>
> Key: KYLIN-2427
> URL: https://issues.apache.org/jira/browse/KYLIN-2427
> Project: Kylin
>  Issue Type: Bug
>Reporter:  Kaige Liu
>Assignee:  Kaige Liu
>
> KYLIN-2406 reports an issue: The order of joins will affect the result of 
> query. For example, below query leads to "No model found"
> Below query triggers NPE
> {code}
> with tmp3 as (
> select l_partkey, 0.5 * sum(l_quantity) as sum_quantity, l_suppkey
> from v_lineitem
> inner join supplier on l_suppkey = s_suppkey
> inner join nation on s_nationkey = n_nationkey
> inner join part on l_partkey = p_partkey
> where l_shipdate >= '1992-01-01' and l_shipdate <= '1995-01-01'
> and n_name = 'CANADA'
> and p_name like 'forest%'
> group by l_partkey, l_suppkey
> )
> select
> s_name,
> s_address
> from
> v_partsupp
> inner join tmp3 on ps_partkey = l_partkey and ps_suppkey = l_suppkey
> inner join supplier on ps_suppkey = s_suppkey
> where
> ps_availqty > sum_quantity
> group by
> s_name, s_address
> order by
> s_name
> {code}
> While below query is OK. Only difference being the order of "inner join tmp3" 
> and "inner join supplier"
> {code}
> with tmp3 as (
> select l_partkey, 0.5 * sum(l_quantity) as sum_quantity, l_suppkey
> from v_lineitem
> inner join supplier on l_suppkey = s_suppkey
> inner join nation on s_nationkey = n_nationkey
> inner join part on l_partkey = p_partkey
> where l_shipdate >= '1992-01-01' and l_shipdate <= '1995-01-01'
> and n_name = 'CANADA'
> and p_name like 'forest%'
> group by l_partkey, l_suppkey
> )
> select
> s_name,
> s_address
> from
> v_partsupp
> inner join supplier on ps_suppkey = s_suppkey
> inner join tmp3 on ps_partkey = l_partkey and ps_suppkey = l_suppkey
> where
> ps_availqty > sum_quantity
> group by
> s_name, s_address
> order by
> s_name
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Resolved] (KYLIN-2579) Improvement on subqueries: reorder subqueries joins with RelOptRule

2017-05-01 Thread hongbin ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hongbin ma resolved KYLIN-2579.
---
   Resolution: Fixed
Fix Version/s: v2.1.0

> Improvement on subqueries: reorder subqueries joins with RelOptRule
> ---
>
> Key: KYLIN-2579
> URL: https://issues.apache.org/jira/browse/KYLIN-2579
> Project: Kylin
>  Issue Type: Improvement
>Reporter: hongbin ma
>Assignee: hongbin ma
> Fix For: v2.1.0
>
>
> Current support for subqueries has some limitations. for example, we require  
> JOIN on tables precedes JOIN on all subqueries, the following query:
> {code:sql}
> select test_kylin_fact.lstg_format_name,sum(test_kylin_fact.price) as GMV 
>  , count(*) as TRANS_CNT
>  from  
>  test_kylin_fact
>  inner JOIN test_category_groupings
>  ON test_kylin_fact.leaf_categ_id = test_category_groupings.leaf_categ_id AND 
> test_kylin_fact.lstg_site_id = test_category_groupings.site_id 
>  inner JOIN (select cal_dt,week_beg_dt from edw.test_cal_dt  where 
> week_beg_dt >= DATE '2010-02-10'  ) xxx
>  ON test_kylin_fact.cal_dt = xxx.cal_dt 
>  
>  
>  where test_category_groupings.meta_categ_name  <> 'Baby'
>  group by test_kylin_fact.lstg_format_name
> {code}
> works but 
> {code:sql}
> select test_kylin_fact.lstg_format_name,sum(test_kylin_fact.price) as GMV 
>  , count(*) as TRANS_CNT
>  from  
>  test_kylin_fact
>  inner JOIN (select cal_dt,week_beg_dt from edw.test_cal_dt  where 
> week_beg_dt >= DATE '2010-02-10'  ) xxx
>  ON test_kylin_fact.cal_dt = xxx.cal_dt 
>  
>  inner JOIN test_category_groupings
>  ON test_kylin_fact.leaf_categ_id = test_category_groupings.leaf_categ_id AND 
> test_kylin_fact.lstg_site_id = test_category_groupings.site_id 
>  
>  
>  where test_category_groupings.meta_categ_name  <> 'Baby'
>  group by test_kylin_fact.lstg_format_name
> {code}
> won't work. In this JIRA we'll reroder subqueries joins with RelOptRule



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (KYLIN-2580) Improvement on subqueries: allow grouping by columns from subquery

2017-05-01 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2580:
-

 Summary: Improvement on subqueries: allow grouping by columns from 
subquery
 Key: KYLIN-2580
 URL: https://issues.apache.org/jira/browse/KYLIN-2580
 Project: Kylin
  Issue Type: Improvement
Reporter: hongbin ma
Assignee: hongbin ma


{code:sql}

select test_kylin_fact.lstg_format_name, xxx.week_beg_dt , 
sum(test_kylin_fact.price) as GMV 
 , count(*) as TRANS_CNT 
 from  

 test_kylin_fact

 inner JOIN test_category_groupings
 ON test_kylin_fact.leaf_categ_id = test_category_groupings.leaf_categ_id AND 
test_kylin_fact.lstg_site_id = test_category_groupings.site_id 


 inner JOIN (select cal_dt,week_beg_dt from edw.test_cal_dt  where week_beg_dt 
>= DATE '2010-02-10'  ) xxx
 ON test_kylin_fact.cal_dt = xxx.cal_dt 


 where test_category_groupings.meta_categ_name  <> 'Baby'
 group by test_kylin_fact.lstg_format_name, xxx.week_beg_dt 
{code}

will fail due to groupby  xxx.week_beg_dt,  because week_beg_dt does not 
necessarily appear in the cube



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (KYLIN-2579) Improvement on subqueries: reorder subqueries joins with RelOptRule

2017-05-01 Thread hongbin ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hongbin ma updated KYLIN-2579:
--
Summary: Improvement on subqueries: reorder subqueries joins with 
RelOptRule  (was: Improvement on subqueries: reroder subqueries joins with 
RelOptRule)

> Improvement on subqueries: reorder subqueries joins with RelOptRule
> ---
>
> Key: KYLIN-2579
> URL: https://issues.apache.org/jira/browse/KYLIN-2579
> Project: Kylin
>  Issue Type: Improvement
>Reporter: hongbin ma
>Assignee: hongbin ma
>
> Current support for subqueries has some limitations. for example, we require  
> JOIN on tables precedes JOIN on all subqueries, the following query:
> {code:sql}
> select test_kylin_fact.lstg_format_name,sum(test_kylin_fact.price) as GMV 
>  , count(*) as TRANS_CNT
>  from  
>  test_kylin_fact
>  inner JOIN test_category_groupings
>  ON test_kylin_fact.leaf_categ_id = test_category_groupings.leaf_categ_id AND 
> test_kylin_fact.lstg_site_id = test_category_groupings.site_id 
>  inner JOIN (select cal_dt,week_beg_dt from edw.test_cal_dt  where 
> week_beg_dt >= DATE '2010-02-10'  ) xxx
>  ON test_kylin_fact.cal_dt = xxx.cal_dt 
>  
>  
>  where test_category_groupings.meta_categ_name  <> 'Baby'
>  group by test_kylin_fact.lstg_format_name
> {code}
> works but 
> {code:sql}
> select test_kylin_fact.lstg_format_name,sum(test_kylin_fact.price) as GMV 
>  , count(*) as TRANS_CNT
>  from  
>  test_kylin_fact
>  inner JOIN (select cal_dt,week_beg_dt from edw.test_cal_dt  where 
> week_beg_dt >= DATE '2010-02-10'  ) xxx
>  ON test_kylin_fact.cal_dt = xxx.cal_dt 
>  
>  inner JOIN test_category_groupings
>  ON test_kylin_fact.leaf_categ_id = test_category_groupings.leaf_categ_id AND 
> test_kylin_fact.lstg_site_id = test_category_groupings.site_id 
>  
>  
>  where test_category_groupings.meta_categ_name  <> 'Baby'
>  group by test_kylin_fact.lstg_format_name
> {code}
> won't work. In this JIRA we'll reroder subqueries joins with RelOptRule



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (KYLIN-2579) Improvement on subqueries: reroder subqueries joins with RelOptRule

2017-05-01 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2579:
-

 Summary: Improvement on subqueries: reroder subqueries joins with 
RelOptRule
 Key: KYLIN-2579
 URL: https://issues.apache.org/jira/browse/KYLIN-2579
 Project: Kylin
  Issue Type: Improvement
Reporter: hongbin ma
Assignee: hongbin ma


Current support for subqueries has some limitations. for example, we require  
JOIN on tables precedes JOIN on all subqueries, the following query:

{code:sql}

select test_kylin_fact.lstg_format_name,sum(test_kylin_fact.price) as GMV 
 , count(*) as TRANS_CNT
 from  

 test_kylin_fact

 inner JOIN test_category_groupings
 ON test_kylin_fact.leaf_categ_id = test_category_groupings.leaf_categ_id AND 
test_kylin_fact.lstg_site_id = test_category_groupings.site_id 


 inner JOIN (select cal_dt,week_beg_dt from edw.test_cal_dt  where week_beg_dt 
>= DATE '2010-02-10'  ) xxx
 ON test_kylin_fact.cal_dt = xxx.cal_dt 
 
 
 where test_category_groupings.meta_categ_name  <> 'Baby'
 group by test_kylin_fact.lstg_format_name

{code}

works but 

{code:sql}

select test_kylin_fact.lstg_format_name,sum(test_kylin_fact.price) as GMV 
 , count(*) as TRANS_CNT
 from  

 test_kylin_fact

 inner JOIN (select cal_dt,week_beg_dt from edw.test_cal_dt  where week_beg_dt 
>= DATE '2010-02-10'  ) xxx
 ON test_kylin_fact.cal_dt = xxx.cal_dt 
 
 inner JOIN test_category_groupings
 ON test_kylin_fact.leaf_categ_id = test_category_groupings.leaf_categ_id AND 
test_kylin_fact.lstg_site_id = test_category_groupings.site_id 
 
 
 where test_category_groupings.meta_categ_name  <> 'Baby'
 group by test_kylin_fact.lstg_format_name

{code}

won't work. In this JIRA we'll reroder subqueries joins with RelOptRule



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (KYLIN-2575) Experimental feature: Computed Column

2017-04-28 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2575:
-

 Summary: Experimental feature: Computed Column
 Key: KYLIN-2575
 URL: https://issues.apache.org/jira/browse/KYLIN-2575
 Project: Kylin
  Issue Type: New Feature
Reporter: hongbin ma
Assignee: hongbin ma


Computed column is a virtual column that is calculated from an expression of 
existing columns. For example, TAX is computed from PRICE * TAX_RATE; TX_YEAR 
is from EXTRACT(year from TX_DATE).

Currently user have to create a view to enrich these computed columns, then 
feed the view to cube. This has two inconvenience:

Create a view is not easy.
The query has to be rewritten to use view instead of the original table.
Let Kylin/KAP directly support computed column will be a big step forward.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (KYLIN-2574) RawQueryLastHacker should group by all possible dimensions

2017-04-28 Thread hongbin ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hongbin ma updated KYLIN-2574:
--
Request participants:   (was: )
 Description: 
currently RawQueryLastHacker make the raw query group by columns existing in 
query (if (tupleInfo.hasColumn(col))). The approach would fail to leverage 
limit push down if the existing columns are not a "prefix" of row 
keys.(org.apache.kylin.storage.gtrecord.GTCubeStorageQueryBase#enableStorageLimitIfPossible)

On the other hand, a large portion of the raw queries are random queries like 
"select * from fact " or "select * from fact inner join lookup where year 
=2000" . Keeping these queries return fast is important to impress users

  was:currently RawQueryLastHacker make the raw query group by columns existing 
in query (if (tupleInfo.hasColumn(col))). The approach would fail to leverage 
limit push down if the existing columns are not a "prefix" of row 
keys.(org.apache.kylin.storage.gtrecord.GTCubeStorageQueryBase#enableStorageLimitIfPossible)


> RawQueryLastHacker should group by all possible dimensions
> --
>
> Key: KYLIN-2574
> URL: https://issues.apache.org/jira/browse/KYLIN-2574
> Project: Kylin
>  Issue Type: Bug
>Reporter: hongbin ma
>Assignee: hongbin ma
>
> currently RawQueryLastHacker make the raw query group by columns existing in 
> query (if (tupleInfo.hasColumn(col))). The approach would fail to leverage 
> limit push down if the existing columns are not a "prefix" of row 
> keys.(org.apache.kylin.storage.gtrecord.GTCubeStorageQueryBase#enableStorageLimitIfPossible)
> On the other hand, a large portion of the raw queries are random queries like 
> "select * from fact " or "select * from fact inner join lookup where year 
> =2000" . Keeping these queries return fast is important to impress users



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (KYLIN-2574) RawQueryLastHacker should group by all possible dimensions

2017-04-28 Thread hongbin ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hongbin ma updated KYLIN-2574:
--
Request participants:   (was: )
 Description: currently RawQueryLastHacker make the raw query group 
by columns existing in query (if (tupleInfo.hasColumn(col))). The approach 
would fail to leverage limit push down if the existing columns are not a 
"prefix" of row 
keys.(org.apache.kylin.storage.gtrecord.GTCubeStorageQueryBase#enableStorageLimitIfPossible)
  (was: currently RawQueryLastHacker make the raw query group by columns 
existing in query (if (tupleInfo.hasColumn(col))). The approach would fail to 
leverage limit push down if the existing columns are not a "prefix" of row 
keys.)

> RawQueryLastHacker should group by all possible dimensions
> --
>
> Key: KYLIN-2574
> URL: https://issues.apache.org/jira/browse/KYLIN-2574
> Project: Kylin
>  Issue Type: Bug
>Reporter: hongbin ma
>Assignee: hongbin ma
>
> currently RawQueryLastHacker make the raw query group by columns existing in 
> query (if (tupleInfo.hasColumn(col))). The approach would fail to leverage 
> limit push down if the existing columns are not a "prefix" of row 
> keys.(org.apache.kylin.storage.gtrecord.GTCubeStorageQueryBase#enableStorageLimitIfPossible)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (KYLIN-2574) RawQueryLastHacker should group by all possible dimensions

2017-04-28 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2574:
-

 Summary: RawQueryLastHacker should group by all possible dimensions
 Key: KYLIN-2574
 URL: https://issues.apache.org/jira/browse/KYLIN-2574
 Project: Kylin
  Issue Type: Bug
Reporter: hongbin ma
Assignee: hongbin ma


currently RawQueryLastHacker make the raw query group by columns existing in 
query (if (tupleInfo.hasColumn(col))). The approach would fail to leverage 
limit push down if the existing columns are not a "prefix" of row keys.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Resolved] (KYLIN-2564) Got "UsernameNotFoundException: User XXX does not exist" in new Kylin instance

2017-04-25 Thread hongbin ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hongbin ma resolved KYLIN-2564.
---
   Resolution: Fixed
Fix Version/s: v2.0.0

> Got "UsernameNotFoundException: User XXX does not exist" in new Kylin instance
> --
>
> Key: KYLIN-2564
> URL: https://issues.apache.org/jira/browse/KYLIN-2564
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v2.0.0
>Reporter: liyang
>Assignee: hongbin ma
> Fix For: v2.0.0
>
>
> In a new Kylin instance, new metadata, we met following exception when 
> creating the very first project.
> {code}
> 2017-04-25 14:04:51,997 ERROR [http-bio-7070-exec-10 ProjectController:218]: 
> Failed to deal with the request.
> org.springframework.security.core.userdetails.UsernameNotFoundException: User 
> ADMIN does not exist. Please make sure the user has logged in before
>   at 
> org.apache.kylin.rest.service.AclService.updateAcl(AclService.java:308)
>   at 
> org.apache.kylin.rest.service.AccessService.grant(AccessService.java:119)
>   at 
> org.apache.kylin.rest.service.AccessService.init(AccessService.java:81)
>   at 
> org.apache.kylin.rest.service.AccessService$$FastClassBySpringCGLIB$$91550c7f.invoke()
>   at 
> org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)
>   at 
> org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:629)
>   at 
> org.apache.kylin.rest.service.AccessService$$EnhancerBySpringCGLIB$$594ff853.init()
>   at 
> org.apache.kylin.rest.service.ProjectService.createProject(ProjectService.java:64)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Resolved] (KYLIN-2555) minor issues about acl and granted autority

2017-04-19 Thread hongbin ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hongbin ma resolved KYLIN-2555.
---
   Resolution: Fixed
Fix Version/s: v2.0.0

> minor issues about acl and granted autority
> ---
>
> Key: KYLIN-2555
> URL: https://issues.apache.org/jira/browse/KYLIN-2555
> Project: Kylin
>  Issue Type: Bug
>Reporter: XIE FAN
>Assignee: XIE FAN
> Fix For: v2.0.0
>
>
> 1. When we use AclService to manage authorities of kylin project, authorities 
> may be granted to not exist users, which should not be allowed
> 2. Implicitly give ADMIN=ADMIN+MODELER+ANALYST and MODELER=MODELER+ANALYST



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (KYLIN-2549) Modify tools that related to Acl

2017-04-18 Thread hongbin ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hongbin ma updated KYLIN-2549:
--
Description: Many tools, such as MigrationTooll, StorageCleanUpJob need to 
read acl records, and they need to be modified using the new Resource store API 
instead of HBase API  (was: Many tools, such as MigrationTooll, 
StorageCleanUpJob need to read acl records, and they need to be modified.)

> Modify tools that related to Acl
> 
>
> Key: KYLIN-2549
> URL: https://issues.apache.org/jira/browse/KYLIN-2549
> Project: Kylin
>  Issue Type: Sub-task
>Reporter: XIE FAN
>Assignee: XIE FAN
>
> Many tools, such as MigrationTooll, StorageCleanUpJob need to read acl 
> records, and they need to be modified using the new Resource store API 
> instead of HBase API



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (KYLIN-2551) separate table desc by each project

2017-04-17 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2551:
-

 Summary: separate table desc by each project
 Key: KYLIN-2551
 URL: https://issues.apache.org/jira/browse/KYLIN-2551
 Project: Kylin
  Issue Type: Improvement
Reporter: hongbin ma
Assignee: hongbin ma


for some historical reasons different projects share same table desc. This 
makes project admins having to worry about not to affect cubes in other project.

The jira aims to separate table desc by each project, and maintain backward 
compatibility so that users won't have to manually "upgrade"



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Resolved] (KYLIN-2539) Useless filter dimension will impact cuboid selection.

2017-04-10 Thread hongbin ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hongbin ma resolved KYLIN-2539.
---
   Resolution: Fixed
 Assignee: hongbin ma
Fix Version/s: v2.0.0

> Useless filter dimension will impact cuboid selection.
> --
>
> Key: KYLIN-2539
> URL: https://issues.apache.org/jira/browse/KYLIN-2539
> Project: Kylin
>  Issue Type: Bug
>Reporter: Yifan Zhang
>Assignee: hongbin ma
> Fix For: v2.0.0
>
>
> Query1: select count(*) from test_kylin_fact where (cal_dt > 
> DATE'2012-01-01') and (seller_id is null or 1 = 1)
> Query2: select count(*) from test_kylin_fact where (cal_dt > DATE'2012-01-01')
> Q1 and Q2 return identical result but hit different cuboid: 43051 and 
> 1310735, and result in different query performance.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (KYLIN-2539) Useless filter dimension will impact cuboid selection.

2017-04-10 Thread hongbin ma (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15962833#comment-15962833
 ] 

hongbin ma commented on KYLIN-2539:
---

I added a org.apache.kylin.metadata.filter.FilterOptimizeTransformer to detect 
patterns like (x = ? or 1 = 1) and replace such patterns with 
ConstantTupleFilter.TRUE

> Useless filter dimension will impact cuboid selection.
> --
>
> Key: KYLIN-2539
> URL: https://issues.apache.org/jira/browse/KYLIN-2539
> Project: Kylin
>  Issue Type: Bug
>Reporter: Yifan Zhang
>
> Query1: select count(*) from test_kylin_fact where (cal_dt > 
> DATE'2012-01-01') and (seller_id is null or 1 = 1)
> Query2: select count(*) from test_kylin_fact where (cal_dt > DATE'2012-01-01')
> Q1 and Q2 return identical result but hit different cuboid: 43051 and 
> 1310735, and result in different query performance.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (KYLIN-2527) Speedup LookupStringTable, use HashMap instead of ConcurrentHashMap

2017-03-31 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2527:
-

 Summary:  Speedup LookupStringTable, use HashMap instead of 
ConcurrentHashMap
 Key: KYLIN-2527
 URL: https://issues.apache.org/jira/browse/KYLIN-2527
 Project: Kylin
  Issue Type: Improvement
Reporter: hongbin ma
Assignee: hongbin ma


concurrent hash map here is a overkill, it should be faster to init a normal 
hash map. the next step might be to cache the lookupStringTable



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (KYLIN-2506) Refactor Global Dictionary

2017-03-31 Thread hongbin ma (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950415#comment-15950415
 ] 

hongbin ma commented on KYLIN-2506:
---

Today we use trie dict as default encoding for precise distinct count, however 
global dictionary seems to be a better default choice. I do understand model 
designer need trie dict in some cases (where global dict may grow too large), 
however we can hide it in advanced settings. There might be a little more work 
to keep backward compatibility, still I think it's manageable. 

> Refactor Global Dictionary
> --
>
> Key: KYLIN-2506
> URL: https://issues.apache.org/jira/browse/KYLIN-2506
> Project: Kylin
>  Issue Type: Improvement
>  Components: General
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
> Fix For: v2.0.0
>
>
> The main points of this refactor:
> 1 Fix the bug that the RemoveListener of LoadingCache swallowed any 
> exceptions when building the GlobalDict.
> 2 Fix the bug that the HDFS filename of DictSliceKey had Illegal characters.
> 3 Fix the bug that the HDFS filename of DictSliceKey maybe longer than 255.
> 4 Fix the bug that DictNode split failed if value length greater than 255 
> bytes.
> 5 Decouple the build and query of GlobalDict: 
> Abstract the builder of AppendTrieDictionary to AppendTrieDictionaryBuilder; 
> Add LoadingCache to AppendTrieDictionary and make AppendTrieDictionary is 
> only readable.
> 6 Remove dependence of LoadingCache when building the GlobalDict.
> 7 Abstract the HDFS operations to GlobalDictStore.
> 8 Abstract the metadata of GlobalDict to GlobalDictMetadata.
> 9 Delete CachedTreeMap.
> 10 Remove the support of multithreading concurrent build and I will add 
> distributed lock for GlobalDict later.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (KYLIN-2521) upgrade to calcite 1.12.0

2017-03-27 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2521:
-

 Summary: upgrade to calcite 1.12.0 
 Key: KYLIN-2521
 URL: https://issues.apache.org/jira/browse/KYLIN-2521
 Project: Kylin
  Issue Type: Task
Reporter: hongbin ma
Assignee: hongbin ma






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Reopened] (KYLIN-2361) Upgrade to Tomcat 8.X

2017-03-12 Thread hongbin ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-2361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hongbin ma reopened KYLIN-2361:
---

> Upgrade to Tomcat 8.X
> -
>
> Key: KYLIN-2361
> URL: https://issues.apache.org/jira/browse/KYLIN-2361
> Project: Kylin
>  Issue Type: Task
>  Components: Web 
>Affects Versions: v1.6.0
>Reporter: Billy Liu
>Assignee: Billy Liu
>Priority: Minor
> Fix For: v2.0.0
>
>
> Apache Tomcat 8.5.x supports the same Servlet, JSP, EL, and WebSocket 
> Specification versions as Apache Tomcat 8.0.x. In addition to that, it also 
> implements the JASPIC 1.1 specification. There are significant changes in 
> many areas under the hood, resulting in improved performance, stability, and 
> total cost of ownership. Please refer to the Apache Tomcat 8.5 Changelog for 
> details.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (KYLIN-2361) Upgrade to Tomcat 8.X

2017-03-12 Thread hongbin ma (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15906467#comment-15906467
 ] 

hongbin ma edited comment on KYLIN-2361 at 3/12/17 9:25 AM:


seems the ordered class loader 
https://github.com/openwide-java/tomcat-classloader-ordered seems not 
guaranteed to load SqlToRelConverter in AtopCalcite prior to SqlToRelConverter 
in calcite. I have to revert this. The change is in 
7976b5fc714f5e73734b3037c05fc2601ea17662


was (Author: mahongbin):
seems the ordered class loader 
https://github.com/openwide-java/tomcat-classloader-ordered seems not 
guaranteed to load SqlToRelConverter in AtopCalcite prior to SqlToRelConverter 
in calcite. I have to revert the change

> Upgrade to Tomcat 8.X
> -
>
> Key: KYLIN-2361
> URL: https://issues.apache.org/jira/browse/KYLIN-2361
> Project: Kylin
>  Issue Type: Task
>  Components: Web 
>Affects Versions: v1.6.0
>Reporter: Billy Liu
>Assignee: Billy Liu
>Priority: Minor
> Fix For: v2.0.0
>
>
> Apache Tomcat 8.5.x supports the same Servlet, JSP, EL, and WebSocket 
> Specification versions as Apache Tomcat 8.0.x. In addition to that, it also 
> implements the JASPIC 1.1 specification. There are significant changes in 
> many areas under the hood, resulting in improved performance, stability, and 
> total cost of ownership. Please refer to the Apache Tomcat 8.5 Changelog for 
> details.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (KYLIN-2361) Upgrade to Tomcat 8.X

2017-03-12 Thread hongbin ma (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15906467#comment-15906467
 ] 

hongbin ma commented on KYLIN-2361:
---

seems the ordered class loader 
https://github.com/openwide-java/tomcat-classloader-ordered seems not 
guaranteed to load SqlToRelConverter in AtopCalcite prior to SqlToRelConverter 
in calcite. I have to revert the change

> Upgrade to Tomcat 8.X
> -
>
> Key: KYLIN-2361
> URL: https://issues.apache.org/jira/browse/KYLIN-2361
> Project: Kylin
>  Issue Type: Task
>  Components: Web 
>Affects Versions: v1.6.0
>Reporter: Billy Liu
>Assignee: Billy Liu
>Priority: Minor
> Fix For: v2.0.0
>
>
> Apache Tomcat 8.5.x supports the same Servlet, JSP, EL, and WebSocket 
> Specification versions as Apache Tomcat 8.0.x. In addition to that, it also 
> implements the JASPIC 1.1 specification. There are significant changes in 
> many areas under the hood, resulting in improved performance, stability, and 
> total cost of ownership. Please refer to the Apache Tomcat 8.5 Changelog for 
> details.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (KYLIN-2495) query exception when integer column encoded as date/time encoding

2017-03-10 Thread hongbin ma (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15904662#comment-15904662
 ] 

hongbin ma edited comment on KYLIN-2495 at 3/10/17 9:14 AM:


updated KYLIN- so that string type no longer support date/time encoding,

test data:

{code}
create table fact0310(intdate int, realdate date, realtime timestamp, longtime 
bigint);

19980302 1998-03-02 2015-06-01 00:00:00 143311680
19920403 1992-04-03 2015-05-15 17:00:00 143170920
19920403 1992-04-03 2016-01-15 12:00:00 145285920

{code}

make sure both intdate and realdate can use date encoding, and realtime and 
longtime can use time encoding by following query:

{code:sql}
select intdate,realdate,realtime,longtime,count(*) from fact0310
group by intdate,realdate,realtime,longtime

{code}


was (Author: mahongbin):
updated KYLIN- so that string type no longer support date/time encoding,

test data:

{code}
create table fact0310(intdate int, realdate date, realtime timestamp, longtime 
bigint);

19980302 1998-03-02 2015-06-01 00:00:00 143311680
19920403 1992-04-03 2015-05-15 17:00:00 143170920
19920403 1992-04-03 2016-01-15 12:00:00 145285920

{code}

make sure both intdate and realdate can use date encoding, and realtime and 
longtime can use time encoding

> query exception when integer column encoded as date/time encoding 
> --
>
> Key: KYLIN-2495
> URL: https://issues.apache.org/jira/browse/KYLIN-2495
> Project: Kylin
>  Issue Type: Bug
>Reporter: hongbin ma
>Assignee: hongbin ma
>
> in KYLIN-, we claimed that integer column can use date/time encoding. 
> however when I tried to query on such cube, an exception is thrown:
> {code}
> java.sql.SQLException: Error while executing SQL "select * from fact0309
> LIMIT 5": For input string: "70225920"
> {code}
> the fact table desc is: 
> {code}
> hive> desc fact0309
> > ;
> OK
> tdate int 
> country   string  
> price decimal(10,0) 
> {code}
> and the sample data is:
> {code}
> 19980302  US  100
> 19920403  CN  100
> 19920403  US  33
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (KYLIN-2495) query exception when integer column encoded as date/time encoding

2017-03-10 Thread hongbin ma (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15904662#comment-15904662
 ] 

hongbin ma commented on KYLIN-2495:
---

updated KYLIN- so that string type no longer support date/time encoding,

test data:

{code}
create table fact0310(intdate int, realdate date, realtime timestamp, longtime 
bigint);

19980302 1998-03-02 2015-06-01 00:00:00 143311680
19920403 1992-04-03 2015-05-15 17:00:00 143170920
19920403 1992-04-03 2016-01-15 12:00:00 145285920

{code}

make sure both intdate and realdate can use date encoding, and realtime and 
longtime can use time encoding

> query exception when integer column encoded as date/time encoding 
> --
>
> Key: KYLIN-2495
> URL: https://issues.apache.org/jira/browse/KYLIN-2495
> Project: Kylin
>  Issue Type: Bug
>Reporter: hongbin ma
>Assignee: hongbin ma
>
> in KYLIN-, we claimed that integer column can use date/time encoding. 
> however when I tried to query on such cube, an exception is thrown:
> {code}
> java.sql.SQLException: Error while executing SQL "select * from fact0309
> LIMIT 5": For input string: "70225920"
> {code}
> the fact table desc is: 
> {code}
> hive> desc fact0309
> > ;
> OK
> tdate int 
> country   string  
> price decimal(10,0) 
> {code}
> and the sample data is:
> {code}
> 19980302  US  100
> 19920403  CN  100
> 19920403  US  33
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (KYLIN-2222) web ui uses rest api to decide which dim encoding is valid for different typed columns

2017-03-09 Thread hongbin ma (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15904654#comment-15904654
 ] 

hongbin ma commented on KYLIN-:
---

refined dimension-encoding to column-type matrix:

|| ||Float Numbers|| Integer Numbers|| Time || String ||
|boolean encoding| N | Y| N|Y|
|date encoding| N | Y| Y|N|
|time encoding| N | Y| Y|N|
|dict encoding| Y| Y| Y|Y|
|fixed_length encoding| N | N| N|Y|
|fixed_length_hex encoding| N | N| N|Y|
|integer encoding| N | Y| N|Y|


> web ui uses rest api to decide which dim encoding is valid for different 
> typed columns
> --
>
> Key: KYLIN-
> URL: https://issues.apache.org/jira/browse/KYLIN-
> Project: Kylin
>  Issue Type: Improvement
>Reporter: hongbin ma
>Assignee: hongbin ma
> Fix For: v2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (KYLIN-2491) Cube with error job can be dropped

2017-03-07 Thread hongbin ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hongbin ma updated KYLIN-2491:
--
Affects Version/s: (was: v1.5.4.1)
   v1.6.0

> Cube with error job can be dropped
> --
>
> Key: KYLIN-2491
> URL: https://issues.apache.org/jira/browse/KYLIN-2491
> Project: Kylin
>  Issue Type: Bug
>  Components: REST Service
>Affects Versions: v1.6.0
>Reporter: nichunen
>Assignee: nichunen
> Fix For: v2.0.0
>
> Attachments: KYLIN-2491.patch
>
>
> If a cube build failed, it can be dropped and left a error job, the job can 
> be resumed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Resolved] (KYLIN-2491) Cube with error job can be dropped

2017-03-07 Thread hongbin ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hongbin ma resolved KYLIN-2491.
---
Resolution: Fixed

> Cube with error job can be dropped
> --
>
> Key: KYLIN-2491
> URL: https://issues.apache.org/jira/browse/KYLIN-2491
> Project: Kylin
>  Issue Type: Bug
>  Components: REST Service
>Affects Versions: v1.6.0
>Reporter: nichunen
>Assignee: nichunen
> Fix For: v2.0.0
>
> Attachments: KYLIN-2491.patch
>
>
> If a cube build failed, it can be dropped and left a error job, the job can 
> be resumed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (KYLIN-2483) SortedIteratorMergerWithLimit could be slower when number of total merge rows is small

2017-03-05 Thread hongbin ma (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15896844#comment-15896844
 ] 

hongbin ma commented on KYLIN-2483:
---

just gave it another thought, disabling SortedIteratorMergerWithLimit with 
limit push down enabled will return wrong results. Since it's always   tempting 
to enable limit push down, we'll have to suffer the cost of 
SortedIteratorMergerWithLimit.

> SortedIteratorMergerWithLimit could be slower when number of total merge rows 
> is small
> --
>
> Key: KYLIN-2483
> URL: https://issues.apache.org/jira/browse/KYLIN-2483
> Project: Kylin
>  Issue Type: Improvement
>Reporter: hongbin ma
>Assignee: hongbin ma
>
> if the pushed down limit is small enough (say less than 100), 
> SortedIteratorMergerWithLimit will bring RELATIVELY significant costs. I'm 
> adding a new configuration entry called 
> kylin.query.merge-sort-partition-results.min-limit (default 100) to fix this 
> issue



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Closed] (KYLIN-2483) SortedIteratorMergerWithLimit could be slower when number of total merge rows is small

2017-03-05 Thread hongbin ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-2483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hongbin ma closed KYLIN-2483.
-
Resolution: Not A Problem

> SortedIteratorMergerWithLimit could be slower when number of total merge rows 
> is small
> --
>
> Key: KYLIN-2483
> URL: https://issues.apache.org/jira/browse/KYLIN-2483
> Project: Kylin
>  Issue Type: Improvement
>Reporter: hongbin ma
>Assignee: hongbin ma
>
> if the pushed down limit is small enough (say less than 100), 
> SortedIteratorMergerWithLimit will bring RELATIVELY significant costs. I'm 
> adding a new configuration entry called 
> kylin.query.merge-sort-partition-results.min-limit (default 100) to fix this 
> issue



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (KYLIN-2483) SortedIteratorMergerWithLimit could be slower when number of total merge rows is small

2017-03-05 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2483:
-

 Summary: SortedIteratorMergerWithLimit could be slower when number 
of total merge rows is small
 Key: KYLIN-2483
 URL: https://issues.apache.org/jira/browse/KYLIN-2483
 Project: Kylin
  Issue Type: Improvement
Reporter: hongbin ma
Assignee: hongbin ma


if the pushed down limit is small enough (say less than 100), 
SortedIteratorMergerWithLimit will bring RELATIVELY significant costs. I'm 
adding a new configuration entry called 
kylin.query.merge-sort-partition-results.min-limit (default 100) to fix this 
issue



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (KYLIN-2471) queries with parenthesized sub-clause in JOIN will fail

2017-02-25 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2471:
-

 Summary: queries with parenthesized sub-clause in JOIN will fail
 Key: KYLIN-2471
 URL: https://issues.apache.org/jira/browse/KYLIN-2471
 Project: Kylin
  Issue Type: Bug
Reporter: hongbin ma
Assignee: hongbin ma


cognos will generate queries with parenthesized sub-clause in JOIN. for example:

{code}
SELECT "TABLE1"."DIM1_1" "DIM1_1"
   ,"TABLE2"."DIM2_1" "DIM2_1"
   ,SUM("FACT"."M1") "M1"
   ,SUM("FACT"."M2") "M2"
  FROM ("COGNOS"."FACT" "FACT" LEFT OUTER JOIN "COGNOS"."TABLE1"
"TABLE1" ON "FACT"."FK_1" = "TABLE1"."PK_1")
  LEFT OUTER JOIN "COGNOS"."TABLE2" "TABLE2"
ON "FACT"."FK_2" = "TABLE2"."PK_2"
 GROUP BY "TABLE2"."DIM2_1"
  ,"TABLE1"."DIM1_1";
{code}

as mentioned in https://issues.apache.org/jira/browse/CALCITE-35 such issue is 
difficult to handle in calcite. We'll leverage IQueryTransformer to remove 
unnecessary parentheses 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (KYLIN-2436) add a configuration knob to disable spilling of aggregation cache

2017-02-14 Thread hongbin ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hongbin ma updated KYLIN-2436:
--
Description: 
Kylin's aggregation operator can spill intermediate results to disk when its 
estimated memory usage exceeds some threshold (kylin.query.coprocessor.mem.gb 
to be specific). While it's a useful feature in general to prevent RegionServer 
from OOM, there are times when aborting this kind of memory-hungry query 
immediately is a more suitable choice to users.

To accommodate this requirement, I suggest adding a new configuration named 
-*kylin.storage.hbase.coprocessor-spill-enabled*- 
+*kylin.storage.partition.aggr-spill-enabled*+. The default value would be 
true, which will keep the same behavior as before. If changed to false, query 
that uses more aggregation memory than threshold will fail immediately.

  was:
Kylin's aggregation operator can spill intermediate results to disk when its 
estimated memory usage exceeds some threshold (kylin.query.coprocessor.mem.gb 
to be specific). While it's a useful feature in general to prevent RegionServer 
from OOM, there are times when aborting this kind of memory-hungry query 
immediately is a more suitable choice to users.

To accommodate this requirement, I suggest adding a new configuration named 
*kylin.storage.hbase.coprocessor-spill-enabled*. The default value would be 
true, which will keep the same behavior as before. If changed to false, query 
that uses more aggregation memory than threshold will fail immediately.


> add a configuration knob to disable spilling of aggregation cache
> -
>
> Key: KYLIN-2436
> URL: https://issues.apache.org/jira/browse/KYLIN-2436
> Project: Kylin
>  Issue Type: Improvement
>  Components: Storage - HBase
>Affects Versions: v1.6.0
>Reporter: Dayue Gao
>Assignee: Dayue Gao
> Fix For: v2.0.0
>
>
> Kylin's aggregation operator can spill intermediate results to disk when its 
> estimated memory usage exceeds some threshold (kylin.query.coprocessor.mem.gb 
> to be specific). While it's a useful feature in general to prevent 
> RegionServer from OOM, there are times when aborting this kind of 
> memory-hungry query immediately is a more suitable choice to users.
> To accommodate this requirement, I suggest adding a new configuration named 
> -*kylin.storage.hbase.coprocessor-spill-enabled*- 
> +*kylin.storage.partition.aggr-spill-enabled*+. The default value would be 
> true, which will keep the same behavior as before. If changed to false, query 
> that uses more aggregation memory than threshold will fail immediately.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (KYLIN-2438) replace scan threshold with max scan bytes

2017-02-14 Thread hongbin ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hongbin ma updated KYLIN-2438:
--
Description: 
In order to guard against bad queries that can consume lots of memory and 
potentially crash kylin / hbase server, kylin limits the maximum number of rows 
query can scan. The maximum value is chosen based on two configs
# *kylin.query.scan.threshold* is used if the query doesn't contain 
memory-hungry metrics
# *kylin.query.mem.budget* / estimated_row_size is used otherwise as the per 
region maximum.

This approach however has several deficiencies:
* It doesn't work with complex, varlen metrics very well. The estimated 
threshold could be either too small or too large. If it's too small, good 
queries are killed. If it's too large, bad queries are not banned.
* Row count doesn't correspond to memory consumption, thus it's difficult to 
determine how large scan threshold should be set to.
* kylin.query.scan.threshold can't be override at cube level.

In this JIRA, I propose to replace the current row count based threshold with a 
more intuitive size based threshold
* KYLIN-2437 will collect the number of bytes scanned at both region and query 
level
* A new configuration *kylin.query.max-scan-bytes* will be added to limits the 
maximum number of bytes query can scan
* *kylin.query.mem.budget* will be renamed to 
-*kylin.storage.hbase.coprocessor-max-scan-bytes*- 
+*kylin.storage.partition.max-scan-bytes*+, which limits at region level. No 
need to rely on estimations about row size any more.
* The above two configs scan be override at cube level
* the old *kylin.query.scan.threshold* will be deprecated

  was:
In order to guard against bad queries that can consume lots of memory and 
potentially crash kylin / hbase server, kylin limits the maximum number of rows 
query can scan. The maximum value is chosen based on two configs
# *kylin.query.scan.threshold* is used if the query doesn't contain 
memory-hungry metrics
# *kylin.query.mem.budget* / estimated_row_size is used otherwise as the per 
region maximum.

This approach however has several deficiencies:
* It doesn't work with complex, varlen metrics very well. The estimated 
threshold could be either too small or too large. If it's too small, good 
queries are killed. If it's too large, bad queries are not banned.
* Row count doesn't correspond to memory consumption, thus it's difficult to 
determine how large scan threshold should be set to.
* kylin.query.scan.threshold can't be override at cube level.

In this JIRA, I propose to replace the current row count based threshold with a 
more intuitive size based threshold
* KYLIN-2437 will collect the number of bytes scanned at both region and query 
level
* A new configuration *kylin.query.max-scan-bytes* will be added to limits the 
maximum number of bytes query can scan
* *kylin.query.mem.budget* will be renamed to 
*kylin.storage.hbase.coprocessor-max-scan-bytes*, which limits at region level. 
No need to rely on estimations about row size any more.
* The above two configs scan be override at cube level
* the old *kylin.query.scan.threshold* will be deprecated


> replace scan threshold with max scan bytes
> --
>
> Key: KYLIN-2438
> URL: https://issues.apache.org/jira/browse/KYLIN-2438
> Project: Kylin
>  Issue Type: Improvement
>  Components: Query Engine, Storage - HBase
>Affects Versions: v1.6.0
>Reporter: Dayue Gao
>Assignee: Dayue Gao
> Fix For: v2.0.0
>
>
> In order to guard against bad queries that can consume lots of memory and 
> potentially crash kylin / hbase server, kylin limits the maximum number of 
> rows query can scan. The maximum value is chosen based on two configs
> # *kylin.query.scan.threshold* is used if the query doesn't contain 
> memory-hungry metrics
> # *kylin.query.mem.budget* / estimated_row_size is used otherwise as the per 
> region maximum.
> This approach however has several deficiencies:
> * It doesn't work with complex, varlen metrics very well. The estimated 
> threshold could be either too small or too large. If it's too small, good 
> queries are killed. If it's too large, bad queries are not banned.
> * Row count doesn't correspond to memory consumption, thus it's difficult to 
> determine how large scan threshold should be set to.
> * kylin.query.scan.threshold can't be override at cube level.
> In this JIRA, I propose to replace the current row count based threshold with 
> a more intuitive size based threshold
> * KYLIN-2437 will collect the number of bytes scanned at both region and 
> query level
> * A new configuration *kylin.query.max-scan-bytes* will be added to limits 
> the maximum number of bytes query can scan
> * *kylin.query.mem.budget* will be renamed to 
>

[jira] [Updated] (KYLIN-2441) protocol for REST API result format

2017-02-09 Thread hongbin ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hongbin ma updated KYLIN-2441:
--
Request participants:   (was: )
 Description: currently there's no standard for REST API's result 
format, so the frontend has to deal with all kinds of formats. This issue is an 
attempt to unify the format

> protocol for REST API result format
> ---
>
> Key: KYLIN-2441
> URL: https://issues.apache.org/jira/browse/KYLIN-2441
> Project: Kylin
>  Issue Type: Bug
>Reporter: hongbin ma
>Assignee: hongbin ma
>
> currently there's no standard for REST API's result format, so the frontend 
> has to deal with all kinds of formats. This issue is an attempt to unify the 
> format



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (KYLIN-2441) protocol for REST API result format

2017-02-09 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2441:
-

 Summary: protocol for REST API result format
 Key: KYLIN-2441
 URL: https://issues.apache.org/jira/browse/KYLIN-2441
 Project: Kylin
  Issue Type: Bug
Reporter: hongbin ma
Assignee: hongbin ma






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (KYLIN-2436) add a configuration knob to disable spilling of aggregation cache

2017-02-08 Thread hongbin ma (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15858057#comment-15858057
 ] 

hongbin ma commented on KYLIN-2436:
---

+1

> add a configuration knob to disable spilling of aggregation cache
> -
>
> Key: KYLIN-2436
> URL: https://issues.apache.org/jira/browse/KYLIN-2436
> Project: Kylin
>  Issue Type: Improvement
>  Components: Storage - HBase
>Affects Versions: v1.6.0
>Reporter: Dayue Gao
>Assignee: Dayue Gao
>
> Kylin's aggregation operator can spill intermediate results to disk when its 
> estimated memory usage exceeds some threshold (kylin.query.coprocessor.mem.gb 
> to be specific). While it's a useful feature in general to prevent 
> RegionServer from OOM, there are times when aborting this kind of 
> memory-hungry query immediately is a more suitable choice to users.
> To accommodate this requirement, I suggest adding a new configuration named 
> *kylin.storage.hbase.coprocessor-spill-enabled*. The default value would be 
> true, which will keep the same behavior as before. If changed to false, query 
> that uses more aggregation memory than threshold will fail immediately.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (KYLIN-2437) collect number of bytes scanned to query metrics

2017-02-08 Thread hongbin ma (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15858055#comment-15858055
 ] 

hongbin ma commented on KYLIN-2437:
---

+1

> collect number of bytes scanned to query metrics
> 
>
> Key: KYLIN-2437
> URL: https://issues.apache.org/jira/browse/KYLIN-2437
> Project: Kylin
>  Issue Type: Improvement
>  Components: Storage - HBase
>Affects Versions: v1.6.0
>Reporter: Dayue Gao
>Assignee: Dayue Gao
>
> Besides scanned row count, it's useful to know how many bytes are scanned 
> from HBase to fulfil a query. It is perhaps a better indicator than row count 
> that shows how much pressure a query puts on HBase. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (KYLIN-2438) replace scan threshold with max scan bytes

2017-02-08 Thread hongbin ma (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15858054#comment-15858054
 ] 

hongbin ma commented on KYLIN-2438:
---

+1 
happy to remove dependency on row size estimation

> replace scan threshold with max scan bytes
> --
>
> Key: KYLIN-2438
> URL: https://issues.apache.org/jira/browse/KYLIN-2438
> Project: Kylin
>  Issue Type: Improvement
>  Components: Query Engine, Storage - HBase
>Affects Versions: v1.6.0
>Reporter: Dayue Gao
>Assignee: Dayue Gao
>
> In order to guard against bad queries that can consume lots of memory and 
> potentially crash kylin / hbase server, kylin limits the maximum number of 
> rows query can scan. The maximum value is chosen based on two configs
> # *kylin.query.scan.threshold* is used if the query doesn't contain 
> memory-hungry metrics
> # *kylin.query.mem.budget* / estimated_row_size is used otherwise as the per 
> region maximum.
> This approach however has several deficiencies:
> * It doesn't work with complex, varlen metrics very well. The estimated 
> threshold could be either too small or too large. If it's too small, good 
> queries are killed. If it's too large, bad queries are not banned.
> * Row count doesn't correspond to memory consumption, thus it's difficult to 
> determine how large scan threshold should be set to.
> * kylin.query.scan.threshold can't be override at cube level.
> In this JIRA, I propose to replace the current row count based threshold with 
> a more intuitive size based threshold
> * KYLIN-2437 will collect the number of bytes scanned at both region and 
> query level
> * A new configuration *kylin.query.max-scan-bytes* will be added to limits 
> the maximum number of bytes query can scan
> * *kylin.query.mem.budget* will be renamed to 
> *kylin.storage.hbase.coprocessor-max-scan-bytes*, which limits at region 
> level. No need to rely on estimations about row size any more.
> * The above two configs scan be override at cube level
> * the old *kylin.query.scan.threshold* will be deprecated



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Resolved] (KYLIN-2222) web ui uses rest api to decide which dim encoding is valid for different typed columns

2017-02-08 Thread hongbin ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hongbin ma resolved KYLIN-.
---
   Resolution: Fixed
Fix Version/s: v2.0.0

> web ui uses rest api to decide which dim encoding is valid for different 
> typed columns
> --
>
> Key: KYLIN-
> URL: https://issues.apache.org/jira/browse/KYLIN-
> Project: Kylin
>  Issue Type: Improvement
>Reporter: hongbin ma
>Assignee: hongbin ma
> Fix For: v2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (KYLIN-2222) web ui uses rest api to decide which dim encoding is valid for different typed columns

2017-02-08 Thread hongbin ma (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857669#comment-15857669
 ] 

hongbin ma commented on KYLIN-:
---

the dimension-encoding to column-type matrix:

|| ||Float Numbers|| Integer Numbers|| Time || String ||
|boolean encoding| N | Y| N|Y|
|date encoding| N | Y| Y|Y|
|time encoding| N | Y| Y|Y|
|dict encoding| Y| Y| Y|Y|
|fixed_length encoding| N | N| N|Y|
|fixed_length_hex encoding| N | N| N|Y|
|integer encoding| N | Y| N|Y|


> web ui uses rest api to decide which dim encoding is valid for different 
> typed columns
> --
>
> Key: KYLIN-
> URL: https://issues.apache.org/jira/browse/KYLIN-
> Project: Kylin
>  Issue Type: Improvement
>Reporter: hongbin ma
>Assignee: hongbin ma
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (KYLIN-2435) two EXTRACT on a column will fail if there exists NULL values for the column

2017-02-07 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2435:
-

 Summary: two EXTRACT on a column will fail if there exists NULL 
values for the column
 Key: KYLIN-2435
 URL: https://issues.apache.org/jira/browse/KYLIN-2435
 Project: Kylin
  Issue Type: Bug
Reporter: hongbin ma
Assignee: hongbin ma


2000-01-01 19:12:33,US,android,10.22
2001-01-01 9:12:33,US,windows,9.12
2002-05-02 20:12:03,CN,windows,3.33
\N,CN,windows,3.32

create table testtable (starttime TIMESTAMP,country STRING, client STRING, 
price DECIMAL(18,4)) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

the following query will succeed:

{code}
select sum(price),extract (year from starttime) from testtable group by extract 
(year from starttime)
{code}

but the following will fail:

{code}
select sum(price) from testtable group by extract (year from starttime), 
extract (month from starttime)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (KYLIN-2424) Optimize the integration test's performance

2017-02-06 Thread hongbin ma (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15853970#comment-15853970
 ] 

hongbin ma edited comment on KYLIN-2424 at 2/6/17 1:20 PM:
---

[~Shaofengshi] great work! 
[~yimingliu] looks like it's abbreviation for "true". If so it could be 
confusing, why not just use "true"?


was (Author: mahongbin):
[~Shaofengshi] great work! I can close KYLIN-2015 safely now
[~yimingliu] looks like it's abbreviation for "true". If so it could be 
confusing, why not just use "true"?

> Optimize the integration test's performance
> ---
>
> Key: KYLIN-2424
> URL: https://issues.apache.org/jira/browse/KYLIN-2424
> Project: Kylin
>  Issue Type: Improvement
>  Components: Tools, Build and Test
>Reporter: Shaofeng SHI
>Assignee: Shaofeng SHI
> Fix For: v2.0.0
>
>
> Kylin's integration test is slow, especially the ITCombinationTest. Most of 
> time are spent on H2 to execute the test queries. In a latest integration 
> test, this test case take 90 minutes to finish.
> By checking H2's document, I think the main problem is the absence of index 
> on the tables, while index is very important for a relational database's 
> query performance. So when Kylin create the tables in H2, shoud create index 
> on the columns that will be used in the queries, like the pk/fk, the 
> filtering columns etc. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (KYLIN-2015) replace h2 with alternatives like sqllite or mysql

2017-02-06 Thread hongbin ma (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15853972#comment-15853972
 ] 

hongbin ma commented on KYLIN-2015:
---

The performance issue of H2 is solved in KYLIN-2424 without replacing H2

> replace h2 with alternatives like sqllite or mysql
> --
>
> Key: KYLIN-2015
> URL: https://issues.apache.org/jira/browse/KYLIN-2015
> Project: Kylin
>  Issue Type: Improvement
>Reporter: hongbin ma
>Assignee: hongbin ma
>
> in IT we compare kylin's result with H2's results to ensure query correctness.
> however h2 only supports part of the SQL syntax. For example, it cannot  
> support functions like timestampadd, or (DATE'2013-01-02' + interval '3' 
> day). What's more, subqueries are observed to be very slow on H2.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (KYLIN-2424) Optimize the integration test's performance

2017-02-06 Thread hongbin ma (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15853970#comment-15853970
 ] 

hongbin ma commented on KYLIN-2424:
---

[~Shaofengshi] great work! I can close KYLIN-2015 safely now
[~yimingliu] looks like it's abbreviation for "true". If so it could be 
confusing, why not just use "true"?

> Optimize the integration test's performance
> ---
>
> Key: KYLIN-2424
> URL: https://issues.apache.org/jira/browse/KYLIN-2424
> Project: Kylin
>  Issue Type: Improvement
>  Components: Tools, Build and Test
>Reporter: Shaofeng SHI
>Assignee: Shaofeng SHI
> Fix For: v2.0.0
>
>
> Kylin's integration test is slow, especially the ITCombinationTest. Most of 
> time are spent on H2 to execute the test queries. In a latest integration 
> test, this test case take 90 minutes to finish.
> By checking H2's document, I think the main problem is the absence of index 
> on the tables, while index is very important for a relational database's 
> query performance. So when Kylin create the tables in H2, shoud create index 
> on the columns that will be used in the queries, like the pk/fk, the 
> filtering columns etc. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

1 2 3 4 5 6 7 >

1 - 100 of 634 matches

Mail list logo