[jira] [Commented] (KYLIN-2312) Display Server Config/Environment by order in system tab
[ https://issues.apache.org/jira/browse/KYLIN-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16544969#comment-16544969 ] hongbin ma commented on KYLIN-2312: --- after KYLIN-2659 this is no longer working, and we haven't received too many complains afterwards, so maybe the requirement here is not very strong > Display Server Config/Environment by order in system tab > - > > Key: KYLIN-2312 > URL: https://issues.apache.org/jira/browse/KYLIN-2312 > Project: Kylin > Issue Type: Improvement > Components: Web >Affects Versions: v1.6.0 >Reporter: Billy Liu >Assignee: Billy Liu >Priority: Minor > Fix For: v2.0.0 > > > The system tab page shows Server Config and Environment, it's useful for > debugging, but the item order is undetermined currently. The Config should > show the same order as the properties file. The Environment should show the > items order by name. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KYLIN-3379) timestampadd test coverage is not enough
[ https://issues.apache.org/jira/browse/KYLIN-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hongbin ma updated KYLIN-3379: -- Description: complex cases like timestampadd(MONTH,23,test_kylin_fact.cal_dt) or timestampadd(MONTH,-23,test_kylin_fact.cal_dt) is not covered. And my tests shows this kind of queries will fail IT. > timestampadd test coverage is not enough > > > Key: KYLIN-3379 > URL: https://issues.apache.org/jira/browse/KYLIN-3379 > Project: Kylin > Issue Type: Bug >Affects Versions: v2.3.1 >Reporter: hongbin ma >Priority: Major > Fix For: v2.4.0 > > > complex cases like > timestampadd(MONTH,23,test_kylin_fact.cal_dt) or > timestampadd(MONTH,-23,test_kylin_fact.cal_dt) is not covered. > > And my tests shows this kind of queries will fail IT. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (KYLIN-3379) timestampadd test coverage is not enough
[ https://issues.apache.org/jira/browse/KYLIN-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hongbin ma reassigned KYLIN-3379: - Assignee: hongbin ma > timestampadd test coverage is not enough > > > Key: KYLIN-3379 > URL: https://issues.apache.org/jira/browse/KYLIN-3379 > Project: Kylin > Issue Type: Bug >Affects Versions: v2.3.1 >Reporter: hongbin ma >Assignee: hongbin ma >Priority: Major > Fix For: v2.4.0 > > > complex cases like > timestampadd(MONTH,23,test_kylin_fact.cal_dt) or > timestampadd(MONTH,-23,test_kylin_fact.cal_dt) is not covered. > > And my tests shows this kind of queries will fail IT. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KYLIN-3379) timestampadd test coverage is not enough
[ https://issues.apache.org/jira/browse/KYLIN-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hongbin ma updated KYLIN-3379: -- Fix Version/s: v2.4.0 > timestampadd test coverage is not enough > > > Key: KYLIN-3379 > URL: https://issues.apache.org/jira/browse/KYLIN-3379 > Project: Kylin > Issue Type: Bug >Affects Versions: v2.3.1 >Reporter: hongbin ma >Assignee: hongbin ma >Priority: Major > Fix For: v2.4.0 > > > complex cases like > timestampadd(MONTH,23,test_kylin_fact.cal_dt) or > timestampadd(MONTH,-23,test_kylin_fact.cal_dt) is not covered. > > And my tests shows this kind of queries will fail IT. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KYLIN-3379) timestampadd test coverage is not enough
hongbin ma created KYLIN-3379: - Summary: timestampadd test coverage is not enough Key: KYLIN-3379 URL: https://issues.apache.org/jira/browse/KYLIN-3379 Project: Kylin Issue Type: Bug Affects Versions: v2.3.1 Reporter: hongbin ma -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KYLIN-3149) Calcite's ReduceExpressionsRule.PROJECT_INSTANCE not working as expected
[ https://issues.apache.org/jira/browse/KYLIN-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hongbin ma updated KYLIN-3149: -- Attachment: dump.txt > Calcite's ReduceExpressionsRule.PROJECT_INSTANCE not working as expected > > > Key: KYLIN-3149 > URL: https://issues.apache.org/jira/browse/KYLIN-3149 > Project: Kylin > Issue Type: Bug >Affects Versions: v2.2.0 >Reporter: hongbin ma > Attachments: dump.txt > > > for queries like: > {code:sql} > select TRANS_ID from kylin_sales group by cast (case > WHEN '1030101' = '1030101' then substring(COALESCE(OPS_USER_ID, > ''), 1, 1) > when '1030101' = '1030102' then substring(COALESCE(OPS_REGION, > ''), 1, 1) > when '1030101' = '1030103' then substring(COALESCE(LSTG_FORMAT_NAME, > ''), 1, 1) > when '1030101' = '1030104' then substring(COALESCE(LSTG_FORMAT_NAME, > ''), 1, 1) > end as varchar(256)), TRANS_ID; > {code} > the expected logical plan after volcano is: > {code} > EXECUTION PLAN BEFORE REWRITE > OLAPToEnumerableConverter > OLAPProjectRel(TRANS_ID=[$1], ctx=[]) > OLAPLimitRel(ctx=[], fetch=[5]) > OLAPAggregateRel(group=[{0, 1}], ctx=[]) > OLAPProjectRel($f0=[SUBSTRING(CASE(IS NOT NULL($9), $9, > ''), 1, 1)], TRANS_ID=[$0], ctx=[]) > OLAPTableScan(table=[[DEFAULT, KYLIN_SALES]], ctx=[], fields=[[0, > 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]]) > {code} > however the actual is: > {code} > EXECUTION PLAN BEFORE REWRITE > OLAPToEnumerableConverter > OLAPLimitRel(ctx=[], fetch=[5]) > OLAPProjectRel(TRANS_ID=[$1], ctx=[]) > OLAPAggregateRel(group=[{0, 1}], ctx=[]) > OLAPProjectRel($f0=[CAST(CASE(=('1030101', '1030101'), > SUBSTRING(CASE(IS NOT NULL($9), $9, ''), 1, 1), =('1030101', > '1030102'), SUBSTRING(CASE(IS NOT NULL($10), $10, ''), 1, 1), > =('1030101', '1030103'), SUBSTRING(CASE(IS NOT NULL($2), $2, ''), > 1, 1), =('1030101', '1030104'), SUBSTRING(CASE(IS NOT NULL($2), $2, > ''), 1, 1), null)):VARCHAR(256) CHARACTER SET "UTF-16LE" COLLATE > "UTF-16LE$en_US$primary"], TRANS_ID=[$0], ctx=[]) > OLAPTableScan(table=[[DEFAULT, KYLIN_SALES]], ctx=[], fields=[[0, > 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]]) > {code} > looks like Calcite's ReduceExpressionsRule.PROJECT_INSTANCE not working as > expected. If we dump the internal state of this VolcanoPlanner > (org.apache.calcite.plan.volcano.VolcanoPlanner#dump), line 19-21 from the > complete dump is attached: > {code} > rel#337:Subset#1.OLAP.[], best=rel#339, importance=0.6561 > > rel#339:OLAPProjectRel.OLAP.[](input=rel#303:Subset#0.OLAP.[],$f0=CAST(CASE(=('1030101', > '1030101'), SUBSTRING(CASE(IS NOT NULL($9), $9, ''), 1, 1), > =('1030101', '1030102'), SUBSTRING(CASE(IS NOT NULL($10), $10, > ''), 1, 1), =('1030101', '1030103'), SUBSTRING(CASE(IS NOT > NULL($2), $2, ''), 1, 1), =('1030101', '1030104'), > SUBSTRING(CASE(IS NOT NULL($2), $2, ''), 1, 1), > null)):VARCHAR(256) CHARACTER SET "UTF-16LE" COLLATE > "UTF-16LE$en_US$primary",TRANS_ID=$0,ctx=), rowcount=100.0, cumulative > cost={15.0 rows, 25.05 cpu, 0.0 io} > > rel#348:OLAPProjectRel.OLAP.[](input=rel#303:Subset#0.OLAP.[],$f0=SUBSTRING(CASE(IS > NOT NULL($9), $9, ''), 1, 1),TRANS_ID=$0,ctx=), rowcount=100.0, > cumulative cost={15.0 rows, 25.05 cpu, 0.0 io} > {code} > we see two rels with same cost: #339 and #348, where #339 is created from > LogicalProject = (OLAPProjectRule)=> OLAPProject, and #348 is created from > LogicalProject =( ReduceExpressionsRule) => Reduced LogicalProject > =(OLAPProjectRule)=> Reduced OLAPProject . Since ReduceExpressionsRule > require Logical Project rather than OLAP Project, #339 is never reduced. > The worse thing is that cost of #339 and #348 are same. By current volcano > planner algorithm the first met rel will be chosen, so unexpected rel is > chosen > A simple approach to fix this is to refine the rel choosing algorithm: when > two rels are equal in cost, choose a "simpler" one. Since we don't have a > perfect measurement of "simple", we simply choose the rel with smaller > toString() length -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KYLIN-3149) Calcite's ReduceExpressionsRule.PROJECT_INSTANCE not working as expected
hongbin ma created KYLIN-3149: - Summary: Calcite's ReduceExpressionsRule.PROJECT_INSTANCE not working as expected Key: KYLIN-3149 URL: https://issues.apache.org/jira/browse/KYLIN-3149 Project: Kylin Issue Type: Bug Affects Versions: v2.2.0 Reporter: hongbin ma for queries like: {code:sql} select TRANS_ID from kylin_sales group by cast (case WHEN '1030101' = '1030101' then substring(COALESCE(OPS_USER_ID, ''), 1, 1) when '1030101' = '1030102' then substring(COALESCE(OPS_REGION, ''), 1, 1) when '1030101' = '1030103' then substring(COALESCE(LSTG_FORMAT_NAME, ''), 1, 1) when '1030101' = '1030104' then substring(COALESCE(LSTG_FORMAT_NAME, ''), 1, 1) end as varchar(256)), TRANS_ID; {code} the expected logical plan after volcano is: {code} EXECUTION PLAN BEFORE REWRITE OLAPToEnumerableConverter OLAPProjectRel(TRANS_ID=[$1], ctx=[]) OLAPLimitRel(ctx=[], fetch=[5]) OLAPAggregateRel(group=[{0, 1}], ctx=[]) OLAPProjectRel($f0=[SUBSTRING(CASE(IS NOT NULL($9), $9, ''), 1, 1)], TRANS_ID=[$0], ctx=[]) OLAPTableScan(table=[[DEFAULT, KYLIN_SALES]], ctx=[], fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]]) {code} however the actual is: {code} EXECUTION PLAN BEFORE REWRITE OLAPToEnumerableConverter OLAPLimitRel(ctx=[], fetch=[5]) OLAPProjectRel(TRANS_ID=[$1], ctx=[]) OLAPAggregateRel(group=[{0, 1}], ctx=[]) OLAPProjectRel($f0=[CAST(CASE(=('1030101', '1030101'), SUBSTRING(CASE(IS NOT NULL($9), $9, ''), 1, 1), =('1030101', '1030102'), SUBSTRING(CASE(IS NOT NULL($10), $10, ''), 1, 1), =('1030101', '1030103'), SUBSTRING(CASE(IS NOT NULL($2), $2, ''), 1, 1), =('1030101', '1030104'), SUBSTRING(CASE(IS NOT NULL($2), $2, ''), 1, 1), null)):VARCHAR(256) CHARACTER SET "UTF-16LE" COLLATE "UTF-16LE$en_US$primary"], TRANS_ID=[$0], ctx=[]) OLAPTableScan(table=[[DEFAULT, KYLIN_SALES]], ctx=[], fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]]) {code} looks like Calcite's ReduceExpressionsRule.PROJECT_INSTANCE not working as expected. If we dump the internal state of this VolcanoPlanner (org.apache.calcite.plan.volcano.VolcanoPlanner#dump), line 19-21 from the complete dump is attached: {code} rel#337:Subset#1.OLAP.[], best=rel#339, importance=0.6561 rel#339:OLAPProjectRel.OLAP.[](input=rel#303:Subset#0.OLAP.[],$f0=CAST(CASE(=('1030101', '1030101'), SUBSTRING(CASE(IS NOT NULL($9), $9, ''), 1, 1), =('1030101', '1030102'), SUBSTRING(CASE(IS NOT NULL($10), $10, ''), 1, 1), =('1030101', '1030103'), SUBSTRING(CASE(IS NOT NULL($2), $2, ''), 1, 1), =('1030101', '1030104'), SUBSTRING(CASE(IS NOT NULL($2), $2, ''), 1, 1), null)):VARCHAR(256) CHARACTER SET "UTF-16LE" COLLATE "UTF-16LE$en_US$primary",TRANS_ID=$0,ctx=), rowcount=100.0, cumulative cost={15.0 rows, 25.05 cpu, 0.0 io} rel#348:OLAPProjectRel.OLAP.[](input=rel#303:Subset#0.OLAP.[],$f0=SUBSTRING(CASE(IS NOT NULL($9), $9, ''), 1, 1),TRANS_ID=$0,ctx=), rowcount=100.0, cumulative cost={15.0 rows, 25.05 cpu, 0.0 io} {code} we see two rels with same cost: #339 and #348, where #339 is created from LogicalProject = (OLAPProjectRule)=> OLAPProject, and #348 is created from LogicalProject =( ReduceExpressionsRule) => Reduced LogicalProject =(OLAPProjectRule)=> Reduced OLAPProject . Since ReduceExpressionsRule require Logical Project rather than OLAP Project, #339 is never reduced. The worse thing is that cost of #339 and #348 are same. By current volcano planner algorithm the first met rel will be chosen, so unexpected rel is chosen A simple approach to fix this is to refine the rel choosing algorithm: when two rels are equal in cost, choose a "simpler" one. Since we don't have a perfect measurement of "simple", we simply choose the rel with smaller toString() length -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KYLIN-3106) DefaultScheduler.shutdown should use ExecutorService.shutdownNow instead of ExecutorService.shutdown
[ https://issues.apache.org/jira/browse/KYLIN-3106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hongbin ma updated KYLIN-3106: -- Summary: DefaultScheduler.shutdown should use ExecutorService.shutdownNow instead of ExecutorService.shutdown (was: DefaultScheduler#shutdown should use shutdownNow instead of shutdown) > DefaultScheduler.shutdown should use ExecutorService.shutdownNow instead of > ExecutorService.shutdown > > > Key: KYLIN-3106 > URL: https://issues.apache.org/jira/browse/KYLIN-3106 > Project: Kylin > Issue Type: Bug >Reporter: hongbin ma > Fix For: v2.3.0 > > > java.util.concurrent.ExecutorService#shutdownNow will interrupt running > worker threads, while java.util.concurrent.ExecutorService#shutdown will not. > if interrupt signal is sent, a worker thread can get aware of it and abort > itself in time. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KYLIN-3106) DefaultScheduler#shutdown should use shutdownNow instead of shutdown
hongbin ma created KYLIN-3106: - Summary: DefaultScheduler#shutdown should use shutdownNow instead of shutdown Key: KYLIN-3106 URL: https://issues.apache.org/jira/browse/KYLIN-3106 Project: Kylin Issue Type: Bug Reporter: hongbin ma java.util.concurrent.ExecutorService#shutdownNow will interrupt running worker threads, while java.util.concurrent.ExecutorService#shutdown will not. if interrupt signal is sent, a worker thread can get aware of it and abort itself in time. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KYLIN-3106) DefaultScheduler#shutdown should use shutdownNow instead of shutdown
[ https://issues.apache.org/jira/browse/KYLIN-3106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hongbin ma updated KYLIN-3106: -- Fix Version/s: v2.3.0 > DefaultScheduler#shutdown should use shutdownNow instead of shutdown > > > Key: KYLIN-3106 > URL: https://issues.apache.org/jira/browse/KYLIN-3106 > Project: Kylin > Issue Type: Bug >Reporter: hongbin ma > Fix For: v2.3.0 > > > java.util.concurrent.ExecutorService#shutdownNow will interrupt running > worker threads, while java.util.concurrent.ExecutorService#shutdown will not. > if interrupt signal is sent, a worker thread can get aware of it and abort > itself in time. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (KYLIN-2982) Avoid upgrade column in OLAPTable
[ https://issues.apache.org/jira/browse/KYLIN-2982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hongbin ma resolved KYLIN-2982. --- Resolution: Fixed Assignee: hongbin ma > Avoid upgrade column in OLAPTable > - > > Key: KYLIN-2982 > URL: https://issues.apache.org/jira/browse/KYLIN-2982 > Project: Kylin > Issue Type: Improvement >Reporter: hongbin ma >Assignee: hongbin ma >Priority: Normal > Fix For: v2.3.0 > > > before CALCITE-845, to avoid sum(integer_typed_col) to overflow, we worked > around by upgrading all integer columns (which appearing in sum measure ) to > bigint type. The workaround will change the column's type without notifying > users, and will easily lead to code mess. > Now that CALCITE-845 is ready, we can use that to provide a cleaner impl -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KYLIN-2985) Cache temp json file created by each Calcite Connection
[ https://issues.apache.org/jira/browse/KYLIN-2985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hongbin ma updated KYLIN-2985: -- Fix Version/s: v2.3.0 > Cache temp json file created by each Calcite Connection > --- > > Key: KYLIN-2985 > URL: https://issues.apache.org/jira/browse/KYLIN-2985 > Project: Kylin > Issue Type: Improvement >Reporter: hongbin ma >Priority: Normal > Fix For: v2.3.0 > > > In org.apache.kylin.query.schema.OLAPSchemaFactory, each caclite connection > will hold a temp file in JVM. The total number of temp files could accumulate > very large. A simple cache could address the problem -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KYLIN-2985) Cache temp json file created by each Calcite Connection
hongbin ma created KYLIN-2985: - Summary: Cache temp json file created by each Calcite Connection Key: KYLIN-2985 URL: https://issues.apache.org/jira/browse/KYLIN-2985 Project: Kylin Issue Type: Improvement Reporter: hongbin ma Priority: Normal In org.apache.kylin.query.schema.OLAPSchemaFactory, each caclite connection will hold a temp file in JVM. The total number of temp files could accumulate very large. A simple cache could address the problem -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KYLIN-2982) Avoid upgrade column in OLAPTable
[ https://issues.apache.org/jira/browse/KYLIN-2982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hongbin ma updated KYLIN-2982: -- Fix Version/s: v2.3.0 > Avoid upgrade column in OLAPTable > - > > Key: KYLIN-2982 > URL: https://issues.apache.org/jira/browse/KYLIN-2982 > Project: Kylin > Issue Type: Improvement >Reporter: hongbin ma >Priority: Normal > Fix For: v2.3.0 > > > before CALCITE-845, to avoid sum(integer_typed_col) to overflow, we worked > around by upgrading all integer columns (which appearing in sum measure ) to > bigint type. The workaround will change the column's type without notifying > users, and will easily lead to code mess. > Now that CALCITE-845 is ready, we can use that to provide a cleaner impl -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KYLIN-2982) Avoid upgrade column in OLAPTable
hongbin ma created KYLIN-2982: - Summary: Avoid upgrade column in OLAPTable Key: KYLIN-2982 URL: https://issues.apache.org/jira/browse/KYLIN-2982 Project: Kylin Issue Type: Improvement Reporter: hongbin ma Priority: Normal before CALCITE-845, to avoid sum(integer_typed_col) to overflow, we worked around by upgrading all integer columns (which appearing in sum measure ) to bigint type. The workaround will change the column's type without notifying users, and will easily lead to code mess. Now that CALCITE-845 is ready, we can use that to provide a cleaner impl -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KYLIN-2823) Trim TupleFilter after dictionary-based filter optimization
[ https://issues.apache.org/jira/browse/KYLIN-2823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hongbin ma updated KYLIN-2823: -- Fix Version/s: v2.2.0 > Trim TupleFilter after dictionary-based filter optimization > --- > > Key: KYLIN-2823 > URL: https://issues.apache.org/jira/browse/KYLIN-2823 > Project: Kylin > Issue Type: Improvement >Reporter: hongbin ma > Fix For: v2.2.0 > > > with cube's dictionary, kylin will optimize filters like: > ( a = 'value_in_dict' OR a = 'value_not_in_dict') => (a = > 'value_in_dict' OR ConstantTupleFilter.FALSE) > we need to further trim the filter to (a = 'value_in_dict') to avoid too many > children after flatten filter step -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KYLIN-2823) Trim TupleFilter after dictionary-based filter optimization
hongbin ma created KYLIN-2823: - Summary: Trim TupleFilter after dictionary-based filter optimization Key: KYLIN-2823 URL: https://issues.apache.org/jira/browse/KYLIN-2823 Project: Kylin Issue Type: Improvement Reporter: hongbin ma with cube's dictionary, kylin will optimize filters like: ( a = 'value_in_dict' OR a = 'value_not_in_dict') => (a = 'value_in_dict' OR ConstantTupleFilter.FALSE) we need to further trim the filter to (a = 'value_in_dict') to avoid too many children after flatten filter step -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KYLIN-2801) Make default precision and scale in DataType (for hive) configurable
hongbin ma created KYLIN-2801: - Summary: Make default precision and scale in DataType (for hive) configurable Key: KYLIN-2801 URL: https://issues.apache.org/jira/browse/KYLIN-2801 Project: Kylin Issue Type: Improvement Reporter: hongbin ma currently these values are hard coded: {code:java} // FIXME 256 for unknown string precision if ((name.equals("char") || name.equals("varchar")) && precision == -1) { precision = 256; // to save memory at frontend, e.g. tableau will // allocate memory according to this if (name.equals("char")) { precision -= 1; //at most 255 according to https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-CharcharChar } } // FIXME (19,4) for unknown decimal precision if ((name.equals("decimal") || name.equals("numeric")) && precision == -1) { precision = 19; scale = 4; } {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KYLIN-2782) Replace DailyRollingFileAppender with RollingFileAppender to allow log retention
[ https://issues.apache.org/jira/browse/KYLIN-2782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hongbin ma updated KYLIN-2782: -- Fix Version/s: v2.2.0 > Replace DailyRollingFileAppender with RollingFileAppender to allow log > retention > > > Key: KYLIN-2782 > URL: https://issues.apache.org/jira/browse/KYLIN-2782 > Project: Kylin > Issue Type: Task >Reporter: hongbin ma > Fix For: v2.2.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KYLIN-2782) Replace DailyRollingFileAppender with RollingFileAppender to allow log retention
hongbin ma created KYLIN-2782: - Summary: Replace DailyRollingFileAppender with RollingFileAppender to allow log retention Key: KYLIN-2782 URL: https://issues.apache.org/jira/browse/KYLIN-2782 Project: Kylin Issue Type: Task Reporter: hongbin ma -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (KYLIN-1143) cache on partition column's hierarchy parents
[ https://issues.apache.org/jira/browse/KYLIN-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hongbin ma resolved KYLIN-1143. --- Resolution: Won't Fix > cache on partition column's hierarchy parents > - > > Key: KYLIN-1143 > URL: https://issues.apache.org/jira/browse/KYLIN-1143 > Project: Kylin > Issue Type: Sub-task >Reporter: hongbin ma >Assignee: hongbin ma > > currently dynamic cache enforces group by on partition column. in many cases > partition column has a hierarchy, query on the hierarchy should be > optimized for dynamic cache, too. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (KYLIN-1146) reduce the number of objects put into ehcache
[ https://issues.apache.org/jira/browse/KYLIN-1146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hongbin ma resolved KYLIN-1146. --- Resolution: Won't Fix > reduce the number of objects put into ehcache > -- > > Key: KYLIN-1146 > URL: https://issues.apache.org/jira/browse/KYLIN-1146 > Project: Kylin > Issue Type: Sub-task >Reporter: hongbin ma >Assignee: hongbin ma > > echcache will give warning if the K/V has lots of objects, maybe we should > compact the K/V into just 2 objects before putting in -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Issue Comment Deleted] (KYLIN-1143) cache on partition column's hierarchy parents
[ https://issues.apache.org/jira/browse/KYLIN-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hongbin ma updated KYLIN-1143: -- Comment: was deleted (was: will fix it in 2.1 release) > cache on partition column's hierarchy parents > - > > Key: KYLIN-1143 > URL: https://issues.apache.org/jira/browse/KYLIN-1143 > Project: Kylin > Issue Type: Sub-task >Reporter: hongbin ma >Assignee: hongbin ma > > currently dynamic cache enforces group by on partition column. in many cases > partition column has a hierarchy, query on the hierarchy should be > optimized for dynamic cache, too. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (KYLIN-2703) kylin supports managing access rights for project and cube through apache ranger.
[ https://issues.apache.org/jira/browse/KYLIN-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16118096#comment-16118096 ] hongbin ma edited comment on KYLIN-2703 at 8/8/17 9:24 AM: --- hi [~peng.jianhua] I have some questions before merging the patch: 1. About org.apache.kylin.rest.controller.AccessController#getAccessEntities: Before your patch, this method is simple: return the access entry list of a requested domain object. After your patch, Why is it necessary for the API caller to provide a "name" (Is it a must?) and "owner" (Why should API caller provide owner ) parameter? 2. On kylin side, What configurations should users make to take effect? Is there a manual or doc? was (Author: mahongbin): hi [~peng.jianhua] I have some questions before merging the patch: 1. About org.apache.kylin.rest.controller.AccessController#getAccessEntities: Before your patch, this method is simple: return the access entry list of a requested domain object. After your patch, Why is it necessary for the API caller to provide a "name" (Is it a must?) and "owner" (Why should API caller provide owner ) parameter? 2. What configurations should users make to use Ranger? Is there a manual or doc? > kylin supports managing access rights for project and cube through apache > ranger. > - > > Key: KYLIN-2703 > URL: https://issues.apache.org/jira/browse/KYLIN-2703 > Project: Kylin > Issue Type: New Feature > Components: General >Reporter: peng.jianhua >Assignee: peng.jianhua > Labels: newbie, patch > Attachments: > 0001-KYLIN-2703-kylin-supports-managing-access-rights-for.patch, > KylinAuditLog.jpg, KylinPlugins.jpg, KylinPolicies.jpg, > KylinServiceEntry.jpg, NewKylinPolicy.jpg, NewKylinService.jpg, > Ranger-PMS-hope.png > > > Ranger is a framework to enable, monitor and manage comprehensive data > security across the Hadoop platform. Apache Ranger has the following goals: > 1. Centralized security administration to manage all security related tasks > in a central UI or using REST APIs. > 2. Fine grained authorization to do a specific action and/or operation with > Hadoop component/tool and managed through a central administration tool > 3. Standardize authorization method across all Hadoop components. > 4. Enhanced support for different authorization methods - Role based access > control, attribute based access control etc. > 5. Centralize auditing of user access and administrative actions (security > related) within all the components of Hadoop. > Ranger has supported enable, monitor and manage following components: > 1. HDFS > 2. HIVE > 3. HBASE > 4. KNOX > 5. YARN > 6. STORM > 7. SOLR > 8. KAFKA > 9. ATLAS > In order to improve the flexibility of kylin privilege control and enhance > value of kylin in the Apache Hadoop ecosystem, like hdfs, yarn, hive, hbase, > Kylin should also support that using Ranger to control access rights for > project and cube. > Specific implementation plan is as following: > On the ranger website, administrators can configure policies to control user > access to projects and cube permissions. > Kylin provides an abstract class and authorization interfaces for use by the > ranger plugin. kylin instantiates ranger plugin’s implementation class when > starting(this class extends the abstract class provided by kylin). > Ranger plugin periodically polls ranger admin, updates the policy to the > local, and updates project and cube access rights based on policy information. > In the Kylin side: > 1. Kylin provides an abstract class that enables the ranger plugin's > implementation class to extend. > 2. Add configuration item. 1) ranger authorization switch, 2) ranger plugin > implementation class's name. > 3. Instantiate the ranger plugin implementation class when starting kylin. > 4. kylin provides authorization interfaces for ranger plugin calls. > 5. According to the ranger authorization configuration item, hide kylin's > authorization management page. > 6. Using ranger manager access rights of the kylin does not affect kylin's > existing permissions functions and logic. > In the Ranger side: > 1. Ranger plugin will periodically polls ranger admin, updates the policy to > the local. > 2. The ranger plugin invoking the authorization interfaces provided by kylin > to updates the project and cube access rights based on the policy information. > reference link:https://issues.apache.org/jira/browse/RANGER-1672 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KYLIN-2703) kylin supports managing access rights for project and cube through apache ranger.
[ https://issues.apache.org/jira/browse/KYLIN-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16118096#comment-16118096 ] hongbin ma commented on KYLIN-2703: --- hi [~peng.jianhua] I have some questions before merging the patch: 1. About org.apache.kylin.rest.controller.AccessController#getAccessEntities: Before your patch, this method is simple: return the access entry list of a requested domain object. After your patch, Why is it necessary for the API caller to provide a "name" (Is it a must?) and "owner" (Why should API caller provide owner ) parameter? 2. What configurations should users make to use Ranger? Is there a manual or doc? > kylin supports managing access rights for project and cube through apache > ranger. > - > > Key: KYLIN-2703 > URL: https://issues.apache.org/jira/browse/KYLIN-2703 > Project: Kylin > Issue Type: New Feature > Components: General >Reporter: peng.jianhua >Assignee: peng.jianhua > Labels: newbie, patch > Attachments: > 0001-KYLIN-2703-kylin-supports-managing-access-rights-for.patch, > KylinAuditLog.jpg, KylinPlugins.jpg, KylinPolicies.jpg, > KylinServiceEntry.jpg, NewKylinPolicy.jpg, NewKylinService.jpg, > Ranger-PMS-hope.png > > > Ranger is a framework to enable, monitor and manage comprehensive data > security across the Hadoop platform. Apache Ranger has the following goals: > 1. Centralized security administration to manage all security related tasks > in a central UI or using REST APIs. > 2. Fine grained authorization to do a specific action and/or operation with > Hadoop component/tool and managed through a central administration tool > 3. Standardize authorization method across all Hadoop components. > 4. Enhanced support for different authorization methods - Role based access > control, attribute based access control etc. > 5. Centralize auditing of user access and administrative actions (security > related) within all the components of Hadoop. > Ranger has supported enable, monitor and manage following components: > 1. HDFS > 2. HIVE > 3. HBASE > 4. KNOX > 5. YARN > 6. STORM > 7. SOLR > 8. KAFKA > 9. ATLAS > In order to improve the flexibility of kylin privilege control and enhance > value of kylin in the Apache Hadoop ecosystem, like hdfs, yarn, hive, hbase, > Kylin should also support that using Ranger to control access rights for > project and cube. > Specific implementation plan is as following: > On the ranger website, administrators can configure policies to control user > access to projects and cube permissions. > Kylin provides an abstract class and authorization interfaces for use by the > ranger plugin. kylin instantiates ranger plugin’s implementation class when > starting(this class extends the abstract class provided by kylin). > Ranger plugin periodically polls ranger admin, updates the policy to the > local, and updates project and cube access rights based on policy information. > In the Kylin side: > 1. Kylin provides an abstract class that enables the ranger plugin's > implementation class to extend. > 2. Add configuration item. 1) ranger authorization switch, 2) ranger plugin > implementation class's name. > 3. Instantiate the ranger plugin implementation class when starting kylin. > 4. kylin provides authorization interfaces for ranger plugin calls. > 5. According to the ranger authorization configuration item, hide kylin's > authorization management page. > 6. Using ranger manager access rights of the kylin does not affect kylin's > existing permissions functions and logic. > In the Ranger side: > 1. Ranger plugin will periodically polls ranger admin, updates the policy to > the local. > 2. The ranger plugin invoking the authorization interfaces provided by kylin > to updates the project and cube access rights based on the policy information. > reference link:https://issues.apache.org/jira/browse/RANGER-1672 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KYLIN-2706) Fix the bug for the comparator in SortedIteratorMergerWithLimit
[ https://issues.apache.org/jira/browse/KYLIN-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16116291#comment-16116291 ] hongbin ma commented on KYLIN-2706: --- reviewed > Fix the bug for the comparator in SortedIteratorMergerWithLimit > --- > > Key: KYLIN-2706 > URL: https://issues.apache.org/jira/browse/KYLIN-2706 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v2.0.0 >Reporter: kangkaisen >Assignee: kangkaisen > Attachments: KYLIN-2706.patch > > > For this SQL, which should disable Storage limit push. Because this SQL will > return more than one record from HBase tables, but the > SortedIteratorMergerWithLimit only return one record, which will get wrong > result. > {code:java} > SELECT sum(A) > FROM TABLE > WHERE date_id >= 20170624 and date_id <= 20170626 > limit 1 > {code} > We should disable Storage limit push down when singleValuesD doesn't > containsAll othersD -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KYLIN-2606) Only return counter for precise count_distinct if query is exactAggregate
[ https://issues.apache.org/jira/browse/KYLIN-2606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16116289#comment-16116289 ] hongbin ma commented on KYLIN-2606: --- patch reviewed > Only return counter for precise count_distinct if query is exactAggregate > - > > Key: KYLIN-2606 > URL: https://issues.apache.org/jira/browse/KYLIN-2606 > Project: Kylin > Issue Type: Improvement > Components: Query Engine >Affects Versions: v2.0.0 >Reporter: kangkaisen >Assignee: kangkaisen > > If the query is exactAggregation and has some memory hungry measures, we > could directly return final result to speed up the query , reduce the RPC > data size and memory usage in queryServer. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KYLIN-2776) Using dropwizard as default metric framework
[ https://issues.apache.org/jira/browse/KYLIN-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hongbin ma updated KYLIN-2776: -- Description: With https://issues.apache.org/jira/browse/KYLIN-2721.We are plan to release a new metric framework. New metric is different hadoop metric and based on dropwizard . which has the following advantage: * Well-defined metric model for frequently-needed metrics (ie JVM metrics) * Well-defined measurements for all metrics (ie max, mean, stddev, mean_rate, etc), * Built-in pluggable reporting frameworks like JMX, Console, Log, JSON We refactored QueryMetric with new metrics, notice the exposed JMX MBeans have changed a little bit. A new tool called perflog is also introduced. Perflog traces call duration time and current active calls by recording them to metric system. Some snapshots of the new JMX MBeans can be seen in attachments was: With https://issues.apache.org/jira/browse/KYLIN-2721.We are plan to release a new metric framework. New metric is different hadoop metric and based on dropwizard . which has the following advantage: * Well-defined metric model for frequently-needed metrics (ie JVM metrics) * Well-defined measurements for all metrics (ie max, mean, stddev, mean_rate, etc), * Built-in pluggable reporting frameworks like JMX, Console, Log, JSON We refactor QueryMetric with new metris. New metric add perflog. Perflog trace calls duration time and current active calls by recording them to metric system. Attachment is the difference between the two metric system . > Using dropwizard as default metric framework > > > Key: KYLIN-2776 > URL: https://issues.apache.org/jira/browse/KYLIN-2776 > Project: Kylin > Issue Type: New Feature >Affects Versions: v2.0.0 >Reporter: yiming.xu >Assignee: yiming.xu > Attachments: active_calls.png, calls.png, KYLIN-2776.patch, > metric_structure.png, query_count.png, query_duration.png, > query_result_rowcount.png, report.json > > > With https://issues.apache.org/jira/browse/KYLIN-2721.We are plan to release > a new metric framework. > New metric is different hadoop metric and based on dropwizard . which has > the following advantage: > * Well-defined metric model for frequently-needed metrics (ie JVM metrics) > * Well-defined measurements for all metrics (ie max, mean, stddev, > mean_rate, etc), > * Built-in pluggable reporting frameworks like JMX, Console, Log, JSON > We refactored QueryMetric with new metrics, notice the exposed JMX MBeans > have changed a little bit. > A new tool called perflog is also introduced. Perflog traces call duration > time and current active calls by recording them to metric system. > Some snapshots of the new JMX MBeans can be seen in attachments -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KYLIN-2776) Using dropwizard as default metric framework
[ https://issues.apache.org/jira/browse/KYLIN-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hongbin ma updated KYLIN-2776: -- Summary: Using dropwizard as default metric framework (was: New metric framework with kylin) > Using dropwizard as default metric framework > > > Key: KYLIN-2776 > URL: https://issues.apache.org/jira/browse/KYLIN-2776 > Project: Kylin > Issue Type: New Feature >Affects Versions: v2.0.0 >Reporter: yiming.xu >Assignee: yiming.xu > Attachments: active_calls.png, calls.png, metric_structure.png, > query_count.png, query_duration.png, query_result_rowcount.png, report.json > > > With https://issues.apache.org/jira/browse/KYLIN-2721.We are plan to release > a new metric framework. > New metric is different hadoop metric and based on dropwizard . which has > the following advantage: > * Well-defined metric model for frequently-needed metrics (ie JVM metrics) > * Well-defined measurements for all metrics (ie max, mean, stddev, > mean_rate, etc), > * Built-in pluggable reporting frameworks like JMX, Console, Log, JSON > We refactor QueryMetric with new metris. > New metric add perflog. Perflog trace calls duration time and current > active calls by recording them to metric system. > Attachment is the difference between the two metric system . -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KYLIN-2776) New metric framework with kylin
[ https://issues.apache.org/jira/browse/KYLIN-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hongbin ma updated KYLIN-2776: -- Description: With https://issues.apache.org/jira/browse/KYLIN-2721.We are plan to release a new metric framework. New metric is different hadoop metric and based on dropwizard . which has the following advantage: * Well-defined metric model for frequently-needed metrics (ie JVM metrics) * Well-defined measurements for all metrics (ie max, mean, stddev, mean_rate, etc), * Built-in pluggable reporting frameworks like JMX, Console, Log, JSON We refactor QueryMetric with new metris. New metric add perflog. Perflog trace calls duration time and current active calls by recording them to metric system. Attachment is the difference between the two metric system . was: With https://issues.apache.org/jira/browse/KYLIN-2721.We are plan to release a new metric framework. New metric is different hadoop metric and based on dropwizard . which has the following advantage: * Well-defined metric model for frequently-needed metrics (ie JVM metrics) * Well-defined measurements for all metrics (ie max, mean, stddev, mean_rate, etc), * Built-in pluggable reporting frameworks like JMX, Console, Log, JSON We refactor QueryMetric with new metris. New metric add perflog. Perflog trace calls duration time and current active calls record to metric system. Attachment is the difference between the two metric system . > New metric framework with kylin > --- > > Key: KYLIN-2776 > URL: https://issues.apache.org/jira/browse/KYLIN-2776 > Project: Kylin > Issue Type: New Feature >Affects Versions: v2.0.0 >Reporter: yiming.xu >Assignee: yiming.xu > Attachments: active_calls.png, calls.png, metric_structure.png, > query_count.png, query_duration.png, query_result_rowcount.png, report.json > > > With https://issues.apache.org/jira/browse/KYLIN-2721.We are plan to release > a new metric framework. > New metric is different hadoop metric and based on dropwizard . which has > the following advantage: > * Well-defined metric model for frequently-needed metrics (ie JVM metrics) > * Well-defined measurements for all metrics (ie max, mean, stddev, > mean_rate, etc), > * Built-in pluggable reporting frameworks like JMX, Console, Log, JSON > We refactor QueryMetric with new metris. > New metric add perflog. Perflog trace calls duration time and current > active calls by recording them to metric system. > Attachment is the difference between the two metric system . -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KYLIN-2653) Spark cubing support HBase cluster with kerberos
[ https://issues.apache.org/jira/browse/KYLIN-2653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hongbin ma updated KYLIN-2653: -- Fix Version/s: (was: v2.1.0) v2.2.0 > Spark cubing support HBase cluster with kerberos > > > Key: KYLIN-2653 > URL: https://issues.apache.org/jira/browse/KYLIN-2653 > Project: Kylin > Issue Type: Bug > Components: Spark Engine >Affects Versions: v2.0.0 >Reporter: kangkaisen >Assignee: kangkaisen > Fix For: v2.2.0 > > > Currently, Spark cubing doesn't support HBase cluster with kerberos. > Temporarily,we could support HBase cluster with kerberos on Yarn client mode, > because which is easy. > In the long term,we should avoid access HBase in Spark cubing. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Reopened] (KYLIN-2720) Should not allow user to access to all tables' metadata of a project
[ https://issues.apache.org/jira/browse/KYLIN-2720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hongbin ma reopened KYLIN-2720: --- > Should not allow user to access to all tables' metadata of a project > > > Key: KYLIN-2720 > URL: https://issues.apache.org/jira/browse/KYLIN-2720 > Project: Kylin > Issue Type: Improvement >Reporter: qiumingming >Assignee: qiumingming > Fix For: v2.1.0 > > Attachments: KYLIN-2720.patch > > > Currently, user can access to all tables and columns metadata of a specific > project as long as he can access to this project, which is not reasonable. > User should just allow to access to tables that he owned cubes dependent to. > However, user can see some other tables in the web UI in current version. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KYLIN-2720) Should not allow user to access to all tables' metadata of a project
[ https://issues.apache.org/jira/browse/KYLIN-2720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16108291#comment-16108291 ] hongbin ma commented on KYLIN-2720: --- Hi [~qmm] I'm afraid with https://issues.apache.org/jira/browse/KYLIN-2515 and https://issues.apache.org/jira/browse/KYLIN-2646 being added into kylin 2.1, we have to revert KYLIN-2720 as it conflicts with above issues. We're refining the authorization process recently. Discussions will be carried on in mail list or JIRA, please get informed > Should not allow user to access to all tables' metadata of a project > > > Key: KYLIN-2720 > URL: https://issues.apache.org/jira/browse/KYLIN-2720 > Project: Kylin > Issue Type: Improvement >Reporter: qiumingming >Assignee: qiumingming > Fix For: v2.1.0 > > Attachments: KYLIN-2720.patch > > > Currently, user can access to all tables and columns metadata of a specific > project as long as he can access to this project, which is not reasonable. > User should just allow to access to tables that he owned cubes dependent to. > However, user can see some other tables in the web UI in current version. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (KYLIN-2646) Project level query authorization
[ https://issues.apache.org/jira/browse/KYLIN-2646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hongbin ma resolved KYLIN-2646. --- Resolution: Fixed Fix Version/s: v2.1.0 > Project level query authorization > - > > Key: KYLIN-2646 > URL: https://issues.apache.org/jira/browse/KYLIN-2646 > Project: Kylin > Issue Type: Improvement >Reporter: hongbin ma >Assignee: hongbin ma > Fix For: v2.1.0 > > > As we introduced ad-hoc queries in > https://issues.apache.org/jira/browse/KYLIN-2515, we'll need to adjust query > authorization as follows: > Query authorization is encouraged to be set as project level. If someone is > assigned READ permission on project, then he has access to query all tables > in the project, regardless thru adhoc or cubes > If a user has READ permission on cubes but no READ permission on project. He > can only issue queries only if the query can be satisfied by those cubes he > has READ permission. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KYLIN-2755) Kylin support hive and hbase authenticated with Kerberos
[ https://issues.apache.org/jira/browse/KYLIN-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104310#comment-16104310 ] hongbin ma commented on KYLIN-2755: --- Hi [~wuyingjun] the implementation has some problems: 1. Hive Kerberoes and HBase Kerbereros may race on "private static UserGroupInformation loginUser = null;" in UserGroupInformation.java 2. Cannot deal with isolate hive/hbase permissions at project level, thus finer permission control is impossible. we're considering to introduce a way to allow "impersonation" (something like http://blog.bcmeng.com/post/kylin-hadoop-queue.html), which I think is a better solution to tackle your problem > Kylin support hive and hbase authenticated with Kerberos > > > Key: KYLIN-2755 > URL: https://issues.apache.org/jira/browse/KYLIN-2755 > Project: Kylin > Issue Type: New Feature >Affects Versions: v2.0.0 >Reporter: wuyingjun >Assignee: wuyingjun > Attachments: code modify.png, KYLIN-2755.patch > > > I want to know how to integrate the kylin into hive datasource and hbase > storage with kerberos. > I have used hive beeline and modifid the hbase configuration initialization > in the source code. > Can the current kylin version support kerberos environment a a better way in > mapreduce cubing? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KYLIN-2721) Introduce a new metrics framework based on dropwizard metrics
[ https://issues.apache.org/jira/browse/KYLIN-2721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102651#comment-16102651 ] hongbin ma commented on KYLIN-2721: --- [~yaho] please try to keep up with latest code. Consider kylin master branch or 2.1.x branch > Introduce a new metrics framework based on dropwizard metrics > - > > Key: KYLIN-2721 > URL: https://issues.apache.org/jira/browse/KYLIN-2721 > Project: Kylin > Issue Type: New Feature >Affects Versions: v2.0.0 >Reporter: Zhong Yanghong >Assignee: Zhong Yanghong > Attachments: Metrics Framework.png > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (KYLIN-2721) Introduce a new metrics framework based on dropwizard metrics
[ https://issues.apache.org/jira/browse/KYLIN-2721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16101592#comment-16101592 ] hongbin ma edited comment on KYLIN-2721 at 7/26/17 12:11 PM: - dropwizard, yammer, codehale is talking about the same thing (http://ningg.top/yammer-metrics/) let's stop confusing the term. [~yaho] I still have three questions. 1. You claimed hadoop metrics is less stable, do you have any published evidence?\ 2. You claimed codehale is lightweight, but as I calculated hadoop metrics has only ~15000 lines of java code (including test), while latest codehale project (https://github.com/dropwizard/metrics.git) has ~23000 lines of java code. "Lightweight" does not favor codehale 3. Although your proposal serves different purpose with existing QueryMetrics, we still need to avoid two metrics frameworks. That said, If we decide to choose dropwizard, we need to migrate QueryMetrics to use dropwizard ASAP was (Author: mahongbin): dropwizard, yammer, codehale is talking about the same thing (http://ningg.top/yammer-metrics/) let's stop confusing the term. [~yaho] I still have two questions. 1. You claimed hadoop metrics is less stable, do you have any published evidence? 2. Although your proposal serves different purpose with existing QueryMetrics, we still need to avoid two metrics frameworks. That said, If we decide to choose dropwizard, we need to migrate QueryMetrics to use dropwizard ASAP > Introduce a new metrics framework based on dropwizard metrics > - > > Key: KYLIN-2721 > URL: https://issues.apache.org/jira/browse/KYLIN-2721 > Project: Kylin > Issue Type: New Feature >Affects Versions: v2.0.0 >Reporter: Zhong Yanghong >Assignee: Zhong Yanghong > Attachments: Metrics Framework.png > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KYLIN-2721) Introduce a new metrics framework based on dropwizard metrics
[ https://issues.apache.org/jira/browse/KYLIN-2721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16101592#comment-16101592 ] hongbin ma commented on KYLIN-2721: --- dropwizard, yammer, codehale is talking about the same thing (http://ningg.top/yammer-metrics/) let's stop confusing the term. [~yaho] I still have two questions. 1. You claimed hadoop metrics is less stable, do you have any published evidence? 2. Although your proposal serves different purpose with existing QueryMetrics, we still need to avoid two metrics frameworks. That said, If we decide to choose dropwizard, we need to migrate QueryMetrics to use dropwizard ASAP > Introduce a new metrics framework based on dropwizard metrics > - > > Key: KYLIN-2721 > URL: https://issues.apache.org/jira/browse/KYLIN-2721 > Project: Kylin > Issue Type: New Feature >Affects Versions: v2.0.0 >Reporter: Zhong Yanghong >Assignee: Zhong Yanghong > Attachments: Metrics Framework.png > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KYLIN-2671) Speed up prepared query execution
[ https://issues.apache.org/jira/browse/KYLIN-2671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16053570#comment-16053570 ] hongbin ma commented on KYLIN-2671: --- please check org.apache.kylin.jdbc.KylinConnection#mockPreparedSignature, currently the result of Kylin JDBC's java.sql.Connection#prepareStatement(java.lang.String) method is mocked, which means: You get a mocked statement metadata from prepareStatement before you call java.sql.PreparedStatement#executeQuery(), it's okay if you use JDBC in following way: {code:java} PreparedStatement statement = conn.prepareStatement("select LSTG_FORMAT_NAME, sum(price) as GMV, count(1) as TRANS_CNT from test_kylin_fact " + "where LSTG_FORMAT_NAME = ? group by LSTG_FORMAT_NAME"); statement.setString(1, "FP-GTC"); ResultSet rs = statement.executeQuery(); {code} however it's not okay to: {code:java} PreparedStatement statement = conn.prepareStatement("select LSTG_FORMAT_NAME, sum(price) as GMV, count(1) as TRANS_CNT from test_kylin_fact " + "where LSTG_FORMAT_NAME = ? group by LSTG_FORMAT_NAME"); ResultSetMetaData metaData = statement.getMetaData(); // do something with metaData here will be problematical because metaData is merely a mock statement.setString(1, "FP-GTC"); ResultSet rs = statement.executeQuery(); {code} > Speed up prepared query execution > - > > Key: KYLIN-2671 > URL: https://issues.apache.org/jira/browse/KYLIN-2671 > Project: Kylin > Issue Type: Improvement >Reporter: hongbin ma > > BI tools use prepared query for function probing, kylin should not execute > such queries in standard way because it is too costly. > It's still worth mentioning standard "prepare-bindparameter-execute" way of > PreparedStatement is still not supported. By now kylin only support Prepared > Statements WITHOUT parameters. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KYLIN-2670) CASE WHEN supporting problem in kylin2.0
[ https://issues.apache.org/jira/browse/KYLIN-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16052736#comment-16052736 ] hongbin ma commented on KYLIN-2670: --- can you try to reproduce this issue with sample cube http://kylin.apache.org/docs20/tutorial/kylin_sample.html? > CASE WHEN supporting problem in kylin2.0 > > > Key: KYLIN-2670 > URL: https://issues.apache.org/jira/browse/KYLIN-2670 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v2.0.0 >Reporter: zhou degao >Assignee: liyang > > Following query failed in kylin 2.0 but succeeded in kylin 1.6 > select "fact_pv_data_alias"."PRODUCT_NAME" as "c0", > "fact_pv_data_alias"."PLATFORM" as "c1" from "CSDNBI"."FACT_PV_DATA" as > "fact_pv_data_alias" group by "fact_pv_data_alias"."PRODUCT_NAME", > "fact_pv_data_alias"."PLATFORM" order by CASE WHEN > "fact_pv_data_alias"."PRODUCT_NAME" IS NULL THEN 1 ELSE 0 END, > "fact_pv_data_alias"."PRODUCT_NAME" ASC, CASE WHEN > "fact_pv_data_alias"."PLATFORM" IS NULL THEN 1 ELSE 0 END, > "fact_pv_data_alias"."PLATFORM" ASC > Reported error in kylin 2.0: > Error while executing SQL "select "fact_pv_data_alias"."PRODUCT_NAME" as > "c0", "fact_pv_data_alias"."PLATFORM" as "c1" from "CSDNBI"."FACT_PV_DATA" as > "fact_pv_data_alias" group by "fact_pv_data_alias"."PRODUCT_NAME", > "fact_pv_data_alias"."PLATFORM" order by CASE WHEN > "fact_pv_data_alias"."PRODUCT_NAME" IS NULL THEN 1 ELSE 0 END, > "fact_pv_data_alias"."PRODUCT_NAME" ASC, CASE WHEN > "fact_pv_data_alias"."PLATFORM" IS NULL THEN 1 ELSE 0 END, > "fact_pv_data_alias"."PLATFORM" ASC LIMIT 5": index (2) must be less than > size (2) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KYLIN-2673) Should allow user to change fact table as long as the cube is disable
[ https://issues.apache.org/jira/browse/KYLIN-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16052719#comment-16052719 ] hongbin ma commented on KYLIN-2673: --- hi kaisen Will disabled cubes with segments be an issue? > Should allow user to change fact table as long as the cube is disable > - > > Key: KYLIN-2673 > URL: https://issues.apache.org/jira/browse/KYLIN-2673 > Project: Kylin > Issue Type: Bug > Components: Web >Affects Versions: v2.0.0 >Reporter: kangkaisen >Assignee: kangkaisen > Fix For: v2.1.0 > > > Currently, user couldn't change fact table even though the cube is disable, > which isn't reasonable. We should allow user to change fact table as long as > the cube is disable. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KYLIN-2671) Speed up prepared query execution
hongbin ma created KYLIN-2671: - Summary: Speed up prepared query execution Key: KYLIN-2671 URL: https://issues.apache.org/jira/browse/KYLIN-2671 Project: Kylin Issue Type: Improvement Reporter: hongbin ma BI tools use prepared query for function probing, kylin should not execute such queries in standard way because it is too costly. It's still worth mentioning standard "prepare-bindparameter-execute" way of PreparedStatement is still not supported. By now kylin only support Prepared Statements WITHOUT parameters. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KYLIN-2667) Ignore whitespace when caching query
hongbin ma created KYLIN-2667: - Summary: Ignore whitespace when caching query Key: KYLIN-2667 URL: https://issues.apache.org/jira/browse/KYLIN-2667 Project: Kylin Issue Type: Improvement Reporter: hongbin ma Assignee: hongbin ma -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KYLIN-2659) Refactor KylinConfig so that all the default configurations are hidden in kylin-defaults.properties
hongbin ma created KYLIN-2659: - Summary: Refactor KylinConfig so that all the default configurations are hidden in kylin-defaults.properties Key: KYLIN-2659 URL: https://issues.apache.org/jira/browse/KYLIN-2659 Project: Kylin Issue Type: Improvement Reporter: hongbin ma Assignee: hongbin ma Currently we ship a conf/kylin.properties file with a lot of configuration overrides. This is not a standard approach compared with other projects like hadoop or spark. It's better to have a kylin-defaults.properties file to hide all the default configurations, users will only have to override necessary configurations in a blank kylin.properties. After the refactor, a config might be override by the following precedence: 1. KV in kylin.properties.override, which is more of a "secret feature", never documented. 2. KV in kylin.properties, users are suggested to override configs here 3. KV in kylin-defaults.properties, readonly to users 4. KV in KylinConfigBase, readonly to users The refactor will be backward compatible -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KYLIN-2646) Project level query authorization
hongbin ma created KYLIN-2646: - Summary: Project level query authorization Key: KYLIN-2646 URL: https://issues.apache.org/jira/browse/KYLIN-2646 Project: Kylin Issue Type: Improvement Reporter: hongbin ma Assignee: hongbin ma As we introduced ad-hoc queries in https://issues.apache.org/jira/browse/KYLIN-2515, we'll need to adjust query authorization as follows: Query authorization is encouraged to be set as project level. If someone is assigned READ permission on project, then he has access to query all tables in the project, regardless thru adhoc or cubes If a user has READ permission on cubes but no READ permission on project. He can only issue queries only if the query can be satisfied by those cubes he has READ permission. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KYLIN-2636) optimize case when in group by
hongbin ma created KYLIN-2636: - Summary: optimize case when in group by Key: KYLIN-2636 URL: https://issues.apache.org/jira/browse/KYLIN-2636 Project: Kylin Issue Type: Improvement Reporter: hongbin ma Assignee: hongbin ma Similar to KYLIN-2635, for clauses like: {code} group by case when 1 = 1 then x 1 = 2 then y else z {code} kylin only need to pick up x as grouping by column. Again, like KYLIN-2635, we'll fix it in KYLIN rather than calcite first -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (KYLIN-2635) optimize determined case when filters
[ https://issues.apache.org/jira/browse/KYLIN-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hongbin ma updated KYLIN-2635: -- Issue Type: Improvement (was: Bug) > optimize determined case when filters > -- > > Key: KYLIN-2635 > URL: https://issues.apache.org/jira/browse/KYLIN-2635 > Project: Kylin > Issue Type: Improvement >Reporter: hongbin ma >Assignee: hongbin ma > > currently calcite will not handle with determined filter like: > 1. where 1 = 1 => where true > 2. where ( 1= 1 or x = 2) => where true > 3. where case when 'a' = 'a' then x > 1 else x < 1 => where x > 1 > the first two cases have been handled in KYLIN-2539, however the third case > is not handled yet. This JIRA is to track the third case. > In theory, this JIRA together with KYLIN-2539, KYLIN-2597 should be solved in > calcite rather than KYLIN. However it's urgent demand so we'll fix in KYLIN > first. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KYLIN-2635) optimize determined case when filters
hongbin ma created KYLIN-2635: - Summary: optimize determined case when filters Key: KYLIN-2635 URL: https://issues.apache.org/jira/browse/KYLIN-2635 Project: Kylin Issue Type: Bug Reporter: hongbin ma Assignee: hongbin ma currently calcite will not handle with determined filter like: 1. where 1 = 1 => where true 2. where ( 1= 1 or x = 2) => where true 3. where case when 'a' = 'a' then x > 1 else x < 1 => where x > 1 the first two cases have been handled in KYLIN-2539, however the third case is not handled yet. This JIRA is to track the third case. In theory, this JIRA together with KYLIN-2539, KYLIN-2597 should be solved in calcite rather than KYLIN. However it's urgent demand so we'll fix in KYLIN first. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KYLIN-2631) Seek to next model when no cube in current model satisfies query
hongbin ma created KYLIN-2631: - Summary: Seek to next model when no cube in current model satisfies query Key: KYLIN-2631 URL: https://issues.apache.org/jira/browse/KYLIN-2631 Project: Kylin Issue Type: Bug Reporter: hongbin ma Assignee: hongbin ma ModelChooser is introduced in 2.0 to match JoinTree in query with JoinTree in model. Currently, we first use ModelChooser to decide the model, then choose cube from the selected model. The cubes in other models are never considered. Chances are there when selected model cannot provide capable cube while non-selected model can. So it's still necessary go through all models -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KYLIN-2625) not null filter clause should be evaluable in storage
hongbin ma created KYLIN-2625: - Summary: not null filter clause should be evaluable in storage Key: KYLIN-2625 URL: https://issues.apache.org/jira/browse/KYLIN-2625 Project: Kylin Issue Type: Bug Reporter: hongbin ma Assignee: hongbin ma currently, limit push down is not enabled for queries like {code:sql} select * from ( select * from test_kylin_fact where lstg_format_name is not null ) limit 20 {code} because "not null" is treated as un-evaluateable. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (KYLIN-2599) select * in subquery fail due to bug in hackSelectStar
[ https://issues.apache.org/jira/browse/KYLIN-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16004116#comment-16004116 ] hongbin ma edited comment on KYLIN-2599 at 5/10/17 5:49 AM: similar exception is thrown when I test query like: {code:sql} select lstg_format_name from test_kylin_fact order by case when 1=1 then cal_dt ELSE seller_id end {code} To fix the issue more completely, we'll: 1. check if rootProj has any field starting with "_KY_", if none is found, then current logical plan does not require hacking, org.apache.calcite.sql2rel.SqlToRelConverter#hackSelectStar should abort by returning root. 2. the field list of rootProj may be longer than that of root (in case of kylin-it/src/test/resources/query/sql_verifyCount/query10.sql), so when constructing validatedRowType we'll skip the longer tail on rootProj 3. sort rel's RelCollation (if any) may become stale after removing the "_KY_" fields, need to fix its fieldIndex was (Author: mahongbin): similar exception is thrown when I test query like: {code:sql} select lstg_format_name from test_kylin_fact order by case when 1=1 then cal_dt ELSE seller_id end {code} To fix the issue more completely, we'll: 1. check if rootProj has any field starting with "_KY_", if none is found, then current logical plan does not require hacking, org.apache.calcite.sql2rel.SqlToRelConverter#hackSelectStar should abort by returning root. 2. the field list of rootProj may be longer than that of root (in case of kylin-it/src/test/resources/query/sql_verifyCount/query10.sql), so when constructing validatedRowType we'll skip the longer tail on rootProj 3. sort rel's RelCollation (if any) may become stale after removing the "_KY_" fields, need to fix its fieldIndex > select * in subquery fail due to bug in hackSelectStar > --- > > Key: KYLIN-2599 > URL: https://issues.apache.org/jira/browse/KYLIN-2599 > Project: Kylin > Issue Type: Improvement >Reporter: hongbin ma > > {code:sql} > select fact.lstg_format_name from > > (select * from test_kylin_fact where cal_dt > date'2010-01-01' ) as fact > > group by fact.lstg_format_name > > order by CASE WHEN fact.lstg_format_name IS NULL THEN 'sdf' ELSE > fact.lstg_format_name END > > {code} > will generate logical plan like: > {code} > LogicalSort(sort0=[$1], dir0=[ASC]) > LogicalProject(LSTG_FORMAT_NAME=[$0], EXPR$1=[CASE(IS NULL($0), 'sdf', $0)]) > LogicalAggregate(group=[{0}]) > LogicalProject(LSTG_FORMAT_NAME=[$3]) > LogicalProject(TRANS_ID=[$0], ORDER_ID=[$1], CAL_DT=[$2], > LSTG_FORMAT_NAME=[$3], LEAF_CATEG_ID=[$4], LSTG_SITE_ID=[$5], > SLR_SEGMENT_CD=[$6], SELLER_ID=[$7], PRICE=[$8], ITEM_COUNT=[$9], > TEST_COUNT_DISTINCT_BITMAP=[$10], DEAL_AMOUNT=[$11], DEAL_YEAR=[$12], > _KY_COUNT__=[$13], _KY_MIN_TEST_KYLIN_FACT_PRICE_=[$14], > _KY_MAX_TEST_KYLIN_FACT_PRICE_=[$15], > _KY_COUNT_DISTINCT_TEST_KYLIN_FACT_SELLER_ID_=[$16], > _KY_COUNT_DISTINCT_TEST_KYLIN_FACT_LSTG_FORMAT_NAME_TEST_KYLIN_FACT_SELLER_ID_=[$17], > _KY_COUNT_DISTINCT_TEST_KYLIN_FACT_TEST_COUNT_DISTINCT_BITMAP_=[$18], > _KY_PERCENTILE_TEST_KYLIN_FACT_PRICE_=[$19]) > LogicalFilter(condition=[>($2, 2010-01-01)]) > OLAPTableScan(table=[[DEFAULT, TEST_KYLIN_FACT]], fields=[[0, 1, > 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]]) > {code} > org.apache.calcite.sql2rel.SqlToRelConverter#hackSelectStar will by mistake > treat it like a normal case and lead to throwing exception -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KYLIN-2599) select * in subquery fail due to bug in hackSelectStar
hongbin ma created KYLIN-2599: - Summary: select * in subquery fail due to bug in hackSelectStar Key: KYLIN-2599 URL: https://issues.apache.org/jira/browse/KYLIN-2599 Project: Kylin Issue Type: Improvement Reporter: hongbin ma {code:sql} select fact.lstg_format_name from (select * from test_kylin_fact where cal_dt > date'2010-01-01' ) as fact group by fact.lstg_format_name order by CASE WHEN fact.lstg_format_name IS NULL THEN 'sdf' ELSE fact.lstg_format_name END {code} will generate logical plan like: {code} LogicalSort(sort0=[$1], dir0=[ASC]) LogicalProject(LSTG_FORMAT_NAME=[$0], EXPR$1=[CASE(IS NULL($0), 'sdf', $0)]) LogicalAggregate(group=[{0}]) LogicalProject(LSTG_FORMAT_NAME=[$3]) LogicalProject(TRANS_ID=[$0], ORDER_ID=[$1], CAL_DT=[$2], LSTG_FORMAT_NAME=[$3], LEAF_CATEG_ID=[$4], LSTG_SITE_ID=[$5], SLR_SEGMENT_CD=[$6], SELLER_ID=[$7], PRICE=[$8], ITEM_COUNT=[$9], TEST_COUNT_DISTINCT_BITMAP=[$10], DEAL_AMOUNT=[$11], DEAL_YEAR=[$12], _KY_COUNT__=[$13], _KY_MIN_TEST_KYLIN_FACT_PRICE_=[$14], _KY_MAX_TEST_KYLIN_FACT_PRICE_=[$15], _KY_COUNT_DISTINCT_TEST_KYLIN_FACT_SELLER_ID_=[$16], _KY_COUNT_DISTINCT_TEST_KYLIN_FACT_LSTG_FORMAT_NAME_TEST_KYLIN_FACT_SELLER_ID_=[$17], _KY_COUNT_DISTINCT_TEST_KYLIN_FACT_TEST_COUNT_DISTINCT_BITMAP_=[$18], _KY_PERCENTILE_TEST_KYLIN_FACT_PRICE_=[$19]) LogicalFilter(condition=[>($2, 2010-01-01)]) OLAPTableScan(table=[[DEFAULT, TEST_KYLIN_FACT]], fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]]) {code} org.apache.calcite.sql2rel.SqlToRelConverter#hackSelectStar will by mistake treat it like a normal case and lead to throwing exception -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KYLIN-2598) Should not translate filter to a in-clause filter with too many elements
hongbin ma created KYLIN-2598: - Summary: Should not translate filter to a in-clause filter with too many elements Key: KYLIN-2598 URL: https://issues.apache.org/jira/browse/KYLIN-2598 Project: Kylin Issue Type: Improvement Reporter: hongbin ma Assignee: hongbin ma In org.apache.kylin.dict.BuiltInFunctionTransformer#translateFunctionTupleFilter we will translate builtin-functions like upper,lower,like to in-clause filters. (KYLIN-993) The approach is In-clause filter will soon become in-efficient when too many elements accumulate in the in-clause. Suggest to set a threshold so that when there're more elements than this threshold, the translation will abort -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KYLIN-2597) Deal with trivial expression in filters like x = 1 + 2
hongbin ma created KYLIN-2597: - Summary: Deal with trivial expression in filters like x = 1 + 2 Key: KYLIN-2597 URL: https://issues.apache.org/jira/browse/KYLIN-2597 Project: Kylin Issue Type: Improvement Reporter: hongbin ma Assignee: hongbin ma BI tools will generate trivial expression in filters, e.g "x = 1 + 2". Such expressions will cause kylin to conceive it as "non-evaluateble", which in turn blocks other things like limit push down, or having to choose cuboid with more dimensions, etc. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KYLIN-2589) Errors in WebUI Authentication
[ https://issues.apache.org/jira/browse/KYLIN-2589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999441#comment-15999441 ] hongbin ma commented on KYLIN-2589: --- hi [~wooya] by " cleaned up all the info in hbase " do you mean cleaning the xxx_acl, xxx_user htables as well? Can you also describe your environment ( hadoop version, hdp or cloudera)? > Errors in WebUI Authentication > -- > > Key: KYLIN-2589 > URL: https://issues.apache.org/jira/browse/KYLIN-2589 > Project: Kylin > Issue Type: Bug > Components: General >Affects Versions: v2.0.0 > Environment: EMR >Reporter: Young Wu > Attachments: 2921494001551_.pic_hd.jpg, Screenshot 2017-05-06 > 12.29.34.png > > > There seems bugs exist in the webserver's authentication part in kylin. After > kylin run several hours, user will failed login with username/password. The > error reported in the log is "Encoded password cannot be null or empty". > Detailed attached behind. The only solution is restart kylin timely. Restart > can suppress this issue several hours and then suddenly error comes back > again. ISSUE detail is also here: > http://apache-kylin.74782.x6.nabble.com/Re-Encoded-password-cannot-be-null-or-empty-when-login-into-kylin-s-web-UI-td7879.html#a7887 > It is not due to upgrade from 2.0.0-BETA to 2.0.0 since I've already cleaned > up all the info in hbase and spun up a brand new kylin-2.0.0, but the issue > is still there. > Another bug occurs seldom, but it looks like also relates to authentication. > It happens when kylin is having a heavy load of query requests. Details also > attached. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KYLIN-2586) use random port for CacheServiceTest as fixed port 7777 might have been occupied
hongbin ma created KYLIN-2586: - Summary: use random port for CacheServiceTest as fixed port might have been occupied Key: KYLIN-2586 URL: https://issues.apache.org/jira/browse/KYLIN-2586 Project: Kylin Issue Type: Improvement Reporter: hongbin ma Assignee: hongbin ma https://builds.apache.org/job/Kylin-Master-JDK-1.7/442/ 2017-05-04 02:24:45,913 WARN [main AbstractLifeCycle:212]: FAILED ServerConnector@29065a9f{HTTP/1.1}{0.0.0.0:}: java.net.BindException: Address already in use java.net.BindException: Address already in use -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (KYLIN-2580) Improvement on subqueries: allow grouping by columns from subquery
[ https://issues.apache.org/jira/browse/KYLIN-2580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hongbin ma resolved KYLIN-2580. --- Resolution: Fixed Fix Version/s: v2.1.0 > Improvement on subqueries: allow grouping by columns from subquery > -- > > Key: KYLIN-2580 > URL: https://issues.apache.org/jira/browse/KYLIN-2580 > Project: Kylin > Issue Type: Improvement >Reporter: hongbin ma >Assignee: hongbin ma > Fix For: v2.1.0 > > > {code:sql} > select test_kylin_fact.lstg_format_name, xxx.week_beg_dt , > sum(test_kylin_fact.price) as GMV > , count(*) as TRANS_CNT > from > test_kylin_fact > inner JOIN test_category_groupings > ON test_kylin_fact.leaf_categ_id = test_category_groupings.leaf_categ_id AND > test_kylin_fact.lstg_site_id = test_category_groupings.site_id > inner JOIN (select cal_dt,week_beg_dt from edw.test_cal_dt where > week_beg_dt >= DATE '2010-02-10' ) xxx > ON test_kylin_fact.cal_dt = xxx.cal_dt > where test_category_groupings.meta_categ_name <> 'Baby' > group by test_kylin_fact.lstg_format_name, xxx.week_beg_dt > {code} > will fail due to groupby xxx.week_beg_dt, because week_beg_dt does not > necessarily appear in the cube -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KYLIN-2427) Auto adjust join order to make query executable
[ https://issues.apache.org/jira/browse/KYLIN-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15990768#comment-15990768 ] hongbin ma commented on KYLIN-2427: --- this issue should have been fixed by KYLIN-2579, but I have not verified it yet. > Auto adjust join order to make query executable > --- > > Key: KYLIN-2427 > URL: https://issues.apache.org/jira/browse/KYLIN-2427 > Project: Kylin > Issue Type: Bug >Reporter: Kaige Liu >Assignee: Kaige Liu > > KYLIN-2406 reports an issue: The order of joins will affect the result of > query. For example, below query leads to "No model found" > Below query triggers NPE > {code} > with tmp3 as ( > select l_partkey, 0.5 * sum(l_quantity) as sum_quantity, l_suppkey > from v_lineitem > inner join supplier on l_suppkey = s_suppkey > inner join nation on s_nationkey = n_nationkey > inner join part on l_partkey = p_partkey > where l_shipdate >= '1992-01-01' and l_shipdate <= '1995-01-01' > and n_name = 'CANADA' > and p_name like 'forest%' > group by l_partkey, l_suppkey > ) > select > s_name, > s_address > from > v_partsupp > inner join tmp3 on ps_partkey = l_partkey and ps_suppkey = l_suppkey > inner join supplier on ps_suppkey = s_suppkey > where > ps_availqty > sum_quantity > group by > s_name, s_address > order by > s_name > {code} > While below query is OK. Only difference being the order of "inner join tmp3" > and "inner join supplier" > {code} > with tmp3 as ( > select l_partkey, 0.5 * sum(l_quantity) as sum_quantity, l_suppkey > from v_lineitem > inner join supplier on l_suppkey = s_suppkey > inner join nation on s_nationkey = n_nationkey > inner join part on l_partkey = p_partkey > where l_shipdate >= '1992-01-01' and l_shipdate <= '1995-01-01' > and n_name = 'CANADA' > and p_name like 'forest%' > group by l_partkey, l_suppkey > ) > select > s_name, > s_address > from > v_partsupp > inner join supplier on ps_suppkey = s_suppkey > inner join tmp3 on ps_partkey = l_partkey and ps_suppkey = l_suppkey > where > ps_availqty > sum_quantity > group by > s_name, s_address > order by > s_name > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (KYLIN-2579) Improvement on subqueries: reorder subqueries joins with RelOptRule
[ https://issues.apache.org/jira/browse/KYLIN-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hongbin ma resolved KYLIN-2579. --- Resolution: Fixed Fix Version/s: v2.1.0 > Improvement on subqueries: reorder subqueries joins with RelOptRule > --- > > Key: KYLIN-2579 > URL: https://issues.apache.org/jira/browse/KYLIN-2579 > Project: Kylin > Issue Type: Improvement >Reporter: hongbin ma >Assignee: hongbin ma > Fix For: v2.1.0 > > > Current support for subqueries has some limitations. for example, we require > JOIN on tables precedes JOIN on all subqueries, the following query: > {code:sql} > select test_kylin_fact.lstg_format_name,sum(test_kylin_fact.price) as GMV > , count(*) as TRANS_CNT > from > test_kylin_fact > inner JOIN test_category_groupings > ON test_kylin_fact.leaf_categ_id = test_category_groupings.leaf_categ_id AND > test_kylin_fact.lstg_site_id = test_category_groupings.site_id > inner JOIN (select cal_dt,week_beg_dt from edw.test_cal_dt where > week_beg_dt >= DATE '2010-02-10' ) xxx > ON test_kylin_fact.cal_dt = xxx.cal_dt > > > where test_category_groupings.meta_categ_name <> 'Baby' > group by test_kylin_fact.lstg_format_name > {code} > works but > {code:sql} > select test_kylin_fact.lstg_format_name,sum(test_kylin_fact.price) as GMV > , count(*) as TRANS_CNT > from > test_kylin_fact > inner JOIN (select cal_dt,week_beg_dt from edw.test_cal_dt where > week_beg_dt >= DATE '2010-02-10' ) xxx > ON test_kylin_fact.cal_dt = xxx.cal_dt > > inner JOIN test_category_groupings > ON test_kylin_fact.leaf_categ_id = test_category_groupings.leaf_categ_id AND > test_kylin_fact.lstg_site_id = test_category_groupings.site_id > > > where test_category_groupings.meta_categ_name <> 'Baby' > group by test_kylin_fact.lstg_format_name > {code} > won't work. In this JIRA we'll reroder subqueries joins with RelOptRule -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KYLIN-2580) Improvement on subqueries: allow grouping by columns from subquery
hongbin ma created KYLIN-2580: - Summary: Improvement on subqueries: allow grouping by columns from subquery Key: KYLIN-2580 URL: https://issues.apache.org/jira/browse/KYLIN-2580 Project: Kylin Issue Type: Improvement Reporter: hongbin ma Assignee: hongbin ma {code:sql} select test_kylin_fact.lstg_format_name, xxx.week_beg_dt , sum(test_kylin_fact.price) as GMV , count(*) as TRANS_CNT from test_kylin_fact inner JOIN test_category_groupings ON test_kylin_fact.leaf_categ_id = test_category_groupings.leaf_categ_id AND test_kylin_fact.lstg_site_id = test_category_groupings.site_id inner JOIN (select cal_dt,week_beg_dt from edw.test_cal_dt where week_beg_dt >= DATE '2010-02-10' ) xxx ON test_kylin_fact.cal_dt = xxx.cal_dt where test_category_groupings.meta_categ_name <> 'Baby' group by test_kylin_fact.lstg_format_name, xxx.week_beg_dt {code} will fail due to groupby xxx.week_beg_dt, because week_beg_dt does not necessarily appear in the cube -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (KYLIN-2579) Improvement on subqueries: reorder subqueries joins with RelOptRule
[ https://issues.apache.org/jira/browse/KYLIN-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hongbin ma updated KYLIN-2579: -- Summary: Improvement on subqueries: reorder subqueries joins with RelOptRule (was: Improvement on subqueries: reroder subqueries joins with RelOptRule) > Improvement on subqueries: reorder subqueries joins with RelOptRule > --- > > Key: KYLIN-2579 > URL: https://issues.apache.org/jira/browse/KYLIN-2579 > Project: Kylin > Issue Type: Improvement >Reporter: hongbin ma >Assignee: hongbin ma > > Current support for subqueries has some limitations. for example, we require > JOIN on tables precedes JOIN on all subqueries, the following query: > {code:sql} > select test_kylin_fact.lstg_format_name,sum(test_kylin_fact.price) as GMV > , count(*) as TRANS_CNT > from > test_kylin_fact > inner JOIN test_category_groupings > ON test_kylin_fact.leaf_categ_id = test_category_groupings.leaf_categ_id AND > test_kylin_fact.lstg_site_id = test_category_groupings.site_id > inner JOIN (select cal_dt,week_beg_dt from edw.test_cal_dt where > week_beg_dt >= DATE '2010-02-10' ) xxx > ON test_kylin_fact.cal_dt = xxx.cal_dt > > > where test_category_groupings.meta_categ_name <> 'Baby' > group by test_kylin_fact.lstg_format_name > {code} > works but > {code:sql} > select test_kylin_fact.lstg_format_name,sum(test_kylin_fact.price) as GMV > , count(*) as TRANS_CNT > from > test_kylin_fact > inner JOIN (select cal_dt,week_beg_dt from edw.test_cal_dt where > week_beg_dt >= DATE '2010-02-10' ) xxx > ON test_kylin_fact.cal_dt = xxx.cal_dt > > inner JOIN test_category_groupings > ON test_kylin_fact.leaf_categ_id = test_category_groupings.leaf_categ_id AND > test_kylin_fact.lstg_site_id = test_category_groupings.site_id > > > where test_category_groupings.meta_categ_name <> 'Baby' > group by test_kylin_fact.lstg_format_name > {code} > won't work. In this JIRA we'll reroder subqueries joins with RelOptRule -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KYLIN-2579) Improvement on subqueries: reroder subqueries joins with RelOptRule
hongbin ma created KYLIN-2579: - Summary: Improvement on subqueries: reroder subqueries joins with RelOptRule Key: KYLIN-2579 URL: https://issues.apache.org/jira/browse/KYLIN-2579 Project: Kylin Issue Type: Improvement Reporter: hongbin ma Assignee: hongbin ma Current support for subqueries has some limitations. for example, we require JOIN on tables precedes JOIN on all subqueries, the following query: {code:sql} select test_kylin_fact.lstg_format_name,sum(test_kylin_fact.price) as GMV , count(*) as TRANS_CNT from test_kylin_fact inner JOIN test_category_groupings ON test_kylin_fact.leaf_categ_id = test_category_groupings.leaf_categ_id AND test_kylin_fact.lstg_site_id = test_category_groupings.site_id inner JOIN (select cal_dt,week_beg_dt from edw.test_cal_dt where week_beg_dt >= DATE '2010-02-10' ) xxx ON test_kylin_fact.cal_dt = xxx.cal_dt where test_category_groupings.meta_categ_name <> 'Baby' group by test_kylin_fact.lstg_format_name {code} works but {code:sql} select test_kylin_fact.lstg_format_name,sum(test_kylin_fact.price) as GMV , count(*) as TRANS_CNT from test_kylin_fact inner JOIN (select cal_dt,week_beg_dt from edw.test_cal_dt where week_beg_dt >= DATE '2010-02-10' ) xxx ON test_kylin_fact.cal_dt = xxx.cal_dt inner JOIN test_category_groupings ON test_kylin_fact.leaf_categ_id = test_category_groupings.leaf_categ_id AND test_kylin_fact.lstg_site_id = test_category_groupings.site_id where test_category_groupings.meta_categ_name <> 'Baby' group by test_kylin_fact.lstg_format_name {code} won't work. In this JIRA we'll reroder subqueries joins with RelOptRule -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KYLIN-2575) Experimental feature: Computed Column
hongbin ma created KYLIN-2575: - Summary: Experimental feature: Computed Column Key: KYLIN-2575 URL: https://issues.apache.org/jira/browse/KYLIN-2575 Project: Kylin Issue Type: New Feature Reporter: hongbin ma Assignee: hongbin ma Computed column is a virtual column that is calculated from an expression of existing columns. For example, TAX is computed from PRICE * TAX_RATE; TX_YEAR is from EXTRACT(year from TX_DATE). Currently user have to create a view to enrich these computed columns, then feed the view to cube. This has two inconvenience: Create a view is not easy. The query has to be rewritten to use view instead of the original table. Let Kylin/KAP directly support computed column will be a big step forward. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (KYLIN-2574) RawQueryLastHacker should group by all possible dimensions
[ https://issues.apache.org/jira/browse/KYLIN-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hongbin ma updated KYLIN-2574: -- Request participants: (was: ) Description: currently RawQueryLastHacker make the raw query group by columns existing in query (if (tupleInfo.hasColumn(col))). The approach would fail to leverage limit push down if the existing columns are not a "prefix" of row keys.(org.apache.kylin.storage.gtrecord.GTCubeStorageQueryBase#enableStorageLimitIfPossible) On the other hand, a large portion of the raw queries are random queries like "select * from fact " or "select * from fact inner join lookup where year =2000" . Keeping these queries return fast is important to impress users was:currently RawQueryLastHacker make the raw query group by columns existing in query (if (tupleInfo.hasColumn(col))). The approach would fail to leverage limit push down if the existing columns are not a "prefix" of row keys.(org.apache.kylin.storage.gtrecord.GTCubeStorageQueryBase#enableStorageLimitIfPossible) > RawQueryLastHacker should group by all possible dimensions > -- > > Key: KYLIN-2574 > URL: https://issues.apache.org/jira/browse/KYLIN-2574 > Project: Kylin > Issue Type: Bug >Reporter: hongbin ma >Assignee: hongbin ma > > currently RawQueryLastHacker make the raw query group by columns existing in > query (if (tupleInfo.hasColumn(col))). The approach would fail to leverage > limit push down if the existing columns are not a "prefix" of row > keys.(org.apache.kylin.storage.gtrecord.GTCubeStorageQueryBase#enableStorageLimitIfPossible) > On the other hand, a large portion of the raw queries are random queries like > "select * from fact " or "select * from fact inner join lookup where year > =2000" . Keeping these queries return fast is important to impress users -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (KYLIN-2574) RawQueryLastHacker should group by all possible dimensions
[ https://issues.apache.org/jira/browse/KYLIN-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hongbin ma updated KYLIN-2574: -- Request participants: (was: ) Description: currently RawQueryLastHacker make the raw query group by columns existing in query (if (tupleInfo.hasColumn(col))). The approach would fail to leverage limit push down if the existing columns are not a "prefix" of row keys.(org.apache.kylin.storage.gtrecord.GTCubeStorageQueryBase#enableStorageLimitIfPossible) (was: currently RawQueryLastHacker make the raw query group by columns existing in query (if (tupleInfo.hasColumn(col))). The approach would fail to leverage limit push down if the existing columns are not a "prefix" of row keys.) > RawQueryLastHacker should group by all possible dimensions > -- > > Key: KYLIN-2574 > URL: https://issues.apache.org/jira/browse/KYLIN-2574 > Project: Kylin > Issue Type: Bug >Reporter: hongbin ma >Assignee: hongbin ma > > currently RawQueryLastHacker make the raw query group by columns existing in > query (if (tupleInfo.hasColumn(col))). The approach would fail to leverage > limit push down if the existing columns are not a "prefix" of row > keys.(org.apache.kylin.storage.gtrecord.GTCubeStorageQueryBase#enableStorageLimitIfPossible) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KYLIN-2574) RawQueryLastHacker should group by all possible dimensions
hongbin ma created KYLIN-2574: - Summary: RawQueryLastHacker should group by all possible dimensions Key: KYLIN-2574 URL: https://issues.apache.org/jira/browse/KYLIN-2574 Project: Kylin Issue Type: Bug Reporter: hongbin ma Assignee: hongbin ma currently RawQueryLastHacker make the raw query group by columns existing in query (if (tupleInfo.hasColumn(col))). The approach would fail to leverage limit push down if the existing columns are not a "prefix" of row keys. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (KYLIN-2564) Got "UsernameNotFoundException: User XXX does not exist" in new Kylin instance
[ https://issues.apache.org/jira/browse/KYLIN-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hongbin ma resolved KYLIN-2564. --- Resolution: Fixed Fix Version/s: v2.0.0 > Got "UsernameNotFoundException: User XXX does not exist" in new Kylin instance > -- > > Key: KYLIN-2564 > URL: https://issues.apache.org/jira/browse/KYLIN-2564 > Project: Kylin > Issue Type: Bug >Affects Versions: v2.0.0 >Reporter: liyang >Assignee: hongbin ma > Fix For: v2.0.0 > > > In a new Kylin instance, new metadata, we met following exception when > creating the very first project. > {code} > 2017-04-25 14:04:51,997 ERROR [http-bio-7070-exec-10 ProjectController:218]: > Failed to deal with the request. > org.springframework.security.core.userdetails.UsernameNotFoundException: User > ADMIN does not exist. Please make sure the user has logged in before > at > org.apache.kylin.rest.service.AclService.updateAcl(AclService.java:308) > at > org.apache.kylin.rest.service.AccessService.grant(AccessService.java:119) > at > org.apache.kylin.rest.service.AccessService.init(AccessService.java:81) > at > org.apache.kylin.rest.service.AccessService$$FastClassBySpringCGLIB$$91550c7f.invoke() > at > org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204) > at > org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:629) > at > org.apache.kylin.rest.service.AccessService$$EnhancerBySpringCGLIB$$594ff853.init() > at > org.apache.kylin.rest.service.ProjectService.createProject(ProjectService.java:64) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (KYLIN-2555) minor issues about acl and granted autority
[ https://issues.apache.org/jira/browse/KYLIN-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hongbin ma resolved KYLIN-2555. --- Resolution: Fixed Fix Version/s: v2.0.0 > minor issues about acl and granted autority > --- > > Key: KYLIN-2555 > URL: https://issues.apache.org/jira/browse/KYLIN-2555 > Project: Kylin > Issue Type: Bug >Reporter: XIE FAN >Assignee: XIE FAN > Fix For: v2.0.0 > > > 1. When we use AclService to manage authorities of kylin project, authorities > may be granted to not exist users, which should not be allowed > 2. Implicitly give ADMIN=ADMIN+MODELER+ANALYST and MODELER=MODELER+ANALYST -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (KYLIN-2549) Modify tools that related to Acl
[ https://issues.apache.org/jira/browse/KYLIN-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hongbin ma updated KYLIN-2549: -- Description: Many tools, such as MigrationTooll, StorageCleanUpJob need to read acl records, and they need to be modified using the new Resource store API instead of HBase API (was: Many tools, such as MigrationTooll, StorageCleanUpJob need to read acl records, and they need to be modified.) > Modify tools that related to Acl > > > Key: KYLIN-2549 > URL: https://issues.apache.org/jira/browse/KYLIN-2549 > Project: Kylin > Issue Type: Sub-task >Reporter: XIE FAN >Assignee: XIE FAN > > Many tools, such as MigrationTooll, StorageCleanUpJob need to read acl > records, and they need to be modified using the new Resource store API > instead of HBase API -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KYLIN-2551) separate table desc by each project
hongbin ma created KYLIN-2551: - Summary: separate table desc by each project Key: KYLIN-2551 URL: https://issues.apache.org/jira/browse/KYLIN-2551 Project: Kylin Issue Type: Improvement Reporter: hongbin ma Assignee: hongbin ma for some historical reasons different projects share same table desc. This makes project admins having to worry about not to affect cubes in other project. The jira aims to separate table desc by each project, and maintain backward compatibility so that users won't have to manually "upgrade" -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (KYLIN-2539) Useless filter dimension will impact cuboid selection.
[ https://issues.apache.org/jira/browse/KYLIN-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hongbin ma resolved KYLIN-2539. --- Resolution: Fixed Assignee: hongbin ma Fix Version/s: v2.0.0 > Useless filter dimension will impact cuboid selection. > -- > > Key: KYLIN-2539 > URL: https://issues.apache.org/jira/browse/KYLIN-2539 > Project: Kylin > Issue Type: Bug >Reporter: Yifan Zhang >Assignee: hongbin ma > Fix For: v2.0.0 > > > Query1: select count(*) from test_kylin_fact where (cal_dt > > DATE'2012-01-01') and (seller_id is null or 1 = 1) > Query2: select count(*) from test_kylin_fact where (cal_dt > DATE'2012-01-01') > Q1 and Q2 return identical result but hit different cuboid: 43051 and > 1310735, and result in different query performance. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KYLIN-2539) Useless filter dimension will impact cuboid selection.
[ https://issues.apache.org/jira/browse/KYLIN-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15962833#comment-15962833 ] hongbin ma commented on KYLIN-2539: --- I added a org.apache.kylin.metadata.filter.FilterOptimizeTransformer to detect patterns like (x = ? or 1 = 1) and replace such patterns with ConstantTupleFilter.TRUE > Useless filter dimension will impact cuboid selection. > -- > > Key: KYLIN-2539 > URL: https://issues.apache.org/jira/browse/KYLIN-2539 > Project: Kylin > Issue Type: Bug >Reporter: Yifan Zhang > > Query1: select count(*) from test_kylin_fact where (cal_dt > > DATE'2012-01-01') and (seller_id is null or 1 = 1) > Query2: select count(*) from test_kylin_fact where (cal_dt > DATE'2012-01-01') > Q1 and Q2 return identical result but hit different cuboid: 43051 and > 1310735, and result in different query performance. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KYLIN-2527) Speedup LookupStringTable, use HashMap instead of ConcurrentHashMap
hongbin ma created KYLIN-2527: - Summary: Speedup LookupStringTable, use HashMap instead of ConcurrentHashMap Key: KYLIN-2527 URL: https://issues.apache.org/jira/browse/KYLIN-2527 Project: Kylin Issue Type: Improvement Reporter: hongbin ma Assignee: hongbin ma concurrent hash map here is a overkill, it should be faster to init a normal hash map. the next step might be to cache the lookupStringTable -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KYLIN-2506) Refactor Global Dictionary
[ https://issues.apache.org/jira/browse/KYLIN-2506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950415#comment-15950415 ] hongbin ma commented on KYLIN-2506: --- Today we use trie dict as default encoding for precise distinct count, however global dictionary seems to be a better default choice. I do understand model designer need trie dict in some cases (where global dict may grow too large), however we can hide it in advanced settings. There might be a little more work to keep backward compatibility, still I think it's manageable. > Refactor Global Dictionary > -- > > Key: KYLIN-2506 > URL: https://issues.apache.org/jira/browse/KYLIN-2506 > Project: Kylin > Issue Type: Improvement > Components: General >Affects Versions: v2.0.0 >Reporter: kangkaisen >Assignee: kangkaisen > Fix For: v2.0.0 > > > The main points of this refactor: > 1 Fix the bug that the RemoveListener of LoadingCache swallowed any > exceptions when building the GlobalDict. > 2 Fix the bug that the HDFS filename of DictSliceKey had Illegal characters. > 3 Fix the bug that the HDFS filename of DictSliceKey maybe longer than 255. > 4 Fix the bug that DictNode split failed if value length greater than 255 > bytes. > 5 Decouple the build and query of GlobalDict: > Abstract the builder of AppendTrieDictionary to AppendTrieDictionaryBuilder; > Add LoadingCache to AppendTrieDictionary and make AppendTrieDictionary is > only readable. > 6 Remove dependence of LoadingCache when building the GlobalDict. > 7 Abstract the HDFS operations to GlobalDictStore. > 8 Abstract the metadata of GlobalDict to GlobalDictMetadata. > 9 Delete CachedTreeMap. > 10 Remove the support of multithreading concurrent build and I will add > distributed lock for GlobalDict later. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KYLIN-2521) upgrade to calcite 1.12.0
hongbin ma created KYLIN-2521: - Summary: upgrade to calcite 1.12.0 Key: KYLIN-2521 URL: https://issues.apache.org/jira/browse/KYLIN-2521 Project: Kylin Issue Type: Task Reporter: hongbin ma Assignee: hongbin ma -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Reopened] (KYLIN-2361) Upgrade to Tomcat 8.X
[ https://issues.apache.org/jira/browse/KYLIN-2361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hongbin ma reopened KYLIN-2361: --- > Upgrade to Tomcat 8.X > - > > Key: KYLIN-2361 > URL: https://issues.apache.org/jira/browse/KYLIN-2361 > Project: Kylin > Issue Type: Task > Components: Web >Affects Versions: v1.6.0 >Reporter: Billy Liu >Assignee: Billy Liu >Priority: Minor > Fix For: v2.0.0 > > > Apache Tomcat 8.5.x supports the same Servlet, JSP, EL, and WebSocket > Specification versions as Apache Tomcat 8.0.x. In addition to that, it also > implements the JASPIC 1.1 specification. There are significant changes in > many areas under the hood, resulting in improved performance, stability, and > total cost of ownership. Please refer to the Apache Tomcat 8.5 Changelog for > details. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (KYLIN-2361) Upgrade to Tomcat 8.X
[ https://issues.apache.org/jira/browse/KYLIN-2361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15906467#comment-15906467 ] hongbin ma edited comment on KYLIN-2361 at 3/12/17 9:25 AM: seems the ordered class loader https://github.com/openwide-java/tomcat-classloader-ordered seems not guaranteed to load SqlToRelConverter in AtopCalcite prior to SqlToRelConverter in calcite. I have to revert this. The change is in 7976b5fc714f5e73734b3037c05fc2601ea17662 was (Author: mahongbin): seems the ordered class loader https://github.com/openwide-java/tomcat-classloader-ordered seems not guaranteed to load SqlToRelConverter in AtopCalcite prior to SqlToRelConverter in calcite. I have to revert the change > Upgrade to Tomcat 8.X > - > > Key: KYLIN-2361 > URL: https://issues.apache.org/jira/browse/KYLIN-2361 > Project: Kylin > Issue Type: Task > Components: Web >Affects Versions: v1.6.0 >Reporter: Billy Liu >Assignee: Billy Liu >Priority: Minor > Fix For: v2.0.0 > > > Apache Tomcat 8.5.x supports the same Servlet, JSP, EL, and WebSocket > Specification versions as Apache Tomcat 8.0.x. In addition to that, it also > implements the JASPIC 1.1 specification. There are significant changes in > many areas under the hood, resulting in improved performance, stability, and > total cost of ownership. Please refer to the Apache Tomcat 8.5 Changelog for > details. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KYLIN-2361) Upgrade to Tomcat 8.X
[ https://issues.apache.org/jira/browse/KYLIN-2361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15906467#comment-15906467 ] hongbin ma commented on KYLIN-2361: --- seems the ordered class loader https://github.com/openwide-java/tomcat-classloader-ordered seems not guaranteed to load SqlToRelConverter in AtopCalcite prior to SqlToRelConverter in calcite. I have to revert the change > Upgrade to Tomcat 8.X > - > > Key: KYLIN-2361 > URL: https://issues.apache.org/jira/browse/KYLIN-2361 > Project: Kylin > Issue Type: Task > Components: Web >Affects Versions: v1.6.0 >Reporter: Billy Liu >Assignee: Billy Liu >Priority: Minor > Fix For: v2.0.0 > > > Apache Tomcat 8.5.x supports the same Servlet, JSP, EL, and WebSocket > Specification versions as Apache Tomcat 8.0.x. In addition to that, it also > implements the JASPIC 1.1 specification. There are significant changes in > many areas under the hood, resulting in improved performance, stability, and > total cost of ownership. Please refer to the Apache Tomcat 8.5 Changelog for > details. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (KYLIN-2495) query exception when integer column encoded as date/time encoding
[ https://issues.apache.org/jira/browse/KYLIN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15904662#comment-15904662 ] hongbin ma edited comment on KYLIN-2495 at 3/10/17 9:14 AM: updated KYLIN- so that string type no longer support date/time encoding, test data: {code} create table fact0310(intdate int, realdate date, realtime timestamp, longtime bigint); 19980302 1998-03-02 2015-06-01 00:00:00 143311680 19920403 1992-04-03 2015-05-15 17:00:00 143170920 19920403 1992-04-03 2016-01-15 12:00:00 145285920 {code} make sure both intdate and realdate can use date encoding, and realtime and longtime can use time encoding by following query: {code:sql} select intdate,realdate,realtime,longtime,count(*) from fact0310 group by intdate,realdate,realtime,longtime {code} was (Author: mahongbin): updated KYLIN- so that string type no longer support date/time encoding, test data: {code} create table fact0310(intdate int, realdate date, realtime timestamp, longtime bigint); 19980302 1998-03-02 2015-06-01 00:00:00 143311680 19920403 1992-04-03 2015-05-15 17:00:00 143170920 19920403 1992-04-03 2016-01-15 12:00:00 145285920 {code} make sure both intdate and realdate can use date encoding, and realtime and longtime can use time encoding > query exception when integer column encoded as date/time encoding > -- > > Key: KYLIN-2495 > URL: https://issues.apache.org/jira/browse/KYLIN-2495 > Project: Kylin > Issue Type: Bug >Reporter: hongbin ma >Assignee: hongbin ma > > in KYLIN-, we claimed that integer column can use date/time encoding. > however when I tried to query on such cube, an exception is thrown: > {code} > java.sql.SQLException: Error while executing SQL "select * from fact0309 > LIMIT 5": For input string: "70225920" > {code} > the fact table desc is: > {code} > hive> desc fact0309 > > ; > OK > tdate int > country string > price decimal(10,0) > {code} > and the sample data is: > {code} > 19980302 US 100 > 19920403 CN 100 > 19920403 US 33 > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KYLIN-2495) query exception when integer column encoded as date/time encoding
[ https://issues.apache.org/jira/browse/KYLIN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15904662#comment-15904662 ] hongbin ma commented on KYLIN-2495: --- updated KYLIN- so that string type no longer support date/time encoding, test data: {code} create table fact0310(intdate int, realdate date, realtime timestamp, longtime bigint); 19980302 1998-03-02 2015-06-01 00:00:00 143311680 19920403 1992-04-03 2015-05-15 17:00:00 143170920 19920403 1992-04-03 2016-01-15 12:00:00 145285920 {code} make sure both intdate and realdate can use date encoding, and realtime and longtime can use time encoding > query exception when integer column encoded as date/time encoding > -- > > Key: KYLIN-2495 > URL: https://issues.apache.org/jira/browse/KYLIN-2495 > Project: Kylin > Issue Type: Bug >Reporter: hongbin ma >Assignee: hongbin ma > > in KYLIN-, we claimed that integer column can use date/time encoding. > however when I tried to query on such cube, an exception is thrown: > {code} > java.sql.SQLException: Error while executing SQL "select * from fact0309 > LIMIT 5": For input string: "70225920" > {code} > the fact table desc is: > {code} > hive> desc fact0309 > > ; > OK > tdate int > country string > price decimal(10,0) > {code} > and the sample data is: > {code} > 19980302 US 100 > 19920403 CN 100 > 19920403 US 33 > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KYLIN-2222) web ui uses rest api to decide which dim encoding is valid for different typed columns
[ https://issues.apache.org/jira/browse/KYLIN-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15904654#comment-15904654 ] hongbin ma commented on KYLIN-: --- refined dimension-encoding to column-type matrix: || ||Float Numbers|| Integer Numbers|| Time || String || |boolean encoding| N | Y| N|Y| |date encoding| N | Y| Y|N| |time encoding| N | Y| Y|N| |dict encoding| Y| Y| Y|Y| |fixed_length encoding| N | N| N|Y| |fixed_length_hex encoding| N | N| N|Y| |integer encoding| N | Y| N|Y| > web ui uses rest api to decide which dim encoding is valid for different > typed columns > -- > > Key: KYLIN- > URL: https://issues.apache.org/jira/browse/KYLIN- > Project: Kylin > Issue Type: Improvement >Reporter: hongbin ma >Assignee: hongbin ma > Fix For: v2.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (KYLIN-2491) Cube with error job can be dropped
[ https://issues.apache.org/jira/browse/KYLIN-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hongbin ma updated KYLIN-2491: -- Affects Version/s: (was: v1.5.4.1) v1.6.0 > Cube with error job can be dropped > -- > > Key: KYLIN-2491 > URL: https://issues.apache.org/jira/browse/KYLIN-2491 > Project: Kylin > Issue Type: Bug > Components: REST Service >Affects Versions: v1.6.0 >Reporter: nichunen >Assignee: nichunen > Fix For: v2.0.0 > > Attachments: KYLIN-2491.patch > > > If a cube build failed, it can be dropped and left a error job, the job can > be resumed. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (KYLIN-2491) Cube with error job can be dropped
[ https://issues.apache.org/jira/browse/KYLIN-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hongbin ma resolved KYLIN-2491. --- Resolution: Fixed > Cube with error job can be dropped > -- > > Key: KYLIN-2491 > URL: https://issues.apache.org/jira/browse/KYLIN-2491 > Project: Kylin > Issue Type: Bug > Components: REST Service >Affects Versions: v1.6.0 >Reporter: nichunen >Assignee: nichunen > Fix For: v2.0.0 > > Attachments: KYLIN-2491.patch > > > If a cube build failed, it can be dropped and left a error job, the job can > be resumed. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KYLIN-2483) SortedIteratorMergerWithLimit could be slower when number of total merge rows is small
[ https://issues.apache.org/jira/browse/KYLIN-2483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15896844#comment-15896844 ] hongbin ma commented on KYLIN-2483: --- just gave it another thought, disabling SortedIteratorMergerWithLimit with limit push down enabled will return wrong results. Since it's always tempting to enable limit push down, we'll have to suffer the cost of SortedIteratorMergerWithLimit. > SortedIteratorMergerWithLimit could be slower when number of total merge rows > is small > -- > > Key: KYLIN-2483 > URL: https://issues.apache.org/jira/browse/KYLIN-2483 > Project: Kylin > Issue Type: Improvement >Reporter: hongbin ma >Assignee: hongbin ma > > if the pushed down limit is small enough (say less than 100), > SortedIteratorMergerWithLimit will bring RELATIVELY significant costs. I'm > adding a new configuration entry called > kylin.query.merge-sort-partition-results.min-limit (default 100) to fix this > issue -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Closed] (KYLIN-2483) SortedIteratorMergerWithLimit could be slower when number of total merge rows is small
[ https://issues.apache.org/jira/browse/KYLIN-2483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hongbin ma closed KYLIN-2483. - Resolution: Not A Problem > SortedIteratorMergerWithLimit could be slower when number of total merge rows > is small > -- > > Key: KYLIN-2483 > URL: https://issues.apache.org/jira/browse/KYLIN-2483 > Project: Kylin > Issue Type: Improvement >Reporter: hongbin ma >Assignee: hongbin ma > > if the pushed down limit is small enough (say less than 100), > SortedIteratorMergerWithLimit will bring RELATIVELY significant costs. I'm > adding a new configuration entry called > kylin.query.merge-sort-partition-results.min-limit (default 100) to fix this > issue -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KYLIN-2483) SortedIteratorMergerWithLimit could be slower when number of total merge rows is small
hongbin ma created KYLIN-2483: - Summary: SortedIteratorMergerWithLimit could be slower when number of total merge rows is small Key: KYLIN-2483 URL: https://issues.apache.org/jira/browse/KYLIN-2483 Project: Kylin Issue Type: Improvement Reporter: hongbin ma Assignee: hongbin ma if the pushed down limit is small enough (say less than 100), SortedIteratorMergerWithLimit will bring RELATIVELY significant costs. I'm adding a new configuration entry called kylin.query.merge-sort-partition-results.min-limit (default 100) to fix this issue -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KYLIN-2471) queries with parenthesized sub-clause in JOIN will fail
hongbin ma created KYLIN-2471: - Summary: queries with parenthesized sub-clause in JOIN will fail Key: KYLIN-2471 URL: https://issues.apache.org/jira/browse/KYLIN-2471 Project: Kylin Issue Type: Bug Reporter: hongbin ma Assignee: hongbin ma cognos will generate queries with parenthesized sub-clause in JOIN. for example: {code} SELECT "TABLE1"."DIM1_1" "DIM1_1" ,"TABLE2"."DIM2_1" "DIM2_1" ,SUM("FACT"."M1") "M1" ,SUM("FACT"."M2") "M2" FROM ("COGNOS"."FACT" "FACT" LEFT OUTER JOIN "COGNOS"."TABLE1" "TABLE1" ON "FACT"."FK_1" = "TABLE1"."PK_1") LEFT OUTER JOIN "COGNOS"."TABLE2" "TABLE2" ON "FACT"."FK_2" = "TABLE2"."PK_2" GROUP BY "TABLE2"."DIM2_1" ,"TABLE1"."DIM1_1"; {code} as mentioned in https://issues.apache.org/jira/browse/CALCITE-35 such issue is difficult to handle in calcite. We'll leverage IQueryTransformer to remove unnecessary parentheses -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (KYLIN-2436) add a configuration knob to disable spilling of aggregation cache
[ https://issues.apache.org/jira/browse/KYLIN-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hongbin ma updated KYLIN-2436: -- Description: Kylin's aggregation operator can spill intermediate results to disk when its estimated memory usage exceeds some threshold (kylin.query.coprocessor.mem.gb to be specific). While it's a useful feature in general to prevent RegionServer from OOM, there are times when aborting this kind of memory-hungry query immediately is a more suitable choice to users. To accommodate this requirement, I suggest adding a new configuration named -*kylin.storage.hbase.coprocessor-spill-enabled*- +*kylin.storage.partition.aggr-spill-enabled*+. The default value would be true, which will keep the same behavior as before. If changed to false, query that uses more aggregation memory than threshold will fail immediately. was: Kylin's aggregation operator can spill intermediate results to disk when its estimated memory usage exceeds some threshold (kylin.query.coprocessor.mem.gb to be specific). While it's a useful feature in general to prevent RegionServer from OOM, there are times when aborting this kind of memory-hungry query immediately is a more suitable choice to users. To accommodate this requirement, I suggest adding a new configuration named *kylin.storage.hbase.coprocessor-spill-enabled*. The default value would be true, which will keep the same behavior as before. If changed to false, query that uses more aggregation memory than threshold will fail immediately. > add a configuration knob to disable spilling of aggregation cache > - > > Key: KYLIN-2436 > URL: https://issues.apache.org/jira/browse/KYLIN-2436 > Project: Kylin > Issue Type: Improvement > Components: Storage - HBase >Affects Versions: v1.6.0 >Reporter: Dayue Gao >Assignee: Dayue Gao > Fix For: v2.0.0 > > > Kylin's aggregation operator can spill intermediate results to disk when its > estimated memory usage exceeds some threshold (kylin.query.coprocessor.mem.gb > to be specific). While it's a useful feature in general to prevent > RegionServer from OOM, there are times when aborting this kind of > memory-hungry query immediately is a more suitable choice to users. > To accommodate this requirement, I suggest adding a new configuration named > -*kylin.storage.hbase.coprocessor-spill-enabled*- > +*kylin.storage.partition.aggr-spill-enabled*+. The default value would be > true, which will keep the same behavior as before. If changed to false, query > that uses more aggregation memory than threshold will fail immediately. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (KYLIN-2438) replace scan threshold with max scan bytes
[ https://issues.apache.org/jira/browse/KYLIN-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hongbin ma updated KYLIN-2438: -- Description: In order to guard against bad queries that can consume lots of memory and potentially crash kylin / hbase server, kylin limits the maximum number of rows query can scan. The maximum value is chosen based on two configs # *kylin.query.scan.threshold* is used if the query doesn't contain memory-hungry metrics # *kylin.query.mem.budget* / estimated_row_size is used otherwise as the per region maximum. This approach however has several deficiencies: * It doesn't work with complex, varlen metrics very well. The estimated threshold could be either too small or too large. If it's too small, good queries are killed. If it's too large, bad queries are not banned. * Row count doesn't correspond to memory consumption, thus it's difficult to determine how large scan threshold should be set to. * kylin.query.scan.threshold can't be override at cube level. In this JIRA, I propose to replace the current row count based threshold with a more intuitive size based threshold * KYLIN-2437 will collect the number of bytes scanned at both region and query level * A new configuration *kylin.query.max-scan-bytes* will be added to limits the maximum number of bytes query can scan * *kylin.query.mem.budget* will be renamed to -*kylin.storage.hbase.coprocessor-max-scan-bytes*- +*kylin.storage.partition.max-scan-bytes*+, which limits at region level. No need to rely on estimations about row size any more. * The above two configs scan be override at cube level * the old *kylin.query.scan.threshold* will be deprecated was: In order to guard against bad queries that can consume lots of memory and potentially crash kylin / hbase server, kylin limits the maximum number of rows query can scan. The maximum value is chosen based on two configs # *kylin.query.scan.threshold* is used if the query doesn't contain memory-hungry metrics # *kylin.query.mem.budget* / estimated_row_size is used otherwise as the per region maximum. This approach however has several deficiencies: * It doesn't work with complex, varlen metrics very well. The estimated threshold could be either too small or too large. If it's too small, good queries are killed. If it's too large, bad queries are not banned. * Row count doesn't correspond to memory consumption, thus it's difficult to determine how large scan threshold should be set to. * kylin.query.scan.threshold can't be override at cube level. In this JIRA, I propose to replace the current row count based threshold with a more intuitive size based threshold * KYLIN-2437 will collect the number of bytes scanned at both region and query level * A new configuration *kylin.query.max-scan-bytes* will be added to limits the maximum number of bytes query can scan * *kylin.query.mem.budget* will be renamed to *kylin.storage.hbase.coprocessor-max-scan-bytes*, which limits at region level. No need to rely on estimations about row size any more. * The above two configs scan be override at cube level * the old *kylin.query.scan.threshold* will be deprecated > replace scan threshold with max scan bytes > -- > > Key: KYLIN-2438 > URL: https://issues.apache.org/jira/browse/KYLIN-2438 > Project: Kylin > Issue Type: Improvement > Components: Query Engine, Storage - HBase >Affects Versions: v1.6.0 >Reporter: Dayue Gao >Assignee: Dayue Gao > Fix For: v2.0.0 > > > In order to guard against bad queries that can consume lots of memory and > potentially crash kylin / hbase server, kylin limits the maximum number of > rows query can scan. The maximum value is chosen based on two configs > # *kylin.query.scan.threshold* is used if the query doesn't contain > memory-hungry metrics > # *kylin.query.mem.budget* / estimated_row_size is used otherwise as the per > region maximum. > This approach however has several deficiencies: > * It doesn't work with complex, varlen metrics very well. The estimated > threshold could be either too small or too large. If it's too small, good > queries are killed. If it's too large, bad queries are not banned. > * Row count doesn't correspond to memory consumption, thus it's difficult to > determine how large scan threshold should be set to. > * kylin.query.scan.threshold can't be override at cube level. > In this JIRA, I propose to replace the current row count based threshold with > a more intuitive size based threshold > * KYLIN-2437 will collect the number of bytes scanned at both region and > query level > * A new configuration *kylin.query.max-scan-bytes* will be added to limits > the maximum number of bytes query can scan > * *kylin.query.mem.budget* will be renamed to >
[jira] [Updated] (KYLIN-2441) protocol for REST API result format
[ https://issues.apache.org/jira/browse/KYLIN-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hongbin ma updated KYLIN-2441: -- Request participants: (was: ) Description: currently there's no standard for REST API's result format, so the frontend has to deal with all kinds of formats. This issue is an attempt to unify the format > protocol for REST API result format > --- > > Key: KYLIN-2441 > URL: https://issues.apache.org/jira/browse/KYLIN-2441 > Project: Kylin > Issue Type: Bug >Reporter: hongbin ma >Assignee: hongbin ma > > currently there's no standard for REST API's result format, so the frontend > has to deal with all kinds of formats. This issue is an attempt to unify the > format -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KYLIN-2441) protocol for REST API result format
hongbin ma created KYLIN-2441: - Summary: protocol for REST API result format Key: KYLIN-2441 URL: https://issues.apache.org/jira/browse/KYLIN-2441 Project: Kylin Issue Type: Bug Reporter: hongbin ma Assignee: hongbin ma -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KYLIN-2436) add a configuration knob to disable spilling of aggregation cache
[ https://issues.apache.org/jira/browse/KYLIN-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15858057#comment-15858057 ] hongbin ma commented on KYLIN-2436: --- +1 > add a configuration knob to disable spilling of aggregation cache > - > > Key: KYLIN-2436 > URL: https://issues.apache.org/jira/browse/KYLIN-2436 > Project: Kylin > Issue Type: Improvement > Components: Storage - HBase >Affects Versions: v1.6.0 >Reporter: Dayue Gao >Assignee: Dayue Gao > > Kylin's aggregation operator can spill intermediate results to disk when its > estimated memory usage exceeds some threshold (kylin.query.coprocessor.mem.gb > to be specific). While it's a useful feature in general to prevent > RegionServer from OOM, there are times when aborting this kind of > memory-hungry query immediately is a more suitable choice to users. > To accommodate this requirement, I suggest adding a new configuration named > *kylin.storage.hbase.coprocessor-spill-enabled*. The default value would be > true, which will keep the same behavior as before. If changed to false, query > that uses more aggregation memory than threshold will fail immediately. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KYLIN-2437) collect number of bytes scanned to query metrics
[ https://issues.apache.org/jira/browse/KYLIN-2437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15858055#comment-15858055 ] hongbin ma commented on KYLIN-2437: --- +1 > collect number of bytes scanned to query metrics > > > Key: KYLIN-2437 > URL: https://issues.apache.org/jira/browse/KYLIN-2437 > Project: Kylin > Issue Type: Improvement > Components: Storage - HBase >Affects Versions: v1.6.0 >Reporter: Dayue Gao >Assignee: Dayue Gao > > Besides scanned row count, it's useful to know how many bytes are scanned > from HBase to fulfil a query. It is perhaps a better indicator than row count > that shows how much pressure a query puts on HBase. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KYLIN-2438) replace scan threshold with max scan bytes
[ https://issues.apache.org/jira/browse/KYLIN-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15858054#comment-15858054 ] hongbin ma commented on KYLIN-2438: --- +1 happy to remove dependency on row size estimation > replace scan threshold with max scan bytes > -- > > Key: KYLIN-2438 > URL: https://issues.apache.org/jira/browse/KYLIN-2438 > Project: Kylin > Issue Type: Improvement > Components: Query Engine, Storage - HBase >Affects Versions: v1.6.0 >Reporter: Dayue Gao >Assignee: Dayue Gao > > In order to guard against bad queries that can consume lots of memory and > potentially crash kylin / hbase server, kylin limits the maximum number of > rows query can scan. The maximum value is chosen based on two configs > # *kylin.query.scan.threshold* is used if the query doesn't contain > memory-hungry metrics > # *kylin.query.mem.budget* / estimated_row_size is used otherwise as the per > region maximum. > This approach however has several deficiencies: > * It doesn't work with complex, varlen metrics very well. The estimated > threshold could be either too small or too large. If it's too small, good > queries are killed. If it's too large, bad queries are not banned. > * Row count doesn't correspond to memory consumption, thus it's difficult to > determine how large scan threshold should be set to. > * kylin.query.scan.threshold can't be override at cube level. > In this JIRA, I propose to replace the current row count based threshold with > a more intuitive size based threshold > * KYLIN-2437 will collect the number of bytes scanned at both region and > query level > * A new configuration *kylin.query.max-scan-bytes* will be added to limits > the maximum number of bytes query can scan > * *kylin.query.mem.budget* will be renamed to > *kylin.storage.hbase.coprocessor-max-scan-bytes*, which limits at region > level. No need to rely on estimations about row size any more. > * The above two configs scan be override at cube level > * the old *kylin.query.scan.threshold* will be deprecated -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (KYLIN-2222) web ui uses rest api to decide which dim encoding is valid for different typed columns
[ https://issues.apache.org/jira/browse/KYLIN-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hongbin ma resolved KYLIN-. --- Resolution: Fixed Fix Version/s: v2.0.0 > web ui uses rest api to decide which dim encoding is valid for different > typed columns > -- > > Key: KYLIN- > URL: https://issues.apache.org/jira/browse/KYLIN- > Project: Kylin > Issue Type: Improvement >Reporter: hongbin ma >Assignee: hongbin ma > Fix For: v2.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KYLIN-2222) web ui uses rest api to decide which dim encoding is valid for different typed columns
[ https://issues.apache.org/jira/browse/KYLIN-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857669#comment-15857669 ] hongbin ma commented on KYLIN-: --- the dimension-encoding to column-type matrix: || ||Float Numbers|| Integer Numbers|| Time || String || |boolean encoding| N | Y| N|Y| |date encoding| N | Y| Y|Y| |time encoding| N | Y| Y|Y| |dict encoding| Y| Y| Y|Y| |fixed_length encoding| N | N| N|Y| |fixed_length_hex encoding| N | N| N|Y| |integer encoding| N | Y| N|Y| > web ui uses rest api to decide which dim encoding is valid for different > typed columns > -- > > Key: KYLIN- > URL: https://issues.apache.org/jira/browse/KYLIN- > Project: Kylin > Issue Type: Improvement >Reporter: hongbin ma >Assignee: hongbin ma > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KYLIN-2435) two EXTRACT on a column will fail if there exists NULL values for the column
hongbin ma created KYLIN-2435: - Summary: two EXTRACT on a column will fail if there exists NULL values for the column Key: KYLIN-2435 URL: https://issues.apache.org/jira/browse/KYLIN-2435 Project: Kylin Issue Type: Bug Reporter: hongbin ma Assignee: hongbin ma 2000-01-01 19:12:33,US,android,10.22 2001-01-01 9:12:33,US,windows,9.12 2002-05-02 20:12:03,CN,windows,3.33 \N,CN,windows,3.32 create table testtable (starttime TIMESTAMP,country STRING, client STRING, price DECIMAL(18,4)) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; the following query will succeed: {code} select sum(price),extract (year from starttime) from testtable group by extract (year from starttime) {code} but the following will fail: {code} select sum(price) from testtable group by extract (year from starttime), extract (month from starttime) {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (KYLIN-2424) Optimize the integration test's performance
[ https://issues.apache.org/jira/browse/KYLIN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15853970#comment-15853970 ] hongbin ma edited comment on KYLIN-2424 at 2/6/17 1:20 PM: --- [~Shaofengshi] great work! [~yimingliu] looks like it's abbreviation for "true". If so it could be confusing, why not just use "true"? was (Author: mahongbin): [~Shaofengshi] great work! I can close KYLIN-2015 safely now [~yimingliu] looks like it's abbreviation for "true". If so it could be confusing, why not just use "true"? > Optimize the integration test's performance > --- > > Key: KYLIN-2424 > URL: https://issues.apache.org/jira/browse/KYLIN-2424 > Project: Kylin > Issue Type: Improvement > Components: Tools, Build and Test >Reporter: Shaofeng SHI >Assignee: Shaofeng SHI > Fix For: v2.0.0 > > > Kylin's integration test is slow, especially the ITCombinationTest. Most of > time are spent on H2 to execute the test queries. In a latest integration > test, this test case take 90 minutes to finish. > By checking H2's document, I think the main problem is the absence of index > on the tables, while index is very important for a relational database's > query performance. So when Kylin create the tables in H2, shoud create index > on the columns that will be used in the queries, like the pk/fk, the > filtering columns etc. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KYLIN-2015) replace h2 with alternatives like sqllite or mysql
[ https://issues.apache.org/jira/browse/KYLIN-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15853972#comment-15853972 ] hongbin ma commented on KYLIN-2015: --- The performance issue of H2 is solved in KYLIN-2424 without replacing H2 > replace h2 with alternatives like sqllite or mysql > -- > > Key: KYLIN-2015 > URL: https://issues.apache.org/jira/browse/KYLIN-2015 > Project: Kylin > Issue Type: Improvement >Reporter: hongbin ma >Assignee: hongbin ma > > in IT we compare kylin's result with H2's results to ensure query correctness. > however h2 only supports part of the SQL syntax. For example, it cannot > support functions like timestampadd, or (DATE'2013-01-02' + interval '3' > day). What's more, subqueries are observed to be very slow on H2. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KYLIN-2424) Optimize the integration test's performance
[ https://issues.apache.org/jira/browse/KYLIN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15853970#comment-15853970 ] hongbin ma commented on KYLIN-2424: --- [~Shaofengshi] great work! I can close KYLIN-2015 safely now [~yimingliu] looks like it's abbreviation for "true". If so it could be confusing, why not just use "true"? > Optimize the integration test's performance > --- > > Key: KYLIN-2424 > URL: https://issues.apache.org/jira/browse/KYLIN-2424 > Project: Kylin > Issue Type: Improvement > Components: Tools, Build and Test >Reporter: Shaofeng SHI >Assignee: Shaofeng SHI > Fix For: v2.0.0 > > > Kylin's integration test is slow, especially the ITCombinationTest. Most of > time are spent on H2 to execute the test queries. In a latest integration > test, this test case take 90 minutes to finish. > By checking H2's document, I think the main problem is the absence of index > on the tables, while index is very important for a relational database's > query performance. So when Kylin create the tables in H2, shoud create index > on the columns that will be used in the queries, like the pk/fk, the > filtering columns etc. -- This message was sent by Atlassian JIRA (v6.3.15#6346)