[jira] [Commented] (KYLIN-4341) by-level cuboid intermediate files are left behind and not cleaned up after job is complete
[ https://issues.apache.org/jira/browse/KYLIN-4341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049369#comment-17049369 ] Vsevolod Ostapenko commented on KYLIN-4341: --- @[~wangrupeng] I have to respectfully disagree with this naive explanation of this not being a bug. I did not request segment merging in the cube configuration, therefore files are expected to be removed. Removing files manually is not a constructive proposition. Any manual management of intermediate files is an operational burden. Plus, it's not properly documented. > by-level cuboid intermediate files are left behind and not cleaned up after > job is complete > --- > > Key: KYLIN-4341 > URL: https://issues.apache.org/jira/browse/KYLIN-4341 > Project: Kylin > Issue Type: Bug > Components: Job Engine >Affects Versions: v2.6.4 > Environment: Kylin 2.6.4, CenOS 7.6, HDP 2.6.5 >Reporter: Vsevolod Ostapenko >Assignee: wangrupeng >Priority: Major > > Setup: MR as a cube build engine and by-level cube build strategy (auto > picked). > Upon completion of a cube segment build job a number of intermediate files > are still left behind. > Namely, output of the MR-jobs that produce the base cuboid, subsequent level > cuboids, as well as rowkey_stats from the hfile creation step. > The files in question consume about the same amount of space in HDFS as the > final hfile. > This lead to wasted space in HDFS that is not released for as long as the > corresponding cube segment is online. The only point the leaked space is > released, is when segment is offlined and cleaned up as part of the segment > retention. > Sample output is as follows. > {quote}$ hadoop fs -ls -R > /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/ > drwxr-xr-x - kylin hdfs 0 2020-01-07 04:44 > /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89 > drwxr-xr-x - kylin hdfs 0 2020-01-07 04:26 > /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid > drwxr-xr-x - kylin hdfs 0 2020-01-07 04:36 > /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid > -rw-r--r-- 2 kylin hdfs 0 2020-01-07 04:36 > /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/_SUCCESS > -rw-r--r-- 2 kylin hdfs 51570048 2020-01-07 04:35 > /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/part-r-0 > -rw-r--r-- 2 kylin hdfs 51477377 2020-01-07 04:36 > /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/part-r-1 > -rw-r--r-- 2 kylin hdfs 51615162 2020-01-07 04:35 > /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/part-r-2 > -rw-r--r-- 2 kylin hdfs 51591031 2020-01-07 04:36 > /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/part-r-3 > -rw-r--r-- 2 kylin hdfs 51648914 2020-01-07 04:35 > /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/part-r-4 > -rw-r--r-- 2 kylin hdfs 51532761 2020-01-07 04:36 > /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/part-r-5 > -rw-r--r-- 2 kylin hdfs 51455652 2020-01-07 04:35 > /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/part-r-6 > -rw-r--r-- 2 kylin hdfs 51552752 2020-01-07 04:36 > /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/part-r-7 > drwxr-xr-x - kylin hdfs 0 2020-01-07 04:25 > /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_base_cuboid > -rw-r--r-- 2 kylin hdfs 0 2020-01-07 04:25 > /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_base_cuboid/_SUCCESS > -rw-r--r-- 2 kylin hdfs 16293012 2020-01-07 04:25 > /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_base_cuboid/part-r-0 > -rw-r--r-- 2 kylin hdfs 16283730 2020-01-07 04:25 > /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_base_cuboid/part-r-1 > -rw-r--r-- 2 kylin hdfs 16288965 2020-01-07 04:25 > /u
[jira] [Created] (KYLIN-4350) Pushdown improperly rewrites the query causing it to fail
Vsevolod Ostapenko created KYLIN-4350: - Summary: Pushdown improperly rewrites the query causing it to fail Key: KYLIN-4350 URL: https://issues.apache.org/jira/browse/KYLIN-4350 Project: Kylin Issue Type: Bug Components: Query Engine Affects Versions: v2.6.4 Environment: HDP 2.6.5, Kylin 2.6.4, CentOS 7.6 Reporter: Vsevolod Ostapenko A query that uses WITH clause and is subject for pushdown to Hive (or Impala) for execution is incorrectly rewritten before being submitted to the execution engine. Table aliases are attributed with database name, with makes query invalid. Sample log excerpts are below: {quote}2020-01-17 12:12:21,997 INFO [Query e844b846-c589-4729-5a04-483f6d73c834-31163] service.QueryService:404 : The original query: with t as ( SELECT ZETTICSDW.A_VL_HOURLY_V.IMSIID "ZETTICSDW_A_VL_HOURLY_V_IMSIID", ZETTICSDW.A_VL_HOURLY_V.MEDIA_GAP_CALL_ID "ZETTICSDW_A_VL_HOURLY_V_MEDIA_GAP_CALL_ID", count(*) cnt FROM ZETTICSDW.A_VL_HOURLY_V WHERE ((ZETTICSDW.A_VL_HOURLY_V.THEDATE = '20200117') AND ((ZETTICSDW.A_VL_HOURLY_V.THEHOUR >= '10') AND (ZETTICSDW.A_VL_HOURLY_V.THEHOUR <= '10'))) GROUP BY ZETTICSDW.A_VL_HOURLY_V.IMSIID, ZETTICSDW.A_VL_HOURLY_V.MEDIA_GAP_CALL_ID ) select t.ZETTICSDW_A_VL_HOURLY_V_IMSIID, count(*) "vl_aggs_model___CD_MEDIA_GAP_CALL_ID" *from t* group by t.ZETTICSDW_A_VL_HOURLY_V_IMSIID ORDER BY "vl_aggs_model___CD_MEDIA_GAP_CALL_ID" desc LIMIT 500 2020-01-17 12:12:22,073 INFO [Query e844b846-c589-4729-5a04-483f6d73c834-31163] adhocquery.AbstractPushdownRunner:37 : the query is converted to with t as ( SELECT ZETTICSDW.A_VL_HOURLY_V.IMSIID `ZETTICSDW_A_VL_HOURLY_V_IMSIID`, ZETTICSDW.A_VL_HOURLY_V.MEDIA_GAP_CALL_ID `ZETTICSDW_A_VL_HOURLY_V_MEDIA_GAP_CALL_ID`, count(*) cnt FROM ZETTICSDW.A_VL_HOURLY_V WHERE ((ZETTICSDW.A_VL_HOURLY_V.THEDATE = '20200117') AND ((ZETTICSDW.A_VL_HOURLY_V.THEHOUR >= '10') AND (ZETTICSDW.A_VL_HOURLY_V.THEHOUR <= '10'))) GROUP BY ZETTICSDW.A_VL_HOURLY_V.IMSIID, ZETTICSDW.A_VL_HOURLY_V.MEDIA_GAP_CALL_ID ) select t.ZETTICSDW_A_VL_HOURLY_V_IMSIID, count(*) `vl_aggs_model___CD_MEDIA_GAP_CALL_ID` *{color:#FF}from ZETTICSDW.t{color}* group by t.ZETTICSDW_A_VL_HOURLY_V_IMSIID ORDER BY `vl_aggs_model___CD_MEDIA_GAP_CALL_ID` desc LIMIT 500 after applying converter org.apache.kylin.source.adhocquery.HivePushDownConverter 2020-01-17 12:12:22,108 ERROR [Query e844b846-c589-4729-5a04-483f6d73c834-31163] service.QueryService:989 : pushdown engine failed current query too org.apache.hive.service.cli.HiveSQLException: AnalysisException: Could not resolve table reference: '*zetticsdw.t*' {quote} Pushdown query should be submitted into query engine as written by the user. As the best effort Kylin push down executor should issue "use " over the same JDBC connection right before submitting the query. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4341) by-level cuboid intermediate files are left behind and not cleaned up after job is complete
Vsevolod Ostapenko created KYLIN-4341: - Summary: by-level cuboid intermediate files are left behind and not cleaned up after job is complete Key: KYLIN-4341 URL: https://issues.apache.org/jira/browse/KYLIN-4341 Project: Kylin Issue Type: Bug Components: Job Engine Affects Versions: v2.6.4 Environment: Kylin 2.6.4, CenOS 7.6, HDP 2.6.5 Reporter: Vsevolod Ostapenko Setup: MR as a cube build engine and by-level cube build strategy (auto picked). Upon completion of a cube segment build job a number of intermediate files are still left behind. Namely, output of the MR-jobs that produce the base cuboid, subsequent level cuboids, as well as rowkey_stats from the hfile creation step. The files in question consume about the same amount of space in HDFS as the final hfile. This lead to wasted space in HDFS that is not released for as long as the corresponding cube segment is online. The only point the leaked space is released, is when segment is offlined and cleaned up as part of the segment retention. Sample output is as follows. {quote}$ hadoop fs -ls -R /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/ drwxr-xr-x - kylin hdfs 0 2020-01-07 04:44 /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89 drwxr-xr-x - kylin hdfs 0 2020-01-07 04:26 /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid drwxr-xr-x - kylin hdfs 0 2020-01-07 04:36 /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid -rw-r--r-- 2 kylin hdfs 0 2020-01-07 04:36 /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/_SUCCESS -rw-r--r-- 2 kylin hdfs 51570048 2020-01-07 04:35 /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/part-r-0 -rw-r--r-- 2 kylin hdfs 51477377 2020-01-07 04:36 /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/part-r-1 -rw-r--r-- 2 kylin hdfs 51615162 2020-01-07 04:35 /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/part-r-2 -rw-r--r-- 2 kylin hdfs 51591031 2020-01-07 04:36 /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/part-r-3 -rw-r--r-- 2 kylin hdfs 51648914 2020-01-07 04:35 /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/part-r-4 -rw-r--r-- 2 kylin hdfs 51532761 2020-01-07 04:36 /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/part-r-5 -rw-r--r-- 2 kylin hdfs 51455652 2020-01-07 04:35 /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/part-r-6 -rw-r--r-- 2 kylin hdfs 51552752 2020-01-07 04:36 /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/part-r-7 drwxr-xr-x - kylin hdfs 0 2020-01-07 04:25 /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_base_cuboid -rw-r--r-- 2 kylin hdfs 0 2020-01-07 04:25 /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_base_cuboid/_SUCCESS -rw-r--r-- 2 kylin hdfs 16293012 2020-01-07 04:25 /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_base_cuboid/part-r-0 -rw-r--r-- 2 kylin hdfs 16283730 2020-01-07 04:25 /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_base_cuboid/part-r-1 -rw-r--r-- 2 kylin hdfs 16288965 2020-01-07 04:25 /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_base_cuboid/part-r-2 -rw-r--r-- 2 kylin hdfs 16270572 2020-01-07 04:25 /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_base_cuboid/part-r-3 drwxr-xr-x - kylin hdfs 0 2020-01-07 04:23 /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/rowkey_stats -rw-r--r-- 3 kylin hdfs 155 2020-01-07 04:23 /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/rowkey_stats/part-r-0_hfile {quote} Removing the job metadata using (metastore.sh clean --jobThreshold Ndays) does not help. Information about the job is removed, but no interm
[jira] [Commented] (KYLIN-3628) Query with lookup table always use latest snapshot
[ https://issues.apache.org/jira/browse/KYLIN-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944773#comment-16944773 ] Vsevolod Ostapenko commented on KYLIN-3628: --- On the subject of checking if lookup table is snapshotted as part of the cube. CubeDesc class already has a method findDimensionByTable(String lookupTableName). So, CubeManager.checkContainsSnapshotTable() method can be replaced with one call to the findDimensionByTable (which will reduce code duplication and fix the regression introduced by the prior version of the fix). Still, try/catch block in findLatestSnapshot() needs to be removed. > Query with lookup table always use latest snapshot > -- > > Key: KYLIN-3628 > URL: https://issues.apache.org/jira/browse/KYLIN-3628 > Project: Kylin > Issue Type: Improvement >Reporter: Na Zhai >Assignee: Na Zhai >Priority: Major > Fix For: v3.0.0-alpha2, v2.6.4 > > > If user queries a lookup table, Kylin will randomly selects a Cube (which has > the snapshot of this lookup table) to answer it. This causes uncertainty when > there are multiple cubes (share the same lookup): some cubes are newly built, > some not. If Kylin picks an old cube, the query result is old. > To remove this uncertainty, for such queries, either always use latest > snapshot, or use earlist snapshot. We believe the "latest" version is better. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (KYLIN-3628) Query with lookup table always use latest snapshot
[ https://issues.apache.org/jira/browse/KYLIN-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vsevolod Ostapenko reopened KYLIN-3628: --- The most recent change will silently swallow legit exceptions in the findLatestSnapshot and may result in an incorrect cube instance to be returned, effectively re-introducing the original bug for cases when there is an exception. > Query with lookup table always use latest snapshot > -- > > Key: KYLIN-3628 > URL: https://issues.apache.org/jira/browse/KYLIN-3628 > Project: Kylin > Issue Type: Improvement >Reporter: Na Zhai >Assignee: Na Zhai >Priority: Major > Fix For: v3.0.0-alpha2, v2.6.4 > > > If user queries a lookup table, Kylin will randomly selects a Cube (which has > the snapshot of this lookup table) to answer it. This causes uncertainty when > there are multiple cubes (share the same lookup): some cubes are newly built, > some not. If Kylin picks an old cube, the query result is old. > To remove this uncertainty, for such queries, either always use latest > snapshot, or use earlist snapshot. We believe the "latest" version is better. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KYLIN-3628) Query with lookup table always use latest snapshot
[ https://issues.apache.org/jira/browse/KYLIN-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16938797#comment-16938797 ] Vsevolod Ostapenko commented on KYLIN-3628: --- [~hit_lacus] The latest change effectively reintroduces the issue that was supposed to be fixed by the prior series of changes. I don't see a good reason for doing try/catch and silently swallowing _any_ exception and then returning the current cube, even when it does not contain lookup table snapshot. > Query with lookup table always use latest snapshot > -- > > Key: KYLIN-3628 > URL: https://issues.apache.org/jira/browse/KYLIN-3628 > Project: Kylin > Issue Type: Improvement >Reporter: Na Zhai >Assignee: Na Zhai >Priority: Major > Fix For: v3.0.0-alpha2, v2.6.4 > > > If user queries a lookup table, Kylin will randomly selects a Cube (which has > the snapshot of this lookup table) to answer it. This causes uncertainty when > there are multiple cubes (share the same lookup): some cubes are newly built, > some not. If Kylin picks an old cube, the query result is old. > To remove this uncertainty, for such queries, either always use latest > snapshot, or use earlist snapshot. We believe the "latest" version is better. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KYLIN-4107) StorageCleanupJob fails to delete Hive tables with "Argument list too long" error
[ https://issues.apache.org/jira/browse/KYLIN-4107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16895586#comment-16895586 ] Vsevolod Ostapenko commented on KYLIN-4107: --- [~codingforfun] I see that in your code fix you are batching Hive drop table commands to avoid bash command line becoming too long. That should work, yet, instead of using a hard-coded value of 20 drop statements per batch, it would be a bit cleaner to add a config parameter with 20 being the default value. My 2 cents, Vsevolod. > StorageCleanupJob fails to delete Hive tables with "Argument list too long" > error > - > > Key: KYLIN-4107 > URL: https://issues.apache.org/jira/browse/KYLIN-4107 > Project: Kylin > Issue Type: Bug > Components: Storage - HBase >Affects Versions: v2.6.2 > Environment: CentOS 7.6, HDP 2.6.5, Kylin 2.6.3 >Reporter: Vsevolod Ostapenko >Assignee: weibin0516 >Priority: Major > Fix For: v3.0.0-beta > > > On a system with multiple Kylin developers that experiment with cube design > and (re)build/drop cube segments often intermediate Hive tables and HBase > left over tables accumulate very quickly. > After a certain point storage cleanup cannot be executed using suggested > method: > {{${KYLIN_HOME}/bin/kylin.sh org.apache.kylin.tool.StorageCleanupJob --delete > true}} > Apparently, storage cleanup job creates a single shell command to drop all > Hive tables, which fails to execute because command line is just too long. > For example: > {quote} > 2019-07-23 17:47:31,611 ERROR [main] job.StorageCleanupJob:377 : Error during > deleting Hive tables > java.io.IOException: Cannot run program "/bin/bash": error=7, Argument list > too long > at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048) > at > org.apache.kylin.common.util.CliCommandExecutor.runNativeCommand(CliCommandExecutor.java:133) > at > org.apache.kylin.common.util.CliCommandExecutor.execute(CliCommandExecutor.java:89) > at > org.apache.kylin.common.util.CliCommandExecutor.execute(CliCommandExecutor.java:83) > at > org.apache.kylin.rest.job.StorageCleanupJob.deleteHiveTables(StorageCleanupJob.java:409) > at > org.apache.kylin.rest.job.StorageCleanupJob.cleanUnusedIntermediateHiveTableInternal(StorageCleanupJob.java:375) > at > org.apache.kylin.rest.job.StorageCleanupJob.cleanUnusedIntermediateHiveTable(StorageCleanupJob.java:278) > at > org.apache.kylin.rest.job.StorageCleanupJob.cleanup(StorageCleanupJob.java:151) > at > org.apache.kylin.rest.job.StorageCleanupJob.execute(StorageCleanupJob.java:145) > at > org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:37) > at org.apache.kylin.tool.StorageCleanupJob.main(StorageCleanupJob.java:27) > Caused by: java.io.IOException: error=7, Argument list too long > at java.lang.UNIXProcess.forkAndExec(Native Method) > at java.lang.UNIXProcess.(UNIXProcess.java:247) > at java.lang.ProcessImpl.start(ProcessImpl.java:134) > at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) > ... 10 more > {quote} > Instead of composing one long command, storage cleanup need to generate a > script and feed that into beeline or hive CLI. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (KYLIN-4107) StorageCleanupJob fails to delete Hive tables with "Argument list too long" error
Vsevolod Ostapenko created KYLIN-4107: - Summary: StorageCleanupJob fails to delete Hive tables with "Argument list too long" error Key: KYLIN-4107 URL: https://issues.apache.org/jira/browse/KYLIN-4107 Project: Kylin Issue Type: Bug Components: Storage - HBase Affects Versions: v2.6.2 Environment: CentOS 7.6, HDP 2.6.5, Kylin 2.6.3 Reporter: Vsevolod Ostapenko On a system with multiple Kylin developers that experiment with cube design and (re)build/drop cube segments often intermediate Hive tables and HBase left over tables accumulate very quickly. After a certain point storage cleanup cannot be executed using suggested method: {{${KYLIN_HOME}/bin/kylin.sh org.apache.kylin.tool.StorageCleanupJob --delete true}} Apparently, storage cleanup job creates a single shell command to drop all Hive tables, which fails to execute because command line is just too long. For example: {quote} 2019-07-23 17:47:31,611 ERROR [main] job.StorageCleanupJob:377 : Error during deleting Hive tables java.io.IOException: Cannot run program "/bin/bash": error=7, Argument list too long at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048) at org.apache.kylin.common.util.CliCommandExecutor.runNativeCommand(CliCommandExecutor.java:133) at org.apache.kylin.common.util.CliCommandExecutor.execute(CliCommandExecutor.java:89) at org.apache.kylin.common.util.CliCommandExecutor.execute(CliCommandExecutor.java:83) at org.apache.kylin.rest.job.StorageCleanupJob.deleteHiveTables(StorageCleanupJob.java:409) at org.apache.kylin.rest.job.StorageCleanupJob.cleanUnusedIntermediateHiveTableInternal(StorageCleanupJob.java:375) at org.apache.kylin.rest.job.StorageCleanupJob.cleanUnusedIntermediateHiveTable(StorageCleanupJob.java:278) at org.apache.kylin.rest.job.StorageCleanupJob.cleanup(StorageCleanupJob.java:151) at org.apache.kylin.rest.job.StorageCleanupJob.execute(StorageCleanupJob.java:145) at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:37) at org.apache.kylin.tool.StorageCleanupJob.main(StorageCleanupJob.java:27) Caused by: java.io.IOException: error=7, Argument list too long at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.(UNIXProcess.java:247) at java.lang.ProcessImpl.start(ProcessImpl.java:134) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) ... 10 more {quote} Instead of composing one long command, storage cleanup need to generate a script and feed that into beeline or hive CLI. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (KYLIN-3628) Query with lookup table always use latest snapshot
[ https://issues.apache.org/jira/browse/KYLIN-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16877286#comment-16877286 ] Vsevolod Ostapenko commented on KYLIN-3628: --- [~hit_lacus], I build local 2.6.2 with your changes applied on top and "derived" column check removed. It seems to work as intended. One suggestion though, debug message about overriding cube selection in the findLatestSnapshot() should be logged only when cube selection was really overridden. For example: {code:java} if (!cube.equals(cubeInstance)) { logger.debug("Picked cube {} over {} as it provides a more recent snapshot of the lookup table {}", cube, cubeInstance, lookupTableName); } {code} > Query with lookup table always use latest snapshot > -- > > Key: KYLIN-3628 > URL: https://issues.apache.org/jira/browse/KYLIN-3628 > Project: Kylin > Issue Type: Improvement >Reporter: Na Zhai >Assignee: Na Zhai >Priority: Major > Fix For: v2.6.0 > > > If user queries a lookup table, Kylin will randomly selects a Cube (which has > the snapshot of this lookup table) to answer it. This causes uncertainty when > there are multiple cubes (share the same lookup): some cubes are newly built, > some not. If Kylin picks an old cube, the query result is old. > To remove this uncertainty, for such queries, either always use latest > snapshot, or use earlist snapshot. We believe the "latest" version is better. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (KYLIN-3628) Query with lookup table always use latest snapshot
[ https://issues.apache.org/jira/browse/KYLIN-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16876282#comment-16876282 ] Vsevolod Ostapenko edited comment on KYLIN-3628 at 7/2/19 7:50 PM: --- Hi XiaoXiang, thank you for a quick turnaround with a fix. I looked at the code change, and the following check seems too restrictive: {code:java} if (dimensionDesc.isDerived() && dimensionDesc.getTable().equalsIgnoreCase(lookupTbl)) { {code} Per my internal tests, the entire lookup table is being snapshotted as part of a cube, if any dimension (non-derived or otherwise) is supplied by that table. So, dimension doesn't have to be derived. In fact, in my test cubes there is no derived dimensions at all (all "derived" properties for all the lookup tables are set to null), it does not prevent executing "select * from lookupTable" against such cube on a non-patched 2.6.2 system. To summarize, I believe that the "dimensionDesc.isDerived()" call should be removed from the expression above. Edit: I have to stay corrected, for the lookup table only the columns that are normal or derived dimensions are being snapshotted (not the entire table). Still, it doesn't change the stance that the "derived" check is not needed in the expression above. was (Author: seva_ostapenko): Hi XiaoXiang, thank you for a quick turnaround with a fix. I looked at the code change, and the following check seems too restrictive: {code:java} if (dimensionDesc.isDerived() && dimensionDesc.getTable().equalsIgnoreCase(lookupTbl)) { {code} Per my internal tests, the entire lookup table is being snapshotted as part of a cube, if any dimension (non-derived or otherwise) is supplied by that table. So, dimension doesn't have to be derived. In fact, in my test cubes there is no derived dimensions at all (all "derived" properties for all the lookup tables are set to null), it does not prevent executing "select * from lookupTable" against such cube on a non-patched 2.6.2 system. To summarize, I believe that the "dimensionDesc.isDerived()" call should be removed from the expression above. > Query with lookup table always use latest snapshot > -- > > Key: KYLIN-3628 > URL: https://issues.apache.org/jira/browse/KYLIN-3628 > Project: Kylin > Issue Type: Improvement >Reporter: Na Zhai >Assignee: Na Zhai >Priority: Major > Fix For: v2.6.0 > > > If user queries a lookup table, Kylin will randomly selects a Cube (which has > the snapshot of this lookup table) to answer it. This causes uncertainty when > there are multiple cubes (share the same lookup): some cubes are newly built, > some not. If Kylin picks an old cube, the query result is old. > To remove this uncertainty, for such queries, either always use latest > snapshot, or use earlist snapshot. We believe the "latest" version is better. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3628) Query with lookup table always use latest snapshot
[ https://issues.apache.org/jira/browse/KYLIN-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16876282#comment-16876282 ] Vsevolod Ostapenko commented on KYLIN-3628: --- Hi XiaoXiang, thank you for a quick turnaround with a fix. I looked at the code change, and the following check seems too restrictive: {code:java} if (dimensionDesc.isDerived() && dimensionDesc.getTable().equalsIgnoreCase(lookupTbl)) { {code} Per my internal tests, the entire lookup table is being snapshotted as part of a cube, if any dimension (non-derived or otherwise) is supplied by that table. So, dimension doesn't have to be derived. In fact, in my test cubes there is no derived dimensions at all (all "derived" properties for all the lookup tables are set to null), it does not prevent executing "select * from lookupTable" against such cube on a non-patched 2.6.2 system. To summarize, I believe that the "dimensionDesc.isDerived()" call should be removed from the expression above. > Query with lookup table always use latest snapshot > -- > > Key: KYLIN-3628 > URL: https://issues.apache.org/jira/browse/KYLIN-3628 > Project: Kylin > Issue Type: Improvement >Reporter: Na Zhai >Assignee: Na Zhai >Priority: Major > Fix For: v2.6.0 > > > If user queries a lookup table, Kylin will randomly selects a Cube (which has > the snapshot of this lookup table) to answer it. This causes uncertainty when > there are multiple cubes (share the same lookup): some cubes are newly built, > some not. If Kylin picks an old cube, the query result is old. > To remove this uncertainty, for such queries, either always use latest > snapshot, or use earlist snapshot. We believe the "latest" version is better. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (KYLIN-3628) Query with lookup table always use latest snapshot
[ https://issues.apache.org/jira/browse/KYLIN-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874997#comment-16874997 ] Vsevolod Ostapenko edited comment on KYLIN-3628 at 6/28/19 7:32 PM: This code change introduces a nasty bug, where Kylin will pick a random cube to answer the query that goes against a lookup table. For example, model X has two cubes C1 and C2. C1 is using lookup table L1 and C2 does not. C2 has more recent segments built than C1. A query "select * from L1" will fail with an error stating that C2 does not contain L1. Code analysis indicates, that LookupTableEnumerator overwrites prior cube choice correctly made by RealizationChooser. The bug is that LookupTableEnumerator finds the latest snapshot on all the realizations of all the cubes in the model, not the one that was already correctly chosen. That leads to a random behavior and unexpected failures. To address both the original issue (where lookup table snapshoted in multiple cubes and suitable cube is picked without looking at the segment build times) and the regression introduced by the change, CubeManager.findLatestSnapshot needs to check if lookup table is actually snapshotted as part of the cube realization. So, if there are mix of multiple cubes that do capture lookup table and ones that don't only the ones that do capture lookup table are ranked by build time. Affected file is CubeManager.java. The bug is in this check {code:java} if (realization.getModel().isLookupTable(lookupTableName)) { {code} getModel.isLookupTable() operates on the model level and across all the cubes, while the check needs to be scoped to the current cube only. was (Author: seva_ostapenko): This code change introduces a nasty bug, where Kylin will pick a random cube to answer the query that goes against a lookup table. For example, model X has two cubes C1 and C2. C1 is using lookup table L1 and C2 does not. C2 has more recent segments built than C1. A query "select * from L1" will fail with an error stating that C2 does not contain L1. Code analysis indicates, that LookupTableEnumerator overwrites prior cube choice correctly made by RealizationChooser. The bug is that LookupTableEnumerator finds the latest snapshot on all the realizations of all the cubes in the model, not the one that was already correctly chosen. That leads to a random behavior and unexpected failures. To address both the original issue (where lookup table snapshoted in multiple cubes and suitable cube is picked without looking at the segment build times) and the regression introduced by the change, CubeManager.findLatestSnapshot needs to check if lookup table is actually snapshotted as part of the cube realization. So, if there are mix of multiple cubes that do capture lookup table and ones that don't only the ones that do capture lookup table are ranked by build time. Affected file is CubeManager.java. The bug is in this check {code:java} if (realization.getModel().isLookupTable(lookupTableName)) { {code} getModel operates across all cubes, which the check needs to be scoped to the current cube only. > Query with lookup table always use latest snapshot > -- > > Key: KYLIN-3628 > URL: https://issues.apache.org/jira/browse/KYLIN-3628 > Project: Kylin > Issue Type: Improvement >Reporter: Na Zhai >Assignee: Na Zhai >Priority: Major > Fix For: v2.6.0 > > > If user queries a lookup table, Kylin will randomly selects a Cube (which has > the snapshot of this lookup table) to answer it. This causes uncertainty when > there are multiple cubes (share the same lookup): some cubes are newly built, > some not. If Kylin picks an old cube, the query result is old. > To remove this uncertainty, for such queries, either always use latest > snapshot, or use earlist snapshot. We believe the "latest" version is better. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (KYLIN-3628) Query with lookup table always use latest snapshot
[ https://issues.apache.org/jira/browse/KYLIN-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874997#comment-16874997 ] Vsevolod Ostapenko edited comment on KYLIN-3628 at 6/28/19 7:23 PM: This code change introduces a nasty bug, where Kylin will pick a random cube to answer the query that goes against a lookup table. For example, model X has two cubes C1 and C2. C1 is using lookup table L1 and C2 does not. C2 has more recent segments built than C1. A query "select * from L1" will fail with an error stating that C2 does not contain L1. Code analysis indicates, that LookupTableEnumerator overwrites prior cube choice correctly made by RealizationChooser. The bug is that LookupTableEnumerator finds the latest snapshot on all the realizations of all the cubes in the model, not the one that was already correctly chosen. That leads to a random behavior and unexpected failures. To address both the original issue (where lookup table snapshoted in multiple cubes and suitable cube is picked without looking at the segment build times) and the regression introduced by the change, CubeManager.findLatestSnapshot needs to check if lookup table is actually snapshotted as part of the cube realization. So, if there are mix of multiple cubes that do capture lookup table and ones that don't only the ones that do capture lookup table are ranked by build time. Affected file is CubeManager.java. The bug is in this check {code:java} if (realization.getModel().isLookupTable(lookupTableName)) { {code} getModel operates across all cubes, which the check needs to be scoped to the current cube only. was (Author: seva_ostapenko): This code change introduces a nasty bug, where Kylin will pick a random cube to answer the query that goes against a lookup table. For example, model X has two cubes C1 and C2. C1 is using lookup table L1 and C2 does not. C2 has more recent segments built than C1. A query "select * from L1" will fail with an error stating that C2 does not contain L1. Code analysis indicates, that LookupTableEnumerator overwrites prior cube choice correctly made by RealizationChooser. The bug is that LookupTableEnumerator finds the latest snapshot on all the realizations of all the cubes in the model, not the one that was already correctly chosen. That leads to a random behavior and unexpected failures. Affected code is in LookupTableEnumerator.java. {code:java} if (olapContext.realization instanceof CubeInstance) { cube = (CubeInstance) olapContext.realization; ProjectInstance project = cube.getProjectInstance(); List realizationEntries = project.getRealizationEntries(); String lookupTableName = olapContext.firstTableScan.getTableName(); CubeManager cubeMgr = CubeManager.getInstance(cube.getConfig()); cube = cubeMgr.findLatestSnapshot(realizationEntries, lookupTableName); olapContext.realization = cube; } {code} > Query with lookup table always use latest snapshot > -- > > Key: KYLIN-3628 > URL: https://issues.apache.org/jira/browse/KYLIN-3628 > Project: Kylin > Issue Type: Improvement >Reporter: Na Zhai >Assignee: Na Zhai >Priority: Major > Fix For: v2.6.0 > > > If user queries a lookup table, Kylin will randomly selects a Cube (which has > the snapshot of this lookup table) to answer it. This causes uncertainty when > there are multiple cubes (share the same lookup): some cubes are newly built, > some not. If Kylin picks an old cube, the query result is old. > To remove this uncertainty, for such queries, either always use latest > snapshot, or use earlist snapshot. We believe the "latest" version is better. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (KYLIN-3628) Query with lookup table always use latest snapshot
[ https://issues.apache.org/jira/browse/KYLIN-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874997#comment-16874997 ] Vsevolod Ostapenko edited comment on KYLIN-3628 at 6/28/19 3:32 PM: This code change introduces a nasty bug, where Kylin will pick a random cube to answer the query that goes against a lookup table. For example, model X has two cubes C1 and C2. C1 is using lookup table L1 and C2 does not. C2 has more recent segments built than C1. A query "select * from L1" will fail with an error stating that C2 does not contain L1. Code analysis indicates, that LookupTableEnumerator overwrites prior cube choice correctly made by RealizationChooser. The bug is that LookupTableEnumerator finds the latest snapshot on all the realizations of all the cubes in the model, not the one that was already correctly chosen. That leads to a random behavior and unexpected failures. Affected code is in LookupTableEnumerator.java. {code:java} if (olapContext.realization instanceof CubeInstance) { cube = (CubeInstance) olapContext.realization; ProjectInstance project = cube.getProjectInstance(); List realizationEntries = project.getRealizationEntries(); String lookupTableName = olapContext.firstTableScan.getTableName(); CubeManager cubeMgr = CubeManager.getInstance(cube.getConfig()); cube = cubeMgr.findLatestSnapshot(realizationEntries, lookupTableName); olapContext.realization = cube; } {code} was (Author: seva_ostapenko): This code change introduces a nasty bug, where Kylin will pick a random cube to answer the query that goes against a lookup table. For exapmple, model X has two cubes C1 and C2. C1 is using lookup table L1 and C2 does not. C2 has more recent segments built than C1. A query "select * from L1" will fail with an error stating that C2 does not contain L1. Code analysis indicates, that LookupTableEnumerator overwrites prior cube choice correctly made by RealizationChooser. The bug is that LookupTableEnumerator finds the latest snapshot on all the realizations of all the cubes in the model, not the one that was already correctly chosen. That leads to a random behavior and unexpected failures. Affected code is in LookupTableEnumerator.java. {code:java} if (olapContext.realization instanceof CubeInstance) { cube = (CubeInstance) olapContext.realization; ProjectInstance project = cube.getProjectInstance(); List realizationEntries = project.getRealizationEntries(); String lookupTableName = olapContext.firstTableScan.getTableName(); CubeManager cubeMgr = CubeManager.getInstance(cube.getConfig()); cube = cubeMgr.findLatestSnapshot(realizationEntries, lookupTableName); olapContext.realization = cube; } {code} > Query with lookup table always use latest snapshot > -- > > Key: KYLIN-3628 > URL: https://issues.apache.org/jira/browse/KYLIN-3628 > Project: Kylin > Issue Type: Improvement >Reporter: Na Zhai >Assignee: Na Zhai >Priority: Major > Fix For: v2.6.0 > > > If user queries a lookup table, Kylin will randomly selects a Cube (which has > the snapshot of this lookup table) to answer it. This causes uncertainty when > there are multiple cubes (share the same lookup): some cubes are newly built, > some not. If Kylin picks an old cube, the query result is old. > To remove this uncertainty, for such queries, either always use latest > snapshot, or use earlist snapshot. We believe the "latest" version is better. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (KYLIN-3628) Query with lookup table always use latest snapshot
[ https://issues.apache.org/jira/browse/KYLIN-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874997#comment-16874997 ] Vsevolod Ostapenko edited comment on KYLIN-3628 at 6/28/19 3:32 PM: This code change introduces a nasty bug, where Kylin will pick a random cube to answer the query that goes against a lookup table. For exapmple, model X has two cubes C1 and C2. C1 is using lookup table L1 and C2 does not. C2 has more recent segments built than C1. A query "select * from L1" will fail with an error stating that C2 does not contain L1. Code analysis indicates, that LookupTableEnumerator overwrites prior cube choice correctly made by RealizationChooser. The bug is that LookupTableEnumerator finds the latest snapshot on all the realizations of all the cubes in the model, not the one that was already correctly chosen. That leads to a random behavior and unexpected failures. Affected code is in LookupTableEnumerator.java. {code:java} if (olapContext.realization instanceof CubeInstance) { cube = (CubeInstance) olapContext.realization; ProjectInstance project = cube.getProjectInstance(); List realizationEntries = project.getRealizationEntries(); String lookupTableName = olapContext.firstTableScan.getTableName(); CubeManager cubeMgr = CubeManager.getInstance(cube.getConfig()); cube = cubeMgr.findLatestSnapshot(realizationEntries, lookupTableName); olapContext.realization = cube; } {code} was (Author: seva_ostapenko): This code change introduces a nasty bug, where Kylin will pick a random cube to answer the query that goes against a lookup table. For exapmple, model X has two cubes C1 and C2. C1 is using lookup table L1 and C2 does not. C2 has more recent segments built than C1. Query select * from L1 will fail with error that C2 does not contain L1. Code analysis indicates, that LookupTableEnumerator overwrites prior cube choice correctly made by RealizationChooser. The bug is that LookupTableEnumerator finds the latest snapshot on all the realizations of all the cubes in the model, not the one that was already correctly chosen. That leads to a random behavior and unexpected failures. Affected code is in LookupTableEnumerator.java. {code:java} if (olapContext.realization instanceof CubeInstance) { cube = (CubeInstance) olapContext.realization; ProjectInstance project = cube.getProjectInstance(); List realizationEntries = project.getRealizationEntries(); String lookupTableName = olapContext.firstTableScan.getTableName(); CubeManager cubeMgr = CubeManager.getInstance(cube.getConfig()); cube = cubeMgr.findLatestSnapshot(realizationEntries, lookupTableName); olapContext.realization = cube; } {code} > Query with lookup table always use latest snapshot > -- > > Key: KYLIN-3628 > URL: https://issues.apache.org/jira/browse/KYLIN-3628 > Project: Kylin > Issue Type: Improvement >Reporter: Na Zhai >Assignee: Na Zhai >Priority: Major > Fix For: v2.6.0 > > > If user queries a lookup table, Kylin will randomly selects a Cube (which has > the snapshot of this lookup table) to answer it. This causes uncertainty when > there are multiple cubes (share the same lookup): some cubes are newly built, > some not. If Kylin picks an old cube, the query result is old. > To remove this uncertainty, for such queries, either always use latest > snapshot, or use earlist snapshot. We believe the "latest" version is better. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (KYLIN-3628) Query with lookup table always use latest snapshot
[ https://issues.apache.org/jira/browse/KYLIN-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vsevolod Ostapenko reopened KYLIN-3628: --- This code change introduces a nasty bug, where Kylin will pick a random cube to answer the query that goes against a lookup table. For exapmple, model X has two cubes C1 and C2. C1 is using lookup table L1 and C2 does not. C2 has more recent segments built than C1. Query select * from L1 will fail with error that C2 does not contain L1. Code analysis indicates, that LookupTableEnumerator overwrites prior cube choice correctly made by RealizationChooser. The bug is that LookupTableEnumerator finds the latest snapshot on all the realizations of all the cubes in the model, not the one that was already correctly chosen. That leads to a random behavior and unexpected failures. Affected code is in LookupTableEnumerator.java. {code:java} if (olapContext.realization instanceof CubeInstance) { cube = (CubeInstance) olapContext.realization; ProjectInstance project = cube.getProjectInstance(); List realizationEntries = project.getRealizationEntries(); String lookupTableName = olapContext.firstTableScan.getTableName(); CubeManager cubeMgr = CubeManager.getInstance(cube.getConfig()); cube = cubeMgr.findLatestSnapshot(realizationEntries, lookupTableName); olapContext.realization = cube; } {code} > Query with lookup table always use latest snapshot > -- > > Key: KYLIN-3628 > URL: https://issues.apache.org/jira/browse/KYLIN-3628 > Project: Kylin > Issue Type: Improvement >Reporter: Na Zhai >Assignee: Na Zhai >Priority: Major > Fix For: v2.6.0 > > > If user queries a lookup table, Kylin will randomly selects a Cube (which has > the snapshot of this lookup table) to answer it. This causes uncertainty when > there are multiple cubes (share the same lookup): some cubes are newly built, > some not. If Kylin picks an old cube, the query result is old. > To remove this uncertainty, for such queries, either always use latest > snapshot, or use earlist snapshot. We believe the "latest" version is better. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3842) kylinProperties.js Unable to get the public configuration of the first line in the front end
[ https://issues.apache.org/jira/browse/KYLIN-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16824564#comment-16824564 ] Vsevolod Ostapenko commented on KYLIN-3842: --- I created a patch (attached) that should address both the original concern and the regression introduced by the prior bug fix attempt. Please review, and either comment or approve. > kylinProperties.js Unable to get the public configuration of the first line > in the front end > > > Key: KYLIN-3842 > URL: https://issues.apache.org/jira/browse/KYLIN-3842 > Project: Kylin > Issue Type: Bug > Components: Web >Affects Versions: v2.5.2 >Reporter: Yuzhang QIU >Assignee: Yuzhang QIU >Priority: Minor > Fix For: v2.6.2 > > Attachments: KYLIN-3842.master.001.patch > > > Hi dear team: > I'm developing OLAP Platform based on Kylin2.5.2. During my work, I found > that kylinProperties.js:37(getProperty(name)) can't get the property of the > first line in the '_config' which initialized through /admin/public_config. > For example, the public config is > 'kylin.restclient.connection.default-max-per-route=20\nkylin.restclient.connection.max-total=200\nkylin.engine.default=2\nkylin.storage.default=2\n > kylin.web.hive-limit=20\nkylin.web.help.length=4\n'. I expected to get 20 > but got '' when I want to get config by key > 'kylin.restclient.connection.default-max-per-route'. This problem caused by > 'var keyIndex = _config.indexOf('\n' + name + '=');'(at > kylinProperties.js:37) return -1 for those names before which don't have an > \n(at the first line). > Then, I debug the AdminService.java, KylinConfig.java and found that the > KylinConfig.java:517(around this line, in method > exportToString(Collection propertyKeys)) build the public config > string with a char '\n' after each property, which cause the first property > don't has '\n' before it. > Those are what I found, which will cause problem for developers. > How do you think? > Best regard > yuzhang -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KYLIN-3842) kylinProperties.js Unable to get the public configuration of the first line in the front end
[ https://issues.apache.org/jira/browse/KYLIN-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vsevolod Ostapenko updated KYLIN-3842: -- Attachment: KYLIN-3842.master.001.patch > kylinProperties.js Unable to get the public configuration of the first line > in the front end > > > Key: KYLIN-3842 > URL: https://issues.apache.org/jira/browse/KYLIN-3842 > Project: Kylin > Issue Type: Bug > Components: Web >Affects Versions: v2.5.2 >Reporter: Yuzhang QIU >Assignee: Yuzhang QIU >Priority: Minor > Fix For: v2.6.2 > > Attachments: KYLIN-3842.master.001.patch > > > Hi dear team: > I'm developing OLAP Platform based on Kylin2.5.2. During my work, I found > that kylinProperties.js:37(getProperty(name)) can't get the property of the > first line in the '_config' which initialized through /admin/public_config. > For example, the public config is > 'kylin.restclient.connection.default-max-per-route=20\nkylin.restclient.connection.max-total=200\nkylin.engine.default=2\nkylin.storage.default=2\n > kylin.web.hive-limit=20\nkylin.web.help.length=4\n'. I expected to get 20 > but got '' when I want to get config by key > 'kylin.restclient.connection.default-max-per-route'. This problem caused by > 'var keyIndex = _config.indexOf('\n' + name + '=');'(at > kylinProperties.js:37) return -1 for those names before which don't have an > \n(at the first line). > Then, I debug the AdminService.java, KylinConfig.java and found that the > KylinConfig.java:517(around this line, in method > exportToString(Collection propertyKeys)) build the public config > string with a char '\n' after each property, which cause the first property > don't has '\n' before it. > Those are what I found, which will cause problem for developers. > How do you think? > Best regard > yuzhang -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (KYLIN-3842) kylinProperties.js Unable to get the public configuration of the first line in the front end
[ https://issues.apache.org/jira/browse/KYLIN-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vsevolod Ostapenko reopened KYLIN-3842: --- _config is one long string that is being searched using simple indexOf (instead of regex). The recent changes introduce regression where partial matches will be falsely picked up. For example, while searching for property XYZ the following case, incorrect property assignment will be picked: {quote}{{# XYZ=foo}} abcXYZ=bar XYZ=expected_value{quote} A trivial fix for the issue with the very first property in the property file that doesn't start with a comment is to prepend "\n" to _config upon initialization, if the first character of _config is not "\n". > kylinProperties.js Unable to get the public configuration of the first line > in the front end > > > Key: KYLIN-3842 > URL: https://issues.apache.org/jira/browse/KYLIN-3842 > Project: Kylin > Issue Type: Bug > Components: Web >Affects Versions: v2.5.2 >Reporter: Yuzhang QIU >Assignee: Yuzhang QIU >Priority: Minor > Fix For: v2.6.2 > > > Hi dear team: > I'm developing OLAP Platform based on Kylin2.5.2. During my work, I found > that kylinProperties.js:37(getProperty(name)) can't get the property of the > first line in the '_config' which initialized through /admin/public_config. > For example, the public config is > 'kylin.restclient.connection.default-max-per-route=20\nkylin.restclient.connection.max-total=200\nkylin.engine.default=2\nkylin.storage.default=2\n > kylin.web.hive-limit=20\nkylin.web.help.length=4\n'. I expected to get 20 > but got '' when I want to get config by key > 'kylin.restclient.connection.default-max-per-route'. This problem caused by > 'var keyIndex = _config.indexOf('\n' + name + '=');'(at > kylinProperties.js:37) return -1 for those names before which don't have an > \n(at the first line). > Then, I debug the AdminService.java, KylinConfig.java and found that the > KylinConfig.java:517(around this line, in method > exportToString(Collection propertyKeys)) build the public config > string with a char '\n' after each property, which cause the first property > don't has '\n' before it. > Those are what I found, which will cause problem for developers. > How do you think? > Best regard > yuzhang -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3322) TopN requires a SUM to work
[ https://issues.apache.org/jira/browse/KYLIN-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16775591#comment-16775591 ] Vsevolod Ostapenko commented on KYLIN-3322: --- It's also reported as KYLIN-3687. It's good to have documentation updated, but it's better to prevent creation of the incomplete TopN cube definitions through the UI and via API calls. > TopN requires a SUM to work > --- > > Key: KYLIN-3322 > URL: https://issues.apache.org/jira/browse/KYLIN-3322 > Project: Kylin > Issue Type: Bug > Components: Measure - TopN >Reporter: liyang >Assignee: Na Zhai >Priority: Major > > Currently if user creates a measure of TopN seller by sum of price, it is > required that user also creates a measure of SUM(price). Otherwise, NPE will > be thrown at query time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3686) Top_N metric code requires cube storage type to be ID_SHARDED_HBASE, but the Web UI defaults to ID_HBASE and provides no safeguards against storage type mismatch
[ https://issues.apache.org/jira/browse/KYLIN-3686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16709162#comment-16709162 ] Vsevolod Ostapenko commented on KYLIN-3686: --- Hi Chao, the "kylin.storage.default" parameter is not set in kylin.properties in our environment, so it does default ti ID_HBASE, as I understand. As far as I can see, the fix for KYLIN-3636 changes only cube defaults to be ID_SHARDED_HBASE. However, it does not address the misalignment and lack of safety checks between cube storage type and implied pre-requisites of Top_N metric. It's still possible to load cube with ID_HBASE from JSON and define Topn_N metric and get failing cube build with no clear explanations for the failure reasons. Thanks, Vsevolod. > Top_N metric code requires cube storage type to be ID_SHARDED_HBASE, but the > Web UI defaults to ID_HBASE and provides no safeguards against storage type > mismatch > - > > Key: KYLIN-3686 > URL: https://issues.apache.org/jira/browse/KYLIN-3686 > Project: Kylin > Issue Type: Improvement > Components: Measure - TopN, Metadata, Web >Affects Versions: v2.5.0 > Environment: HDP 2.5.6, Kylin 2.5 >Reporter: Vsevolod Ostapenko >Assignee: Shaofeng SHI >Priority: Major > Fix For: v2.6.0 > > > When new cube is defined via Kylin 2.5 UI, the default cube storage type is > set to 0 (ID_HBASE). > Top_N metric support is currently hard coded to expect cube storage type 2 > (ID_SHARDED_HBASE), and it *_does not_* check if the cube storage type is the > "sharded HBASE". > UI provides no safeguards either to prevent a user from defining a cube with > Top_N metric that would blow up on the cube building stage with a perplexing > stack trace like the following: > {quote}2018-10-22 16:15:50,388 ERROR [main] > org.apache.kylin.engine.mr.KylinMapper: > java.lang.ArrayIndexOutOfBoundsException > at java.lang.System.arraycopy(Native Method) > at > org.apache.kylin.engine.mr.common.NDCuboidBuilder.buildKeyInternal(NDCuboidBuilder.java:106) > at > org.apache.kylin.engine.mr.common.NDCuboidBuilder.buildKey(NDCuboidBuilder.java:71) > at > org.apache.kylin.engine.mr.steps.NDCuboidMapper.doMap(NDCuboidMapper.java:112) > at > org.apache.kylin.engine.mr.steps.NDCuboidMapper.doMap(NDCuboidMapper.java:47) > at org.apache.kylin.engine.mr.KylinMapper.map(KylinMapper.java:77) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) > {quote} > Please, either: > – modify Top_N code to support all cube storage types (not only > ID_SHARDED_HBASE), > or > – modify Top_N code to perform explicit check for cube storage type and > raise descriptive exception, when cube storage is not the one that is > expected. Plus update the UI to prevent the user from creating cube > definitions that are incompatible with the storage type compatible with Top_N > measure > PS: NDCCuboidBuilder,java contains the following line: > {quote}int offset = RowConstants.ROWKEY_SHARDID_LEN + > RowConstants.ROWKEY_CUBOIDID_LEN; // skip shard and cuboidId{quote} > If cube storage type is not ID_SHARDED_HBASE, offset is calculated > incorrectly, which leads to ArrayIndexOutOfBounds exception. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KYLIN-3686) Top_N metric code requires cube storage type to be ID_SHARDED_HBASE, but the Web UI defaults to ID_HBASE and provides no safeguards against storage type mismatch
[ https://issues.apache.org/jira/browse/KYLIN-3686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vsevolod Ostapenko updated KYLIN-3686: -- Description: When new cube is defined via Kylin 2.5 UI, the default cube storage type is set to 0 (ID_HBASE). Top_N metric support is currently hard coded to expect cube storage type 2 (ID_SHARDED_HBASE), and it *_does not_* check if the cube storage type is the "sharded HBASE". UI provides no safeguards either to prevent a user from defining a cube with Top_N metric that would blow up on the cube building stage with a perplexing stack trace like the following: {quote}2018-10-22 16:15:50,388 ERROR [main] org.apache.kylin.engine.mr.KylinMapper: java.lang.ArrayIndexOutOfBoundsException at java.lang.System.arraycopy(Native Method) at org.apache.kylin.engine.mr.common.NDCuboidBuilder.buildKeyInternal(NDCuboidBuilder.java:106) at org.apache.kylin.engine.mr.common.NDCuboidBuilder.buildKey(NDCuboidBuilder.java:71) at org.apache.kylin.engine.mr.steps.NDCuboidMapper.doMap(NDCuboidMapper.java:112) at org.apache.kylin.engine.mr.steps.NDCuboidMapper.doMap(NDCuboidMapper.java:47) at org.apache.kylin.engine.mr.KylinMapper.map(KylinMapper.java:77) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) {quote} Please, either: – modify Top_N code to support all cube storage types (not only ID_SHARDED_HBASE), or – modify Top_N code to perform explicit check for cube storage type and raise descriptive exception, when cube storage is not the one that is expected. Plus update the UI to prevent the user from creating cube definitions that are incompatible with the storage type compatible with Top_N measure PS: NDCCuboidBuilder,java contains the following line: {quote}int offset = RowConstants.ROWKEY_SHARDID_LEN + RowConstants.ROWKEY_CUBOIDID_LEN; // skip shard and cuboidId{quote} If cube storage type is not ID_SHARDED_HBASE, offset is calculated incorrectly, which leads to ArrayIndexOutOfBounds exception. was: When new cube is defined via Kylin 2.5 UI, the default cube storage type is set to 0 (ID_HBASE). Top_N metric support is currently hard coded to expect cube storage type 2 (ID_SHARDED_HBASE), and it *_does not_* check if the cube storage type is the "sharded HBASE". UI provides no safeguards either to prevent a user from defining a cube with Top_N metric that would blow up on the cube building stage with a perplexing stack trace like the following: {quote}2018-10-22 16:15:50,388 ERROR [main] org.apache.kylin.engine.mr.KylinMapper: java.lang.ArrayIndexOutOfBoundsException at java.lang.System.arraycopy(Native Method) at org.apache.kylin.engine.mr.common.NDCuboidBuilder.buildKeyInternal(NDCuboidBuilder.java:106) at org.apache.kylin.engine.mr.common.NDCuboidBuilder.buildKey(NDCuboidBuilder.java:71) at org.apache.kylin.engine.mr.steps.NDCuboidMapper.doMap(NDCuboidMapper.java:112) at org.apache.kylin.engine.mr.steps.NDCuboidMapper.doMap(NDCuboidMapper.java:47) at org.apache.kylin.engine.mr.KylinMapper.map(KylinMapper.java:77) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) {quote} Please, either: -- modify Top_N code to support all cube storage types (not only ID_SHARDED_HBASE), or -- modify Top_N code to perform explicit check for cube storage type and raise descriptive exception, when cube storage is not the one that is expected. Plus update the UI to prevent the user from creating cube definitions that are incompatible with the storage type compatible with Top_N measure > Top_N metric code requires cube storage type to be ID_SHARDED_HBASE, but the > Web UI defaults to ID_HBASE and provides no safeguards against storage type > mismatch > - > > Key: KYLIN-3686 > URL: https://issues.apache.org/jira/browse/KYLIN-3686 > Project: Kylin > Issue Type: Improvement > Components: Measure - TopN, Metadata, Web >Affects Versions: v2.5.0 > Environment: HDP 2.5.6, Kylin 2.5 >Reporter: Vsevolod Ostapenko >Priority: Major > > When new cube is defined via Kylin 2.5 UI, the default cube storage type is > set to 0 (ID_HBASE). > Top_N metric support is currently hard coded to expect cube storage type 2 > (ID_SHARDED_HBASE), and it *_does not_* check if the cube storage type is the > "sharded HBASE". > UI provides no safeguards either to prevent a user from defining a cube with > Top_N metric that would blow up on the cube building stage with a perplexing > stack trace like the following: > {quote}2018-10-22 16:15:50,388 ERROR [main] > org.apache.kylin.engine.mr.KylinMapper: > java.lang.ArrayIndexOutOfBoundsException > at java.lang.System.arraycopy(Native Method) > at > org.apache.kylin.engine.
[jira] [Updated] (KYLIN-3686) Top_N metric code requires cube storage type to be ID_SHARDED_HBASE, but the Web UI defaults to ID_HBASE and provides no safeguards against storage type mismatch
[ https://issues.apache.org/jira/browse/KYLIN-3686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vsevolod Ostapenko updated KYLIN-3686: -- Description: When new cube is defined via Kylin 2.5 UI, the default cube storage type is set to 0 (ID_HBASE). Top_N metric support is currently hard coded to expect cube storage type 2 (ID_SHARDED_HBASE), and it *_does not_* check if the cube storage type is the "sharded HBASE". UI provides no safeguards either to prevent a user from defining a cube with Top_N metric that would blow up on the cube building stage with a perplexing stack trace like the following: {quote}2018-10-22 16:15:50,388 ERROR [main] org.apache.kylin.engine.mr.KylinMapper: java.lang.ArrayIndexOutOfBoundsException at java.lang.System.arraycopy(Native Method) at org.apache.kylin.engine.mr.common.NDCuboidBuilder.buildKeyInternal(NDCuboidBuilder.java:106) at org.apache.kylin.engine.mr.common.NDCuboidBuilder.buildKey(NDCuboidBuilder.java:71) at org.apache.kylin.engine.mr.steps.NDCuboidMapper.doMap(NDCuboidMapper.java:112) at org.apache.kylin.engine.mr.steps.NDCuboidMapper.doMap(NDCuboidMapper.java:47) at org.apache.kylin.engine.mr.KylinMapper.map(KylinMapper.java:77) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) {quote} Please, either: -- modify Top_N code to support all cube storage types (not only ID_SHARDED_HBASE), or -- modify Top_N code to perform explicit check for cube storage type and raise descriptive exception, when cube storage is not the one that is expected. Plus update the UI to prevent the user from creating cube definitions that are incompatible with the storage type compatible with Top_N measure was: When new cube is defined via Kylin 2.5 UI, the default cube storage type is set to 0 (ID_HBASE). Top_N metric support is currently hard coded to expect cube storage type 2 (ID_SHARDED_HBASE), and it *_does not_* check if the cube storage type is the "sharded HBASE". UI provides no safeguards either to prevent a user from defining a cube with Top_N metric that would blow up on the cube building stage with a perplexing stack trace like the following: {quote}2018-10-22 16:15:50,388 ERROR [main] org.apache.kylin.engine.mr.KylinMapper: java.lang.ArrayIndexOutOfBoundsException at java.lang.System.arraycopy(Native Method) at org.apache.kylin.engine.mr.common.NDCuboidBuilder.buildKeyInternal(NDCuboidBuilder.java:106) at org.apache.kylin.engine.mr.common.NDCuboidBuilder.buildKey(NDCuboidBuilder.java:71) at org.apache.kylin.engine.mr.steps.NDCuboidMapper.doMap(NDCuboidMapper.java:112) at org.apache.kylin.engine.mr.steps.NDCuboidMapper.doMap(NDCuboidMapper.java:47) at org.apache.kylin.engine.mr.KylinMapper.map(KylinMapper.java:77) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) {quote} Please, either: ** modify Top_N code to support all cube storage types (not only ID_SHARDED_HBASE), or **modify Top_N code to perform explicit check for cube storage type and raise descriptive exception, when cube storage is not the one that is expected. Plus update the UI to prevent the user from creating cube definitions that are incompatible with the storage type compatible with Top_N measure > Top_N metric code requires cube storage type to be ID_SHARDED_HBASE, but the > Web UI defaults to ID_HBASE and provides no safeguards against storage type > mismatch > - > > Key: KYLIN-3686 > URL: https://issues.apache.org/jira/browse/KYLIN-3686 > Project: Kylin > Issue Type: Improvement > Components: Measure - TopN, Metadata, Web >Affects Versions: v2.5.0 > Environment: HDP 2.5.6, Kylin 2.5 >Reporter: Vsevolod Ostapenko >Priority: Major > > When new cube is defined via Kylin 2.5 UI, the default cube storage type is > set to 0 (ID_HBASE). > Top_N metric support is currently hard coded to expect cube storage type 2 > (ID_SHARDED_HBASE), and it *_does not_* check if the cube storage type is the > "sharded HBASE". > UI provides no safeguards either to prevent a user from defining a cube with > Top_N metric that would blow up on the cube building stage with a perplexing > stack trace like the following: > {quote}2018-10-22 16:15:50,388 ERROR [main] > org.apache.kylin.engine.mr.KylinMapper: > java.lang.ArrayIndexOutOfBoundsException > at java.lang.System.arraycopy(Native Method) > at > org.apache.kylin.engine.mr.common.NDCuboidBuilder.buildKeyInternal(NDCuboidBuilder.java:106) > at > org.apache.kylin.engine.mr.common.NDCuboidBuilder.buildKey(NDCuboidBuilder.java:71) > at > org.apache.kylin.engine.mr.steps.NDCuboidMapper.doMap(NDCuboidMapper.java:112) > at > org.apache.kylin.engine.mr.steps.NDCuboidMapp
[jira] [Updated] (KYLIN-3686) Top_N metric code requires cube storage type to be ID_SHARDED_HBASE, but the Web UI defaults to ID_HBASE and provides no safeguards against storage type mismatch
[ https://issues.apache.org/jira/browse/KYLIN-3686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vsevolod Ostapenko updated KYLIN-3686: -- Description: When new cube is defined via Kylin 2.5 UI, the default cube storage type is set to 0 (ID_HBASE). Top_N metric support is currently hard coded to expect cube storage type 2 (ID_SHARDED_HBASE), and it *_does not_* check if the cube storage type is the "sharded HBASE". UI provides no safeguards either to prevent a user from defining a cube with Top_N metric that would blow up on the cube building stage with a perplexing stack trace like the following: {quote}2018-10-22 16:15:50,388 ERROR [main] org.apache.kylin.engine.mr.KylinMapper: java.lang.ArrayIndexOutOfBoundsException at java.lang.System.arraycopy(Native Method) at org.apache.kylin.engine.mr.common.NDCuboidBuilder.buildKeyInternal(NDCuboidBuilder.java:106) at org.apache.kylin.engine.mr.common.NDCuboidBuilder.buildKey(NDCuboidBuilder.java:71) at org.apache.kylin.engine.mr.steps.NDCuboidMapper.doMap(NDCuboidMapper.java:112) at org.apache.kylin.engine.mr.steps.NDCuboidMapper.doMap(NDCuboidMapper.java:47) at org.apache.kylin.engine.mr.KylinMapper.map(KylinMapper.java:77) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) {quote} Please, either: ** modify Top_N code to support all cube storage types (not only ID_SHARDED_HBASE), or **modify Top_N code to perform explicit check for cube storage type and raise descriptive exception, when cube storage is not the one that is expected. Plus update the UI to prevent the user from creating cube definitions that are incompatible with the storage type compatible with Top_N measure was: When new cube is defined via Kylin 2.5 UI, the default cube storage type is set to 0 (ID_HBASE). Top_N metric support is currently hard coded to expect cube storage type 2 (ID_SHARDED_HBASE), and it *_does not_* check if the cube storage type is the "sharded HBASE". UI provides no safeguards either to prevent a user from defining a cube with Top_N metric that would blow up on the cube building stage with a perplexing stack trace like the following: {quote}2018-10-22 16:15:50,388 ERROR [main] org.apache.kylin.engine.mr.KylinMapper: java.lang.ArrayIndexOutOfBoundsException at java.lang.System.arraycopy(Native Method) at org.apache.kylin.engine.mr.common.NDCuboidBuilder.buildKeyInternal(NDCuboidBuilder.java:106) at org.apache.kylin.engine.mr.common.NDCuboidBuilder.buildKey(NDCuboidBuilder.java:71) at org.apache.kylin.engine.mr.steps.NDCuboidMapper.doMap(NDCuboidMapper.java:112) at org.apache.kylin.engine.mr.steps.NDCuboidMapper.doMap(NDCuboidMapper.java:47) at org.apache.kylin.engine.mr.KylinMapper.map(KylinMapper.java:77) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) {quote} Please, either * ** modify Top_N code to support all cube storage types (not only ID_SHARDED_HBASE), or **modify Top_N code to perform explicit check for cube storage type and raise descriptive exception, when cube storage is not the one that is expected. Plus update the UI to prevent the user from creating cube definitions that are incompatible with the storage type compatible with Top_N measure > Top_N metric code requires cube storage type to be ID_SHARDED_HBASE, but the > Web UI defaults to ID_HBASE and provides no safeguards against storage type > mismatch > - > > Key: KYLIN-3686 > URL: https://issues.apache.org/jira/browse/KYLIN-3686 > Project: Kylin > Issue Type: Improvement > Components: Measure - TopN, Metadata, Web >Affects Versions: v2.5.0 > Environment: HDP 2.5.6, Kylin 2.5 >Reporter: Vsevolod Ostapenko >Priority: Major > > When new cube is defined via Kylin 2.5 UI, the default cube storage type is > set to 0 (ID_HBASE). > Top_N metric support is currently hard coded to expect cube storage type 2 > (ID_SHARDED_HBASE), and it *_does not_* check if the cube storage type is the > "sharded HBASE". > UI provides no safeguards either to prevent a user from defining a cube with > Top_N metric that would blow up on the cube building stage with a perplexing > stack trace like the following: > {quote}2018-10-22 16:15:50,388 ERROR [main] > org.apache.kylin.engine.mr.KylinMapper: > java.lang.ArrayIndexOutOfBoundsException > at java.lang.System.arraycopy(Native Method) > at > org.apache.kylin.engine.mr.common.NDCuboidBuilder.buildKeyInternal(NDCuboidBuilder.java:106) > at > org.apache.kylin.engine.mr.common.NDCuboidBuilder.buildKey(NDCuboidBuilder.java:71) > at > org.apache.kylin.engine.mr.steps.NDCuboidMapper.doMap(NDCuboidMapper.java:112) > at > org.apache.kylin.engine.mr.steps.NDCuboidMa
[jira] [Updated] (KYLIN-3686) Top_N metric code requires cube storage type to be ID_SHARDED_HBASE, but the Web UI defaults to ID_HBASE and provides no safeguards against storage type mismatch
[ https://issues.apache.org/jira/browse/KYLIN-3686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vsevolod Ostapenko updated KYLIN-3686: -- Description: When new cube is defined via Kylin 2.5 UI, the default cube storage type is set to 0 (ID_HBASE). Top_N metric support is currently hard coded to expect cube storage type 2 (ID_SHARDED_HBASE), and it *_does not_* check if the cube storage type is the "sharded HBASE". UI provides no safeguards either to prevent a user from defining a cube with Top_N metric that would blow up on the cube building stage with a perplexing stack trace like the following: {quote}2018-10-22 16:15:50,388 ERROR [main] org.apache.kylin.engine.mr.KylinMapper: java.lang.ArrayIndexOutOfBoundsException at java.lang.System.arraycopy(Native Method) at org.apache.kylin.engine.mr.common.NDCuboidBuilder.buildKeyInternal(NDCuboidBuilder.java:106) at org.apache.kylin.engine.mr.common.NDCuboidBuilder.buildKey(NDCuboidBuilder.java:71) at org.apache.kylin.engine.mr.steps.NDCuboidMapper.doMap(NDCuboidMapper.java:112) at org.apache.kylin.engine.mr.steps.NDCuboidMapper.doMap(NDCuboidMapper.java:47) at org.apache.kylin.engine.mr.KylinMapper.map(KylinMapper.java:77) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) {quote} Please, either * ** modify Top_N code to support all cube storage types (not only ID_SHARDED_HBASE), or **modify Top_N code to perform explicit check for cube storage type and raise descriptive exception, when cube storage is not the one that is expected. Plus update the UI to prevent the user from creating cube definitions that are incompatible with the storage type compatible with Top_N measure was: When new cube is defined via Kylin 2.5 UI, the default cube storage type is set to 0 (ID_HBASE). Top_N metric support is currently hard coded to expect cube storage type 2 (ID_SHARDED_HBASE), and it *_does not_* check if the cube storage type is the "sharded HBASE". UI provides no safeguards either to prevent a user from defining a cube with Top_N metric that would blow up on the cube building stage with a perplexing stack trace like the following: {quote}2018-11-08 08:35:45,413 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.IllegalArgumentException: Can't read partitions file at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:116) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.(MapTask.java:701) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1865) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164) Caused by: java.io.IOException: wrong key class: org.apache.kylin.storage.hbase.steps.RowKeyWritable is not class org.apache.hadoop.hbase.io.ImmutableBytesWritable at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2332) at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2384) at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.readPartitions(TotalOrderPartitioner.java:306) at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:88) ... 10 more {quote} Please, either ** modify Top_N code to support all cube storage types (not only ID_SHARDED_HBASE), or **modify Top_N code to perform explicit check for cube storage type and raise descriptive exception, when cube storage is not the one that is expected. Plus update the UI to prevent the user from creating cube definitions that are incompatible with the storage type compatible with Top_N measure > Top_N metric code requires cube storage type to be ID_SHARDED_HBASE, but the > Web UI defaults to ID_HBASE and provides no safeguards against storage type > mismatch > - > > Key: KYLIN-3686 > URL: https://issues.apache.org/jira/browse/KYLIN-3686 > Project: Kylin > Issue Type: Improvement > Components: Measure - TopN, Metadata, Web >Affects Versions: v2.5.0 > Environment: HDP 2.5.6, Kylin 2.5 >Reporter: Vsevolod Ostapenko >Priority: Major > > When new cube is defined via Kylin 2.5 UI, the default cube storage type is > set to 0 (ID_HBASE). > Top_N metric support is
[jira] [Created] (KYLIN-3687) Top_N measure requires related SUM() measure to be defined as part of the cube to work, but Web UI allows creation of the cube that has Top_N measure only, resulting in N
Vsevolod Ostapenko created KYLIN-3687: - Summary: Top_N measure requires related SUM() measure to be defined as part of the cube to work, but Web UI allows creation of the cube that has Top_N measure only, resulting in NPE at query time Key: KYLIN-3687 URL: https://issues.apache.org/jira/browse/KYLIN-3687 Project: Kylin Issue Type: Improvement Components: Measure - TopN, Metadata, Web Affects Versions: v2.5.0 Environment: HDP 2.5.6, Kylin 2.5 Reporter: Vsevolod Ostapenko Web UI allows defining a cube with Top_N measure without defining a related SUM() measure. E.g. a variation of the kylin_sales_cube can be successfully defined via UI with just TOP_SELLER without actually defining GVM_SUM measure. Such cube builds just fine, but at the query time an NPE is thrown similar to the following: {quote}Caused by: java.lang.NullPointerException at org.apache.kylin.query.relnode.OLAPAggregateRel.rewriteAggregateCall(OLAPAggregateRel.java:561) at org.apache.kylin.query.relnode.OLAPAggregateRel.implementRewrite(OLAPAggregateRel.java:419) at org.apache.kylin.query.relnode.OLAPRel$RewriteImplementor.visitChild(OLAPRel.java:174) at org.apache.kylin.query.relnode.OLAPSortRel.implementRewrite(OLAPSortRel.java:86) at org.apache.kylin.query.relnode.OLAPRel$RewriteImplementor.visitChild(OLAPRel.java:174) at org.apache.kylin.query.relnode.OLAPLimitRel.implementRewrite(OLAPLimitRel.java:109) at org.apache.kylin.query.relnode.OLAPRel$RewriteImplementor.visitChild(OLAPRel.java:174) at org.apache.kylin.query.relnode.OLAPToEnumerableConverter.implement(OLAPToEnumerableConverter.java:100) at org.apache.calcite.adapter.enumerable.EnumerableRelImplementor.implementRoot(EnumerableRelImplementor.java:108) at org.apache.calcite.adapter.enumerable.EnumerableInterpretable.toBindable(EnumerableInterpretable.java:92) at org.apache.calcite.prepare.CalcitePrepareImpl$CalcitePreparingStmt.implement(CalcitePrepareImpl.java:1281) at org.apache.calcite.prepare.Prepare.prepareSql(Prepare.java:331) at org.apache.calcite.prepare.Prepare.prepareSql(Prepare.java:228) {quote} There need to be some checks in the UI and in the Top_N query processing code to ensure that all the required measures are defined (as Top_N is actually dependent on another measure to function properly) and inform the user that Top_N definition is incomplete and cube definition is invalid. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KYLIN-3686) Top_N metric code requires cube storage type to be ID_SHARDED_HBASE, but the Web UI defaults to ID_HBASE and provides no safeguards against storage type mismatch
Vsevolod Ostapenko created KYLIN-3686: - Summary: Top_N metric code requires cube storage type to be ID_SHARDED_HBASE, but the Web UI defaults to ID_HBASE and provides no safeguards against storage type mismatch Key: KYLIN-3686 URL: https://issues.apache.org/jira/browse/KYLIN-3686 Project: Kylin Issue Type: Improvement Components: Measure - TopN, Metadata, Web Affects Versions: v2.5.0 Environment: HDP 2.5.6, Kylin 2.5 Reporter: Vsevolod Ostapenko When new cube is defined via Kylin 2.5 UI, the default cube storage type is set to 0 (ID_HBASE). Top_N metric support is currently hard coded to expect cube storage type 2 (ID_SHARDED_HBASE), and it *_does not_* check if the cube storage type is the "sharded HBASE". UI provides no safeguards either to prevent a user from defining a cube with Top_N metric that would blow up on the cube building stage with a perplexing stack trace like the following: {quote}2018-11-08 08:35:45,413 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.IllegalArgumentException: Can't read partitions file at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:116) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.(MapTask.java:701) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1865) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164) Caused by: java.io.IOException: wrong key class: org.apache.kylin.storage.hbase.steps.RowKeyWritable is not class org.apache.hadoop.hbase.io.ImmutableBytesWritable at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2332) at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2384) at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.readPartitions(TotalOrderPartitioner.java:306) at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:88) ... 10 more {quote} Please, either ** modify Top_N code to support all cube storage types (not only ID_SHARDED_HBASE), or **modify Top_N code to perform explicit check for cube storage type and raise descriptive exception, when cube storage is not the one that is expected. Plus update the UI to prevent the user from creating cube definitions that are incompatible with the storage type compatible with Top_N measure -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KYLIN-3670) Misspelled constant DEFAUL_JOB_CONF_SUFFIX
Vsevolod Ostapenko created KYLIN-3670: - Summary: Misspelled constant DEFAUL_JOB_CONF_SUFFIX Key: KYLIN-3670 URL: https://issues.apache.org/jira/browse/KYLIN-3670 Project: Kylin Issue Type: Improvement Components: Job Engine Affects Versions: v2.5.0 Environment: HDP 2.5.6, Kylin 2.5, CentOS 7.2 Reporter: Vsevolod Ostapenko One of the JobEngineConfig constants is misspelled. It's defined as DEFAUL_JOB_CONF_SUFFIX, while it should be DEFAUL*T*_JOB_CONF_SUFFIX. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KYLIN-3258) No check for duplicate cube name when creating a hybrid cube
[ https://issues.apache.org/jira/browse/KYLIN-3258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vsevolod Ostapenko updated KYLIN-3258: -- Environment: HDP 2.5.6, Kylin 2.2 > No check for duplicate cube name when creating a hybrid cube > > > Key: KYLIN-3258 > URL: https://issues.apache.org/jira/browse/KYLIN-3258 > Project: Kylin > Issue Type: Bug > Components: Metadata >Affects Versions: v2.2.0 > Environment: HDP 2.5.6, Kylin 2.2 >Reporter: Vsevolod Ostapenko >Priority: Minor > > When loading hybrid cube definitions via REST API, there is no check for > duplicate cube names is the list. If due to a user error or incorrectly > generated list of cubes by an external application/script the same cube name > is listed more than once, new or updated hybrid cube will contain the same > cube listed multiple times. > It does not seem to cause any immediate issues with querying, but it's just > not right. REST API should throw and exception, when the same cube name is > listed multiple times. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KYLIN-3259) When a cube is deleted, remove it from the hybrid cube definition
Vsevolod Ostapenko created KYLIN-3259: - Summary: When a cube is deleted, remove it from the hybrid cube definition Key: KYLIN-3259 URL: https://issues.apache.org/jira/browse/KYLIN-3259 Project: Kylin Issue Type: Improvement Components: Metadata Affects Versions: v2.2.0 Environment: HDP 2.5.6, Kylin 2.2 Reporter: Vsevolod Ostapenko When a cube is deleted, its references are not automatically removed from existing hybrid cube definition. That can lead to errors down the road, if user (or application) retrieves the list of cubes via REST API call and later tries to update the hybrid cube. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KYLIN-3258) No check for duplicate cube name when creating a hybrid cube
Vsevolod Ostapenko created KYLIN-3258: - Summary: No check for duplicate cube name when creating a hybrid cube Key: KYLIN-3258 URL: https://issues.apache.org/jira/browse/KYLIN-3258 Project: Kylin Issue Type: Bug Components: Metadata Affects Versions: v2.2.0 Reporter: Vsevolod Ostapenko When loading hybrid cube definitions via REST API, there is no check for duplicate cube names is the list. If due to a user error or incorrectly generated list of cubes by an external application/script the same cube name is listed more than once, new or updated hybrid cube will contain the same cube listed multiple times. It does not seem to cause any immediate issues with querying, but it's just not right. REST API should throw and exception, when the same cube name is listed multiple times. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3256) Filter of dates do not work
[ https://issues.apache.org/jira/browse/KYLIN-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16367947#comment-16367947 ] Vsevolod Ostapenko commented on KYLIN-3256: --- There is already an open bug for error in generated code, see KYLIN-3126 > Filter of dates do not work > --- > > Key: KYLIN-3256 > URL: https://issues.apache.org/jira/browse/KYLIN-3256 > Project: Kylin > Issue Type: Bug >Affects Versions: v2.2.0 >Reporter: Jean-Luc BELLIER >Priority: Major > > Hello, > > I am wondering how to filter date columns with Kylin. > I am working with the sample cube of the learn_kylin project. I have slightly > modified the cube to add a few more columns, but that is all. > In the advanced section, I put KYLIN_SALES.PART_DT in the 'Rowkeys' section, > defined as 'date' type. > > I would like to add a filter like 'WHERE KYLIN_SALES.DT_PART = '2012-06-24' > but the Kylin interface gives me a mistake : 'error while compiling generated > Java code' > This works fine with hive console. > I also tried with TO_DATE('2012-06-24'). > Using "WHERE KYLIN_SALES.DT_PART BETWEEN '2012-06-24' AND '2012-06-25'", it > works fine. > > Are there limitations or internal transformations on the 'date' type in > Kylin ? > > Thank you for your help. Have a good day. > > Best regards; > Jean-Luc. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KYLIN-3253) Enabling DEBUG in kylin-server-log4j.properties results in NPE in Calcite layer during query execution
Vsevolod Ostapenko created KYLIN-3253: - Summary: Enabling DEBUG in kylin-server-log4j.properties results in NPE in Calcite layer during query execution Key: KYLIN-3253 URL: https://issues.apache.org/jira/browse/KYLIN-3253 Project: Kylin Issue Type: Bug Components: Query Engine Affects Versions: v2.2.0 Environment: HDP 2.5.6, Kylin 2.2 Reporter: Vsevolod Ostapenko If log4j root logger is set to DEBUG level in the kylin-server-log4j.properties attempt to run a query after that results in a failure with an NPE being triggered in the calcite layer (see stack trace below). The issue was fixed in Calcite 1.14 as https://issues.apache.org/jira/browse/CALCITE-1859 It's a one line change to core/src/main/java/org/apache/calcite/plan/volcano/VolcanoPlanner.java Since Kylin is packaging it's own fork of Calcite from [http://repository.kyligence.io|http://repository.kyligence.io/], the fix need to be ported to 1.13.0-kylin-r-SPANSHOT.jar by someone who has access to this forked repo. {quote} at org.apache.calcite.avatica.Helper.createException(Helper.java:56) at org.apache.calcite.avatica.Helper.createException(Helper.java:41) at org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:156) at org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:218) at org.apache.kylin.rest.service.QueryService.execute(QueryService.java:834) at org.apache.kylin.rest.service.QueryService.queryWithSqlMassage(QueryService.java:561) at org.apache.kylin.rest.service.QueryService.query(QueryService.java:181) at org.apache.kylin.rest.service.QueryService.doQueryWithCache(QueryService.java:415) at org.apache.kylin.rest.controller.QueryController.query(QueryController.java:78) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:205) at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:133) at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:97) at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:827) at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:738) at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:85) at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:967) at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:901) at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:970) at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:872) at javax.servlet.http.HttpServlet.service(HttpServlet.java:650) at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:846) at javax.servlet.http.HttpServlet.service(HttpServlet.java:731) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:317) at org.springframework.security.web.access.intercept.FilterSecurityInterceptor.invoke(FilterSecurityInterceptor.java:127) at org.springframework.security.web.access.intercept.FilterSecurityInterceptor.doFilter(FilterSecurityInterceptor.java:91) at org.springframework.security.web.FilterChainP
[jira] [Reopened] (KYLIN-3223) Query for the list of hybrid cubes results in NPE
[ https://issues.apache.org/jira/browse/KYLIN-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vsevolod Ostapenko reopened KYLIN-3223: --- Reopened as revised fix version is available. > Query for the list of hybrid cubes results in NPE > - > > Key: KYLIN-3223 > URL: https://issues.apache.org/jira/browse/KYLIN-3223 > Project: Kylin > Issue Type: Bug > Components: REST Service >Affects Versions: v2.2.0 > Environment: HDP 2.5.6, Kylin 2.2 >Reporter: Vsevolod Ostapenko >Assignee: Vsevolod Ostapenko >Priority: Major > Fix For: v2.3.0 > > Attachments: > 0001-KYLIN-3223-Query-for-the-list-of-hybrid-cubes-result.patch, > KYLIN-3223.master.001.patch > > > Calling REST API to get the list of hybrid cubes returns stack trace with NPE > exception. > {quote}curl -u ADMIN:KYLIN -X GET -H 'Content-Type: application/json' -d {} > [http://localhost:7070/kylin/api/hybrids] > {quote} > > If a parameter project without a value is specified, call succeeds. E.g. > {quote}curl -u ADMIN:KYLIN -X GET -H 'Content-Type: application/json' -d {} > [http://localhost:7070/kylin/api/hybrids?project] > {quote} > Quick look at the HybridService.java suggests that there is a bug in the > code, where the very first line tries to check ACLs on the project using the > project name, which is NULL, when project parameter is not specified as part > of the URL. > If parameter is specified without a value, ACL check is not performed, so > it's another bug, as the list of projects is retrieved without read > permission checking. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3223) Query for the list of hybrid cubes results in NPE
[ https://issues.apache.org/jira/browse/KYLIN-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16358628#comment-16358628 ] Vsevolod Ostapenko commented on KYLIN-3223: --- [~yimingliu], I created a revised version of the fix to use updated ACL checking API provided by KYLIN-3239 (Refactor the ACL code about checkPermission and hasPermission). Please review and provide feedback. > Query for the list of hybrid cubes results in NPE > - > > Key: KYLIN-3223 > URL: https://issues.apache.org/jira/browse/KYLIN-3223 > Project: Kylin > Issue Type: Bug > Components: REST Service >Affects Versions: v2.2.0 > Environment: HDP 2.5.6, Kylin 2.2 >Reporter: Vsevolod Ostapenko >Assignee: Vsevolod Ostapenko >Priority: Major > Fix For: v2.3.0 > > Attachments: > 0001-KYLIN-3223-Query-for-the-list-of-hybrid-cubes-result.patch, > KYLIN-3223.master.001.patch > > > Calling REST API to get the list of hybrid cubes returns stack trace with NPE > exception. > {quote}curl -u ADMIN:KYLIN -X GET -H 'Content-Type: application/json' -d {} > [http://localhost:7070/kylin/api/hybrids] > {quote} > > If a parameter project without a value is specified, call succeeds. E.g. > {quote}curl -u ADMIN:KYLIN -X GET -H 'Content-Type: application/json' -d {} > [http://localhost:7070/kylin/api/hybrids?project] > {quote} > Quick look at the HybridService.java suggests that there is a bug in the > code, where the very first line tries to check ACLs on the project using the > project name, which is NULL, when project parameter is not specified as part > of the URL. > If parameter is specified without a value, ACL check is not performed, so > it's another bug, as the list of projects is retrieved without read > permission checking. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KYLIN-3223) Query for the list of hybrid cubes results in NPE
[ https://issues.apache.org/jira/browse/KYLIN-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vsevolod Ostapenko updated KYLIN-3223: -- Attachment: KYLIN-3223.master.001.patch > Query for the list of hybrid cubes results in NPE > - > > Key: KYLIN-3223 > URL: https://issues.apache.org/jira/browse/KYLIN-3223 > Project: Kylin > Issue Type: Bug > Components: REST Service >Affects Versions: v2.2.0 > Environment: HDP 2.5.6, Kylin 2.2 >Reporter: Vsevolod Ostapenko >Assignee: Vsevolod Ostapenko >Priority: Major > Fix For: v2.3.0 > > Attachments: > 0001-KYLIN-3223-Query-for-the-list-of-hybrid-cubes-result.patch, > KYLIN-3223.master.001.patch > > > Calling REST API to get the list of hybrid cubes returns stack trace with NPE > exception. > {quote}curl -u ADMIN:KYLIN -X GET -H 'Content-Type: application/json' -d {} > [http://localhost:7070/kylin/api/hybrids] > {quote} > > If a parameter project without a value is specified, call succeeds. E.g. > {quote}curl -u ADMIN:KYLIN -X GET -H 'Content-Type: application/json' -d {} > [http://localhost:7070/kylin/api/hybrids?project] > {quote} > Quick look at the HybridService.java suggests that there is a bug in the > code, where the very first line tries to check ACLs on the project using the > project name, which is NULL, when project parameter is not specified as part > of the URL. > If parameter is specified without a value, ACL check is not performed, so > it's another bug, as the list of projects is retrieved without read > permission checking. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KYLIN-3249) Default hybrid cube priority should be the same as of a regular cube
[ https://issues.apache.org/jira/browse/KYLIN-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vsevolod Ostapenko updated KYLIN-3249: -- Description: Hybrid cubes are assigned default priority lower than regular cubes, which leads to incorrect selection of a hybrid cube while a regular non-hybridized cube with lower cost is available. For example, model has a wide cube with full set of metrics and narrower cube with top-N entries for a subset of metrics. If wide cube is hybridized (due to a new metric addition), but top-N cube remains unchanged and non-hybridized, top-N cube will be no longer queried, causing query performance degradation. The issue can be tracked to the query/src/main/java/org/apache/kylin/query/routing/Candidate.java, where hybrid cubes are assigned priority 0, while regular cubes are assigned priority of 1. This unconditional priority assignment is incorrect as it only holds for cases when there is only one cube "flavor" in the model or when all the cubes of various "flavors" are hybridized at the same time. Simplest fix is to have hybrid priority to be the same as of a regular cube. Plus, as an enhancement to the cube selection algorithm a new rule can be implemented that will filter out regular candidate cubes that are included into candidate hybrid cubes. was: Hybrid cubes are assigned default priority lower than regular cubes, which leads to incorrect selection of a hybrid cube while a regular non-hybridized cube with lower cost is available. For example, model has a wide cube with full set of metrics and narrower cube with top-N entries for a subset of metrics. If wide cube is hybridized (due to a new metric addition), but top-N cube remains unchanged and non-hybridized, top-N cube will be no longer queried, causing query performance degradation. The issue can be tracked to the query/src/main/java/org/apache/kylin/query/routing/Candidate.java, where hybrid cubes are assigned priority 0, while regular cubes are assigned priority of 1. This unconditional priority assignment is incorrect as it only holds for cases when there is only one cube "type" in the model or when all the cubes are hybridized at the same time. Simplest fix is to have hybrid priority to be the same as of a regular cube. Plus, as an enhancement to the cube selection algorithm a new rule can be implemented that will filter out regular candidate cubes that are included into candidate hybrid cubes. > Default hybrid cube priority should be the same as of a regular cube > > > Key: KYLIN-3249 > URL: https://issues.apache.org/jira/browse/KYLIN-3249 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v2.2.0 > Environment: HDP 2.5.6, Kylin 2.2 >Reporter: Vsevolod Ostapenko >Priority: Major > > Hybrid cubes are assigned default priority lower than regular cubes, which > leads to incorrect selection of a hybrid cube while a regular non-hybridized > cube with lower cost is available. > For example, model has a wide cube with full set of metrics and narrower cube > with top-N entries for a subset of metrics. > If wide cube is hybridized (due to a new metric addition), but top-N cube > remains unchanged and non-hybridized, top-N cube will be no longer queried, > causing query performance degradation. > The issue can be tracked to the > query/src/main/java/org/apache/kylin/query/routing/Candidate.java, where > hybrid cubes are assigned priority 0, while regular cubes are assigned > priority of 1. > This unconditional priority assignment is incorrect as it only holds for > cases when there is only one cube "flavor" in the model or when all the cubes > of various "flavors" are hybridized at the same time. > Simplest fix is to have hybrid priority to be the same as of a regular cube. > Plus, as an enhancement to the cube selection algorithm a new rule can be > implemented that will filter out regular candidate cubes that are included > into candidate hybrid cubes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KYLIN-3249) Default hybrid cube priority should be the same as of a regular cube
Vsevolod Ostapenko created KYLIN-3249: - Summary: Default hybrid cube priority should be the same as of a regular cube Key: KYLIN-3249 URL: https://issues.apache.org/jira/browse/KYLIN-3249 Project: Kylin Issue Type: Bug Components: Query Engine Affects Versions: v2.2.0 Environment: HDP 2.5.6, Kylin 2.2 Reporter: Vsevolod Ostapenko Hybrid cubes are assigned default priority lower than regular cubes, which leads to incorrect selection of a hybrid cube while a regular non-hybridized cube with lower cost is available. For example, model has a wide cube with full set of metrics and narrower cube with top-N entries for a subset of metrics. If wide cube is hybridized (due to a new metric addition), but top-N cube remains unchanged and non-hybridized, top-N cube will be no longer queried, 'causing query performance degradation. The issue can be tracked to the query/src/main/java/org/apache/kylin/query/routing/Candidate.java, where hybrid cubes are assigned priority 0, while regular cubes are assigned priority of 1. This unconditional priority assignment is incorrect as it only holds for cases when there is only one cube "type" in the model or when all the cubes are hybridized at the same time. Simplest fix is to have hybrid priority to be the same as of a regular cube. Plus, as an enhancement to the cube selection algorithm a new rule can be implemented that will filter out regular candidate cubes that are included into candidate hybrid cubes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KYLIN-3249) Default hybrid cube priority should be the same as of a regular cube
[ https://issues.apache.org/jira/browse/KYLIN-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vsevolod Ostapenko updated KYLIN-3249: -- Description: Hybrid cubes are assigned default priority lower than regular cubes, which leads to incorrect selection of a hybrid cube while a regular non-hybridized cube with lower cost is available. For example, model has a wide cube with full set of metrics and narrower cube with top-N entries for a subset of metrics. If wide cube is hybridized (due to a new metric addition), but top-N cube remains unchanged and non-hybridized, top-N cube will be no longer queried, causing query performance degradation. The issue can be tracked to the query/src/main/java/org/apache/kylin/query/routing/Candidate.java, where hybrid cubes are assigned priority 0, while regular cubes are assigned priority of 1. This unconditional priority assignment is incorrect as it only holds for cases when there is only one cube "type" in the model or when all the cubes are hybridized at the same time. Simplest fix is to have hybrid priority to be the same as of a regular cube. Plus, as an enhancement to the cube selection algorithm a new rule can be implemented that will filter out regular candidate cubes that are included into candidate hybrid cubes. was: Hybrid cubes are assigned default priority lower than regular cubes, which leads to incorrect selection of a hybrid cube while a regular non-hybridized cube with lower cost is available. For example, model has a wide cube with full set of metrics and narrower cube with top-N entries for a subset of metrics. If wide cube is hybridized (due to a new metric addition), but top-N cube remains unchanged and non-hybridized, top-N cube will be no longer queried, 'causing query performance degradation. The issue can be tracked to the query/src/main/java/org/apache/kylin/query/routing/Candidate.java, where hybrid cubes are assigned priority 0, while regular cubes are assigned priority of 1. This unconditional priority assignment is incorrect as it only holds for cases when there is only one cube "type" in the model or when all the cubes are hybridized at the same time. Simplest fix is to have hybrid priority to be the same as of a regular cube. Plus, as an enhancement to the cube selection algorithm a new rule can be implemented that will filter out regular candidate cubes that are included into candidate hybrid cubes. > Default hybrid cube priority should be the same as of a regular cube > > > Key: KYLIN-3249 > URL: https://issues.apache.org/jira/browse/KYLIN-3249 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v2.2.0 > Environment: HDP 2.5.6, Kylin 2.2 >Reporter: Vsevolod Ostapenko >Priority: Major > > Hybrid cubes are assigned default priority lower than regular cubes, which > leads to incorrect selection of a hybrid cube while a regular non-hybridized > cube with lower cost is available. > For example, model has a wide cube with full set of metrics and narrower cube > with top-N entries for a subset of metrics. > If wide cube is hybridized (due to a new metric addition), but top-N cube > remains unchanged and non-hybridized, top-N cube will be no longer queried, > causing query performance degradation. > The issue can be tracked to the > query/src/main/java/org/apache/kylin/query/routing/Candidate.java, where > hybrid cubes are assigned priority 0, while regular cubes are assigned > priority of 1. > This unconditional priority assignment is incorrect as it only holds for > cases when there is only one cube "type" in the model or when all the cubes > are hybridized at the same time. > Simplest fix is to have hybrid priority to be the same as of a regular cube. > Plus, as an enhancement to the cube selection algorithm a new rule can be > implemented that will filter out regular candidate cubes that are included > into candidate hybrid cubes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (KYLIN-3223) Query for the list of hybrid cubes results in NPE
[ https://issues.apache.org/jira/browse/KYLIN-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vsevolod Ostapenko reassigned KYLIN-3223: - Assignee: Vsevolod Ostapenko (was: nichunen) > Query for the list of hybrid cubes results in NPE > - > > Key: KYLIN-3223 > URL: https://issues.apache.org/jira/browse/KYLIN-3223 > Project: Kylin > Issue Type: Bug > Components: REST Service >Affects Versions: v2.2.0 > Environment: HDP 2.5.6, Kylin 2.2 >Reporter: Vsevolod Ostapenko >Assignee: Vsevolod Ostapenko >Priority: Major > Attachments: > 0001-KYLIN-3223-Query-for-the-list-of-hybrid-cubes-result.patch > > > Calling REST API to get the list of hybrid cubes returns stack trace with NPE > exception. > {quote}curl -u ADMIN:KYLIN -X GET -H 'Content-Type: application/json' -d {} > [http://localhost:7070/kylin/api/hybrids] > {quote} > > If a parameter project without a value is specified, call succeeds. E.g. > {quote}curl -u ADMIN:KYLIN -X GET -H 'Content-Type: application/json' -d {} > [http://localhost:7070/kylin/api/hybrids?project] > {quote} > Quick look at the HybridService.java suggests that there is a bug in the > code, where the very first line tries to check ACLs on the project using the > project name, which is NULL, when project parameter is not specified as part > of the URL. > If parameter is specified without a value, ACL check is not performed, so > it's another bug, as the list of projects is retrieved without read > permission checking. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3223) Query for the list of hybrid cubes results in NPE
[ https://issues.apache.org/jira/browse/KYLIN-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16357639#comment-16357639 ] Vsevolod Ostapenko commented on KYLIN-3223: --- [~yimingliu], I attached the proposed patch for NPE and missing read access check on projects, when project either not specified or empty. Please review or have someone to look at the changes and provide feedback. > Query for the list of hybrid cubes results in NPE > - > > Key: KYLIN-3223 > URL: https://issues.apache.org/jira/browse/KYLIN-3223 > Project: Kylin > Issue Type: Bug > Components: REST Service >Affects Versions: v2.2.0 > Environment: HDP 2.5.6, Kylin 2.2 >Reporter: Vsevolod Ostapenko >Assignee: nichunen >Priority: Major > Attachments: > 0001-KYLIN-3223-Query-for-the-list-of-hybrid-cubes-result.patch > > > Calling REST API to get the list of hybrid cubes returns stack trace with NPE > exception. > {quote}curl -u ADMIN:KYLIN -X GET -H 'Content-Type: application/json' -d {} > [http://localhost:7070/kylin/api/hybrids] > {quote} > > If a parameter project without a value is specified, call succeeds. E.g. > {quote}curl -u ADMIN:KYLIN -X GET -H 'Content-Type: application/json' -d {} > [http://localhost:7070/kylin/api/hybrids?project] > {quote} > Quick look at the HybridService.java suggests that there is a bug in the > code, where the very first line tries to check ACLs on the project using the > project name, which is NULL, when project parameter is not specified as part > of the URL. > If parameter is specified without a value, ACL check is not performed, so > it's another bug, as the list of projects is retrieved without read > permission checking. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KYLIN-3223) Query for the list of hybrid cubes results in NPE
[ https://issues.apache.org/jira/browse/KYLIN-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vsevolod Ostapenko updated KYLIN-3223: -- Attachment: 0001-KYLIN-3223-Query-for-the-list-of-hybrid-cubes-result.patch > Query for the list of hybrid cubes results in NPE > - > > Key: KYLIN-3223 > URL: https://issues.apache.org/jira/browse/KYLIN-3223 > Project: Kylin > Issue Type: Bug > Components: REST Service >Affects Versions: v2.2.0 > Environment: HDP 2.5.6, Kylin 2.2 >Reporter: Vsevolod Ostapenko >Assignee: nichunen >Priority: Major > Attachments: > 0001-KYLIN-3223-Query-for-the-list-of-hybrid-cubes-result.patch > > > Calling REST API to get the list of hybrid cubes returns stack trace with NPE > exception. > {quote}curl -u ADMIN:KYLIN -X GET -H 'Content-Type: application/json' -d {} > [http://localhost:7070/kylin/api/hybrids] > {quote} > > If a parameter project without a value is specified, call succeeds. E.g. > {quote}curl -u ADMIN:KYLIN -X GET -H 'Content-Type: application/json' -d {} > [http://localhost:7070/kylin/api/hybrids?project] > {quote} > Quick look at the HybridService.java suggests that there is a bug in the > code, where the very first line tries to check ACLs on the project using the > project name, which is NULL, when project parameter is not specified as part > of the URL. > If parameter is specified without a value, ACL check is not performed, so > it's another bug, as the list of projects is retrieved without read > permission checking. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3139) Failure in map-reduce job due to undefined hdp.version variable when using HDP stack and remote HBase cluster
[ https://issues.apache.org/jira/browse/KYLIN-3139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16357607#comment-16357607 ] Vsevolod Ostapenko commented on KYLIN-3139: --- [~liyang], [~yimingliu] guys, could we make a decision on this one? It's a trivial change, but it has been hanging around for more than a month now. > Failure in map-reduce job due to undefined hdp.version variable when using > HDP stack and remote HBase cluster > - > > Key: KYLIN-3139 > URL: https://issues.apache.org/jira/browse/KYLIN-3139 > Project: Kylin > Issue Type: Bug > Components: Others >Affects Versions: v2.2.0 > Environment: HDP 2.5.6, two cluster setup, Kylin 2.2.0 in a cluster > with Hive only, remote HBase cluster for data storage >Reporter: Vsevolod Ostapenko >Assignee: Vsevolod Ostapenko >Priority: Minor > Labels: hdp > Attachments: KYLIN-3139.master.001.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > When running on top of HDP stack and using a setup where Hive and HBase run > in different clusters cube build/refresh fails on the step "Extract Fact > Table Distinct Columns" with the error > {quote}java.lang.IllegalArgumentException: Unable to parse > '/hdp/apps/$\{hdp.version\}/mapreduce/mapreduce.tar.gz#mr-framework' as a > URI, check the setting for mapreduce.application.framework.path{quote} > Based on existing JIRA discussions in Ambari project, it's responsibility of > a service to set hdp.version Java property. When HBase is not installed as a > service in a cluster where Kylin server is running, hbase launcher (invoked > by kylin.sh) does not set this property (presumably because HBase in that > case is just a client and not a service). > The only suitable workaround found so far is to set property as part of the > conf/setenv.sh script. > In order to avoid hard coding of the HDP version info, suggested change to > setenv.sh will attempt to detect HDP version at run-time. It should work for > all released HDP version from 2.2.x to 2.6.x > In addition to that, it will also try to locate and set Java native library > path, when running on top of HDP. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KYLIN-3223) Query for the list of hybrid cubes results in NPE
[ https://issues.apache.org/jira/browse/KYLIN-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vsevolod Ostapenko updated KYLIN-3223: -- Description: Calling REST API to get the list of hybrid cubes returns stack trace with NPE exception. {quote}curl -u ADMIN:KYLIN -X GET -H 'Content-Type: application/json' -d {} [http://localhost:7070/kylin/api/hybrids] {quote} If a parameter project without a value is specified, call succeeds. E.g. {quote}curl -u ADMIN:KYLIN -X GET -H 'Content-Type: application/json' -d {} [http://localhost:7070/kylin/api/hybrids?project] {quote} Quick look at the HybridService.java suggests that there is a bug in the code, where the very first line tries to check ACLs on the project using the project name, which is NULL, when project parameter is not specified as part of the URL. If parameter is specified without a value, ACL check is not performed, so it's another bug, as the list of projects is retrieved without read permission checking. was: Calling REST API to get the list of hybrid cubes returns stack trace with NPE exception. {quote}curl -u ADMIN:KYLIN -X GET -H 'Content-Type: application/json' -d {} [http://localhost:7070/kylin/api/hybrids] {"code":"999","data":null,"msg":null,"stacktrace":"java.lang.NullPointerException\n\tat java.util.concurrent.ConcurrentSkipListMap.doGet(ConcurrentSkipListMap.java:778)\n\tat java.util.concurrent.ConcurrentSkipListMap.get(ConcurrentSkipListMap.java:1546)\n\tat org.apache.kylin.metadata.cachesync.SingleValueCache.get(SingleValueCache.java:85)\n\tat org.apache.kylin.metadata.project.ProjectManager.getProject(ProjectManager.java:172)\n\tat org.apache.kylin.rest.util.AclEvaluate.getProjectInstance(AclEvaluate.java:39)\n\tat org.apache.kylin.rest.util.AclEvaluate.checkProjectReadPermission(AclEvaluate.java:61)\n\tat org.apache.kylin.rest.service.HybridService.listHybrids(HybridService.java:115)\n\tat org.apache.kylin.rest.controller.HybridController.list(HybridController.java:76)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:497)\n\tat org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:205)\n\tat org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:133)\n\tat org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:97)\n\tat org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:827)\n\tat org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:738)\n\tat org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:85)\n\tat org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:967)\n\tat org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:901)\n\tat org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:970)\n\tat org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:861)\n\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:624)\n\tat org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:846)\n\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:731)\n\tat org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)\n\tat org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)\n\tat org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)\n\tat org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)\n\tat org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)\n\tat org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:317)\n\tat org.springframework.security.web.access.intercept.FilterSecurityInterceptor.invoke(FilterSecurityInterceptor.java:127)\n\tat org.springframework.security.web.access.intercept.FilterSecurityInterceptor.doFilter(FilterSecurityInterceptor.java:91)\n\tat org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)\n\tat org.springframework.security.web.access.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:114)\n\tat org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)\n\tat
[jira] [Created] (KYLIN-3223) Query for the list of hybrid cubes results in NPE
Vsevolod Ostapenko created KYLIN-3223: - Summary: Query for the list of hybrid cubes results in NPE Key: KYLIN-3223 URL: https://issues.apache.org/jira/browse/KYLIN-3223 Project: Kylin Issue Type: Bug Components: REST Service Affects Versions: v2.2.0 Environment: HDP 2.5.6, Kylin 2.2 Reporter: Vsevolod Ostapenko Assignee: luguosheng Calling REST API to get the list of hybrid cubes returns stack trace with NPE exception. {quote}curl -u ADMIN:KYLIN -X GET -H 'Content-Type: application/json' -d {} [http://localhost:7070/kylin/api/hybrids] {"code":"999","data":null,"msg":null,"stacktrace":"java.lang.NullPointerException\n\tat java.util.concurrent.ConcurrentSkipListMap.doGet(ConcurrentSkipListMap.java:778)\n\tat java.util.concurrent.ConcurrentSkipListMap.get(ConcurrentSkipListMap.java:1546)\n\tat org.apache.kylin.metadata.cachesync.SingleValueCache.get(SingleValueCache.java:85)\n\tat org.apache.kylin.metadata.project.ProjectManager.getProject(ProjectManager.java:172)\n\tat org.apache.kylin.rest.util.AclEvaluate.getProjectInstance(AclEvaluate.java:39)\n\tat org.apache.kylin.rest.util.AclEvaluate.checkProjectReadPermission(AclEvaluate.java:61)\n\tat org.apache.kylin.rest.service.HybridService.listHybrids(HybridService.java:115)\n\tat org.apache.kylin.rest.controller.HybridController.list(HybridController.java:76)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:497)\n\tat org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:205)\n\tat org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:133)\n\tat org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:97)\n\tat org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:827)\n\tat org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:738)\n\tat org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:85)\n\tat org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:967)\n\tat org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:901)\n\tat org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:970)\n\tat org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:861)\n\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:624)\n\tat org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:846)\n\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:731)\n\tat org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)\n\tat org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)\n\tat org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)\n\tat org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)\n\tat org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)\n\tat org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:317)\n\tat org.springframework.security.web.access.intercept.FilterSecurityInterceptor.invoke(FilterSecurityInterceptor.java:127)\n\tat org.springframework.security.web.access.intercept.FilterSecurityInterceptor.doFilter(FilterSecurityInterceptor.java:91)\n\tat org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)\n\tat org.springframework.security.web.access.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:114)\n\tat org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)\n\tat org.springframework.security.web.session.SessionManagementFilter.doFilter(SessionManagementFilter.java:137)\n\tat org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)\n\tat org.springframework.security.web.authentication.AnonymousAuthenticationFilter.doFilter(AnonymousAuthenticationFilter.java:111)\n\tat org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)\n\tat org.springframework.security.web.servletapi.SecurityContextHolderAwareRequestFilter.doFilter(Secur
[jira] [Commented] (KYLIN-3122) Partition elimination algorithm seems to be inefficient and have serious issues with handling date/time ranges, can lead to very slow queries and OOM/Java heap dump con
[ https://issues.apache.org/jira/browse/KYLIN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344161#comment-16344161 ] Vsevolod Ostapenko commented on KYLIN-3122: --- I think I found one place in the code that is at least partially responsible for the behavior observed. The convertFilterColumnsAndConstants method in the GTUtil.java rewrites statement filter, after static values in the WHERE clause were checked against tri-dictionary. There seems to be multiple issues with this approach: 1) Filtering on the partitioning key is treated the same as filtering on a non-partitioning columns, which is incorrect, as presence or absence of a lower or upper range bound for partitioning column in the dictionary in a specific segment provides no guarantees that this segment is or is not a candidate for further scan. 2) As the side effect of the #1, it looks like after first candidate segment is hit (the lower bound date-time value is found in dictionary), the filter is modified in place (rewritten) to exclude the upper bound condition (if upper bound condition is not found in the segment, which is always the case in our scenario). Partitioning keys require special handling, they need to be checked against segment range meta-data, and excluded from dictionary-based checks. > Partition elimination algorithm seems to be inefficient and have serious > issues with handling date/time ranges, can lead to very slow queries and > OOM/Java heap dump conditions > --- > > Key: KYLIN-3122 > URL: https://issues.apache.org/jira/browse/KYLIN-3122 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v2.2.0 > Environment: HDP 2.5.6, Kylin 2.2.0 >Reporter: Vsevolod Ostapenko >Assignee: hongbin ma >Priority: Critical > Attachments: partition_elimination_bug_single_column_test.log > > > Current algorithm of cube segment elimination seems to be rather inefficient. > We are using a model where cubes are partitioned by date and time: > "partition_desc": > { "partition_date_column": "A_VL_HOURLY_V.THEDATE", "partition_time_column": > "A_VL_HOURLY_V.THEHOUR", "partition_date_start": 0, "partition_date_format": > "MMdd", "partition_time_format": "HH", "partition_type": "APPEND", > "partition_condition_builder": > "org.apache.kylin.metadata.model.PartitionDesc$DefaultPartitionConditionBuilder" > } > , > Cubes contain partitions for multiple days and 24 hours for each day. Each > cube segment corresponds to just one hour. > When a query is issued where both date and hour are specified using equality > condition (e.g. thedate = '20171011' and thehour = '10') Kylin sequentially > integrates over all the segment cubes (hundreds of them) only to skip all > except for the one that needs to be scanned (which can be observed by looking > in the logs). > The expectation is that Kylin would use existing info on the partitioning > columns (date and time) and known hierarchical relations between date and > time to locate required partition much more efficiently that linear scan > through all the cube partitions. > Now, if filtering condition is on the range of hours, behavior of the > partition pruning and scanning becomes not very logical, which suggests bugs > in the logic. > If filtering condition is on specific date and closed-open range of hours > (e.g. thedate = '20171011' and thehour >= '10' and thehour < '11'), in > addition to sequentially scanning all the cube partitions (as described > above), Kylin will scan HBase tables for all the hours from the specified > starting hour and till the last hour of the day (e.g. from hour 10 to 24, > instead of just hour 10). > As the result query will run much longer that necessary, and might run out > of memory, causing JVM heap dump and Kylin server crash. > If filtering condition is on specific date by hour interval is specified as > open-closed (e.g. thedate = '20171011' and thehour > '09' and thehour <= > '10'), Kylin will scan all HBase tables for all the later dates and hours > (e.g. from hour 10 and till the most recent hour on the most recent day, > which can be hundreds of tables and thousands of regions). > As the result query execution will dramatically increase and in most cases > Kylin server will be terminated with OOM error and JVM heap dump. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (KYLIN-3122) Partition elimination algorithm seems to be inefficient and have serious issues with handling date/time ranges, can lead to very slow queries and OOM/Java heap dum
[ https://issues.apache.org/jira/browse/KYLIN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16343677#comment-16343677 ] Vsevolod Ostapenko edited comment on KYLIN-3122 at 1/29/18 5:31 PM: [~Shaofengshi], we tried using single partitioning column with date and time fused together. The result is still not satisfactory, as cube segments are not properly eliminated even in this case. In our test we had a table with 12 hourly partitions defined for hours 00 to 11. A test query with condition _*where a.time_key >= '201711200100' and a.time_key < '201711200400'*_ is only filtering out the very fist segment for the hour 00, and then progresses with scanning all the remaining 11 segments, instead of the expected 3 segments (for hours 01, 02 and 03). It looks very much like a bug, where as soon as the lower bound condition is satisfied, upper bound condition is no longer checked. I'm attaching a log excerpt to illustrate the above mentioned behavior. [^partition_elimination_bug_single_column_test.log] was (Author: seva_ostapenko): [~Shaofengshi], we tried using single partitioning column with date and time fused together. The result is still not satisfactory, as cube segments are not properly eliminated even in this case. In our test we had a table with 12 hourly partitions defined for hours 00 to 11, a query with condition _*where a.time_key >= '201711200100' and a.time_key < '201711200400'*_ is only filtering out the very fist segment for hour 00, then progresses with scanning all the remaining 11 segments, instead of expected 3 segments (hours 01, 02 and 03). It looks very much like a bug, where as soon as the lower bound condition is satisfied, upper bound condition is no longer checked. I'm attaching a log excerpt to illustrate the above mentioned behavior. [^partition_elimination_bug_single_column_test.log] > Partition elimination algorithm seems to be inefficient and have serious > issues with handling date/time ranges, can lead to very slow queries and > OOM/Java heap dump conditions > --- > > Key: KYLIN-3122 > URL: https://issues.apache.org/jira/browse/KYLIN-3122 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v2.2.0 > Environment: HDP 2.5.6, Kylin 2.2.0 >Reporter: Vsevolod Ostapenko >Assignee: hongbin ma >Priority: Critical > Attachments: partition_elimination_bug_single_column_test.log > > > Current algorithm of cube segment elimination seems to be rather inefficient. > We are using a model where cubes are partitioned by date and time: > "partition_desc": > { "partition_date_column": "A_VL_HOURLY_V.THEDATE", "partition_time_column": > "A_VL_HOURLY_V.THEHOUR", "partition_date_start": 0, "partition_date_format": > "MMdd", "partition_time_format": "HH", "partition_type": "APPEND", > "partition_condition_builder": > "org.apache.kylin.metadata.model.PartitionDesc$DefaultPartitionConditionBuilder" > } > , > Cubes contain partitions for multiple days and 24 hours for each day. Each > cube segment corresponds to just one hour. > When a query is issued where both date and hour are specified using equality > condition (e.g. thedate = '20171011' and thehour = '10') Kylin sequentially > integrates over all the segment cubes (hundreds of them) only to skip all > except for the one that needs to be scanned (which can be observed by looking > in the logs). > The expectation is that Kylin would use existing info on the partitioning > columns (date and time) and known hierarchical relations between date and > time to locate required partition much more efficiently that linear scan > through all the cube partitions. > Now, if filtering condition is on the range of hours, behavior of the > partition pruning and scanning becomes not very logical, which suggests bugs > in the logic. > If filtering condition is on specific date and closed-open range of hours > (e.g. thedate = '20171011' and thehour >= '10' and thehour < '11'), in > addition to sequentially scanning all the cube partitions (as described > above), Kylin will scan HBase tables for all the hours from the specified > starting hour and till the last hour of the day (e.g. from hour 10 to 24, > instead of just hour 10). > As the result query will run much longer that necessary, and might run out > of memory, causing JVM heap dump and Kylin server crash. > If filtering condition is on specific date by hour interval is specified as > open-closed (e.g. thedate = '20171011' and thehour > '09' and thehour <= > '10'), Kylin will scan all HBase tables for all the later dates a
[jira] [Commented] (KYLIN-3122) Partition elimination algorithm seems to be inefficient and have serious issues with handling date/time ranges, can lead to very slow queries and OOM/Java heap dump con
[ https://issues.apache.org/jira/browse/KYLIN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16343677#comment-16343677 ] Vsevolod Ostapenko commented on KYLIN-3122: --- [~Shaofengshi], we tried using single partitioning column with date and time fused together. The result is still not satisfactory, as cube segments are not properly eliminated even in this case. In our test we had a table with 12 hourly partitions defined for hours 00 to 11, a query with condition _*where a.time_key >= '201711200100' and a.time_key < '201711200400'*_ is only filtering out the very fist segment for hour 00, then progresses with scanning all the remaining 11 segments, instead of expected 3 segments (hours 01, 02 and 03). It looks very much like a bug, where as soon as the lower bound condition is satisfied, upper bound condition is no longer checked. I'm attaching a log excerpt to illustrate the above mentioned behavior. [^partition_elimination_bug_single_column_test.log] > Partition elimination algorithm seems to be inefficient and have serious > issues with handling date/time ranges, can lead to very slow queries and > OOM/Java heap dump conditions > --- > > Key: KYLIN-3122 > URL: https://issues.apache.org/jira/browse/KYLIN-3122 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v2.2.0 > Environment: HDP 2.5.6, Kylin 2.2.0 >Reporter: Vsevolod Ostapenko >Assignee: hongbin ma >Priority: Critical > Attachments: partition_elimination_bug_single_column_test.log > > > Current algorithm of cube segment elimination seems to be rather inefficient. > We are using a model where cubes are partitioned by date and time: > "partition_desc": > { "partition_date_column": "A_VL_HOURLY_V.THEDATE", "partition_time_column": > "A_VL_HOURLY_V.THEHOUR", "partition_date_start": 0, "partition_date_format": > "MMdd", "partition_time_format": "HH", "partition_type": "APPEND", > "partition_condition_builder": > "org.apache.kylin.metadata.model.PartitionDesc$DefaultPartitionConditionBuilder" > } > , > Cubes contain partitions for multiple days and 24 hours for each day. Each > cube segment corresponds to just one hour. > When a query is issued where both date and hour are specified using equality > condition (e.g. thedate = '20171011' and thehour = '10') Kylin sequentially > integrates over all the segment cubes (hundreds of them) only to skip all > except for the one that needs to be scanned (which can be observed by looking > in the logs). > The expectation is that Kylin would use existing info on the partitioning > columns (date and time) and known hierarchical relations between date and > time to locate required partition much more efficiently that linear scan > through all the cube partitions. > Now, if filtering condition is on the range of hours, behavior of the > partition pruning and scanning becomes not very logical, which suggests bugs > in the logic. > If filtering condition is on specific date and closed-open range of hours > (e.g. thedate = '20171011' and thehour >= '10' and thehour < '11'), in > addition to sequentially scanning all the cube partitions (as described > above), Kylin will scan HBase tables for all the hours from the specified > starting hour and till the last hour of the day (e.g. from hour 10 to 24, > instead of just hour 10). > As the result query will run much longer that necessary, and might run out > of memory, causing JVM heap dump and Kylin server crash. > If filtering condition is on specific date by hour interval is specified as > open-closed (e.g. thedate = '20171011' and thehour > '09' and thehour <= > '10'), Kylin will scan all HBase tables for all the later dates and hours > (e.g. from hour 10 and till the most recent hour on the most recent day, > which can be hundreds of tables and thousands of regions). > As the result query execution will dramatically increase and in most cases > Kylin server will be terminated with OOM error and JVM heap dump. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KYLIN-3122) Partition elimination algorithm seems to be inefficient and have serious issues with handling date/time ranges, can lead to very slow queries and OOM/Java heap dump condi
[ https://issues.apache.org/jira/browse/KYLIN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vsevolod Ostapenko updated KYLIN-3122: -- Attachment: partition_elimination_bug_single_column_test.log > Partition elimination algorithm seems to be inefficient and have serious > issues with handling date/time ranges, can lead to very slow queries and > OOM/Java heap dump conditions > --- > > Key: KYLIN-3122 > URL: https://issues.apache.org/jira/browse/KYLIN-3122 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v2.2.0 > Environment: HDP 2.5.6, Kylin 2.2.0 >Reporter: Vsevolod Ostapenko >Assignee: hongbin ma >Priority: Critical > Attachments: partition_elimination_bug_single_column_test.log > > > Current algorithm of cube segment elimination seems to be rather inefficient. > We are using a model where cubes are partitioned by date and time: > "partition_desc": > { "partition_date_column": "A_VL_HOURLY_V.THEDATE", "partition_time_column": > "A_VL_HOURLY_V.THEHOUR", "partition_date_start": 0, "partition_date_format": > "MMdd", "partition_time_format": "HH", "partition_type": "APPEND", > "partition_condition_builder": > "org.apache.kylin.metadata.model.PartitionDesc$DefaultPartitionConditionBuilder" > } > , > Cubes contain partitions for multiple days and 24 hours for each day. Each > cube segment corresponds to just one hour. > When a query is issued where both date and hour are specified using equality > condition (e.g. thedate = '20171011' and thehour = '10') Kylin sequentially > integrates over all the segment cubes (hundreds of them) only to skip all > except for the one that needs to be scanned (which can be observed by looking > in the logs). > The expectation is that Kylin would use existing info on the partitioning > columns (date and time) and known hierarchical relations between date and > time to locate required partition much more efficiently that linear scan > through all the cube partitions. > Now, if filtering condition is on the range of hours, behavior of the > partition pruning and scanning becomes not very logical, which suggests bugs > in the logic. > If filtering condition is on specific date and closed-open range of hours > (e.g. thedate = '20171011' and thehour >= '10' and thehour < '11'), in > addition to sequentially scanning all the cube partitions (as described > above), Kylin will scan HBase tables for all the hours from the specified > starting hour and till the last hour of the day (e.g. from hour 10 to 24, > instead of just hour 10). > As the result query will run much longer that necessary, and might run out > of memory, causing JVM heap dump and Kylin server crash. > If filtering condition is on specific date by hour interval is specified as > open-closed (e.g. thedate = '20171011' and thehour > '09' and thehour <= > '10'), Kylin will scan all HBase tables for all the later dates and hours > (e.g. from hour 10 and till the most recent hour on the most recent day, > which can be hundreds of tables and thousands of regions). > As the result query execution will dramatically increase and in most cases > Kylin server will be terminated with OOM error and JVM heap dump. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3122) Partition elimination algorithm seems to be inefficient and have serious issues with handling date/time ranges, can lead to very slow queries and OOM/Java heap dump con
[ https://issues.apache.org/jira/browse/KYLIN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16339446#comment-16339446 ] Vsevolod Ostapenko commented on KYLIN-3122: --- [~Shaofengshi] or [~yimingliu], could one of you guys please assign this bug for proper investigation as this issue is derailing our product development plans and is a showstopper for any real production deployment? > Partition elimination algorithm seems to be inefficient and have serious > issues with handling date/time ranges, can lead to very slow queries and > OOM/Java heap dump conditions > --- > > Key: KYLIN-3122 > URL: https://issues.apache.org/jira/browse/KYLIN-3122 > Project: Kylin > Issue Type: Bug > Components: Storage - HBase >Affects Versions: v2.2.0 > Environment: HDP 2.5.6, Kylin 2.2.0 >Reporter: Vsevolod Ostapenko >Assignee: hongbin ma >Priority: Critical > > Current algorithm of cube segment elimination seems to be rather inefficient. > We are using a model where cubes are partitioned by date and time: > "partition_desc": > { "partition_date_column": "A_VL_HOURLY_V.THEDATE", "partition_time_column": > "A_VL_HOURLY_V.THEHOUR", "partition_date_start": 0, "partition_date_format": > "MMdd", "partition_time_format": "HH", "partition_type": "APPEND", > "partition_condition_builder": > "org.apache.kylin.metadata.model.PartitionDesc$DefaultPartitionConditionBuilder" > } > , > Cubes contain partitions for multiple days and 24 hours for each day. Each > cube segment corresponds to just one hour. > When a query is issued where both date and hour are specified using equality > condition (e.g. thedate = '20171011' and thehour = '10') Kylin sequentially > integrates over all the segment cubes (hundreds of them) only to skip all > except for the one that needs to be scanned (which can be observed by looking > in the logs). > The expectation is that Kylin would use existing info on the partitioning > columns (date and time) and known hierarchical relations between date and > time to locate required partition much more efficiently that linear scan > through all the cube partitions. > Now, if filtering condition is on the range of hours, behavior of the > partition pruning and scanning becomes not very logical, which suggests bugs > in the logic. > If filtering condition is on specific date and closed-open range of hours > (e.g. thedate = '20171011' and thehour >= '10' and thehour < '11'), in > addition to sequentially scanning all the cube partitions (as described > above), Kylin will scan HBase tables for all the hours from the specified > starting hour and till the last hour of the day (e.g. from hour 10 to 24, > instead of just hour 10). > As the result query will run much longer that necessary, and might run out > of memory, causing JVM heap dump and Kylin server crash. > If filtering condition is on specific date by hour interval is specified as > open-closed (e.g. thedate = '20171011' and thehour > '09' and thehour <= > '10'), Kylin will scan all HBase tables for all the later dates and hours > (e.g. from hour 10 and till the most recent hour on the most recent day, > which can be hundreds of tables and thousands of regions). > As the result query execution will dramatically increase and in most cases > Kylin server will be terminated with OOM error and JVM heap dump. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KYLIN-3122) Partition elimination algorithm seems to be inefficient and have serious issues with handling date/time ranges, can lead to very slow queries and OOM/Java heap dump condi
[ https://issues.apache.org/jira/browse/KYLIN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vsevolod Ostapenko updated KYLIN-3122: -- Priority: Critical (was: Major) > Partition elimination algorithm seems to be inefficient and have serious > issues with handling date/time ranges, can lead to very slow queries and > OOM/Java heap dump conditions > --- > > Key: KYLIN-3122 > URL: https://issues.apache.org/jira/browse/KYLIN-3122 > Project: Kylin > Issue Type: Bug > Components: Storage - HBase >Affects Versions: v2.2.0 > Environment: HDP 2.5.6, Kylin 2.2.0 >Reporter: Vsevolod Ostapenko >Assignee: hongbin ma >Priority: Critical > > Current algorithm of cube segment elimination seems to be rather inefficient. > We are using a model where cubes are partitioned by date and time: > "partition_desc": > { "partition_date_column": "A_VL_HOURLY_V.THEDATE", "partition_time_column": > "A_VL_HOURLY_V.THEHOUR", "partition_date_start": 0, "partition_date_format": > "MMdd", "partition_time_format": "HH", "partition_type": "APPEND", > "partition_condition_builder": > "org.apache.kylin.metadata.model.PartitionDesc$DefaultPartitionConditionBuilder" > } > , > Cubes contain partitions for multiple days and 24 hours for each day. Each > cube segment corresponds to just one hour. > When a query is issued where both date and hour are specified using equality > condition (e.g. thedate = '20171011' and thehour = '10') Kylin sequentially > integrates over all the segment cubes (hundreds of them) only to skip all > except for the one that needs to be scanned (which can be observed by looking > in the logs). > The expectation is that Kylin would use existing info on the partitioning > columns (date and time) and known hierarchical relations between date and > time to locate required partition much more efficiently that linear scan > through all the cube partitions. > Now, if filtering condition is on the range of hours, behavior of the > partition pruning and scanning becomes not very logical, which suggests bugs > in the logic. > If filtering condition is on specific date and closed-open range of hours > (e.g. thedate = '20171011' and thehour >= '10' and thehour < '11'), in > addition to sequentially scanning all the cube partitions (as described > above), Kylin will scan HBase tables for all the hours from the specified > starting hour and till the last hour of the day (e.g. from hour 10 to 24, > instead of just hour 10). > As the result query will run much longer that necessary, and might run out > of memory, causing JVM heap dump and Kylin server crash. > If filtering condition is on specific date by hour interval is specified as > open-closed (e.g. thedate = '20171011' and thehour > '09' and thehour <= > '10'), Kylin will scan all HBase tables for all the later dates and hours > (e.g. from hour 10 and till the most recent hour on the most recent day, > which can be hundreds of tables and thousands of regions). > As the result query execution will dramatically increase and in most cases > Kylin server will be terminated with OOM error and JVM heap dump. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KYLIN-3122) Partition elimination algorithm seems to be inefficient and have serious issues with handling date/time ranges, can lead to very slow queries and OOM/Java heap dump condi
[ https://issues.apache.org/jira/browse/KYLIN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vsevolod Ostapenko updated KYLIN-3122: -- Description: Current algorithm of cube segment elimination seems to be rather inefficient. We are using a model where cubes are partitioned by date and time: "partition_desc": { "partition_date_column": "A_VL_HOURLY_V.THEDATE", "partition_time_column": "A_VL_HOURLY_V.THEHOUR", "partition_date_start": 0, "partition_date_format": "MMdd", "partition_time_format": "HH", "partition_type": "APPEND", "partition_condition_builder": "org.apache.kylin.metadata.model.PartitionDesc$DefaultPartitionConditionBuilder" } , Cubes contain partitions for multiple days and 24 hours for each day. Each cube segment corresponds to just one hour. When a query is issued where both date and hour are specified using equality condition (e.g. thedate = '20171011' and thehour = '10') Kylin sequentially integrates over all the segment cubes (hundreds of them) only to skip all except for the one that needs to be scanned (which can be observed by looking in the logs). The expectation is that Kylin would use existing info on the partitioning columns (date and time) and known hierarchical relations between date and time to locate required partition much more efficiently that linear scan through all the cube partitions. Now, if filtering condition is on the range of hours, behavior of the partition pruning and scanning becomes not very logical, which suggests bugs in the logic. If filtering condition is on specific date and closed-open range of hours (e.g. thedate = '20171011' and thehour >= '10' and thehour < '11'), in addition to sequentially scanning all the cube partitions (as described above), Kylin will scan HBase tables for all the hours from the specified starting hour and till the last hour of the day (e.g. from hour 10 to 24, instead of just hour 10). As the result query will run much longer that necessary, and might run out of memory, causing JVM heap dump and Kylin server crash. If filtering condition is on specific date by hour interval is specified as open-closed (e.g. thedate = '20171011' and thehour > '09' and thehour <= '10'), Kylin will scan all HBase tables for all the later dates and hours (e.g. from hour 10 and till the most recent hour on the most recent day, which can be hundreds of tables and thousands of regions). As the result query execution will dramatically increase and in most cases Kylin server will be terminated with OOM error and JVM heap dump. was: Current algorithm of cube segment elimination seems to be rather inefficient. We are using a model where cubes are partitioned by date and time: "partition_desc": { "partition_date_column": "A_VL_HOURLY_V.THEDATE", "partition_time_column": "A_VL_HOURLY_V.THEHOUR", "partition_date_start": 0, "partition_date_format": "MMdd", "partition_time_format": "HH", "partition_type": "APPEND", "partition_condition_builder": "org.apache.kylin.metadata.model.PartitionDesc$DefaultPartitionConditionBuilder" }, Cubes contain partitions for multiple days and 24 hours for each day. Each cube segment corresponds to just one hour. When a query is issued where both date and hour are specified using equality condition (e.g. thedate = '20171011' and thehour = '10') Kylin sequentially integrates over all the segment cubes (hundreds of them) only to skip all except for the one that needs to be scanned (which can be observed by looking in the logs). The expectation is that Kylin would use existing info on the partitioning columns (date and time) and known hierarchical relations between date and time to locate required partition much more efficiently that linear scan through all the cube partitions. Now, if filtering condition is on the range of hours, behavior of the partition pruning and scanning becomes not very logical, which suggests bugs in the logic. If filtering condition is on specific date and closed-open range of hours (e.g. thedate = '20171011' and thehour >= '10' and thehour < '11'), in addition to sequentially scanning all the cube partitions (as described above), Kylin will scan HBase tables for all the hours from the specified starting hour and till the last hour of the day (e.g. from hour 10 to 24, instead of just hour 10). As the result query will run much longer that necessary, and might run out of memory, causing JVM heap dump and Kylin server crash. If filtering condition is on specific date by hour interval is specified as open-closed (e.g. thedate = '20171011' and thehour > '09' and thehour <= '10'), Kylin will scan all HBase tables for all the later dates and hours (e.g. from hour 10 and till the most recent hour on the most recent day, which can be hundreds of r). As the result query execution will dramatically increase and in most cases Kylin server will
[jira] [Created] (KYLIN-3186) Add support for partitioning columns that combine date and time (e.g. YYYYMMDDHHMISS)
Vsevolod Ostapenko created KYLIN-3186: - Summary: Add support for partitioning columns that combine date and time (e.g. MMDDHHMISS) Key: KYLIN-3186 URL: https://issues.apache.org/jira/browse/KYLIN-3186 Project: Kylin Issue Type: Improvement Components: General Affects Versions: v2.2.0 Reporter: Vsevolod Ostapenko In a multitude of existing enterprise applications partitioning is done on a single column that fuse date and time into a single value (string, integer or big integer). Typical formats are MMDDHHMM or MMDDHHMMSS (e.g. 201801181621 and 20180118154734). Such representation is human readable and provides natural sorting of the date/time values. Lack of support for such date/time representation requires some ugly workarounds, like creating views that split date and time into separate columns or data copying into tables with different partitioning scheme, none of which is a particularly good solution. More over, using views approach on Hive causes severe performance issues, due to inability of Hive optimizer correctly analyze filtering conditions auto-generated by Kylin during the flat table build step. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KYLIN-3185) Change handling of new metrics in the hybrid cube scenario to return NULL values for segments built before metrics were added
Vsevolod Ostapenko created KYLIN-3185: - Summary: Change handling of new metrics in the hybrid cube scenario to return NULL values for segments built before metrics were added Key: KYLIN-3185 URL: https://issues.apache.org/jira/browse/KYLIN-3185 Project: Kylin Issue Type: Improvement Components: Query Engine Affects Versions: v2.2.0 Environment: HDP 2.5.6, Kylin 2.2.0 Reporter: Vsevolod Ostapenko Assignee: liyang Currently, when a hybrid cube is defined and a new metric is added, cube segments that were created before the metric was introduced are not consulted, if a query contains this new metric. As the result, even data for metrics that existed and were computed are not returned. A better behavior would be to find an intersection between metrics present in a segment and metrics requested by the query and for given segment return all the available metrics and inject NULL values for ones that did not exist at the time of cube segment population. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3139) Failure in map-reduce job due to undefined hdp.version variable when using HDP stack and remote HBase cluster
[ https://issues.apache.org/jira/browse/KYLIN-3139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16313524#comment-16313524 ] Vsevolod Ostapenko commented on KYLIN-3139: --- [~Shaofengshi], I'm not sure who to ask to review my proposed changes to this JIRA. Perhaps you could have a look or direct it to the correct reviewer? Thanks in advance. > Failure in map-reduce job due to undefined hdp.version variable when using > HDP stack and remote HBase cluster > - > > Key: KYLIN-3139 > URL: https://issues.apache.org/jira/browse/KYLIN-3139 > Project: Kylin > Issue Type: Bug > Components: General >Affects Versions: v2.2.0 > Environment: HDP 2.5.6, two cluster setup, Kylin 2.2.0 in a cluster > with Hive only, remote HBase cluster for data storage >Reporter: Vsevolod Ostapenko >Assignee: Vsevolod Ostapenko >Priority: Minor > Labels: hdp > Attachments: KYLIN-3139.master.001.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > When running on top of HDP stack and using a setup where Hive and HBase run > in different clusters cube build/refresh fails on the step "Extract Fact > Table Distinct Columns" with the error > {quote}java.lang.IllegalArgumentException: Unable to parse > '/hdp/apps/$\{hdp.version\}/mapreduce/mapreduce.tar.gz#mr-framework' as a > URI, check the setting for mapreduce.application.framework.path{quote} > Based on existing JIRA discussions in Ambari project, it's responsibility of > a service to set hdp.version Java property. When HBase is not installed as a > service in a cluster where Kylin server is running, hbase launcher (invoked > by kylin.sh) does not set this property (presumably because HBase in that > case is just a client and not a service). > The only suitable workaround found so far is to set property as part of the > conf/setenv.sh script. > In order to avoid hard coding of the HDP version info, suggested change to > setenv.sh will attempt to detect HDP version at run-time. It should work for > all released HDP version from 2.2.x to 2.6.x > In addition to that, it will also try to locate and set Java native library > path, when running on top of HDP. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KYLIN-3069) Add proper time zone support to the WebUI instead of GMT/PST kludge
[ https://issues.apache.org/jira/browse/KYLIN-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16313505#comment-16313505 ] Vsevolod Ostapenko commented on KYLIN-3069: --- [~Zhixiong Chen], could you please review the changes and commit into the master? Patch looks fine to me (in case anyone is waiting for my feedback). > Add proper time zone support to the WebUI instead of GMT/PST kludge > --- > > Key: KYLIN-3069 > URL: https://issues.apache.org/jira/browse/KYLIN-3069 > Project: Kylin > Issue Type: Bug > Components: Web >Affects Versions: v2.2.0 > Environment: HDP 2.5.3, Kylin 2.2.0 >Reporter: Vsevolod Ostapenko >Assignee: peng.jianhua >Priority: Minor > Attachments: > 0001-KYLIN-3069-Add-proper-time-zone-support-to-the-WebUI.patch, Screen Shot > 2017-12-05 at 10.01.39 PM.png, kylin_pic1.png, kylin_pic2.png, kylin_pic3.png > > Original Estimate: 168h > Remaining Estimate: 168h > > Time zone handling logic in the WebUI is a kludge, coded to parse only > "GMT-N" time zone specifications and defaulting to PST, if parsing is not > successful (kylin/webapp/app/js/filters/filter.js) > Integrating moment and moment time zone (http://momentjs.com/timezone/docs/) > into the product, would allow correct time zone handling. > For the users who happen to reside in the geographical locations that do > observe day light savings time, usage of GMT-N format is very inconvenient > and info reported by the UI in various places is perplexing. > Needless to say that the GMT moniker itself is long deprecated. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KYLIN-3139) Failure in map-reduce job due to undefined hdp.version variable when using HDP stack and remote HBase cluster
[ https://issues.apache.org/jira/browse/KYLIN-3139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vsevolod Ostapenko updated KYLIN-3139: -- Description: When running on top of HDP stack and using a setup where Hive and HBase run in different clusters cube build/refresh fails on the step "Extract Fact Table Distinct Columns" with the error {quote}java.lang.IllegalArgumentException: Unable to parse '/hdp/apps/$\{hdp.version\}/mapreduce/mapreduce.tar.gz#mr-framework' as a URI, check the setting for mapreduce.application.framework.path{quote} Based on existing JIRA discussions in Ambari project, it's responsibility of a service to set hdp.version Java property. When HBase is not installed as a service in a cluster where Kylin server is running, hbase launcher (invoked by kylin.sh) does not set this property (presumably because HBase in that case is just a client and not a service). The only suitable workaround found so far is to set property as part of the conf/setenv.sh script. In order to avoid hard coding of the HDP version info, suggested change to setenv.sh will attempt to detect HDP version at run-time. It should work for all released HDP version from 2.2.x to 2.6.x In addition to that, it will also try to locate and set Java native library path, when running on top of HDP. was: When running on top of HDP stack and using a setup where Hive and HBase run in different clusters cube build/refresh fails on the step "Extract Fact Table Distinct Columns" with the error {quote}java.lang.IllegalArgumentException: Unable to parse '/hdp/apps/$\{hdp.version\}/mapreduce/mapreduce.tar.gz#mr-framework' as a URI, check the setting for mapreduce.application.framework.path{quote} Based on existing JIRA discussions in Ambari project, it's responsibility of a service to set hdp.version Java property. When HBase is not installed as a service in a cluster hbase launcher does not set this property (presumably because HBase in that case is just a client and not a service). The only suitable workaround found so far is to set property as part of the conf/setenv.sh script. In order to avoid hard coding of the HDP version info, suggested change to setenv.sh will attempt to detect HDP version at run-time. It should work for all released HDP version from 2.2.x to 2.6.x In addition to that, it will also try to locate and set Java native library path, when running on top of HDP. > Failure in map-reduce job due to undefined hdp.version variable when using > HDP stack and remote HBase cluster > - > > Key: KYLIN-3139 > URL: https://issues.apache.org/jira/browse/KYLIN-3139 > Project: Kylin > Issue Type: Bug > Components: General >Affects Versions: v2.2.0 > Environment: HDP 2.5.6, two cluster setup, Kylin 2.2.0 in a cluster > with Hive only, remote HBase cluster for data storage >Reporter: Vsevolod Ostapenko >Assignee: Vsevolod Ostapenko >Priority: Minor > Labels: hdp > Attachments: KYLIN-3139.master.001.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > When running on top of HDP stack and using a setup where Hive and HBase run > in different clusters cube build/refresh fails on the step "Extract Fact > Table Distinct Columns" with the error > {quote}java.lang.IllegalArgumentException: Unable to parse > '/hdp/apps/$\{hdp.version\}/mapreduce/mapreduce.tar.gz#mr-framework' as a > URI, check the setting for mapreduce.application.framework.path{quote} > Based on existing JIRA discussions in Ambari project, it's responsibility of > a service to set hdp.version Java property. When HBase is not installed as a > service in a cluster where Kylin server is running, hbase launcher (invoked > by kylin.sh) does not set this property (presumably because HBase in that > case is just a client and not a service). > The only suitable workaround found so far is to set property as part of the > conf/setenv.sh script. > In order to avoid hard coding of the HDP version info, suggested change to > setenv.sh will attempt to detect HDP version at run-time. It should work for > all released HDP version from 2.2.x to 2.6.x > In addition to that, it will also try to locate and set Java native library > path, when running on top of HDP. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KYLIN-3139) Failure in map-reduce job due to undefined hdp.version variable when using HDP stack and remote HBase cluster
[ https://issues.apache.org/jira/browse/KYLIN-3139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305767#comment-16305767 ] Vsevolod Ostapenko commented on KYLIN-3139: --- Proposed version of the patch is attached, please review and provide your feedback (or commit, if it looks OK). > Failure in map-reduce job due to undefined hdp.version variable when using > HDP stack and remote HBase cluster > - > > Key: KYLIN-3139 > URL: https://issues.apache.org/jira/browse/KYLIN-3139 > Project: Kylin > Issue Type: Bug > Components: General >Affects Versions: v2.2.0 > Environment: HDP 2.5.6, two cluster setup, Kylin 2.2.0 in a cluster > with Hive only, remote HBase cluster for data storage >Reporter: Vsevolod Ostapenko >Assignee: Vsevolod Ostapenko >Priority: Minor > Labels: hdp > Attachments: KYLIN-3139.master.001.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > When running on top of HDP stack and using a setup where Hive and HBase run > in different clusters cube build/refresh fails on the step "Extract Fact > Table Distinct Columns" with the error > {quote}java.lang.IllegalArgumentException: Unable to parse > '/hdp/apps/$\{hdp.version\}/mapreduce/mapreduce.tar.gz#mr-framework' as a > URI, check the setting for mapreduce.application.framework.path{quote} > Based on existing JIRA discussions in Ambari project, it's responsibility of > a service to set hdp.version Java property. When HBase is not installed as a > service in a cluster hbase launcher does not set this property (presumably > because HBase in that case is just a client and not a service). > The only suitable workaround found so far is to set property as part of the > conf/setenv.sh script. > In order to avoid hard coding of the HDP version info, suggested change to > setenv.sh will attempt to detect HDP version at run-time. It should work for > all released HDP version from 2.2.x to 2.6.x > In addition to that, it will also try to locate and set Java native library > path, when running on top of HDP. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KYLIN-3139) Failure in map-reduce job due to undefined hdp.version variable when using HDP stack and remote HBase cluster
[ https://issues.apache.org/jira/browse/KYLIN-3139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vsevolod Ostapenko updated KYLIN-3139: -- Attachment: KYLIN-3139.master.001.patch > Failure in map-reduce job due to undefined hdp.version variable when using > HDP stack and remote HBase cluster > - > > Key: KYLIN-3139 > URL: https://issues.apache.org/jira/browse/KYLIN-3139 > Project: Kylin > Issue Type: Bug > Components: General >Affects Versions: v2.2.0 > Environment: HDP 2.5.6, two cluster setup, Kylin 2.2.0 in a cluster > with Hive only, remote HBase cluster for data storage >Reporter: Vsevolod Ostapenko >Assignee: Vsevolod Ostapenko >Priority: Minor > Labels: hdp > Attachments: KYLIN-3139.master.001.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > When running on top of HDP stack and using a setup where Hive and HBase run > in different clusters cube build/refresh fails on the step "Extract Fact > Table Distinct Columns" with the error > {quote}java.lang.IllegalArgumentException: Unable to parse > '/hdp/apps/$\{hdp.version\}/mapreduce/mapreduce.tar.gz#mr-framework' as a > URI, check the setting for mapreduce.application.framework.path{quote} > Based on existing JIRA discussions in Ambari project, it's responsibility of > a service to set hdp.version Java property. When HBase is not installed as a > service in a cluster hbase launcher does not set this property (presumably > because HBase in that case is just a client and not a service). > The only suitable workaround found so far is to set property as part of the > conf/setenv.sh script. > In order to avoid hard coding of the HDP version info, suggested change to > setenv.sh will attempt to detect HDP version at run-time. It should work for > all released HDP version from 2.2.x to 2.6.x > In addition to that, it will also try to locate and set Java native library > path, when running on top of HDP. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KYLIN-3139) Failure in map-reduce job due to undefined hdp.version variable when using HDP stack and remote HBase cluster
Vsevolod Ostapenko created KYLIN-3139: - Summary: Failure in map-reduce job due to undefined hdp.version variable when using HDP stack and remote HBase cluster Key: KYLIN-3139 URL: https://issues.apache.org/jira/browse/KYLIN-3139 Project: Kylin Issue Type: Bug Components: General Affects Versions: v2.2.0 Environment: HDP 2.5.6, two cluster setup, Kylin 2.2.0 in a cluster with Hive only, remote HBase cluster for data storage Reporter: Vsevolod Ostapenko Assignee: Vsevolod Ostapenko Priority: Minor When running on top of HDP stack and using a setup where Hive and HBase run in different clusters cube build/refresh fails on the step "Extract Fact Table Distinct Columns" with the error {quote}java.lang.IllegalArgumentException: Unable to parse '/hdp/apps/$\{hdp.version\}/mapreduce/mapreduce.tar.gz#mr-framework' as a URI, check the setting for mapreduce.application.framework.path{quote} Based on existing JIRA discussions in Ambari project, it's responsibility of a service to set hdp.version Java property. When HBase is not installed as a service in a cluster hbase launcher does not set this property (presumably because HBase in that case is just a client and not a service). The only suitable workaround found so far is to set property as part of the conf/setenv.sh script. In order to avoid hard coding of the HDP version info, suggested change to setenv.sh will attempt to detect HDP version at run-time. It should work for all released HDP version from 2.2.x to 2.6.x In addition to that, it will also try to locate and set Java native library path, when running on top of HDP. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KYLIN-3127) In the Insights tab, results section, make the list of Cubes hit by the query either scrollable or multiline
[ https://issues.apache.org/jira/browse/KYLIN-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vsevolod Ostapenko updated KYLIN-3127: -- Description: When query hits multiple cubes or the same cube multiple times, the list of cubes is truncated as it's a single line and non-scrollable element on the page. Please refer to the enclosed screenshot. !https://issues.apache.org/jira/secure/attachment/12903336/Screen%20Shot%202017-12-21%20at%207.49.46%20PM.png! Similar behavior can observed even with a single cube with a rather long cube name. was: When query hits multiple cubes or the same cube multiple times, the list of cubes is truncated as it's a single line and non-scrollable element on the page. Please refer to the enclosed screenshot. Similar behavior can observed even with a single cube with a rather long cube name. > In the Insights tab, results section, make the list of Cubes hit by the query > either scrollable or multiline > > > Key: KYLIN-3127 > URL: https://issues.apache.org/jira/browse/KYLIN-3127 > Project: Kylin > Issue Type: Improvement > Components: Web >Affects Versions: v2.2.0 > Environment: HDP 2.5.6, Kylin 2.2.0 >Reporter: Vsevolod Ostapenko >Assignee: Zhixiong Chen >Priority: Minor > Attachments: Screen Shot 2017-12-21 at 7.49.46 PM.png > > > When query hits multiple cubes or the same cube multiple times, the list of > cubes is truncated as it's a single line and non-scrollable element on the > page. Please refer to the enclosed screenshot. > !https://issues.apache.org/jira/secure/attachment/12903336/Screen%20Shot%202017-12-21%20at%207.49.46%20PM.png! > Similar behavior can observed even with a single cube with a rather long cube > name. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KYLIN-3127) In the Insights tab, results section, make the list of Cubes hit by the query either scrollable or multiline
[ https://issues.apache.org/jira/browse/KYLIN-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vsevolod Ostapenko updated KYLIN-3127: -- Description: When query hits multiple cubes or the same cube multiple times, the list of cubes is truncated as it's a single line and non-scrollable element on the page. Please refer to the enclosed screenshot. [^Screen Shot 2017-12-21 at 7.49.46 PM.png] Similar behavior can observed even with a single cube with a rather long cube name. was: When query hits multiple cubes or the same cube multiple times, the list of cubes is truncated as it's a single line and non-scrollable element on the page. Please refer to the enclosed screenshot. Similar behavior can observed even with a single cube with a rather long cube name. > In the Insights tab, results section, make the list of Cubes hit by the query > either scrollable or multiline > > > Key: KYLIN-3127 > URL: https://issues.apache.org/jira/browse/KYLIN-3127 > Project: Kylin > Issue Type: Improvement > Components: Web >Affects Versions: v2.2.0 > Environment: HDP 2.5.6, Kylin 2.2.0 >Reporter: Vsevolod Ostapenko >Assignee: Zhixiong Chen >Priority: Minor > Attachments: Screen Shot 2017-12-21 at 7.49.46 PM.png > > > When query hits multiple cubes or the same cube multiple times, the list of > cubes is truncated as it's a single line and non-scrollable element on the > page. Please refer to the enclosed screenshot. > [^Screen Shot 2017-12-21 at 7.49.46 PM.png] > Similar behavior can observed even with a single cube with a rather long cube > name. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KYLIN-3127) In the Insights tab, results section, make the list of Cubes hit by the query either scrollable or multiline
[ https://issues.apache.org/jira/browse/KYLIN-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vsevolod Ostapenko updated KYLIN-3127: -- Description: When query hits multiple cubes or the same cube multiple times, the list of cubes is truncated as it's a single line and non-scrollable element on the page. Please refer to the enclosed screenshot. Similar behavior can observed even with a single cube with a rather long cube name. was: When query hits multiple cubes or the same cube multiple times, the list of cubes is truncated as it's a single line and non-scrollable element on the page. Please refer to the enclosed screenshot. [^Screen Shot 2017-12-21 at 7.49.46 PM.png] Similar behavior can observed even with a single cube with a rather long cube name. > In the Insights tab, results section, make the list of Cubes hit by the query > either scrollable or multiline > > > Key: KYLIN-3127 > URL: https://issues.apache.org/jira/browse/KYLIN-3127 > Project: Kylin > Issue Type: Improvement > Components: Web >Affects Versions: v2.2.0 > Environment: HDP 2.5.6, Kylin 2.2.0 >Reporter: Vsevolod Ostapenko >Assignee: Zhixiong Chen >Priority: Minor > Attachments: Screen Shot 2017-12-21 at 7.49.46 PM.png > > > When query hits multiple cubes or the same cube multiple times, the list of > cubes is truncated as it's a single line and non-scrollable element on the > page. Please refer to the enclosed screenshot. > Similar behavior can observed even with a single cube with a rather long cube > name. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KYLIN-3127) In the Insights tab, results section, make the list of Cubes hit by the query either scrollable or multiline
[ https://issues.apache.org/jira/browse/KYLIN-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vsevolod Ostapenko updated KYLIN-3127: -- Description: When query hits multiple cubes or the same cube multiple times, the list of cubes is truncated as it's a single line and non-scrollable element on the page. Please refer to the enclosed screenshot. Similar behavior can observed even with a single cube with a rather long cube name. was: When query hits multiple cubes or the same cube multiple times, the list of cubes is truncated as it's a single line and non-scrollable element on the page. Please refer to the enclosed screenshot. !Screen Shot 2017-12-21 at 7.49.46 PM.png|thumbnail! Similar behavior can observed even with a single cube with a rather long cube name. > In the Insights tab, results section, make the list of Cubes hit by the query > either scrollable or multiline > > > Key: KYLIN-3127 > URL: https://issues.apache.org/jira/browse/KYLIN-3127 > Project: Kylin > Issue Type: Improvement > Components: Web >Affects Versions: v2.2.0 > Environment: HDP 2.5.6, Kylin 2.2.0 >Reporter: Vsevolod Ostapenko >Assignee: Zhixiong Chen >Priority: Minor > Attachments: Screen Shot 2017-12-21 at 7.49.46 PM.png > > > When query hits multiple cubes or the same cube multiple times, the list of > cubes is truncated as it's a single line and non-scrollable element on the > page. Please refer to the enclosed screenshot. > > Similar behavior can observed even with a single cube with a rather long cube > name. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KYLIN-3127) In the Insights tab, results section, make the list of Cubes hit by the query either scrollable or multiline
[ https://issues.apache.org/jira/browse/KYLIN-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vsevolod Ostapenko updated KYLIN-3127: -- Description: When query hits multiple cubes or the same cube multiple times, the list of cubes is truncated as it's a single line and non-scrollable element on the page. Please refer to the enclosed screenshot. Similar behavior can observed even with a single cube with a rather long cube name. was: When query hits multiple cubes or the same cube multiple times, the list of cubes is truncated as it's a single line and non-scrollable element on the page. Please refer to the enclosed screenshot. Similar behavior can observed even with a single cube with a rather long cube name. > In the Insights tab, results section, make the list of Cubes hit by the query > either scrollable or multiline > > > Key: KYLIN-3127 > URL: https://issues.apache.org/jira/browse/KYLIN-3127 > Project: Kylin > Issue Type: Improvement > Components: Web >Affects Versions: v2.2.0 > Environment: HDP 2.5.6, Kylin 2.2.0 >Reporter: Vsevolod Ostapenko >Assignee: Zhixiong Chen >Priority: Minor > Attachments: Screen Shot 2017-12-21 at 7.49.46 PM.png > > > When query hits multiple cubes or the same cube multiple times, the list of > cubes is truncated as it's a single line and non-scrollable element on the > page. Please refer to the enclosed screenshot. > > Similar behavior can observed even with a single cube with a rather long cube > name. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KYLIN-3127) In the Insights tab, results section, make the list of Cubes hit by the query either scrollable or multiline
[ https://issues.apache.org/jira/browse/KYLIN-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vsevolod Ostapenko updated KYLIN-3127: -- Description: When query hits multiple cubes or the same cube multiple times, the list of cubes is truncated as it's a single line and non-scrollable element on the page. Please refer to the enclosed screenshot. !Screen Shot 2017-12-21 at 7.49.46 PM.png|thumbnail! Similar behavior can observed even with a single cube with a rather long cube name. was: When query hits multiple cubes or the same cube multiple times, the list of cubes is truncated as it's a single line and non-scrollable element on the page. Please refer to the enclosed screenshot. !attachment-name.jpg|thumbnail! Similar behavior can observed even with a single cube with a rather long cube name. > In the Insights tab, results section, make the list of Cubes hit by the query > either scrollable or multiline > > > Key: KYLIN-3127 > URL: https://issues.apache.org/jira/browse/KYLIN-3127 > Project: Kylin > Issue Type: Improvement > Components: Web >Affects Versions: v2.2.0 > Environment: HDP 2.5.6, Kylin 2.2.0 >Reporter: Vsevolod Ostapenko >Assignee: Zhixiong Chen >Priority: Minor > Attachments: Screen Shot 2017-12-21 at 7.49.46 PM.png > > > When query hits multiple cubes or the same cube multiple times, the list of > cubes is truncated as it's a single line and non-scrollable element on the > page. Please refer to the enclosed screenshot. > !Screen Shot 2017-12-21 at 7.49.46 PM.png|thumbnail! > > Similar behavior can observed even with a single cube with a rather long cube > name. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KYLIN-3127) In the Insights tab, results section, make the list of Cubes hit by the query either scrollable or multiline
[ https://issues.apache.org/jira/browse/KYLIN-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vsevolod Ostapenko updated KYLIN-3127: -- Description: When query hits multiple cubes or the same cube multiple times, the list of cubes is truncated as it's a single line and non-scrollable element on the page. Please refer to the enclosed screenshot. !attachment-name.jpg|thumbnail! Similar behavior can observed even with a single cube with a rather long cube name. was: When query hits multiple cubes or the same cube multiple times, the list of cubes is truncated as it's a single line and non-scrollable element on the page. Please refer to the enclosed screenshot. !Screen Shot 2017-12-21 at 7.49.46 PM.png|thumbnail! Similar behavior can observed even with a single cube with a rather long cube name. > In the Insights tab, results section, make the list of Cubes hit by the query > either scrollable or multiline > > > Key: KYLIN-3127 > URL: https://issues.apache.org/jira/browse/KYLIN-3127 > Project: Kylin > Issue Type: Improvement > Components: Web >Affects Versions: v2.2.0 > Environment: HDP 2.5.6, Kylin 2.2.0 >Reporter: Vsevolod Ostapenko >Assignee: Zhixiong Chen >Priority: Minor > Attachments: Screen Shot 2017-12-21 at 7.49.46 PM.png > > > When query hits multiple cubes or the same cube multiple times, the list of > cubes is truncated as it's a single line and non-scrollable element on the > page. Please refer to the enclosed screenshot. > !attachment-name.jpg|thumbnail! > > Similar behavior can observed even with a single cube with a rather long cube > name. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KYLIN-3127) In the Insights tab, results section, make the list of Cubes hit by the query either scrollable or multiline
Vsevolod Ostapenko created KYLIN-3127: - Summary: In the Insights tab, results section, make the list of Cubes hit by the query either scrollable or multiline Key: KYLIN-3127 URL: https://issues.apache.org/jira/browse/KYLIN-3127 Project: Kylin Issue Type: Improvement Components: Web Affects Versions: v2.2.0 Environment: HDP 2.5.6, Kylin 2.2.0 Reporter: Vsevolod Ostapenko Assignee: Zhixiong Chen Priority: Minor Attachments: Screen Shot 2017-12-21 at 7.49.46 PM.png When query hits multiple cubes or the same cube multiple times, the list of cubes is truncated as it's a single line and non-scrollable element on the page. Please refer to the enclosed screenshot. !Screen Shot 2017-12-21 at 7.49.46 PM.png|thumbnail! Similar behavior can observed even with a single cube with a rather long cube name. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KYLIN-3126) Query fails with "Error while compiling generated Java code" when equality condition is used, and works when equivalent IN clause is specified
[ https://issues.apache.org/jira/browse/KYLIN-3126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vsevolod Ostapenko updated KYLIN-3126: -- Description: The following query fails with "Error while compiling generated Java code", when equality condition is used {{(d0.year_beg_dt = '2012-01-01')}} and works when IN clause is used {{(d0.year_beg_dt in ('2012-01-01'))}} {code:sql} select d2.country, count(f.item_count) items_ttl from kylin_sales f join kylin_cal_dt d0 on f.part_dt = d0.cal_dt join kylin_account d1 on f.buyer_id = d1.account_id join kylin_country d2 on d1.account_country = d2.country where d0.year_beg_dt = '2012-01-01' -- blows up -- d0.year_beg_dt in ('2012-01-01') -- works and d2.country in ('US', 'JP') group by d2.country {code} was: The following query fails with "Error while compiling generated Java code", when equality condition is used (d0.year_beg_dt = '2012-01-01') and works when IN clause is used (d0.year_beg_dt in ('2012-01-01')) select d2.country, count(f.item_count) items_ttl from kylin_sales f join kylin_cal_dt d0 on f.part_dt = d0.cal_dt join kylin_account d1 on f.buyer_id = d1.account_id join kylin_country d2 on d1.account_country = d2.country where d0.year_beg_dt = '2012-01-01' -- blows up -- d0.year_beg_dt in ('2012-01-01') -- works and d2.country in ('US', 'JP') group by d2.country > Query fails with "Error while compiling generated Java code" when equality > condition is used, and works when equivalent IN clause is specified > -- > > Key: KYLIN-3126 > URL: https://issues.apache.org/jira/browse/KYLIN-3126 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v2.2.0 > Environment: HDP 2.5.6, Kylin 2.2.0, sample cube >Reporter: Vsevolod Ostapenko >Assignee: liyang > > The following query fails with "Error while compiling generated Java code", > when equality condition is used {{(d0.year_beg_dt = '2012-01-01')}} and works > when IN clause is used {{(d0.year_beg_dt in ('2012-01-01'))}} > {code:sql} > select > d2.country, > count(f.item_count) items_ttl > from > kylin_sales f > join > kylin_cal_dt d0 > on > f.part_dt = d0.cal_dt > join > kylin_account d1 > on > f.buyer_id = d1.account_id > join > kylin_country d2 > on > d1.account_country = d2.country > where > d0.year_beg_dt = '2012-01-01' -- blows up > -- d0.year_beg_dt in ('2012-01-01') -- works > and > d2.country in ('US', 'JP') > group by > d2.country > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (KYLIN-3121) NPE while executing a query with two left outer joins and floating point expressions on nullable fields
[ https://issues.apache.org/jira/browse/KYLIN-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16300778#comment-16300778 ] Vsevolod Ostapenko edited comment on KYLIN-3121 at 12/22/17 12:57 AM: -- [~yimingliu], sure, here is an equivalent query against the sample cube. It fails with exactly the same errors. {code:sql} with t1 as ( select d2.country, count(f.item_count) items_ttl from kylin_sales f join kylin_cal_dt d0 on f.part_dt = d0.cal_dt join kylin_account d1 on f.buyer_id = d1.account_id join kylin_country d2 on d1.account_country = d2.country where d0.year_beg_dt in ('2012-01-01') and d2.country = 'US' group by d2.country ) , t2 as ( select d2.country, count(f.item_count) items_ttl from kylin_sales f join kylin_cal_dt d0 on f.part_dt = d0.cal_dt join kylin_account d1 on f.buyer_id = d1.account_id join kylin_country d2 on d1.account_country = d2.country where d0.year_beg_dt in ('2012-01-01') and d2.country = 'JP' group by d2.country ) , t3 as ( select d2.country, count(f.item_count) items_ttl from kylin_sales f join kylin_cal_dt d0 on f.part_dt = d0.cal_dt join kylin_account d1 on f.buyer_id = d1.account_id join kylin_country d2 on d1.account_country = d2.country where d0.year_beg_dt in ('2012-01-01') and d2.country in ('US', 'JP') group by d2.country ) select t3.country, t2.items_ttl, t3.items_ttl, -- 1 * t1.items_ttl expr1, -- works 1.0 * t1.items_ttlexpr1, -- blows up, null while executing SQL -- 1 * NULLIF(t2.items_ttl, 0) expr2 -- works 1.0 * NULLIF(t2.items_ttl, 0) expr2 -- blows up, no error message, just Failed from t3 left outer join t1 on t3.country = t1.country left outer join t2 on t3.country = t2.country {code} was (Author: seva_ostapenko): [~yimingliu], sure, here is an equivalent query against the sample cube. It fails with exactly the same errors. with t1 as ( select d2.country, count(f.item_count) items_ttl from kylin_sales f join kylin_cal_dt d0 on f.part_dt = d0.cal_dt join kylin_account d1 on f.buyer_id = d1.account_id join kylin_country d2 on d1.account_country = d2.country where d0.year_beg_dt in ('2012-01-01') and d2.country = 'US' group by d2.country ) , t2 as ( select d2.country, count(f.item_count) items_ttl from kylin_sales f join kylin_cal_dt d0 on f.part_dt = d0.cal_dt join kylin_account d1 on f.buyer_id = d1.account_id join kylin_country d2 on d1.account_country = d2.country where d0.year_beg_dt in ('2012-01-01') and d2.country = 'JP' group by d2.country ) , t3 as ( select d2.country, count(f.item_count) items_ttl from kylin_sales f join kylin_cal_dt d0 on f.part_dt = d0.cal_dt join kylin_account d1 on f.buyer_id = d1.account_id join kylin_country d2 on d1.account_country = d2.country where d0.year_beg_dt in ('2012-01-01') and d2.country in ('US', 'JP') group by d2.country ) select t3.country, t2.items_ttl, t3.items_ttl, -- 1 * t1.items_ttl expr1, -- works 1.0 * t1.items_ttlexpr1, -- blows up, null while executing SQL -- 1 * NULLIF(t2.items_ttl, 0) expr2 -- works 1.0 * NULLIF(t2.items_ttl, 0) expr2 -- blows up, no error message, just Failed from t3 left outer join t1 on t3.country = t1.country left outer join t2 on t3.country = t2.country > NPE while executing a query with two left outer joins and floating point > expressions on nullable fields > --- > > Key: KYLIN-3121 > URL: https://issues.apache.org/jira/browse/KYLIN-3121 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v2.2.0 > Environment: HDP 2.5.6, Kylin 2.2.0 >Reporter: Vsevolod Ostapenko >Assignee: liyang > > Queries that include two (or more) left outer joins and contain floating > point expressions that operate on the fields that contain integer NULL values > (due to left outer join) fail in-flight with NullPointerExceptions. > As an example, the following
[jira] [Commented] (KYLIN-3121) NPE while executing a query with two left outer joins and floating point expressions on nullable fields
[ https://issues.apache.org/jira/browse/KYLIN-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16300778#comment-16300778 ] Vsevolod Ostapenko commented on KYLIN-3121: --- [~yimingliu], sure, here is an equivalent query against the sample cube. It fails with exactly the same errors. with t1 as ( select d2.country, count(f.item_count) items_ttl from kylin_sales f join kylin_cal_dt d0 on f.part_dt = d0.cal_dt join kylin_account d1 on f.buyer_id = d1.account_id join kylin_country d2 on d1.account_country = d2.country where d0.year_beg_dt in ('2012-01-01') and d2.country = 'US' group by d2.country ) , t2 as ( select d2.country, count(f.item_count) items_ttl from kylin_sales f join kylin_cal_dt d0 on f.part_dt = d0.cal_dt join kylin_account d1 on f.buyer_id = d1.account_id join kylin_country d2 on d1.account_country = d2.country where d0.year_beg_dt in ('2012-01-01') and d2.country = 'JP' group by d2.country ) , t3 as ( select d2.country, count(f.item_count) items_ttl from kylin_sales f join kylin_cal_dt d0 on f.part_dt = d0.cal_dt join kylin_account d1 on f.buyer_id = d1.account_id join kylin_country d2 on d1.account_country = d2.country where d0.year_beg_dt in ('2012-01-01') and d2.country in ('US', 'JP') group by d2.country ) select t3.country, t2.items_ttl, t3.items_ttl, -- 1 * t1.items_ttl expr1, -- works 1.0 * t1.items_ttlexpr1, -- blows up, null while executing SQL -- 1 * NULLIF(t2.items_ttl, 0) expr2 -- works 1.0 * NULLIF(t2.items_ttl, 0) expr2 -- blows up, no error message, just Failed from t3 left outer join t1 on t3.country = t1.country left outer join t2 on t3.country = t2.country > NPE while executing a query with two left outer joins and floating point > expressions on nullable fields > --- > > Key: KYLIN-3121 > URL: https://issues.apache.org/jira/browse/KYLIN-3121 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v2.2.0 > Environment: HDP 2.5.6, Kylin 2.2.0 >Reporter: Vsevolod Ostapenko >Assignee: liyang > > Queries that include two (or more) left outer joins and contain floating > point expressions that operate on the fields that contain integer NULL values > (due to left outer join) fail in-flight with NullPointerExceptions. > As an example, the following query generates NPE on either of the two > expressions: > * 100.0 * t2.media_gap_call_count > * 1.0 * NULLIF(t1.active_call_count, 0) > with > t1 > as > ( > select > d1.cell_name, > count(distinct a1.call_id) as active_call_count > from > zetticsdw.a_vl_hourly_v a1 > inner join > zetticsdw.d_cell_v d1 > on > a1.cell_key = d1.cell_key > where > d1.region_3 = 'Mumbai' > and > a1.thedate = '20171011' > and > a1.thehour = '00' > and > a1.active_call_flg = 1 > group by > d1.cell_name > ), > t2 > as > ( > select > d1.cell_name, > count(distinct a1.call_id) as media_gap_call_count > from > zetticsdw.a_vl_hourly_v a1 > inner join > zetticsdw.d_cell_v d1 > on > a1.cell_key = d1.cell_key > where > d1.region_3 = 'Mumbai' > and > a1.thedate='20171011' > and > a1.thehour = '00' > and > a1.media_gap_call_flg = 1 > group by > d1.cell_name > ) > , > t3 > as > ( > select > d1.cell_name, > sum(a1.ow_call_flg) one_way_call_count, > sum(a1.succ_call_flg) successfull_call_count > from > zetticsdw.a_vl_hourly_v a1 > inner join > zetticsdw.d_cell_v d1 > on > a1.cell_key = d1.cell_key > where > d1.region_3 = 'Mumbai' > and > a1.thedate='20171011' > and > a1.thehour = '00' > group by > d1.cell_name > ) > select >t3.cell_name, >t1.active_call_count, >t2.media_gap_call_count, >t3.one_way_call_count, >t3.successfull_call_count, >-- 100 * t2.media_gap_call_count nom, -- > works >-- 1 * NULLIF(t1.active_call_count, 0) denom-- > works >100.0 * t2.media_gap_call_count nom, -- fails, > NPE of one kind >1.0 * NULLIF(t1.active_call_count, 0) denom
[jira] [Created] (KYLIN-3126) Query fails with "Error while compiling generated Java code" when equality condition is used, and works when equivalent IN clause is specified
Vsevolod Ostapenko created KYLIN-3126: - Summary: Query fails with "Error while compiling generated Java code" when equality condition is used, and works when equivalent IN clause is specified Key: KYLIN-3126 URL: https://issues.apache.org/jira/browse/KYLIN-3126 Project: Kylin Issue Type: Bug Components: Query Engine Affects Versions: v2.2.0 Environment: HDP 2.5.6, Kylin 2.2.0, sample cube Reporter: Vsevolod Ostapenko Assignee: liyang The following query fails with "Error while compiling generated Java code", when equality condition is used (d0.year_beg_dt = '2012-01-01') and works when IN clause is used (d0.year_beg_dt in ('2012-01-01')) select d2.country, count(f.item_count) items_ttl from kylin_sales f join kylin_cal_dt d0 on f.part_dt = d0.cal_dt join kylin_account d1 on f.buyer_id = d1.account_id join kylin_country d2 on d1.account_country = d2.country where d0.year_beg_dt = '2012-01-01' -- blows up -- d0.year_beg_dt in ('2012-01-01') -- works and d2.country in ('US', 'JP') group by d2.country -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KYLIN-3122) Partition elimination algorithm seems to be inefficient and have serious issues with handling date/time ranges, can lead to very slow queries and OOM/Java heap dump condi
[ https://issues.apache.org/jira/browse/KYLIN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vsevolod Ostapenko updated KYLIN-3122: -- Description: Current algorithm of cube segment elimination seems to be rather inefficient. We are using a model where cubes are partitioned by date and time: "partition_desc": { "partition_date_column": "A_VL_HOURLY_V.THEDATE", "partition_time_column": "A_VL_HOURLY_V.THEHOUR", "partition_date_start": 0, "partition_date_format": "MMdd", "partition_time_format": "HH", "partition_type": "APPEND", "partition_condition_builder": "org.apache.kylin.metadata.model.PartitionDesc$DefaultPartitionConditionBuilder" }, Cubes contain partitions for multiple days and 24 hours for each day. Each cube segment corresponds to just one hour. When a query is issued where both date and hour are specified using equality condition (e.g. thedate = '20171011' and thehour = '10') Kylin sequentially integrates over all the segment cubes (hundreds of them) only to skip all except for the one that needs to be scanned (which can be observed by looking in the logs). The expectation is that Kylin would use existing info on the partitioning columns (date and time) and known hierarchical relations between date and time to locate required partition much more efficiently that linear scan through all the cube partitions. Now, if filtering condition is on the range of hours, behavior of the partition pruning and scanning becomes not very logical, which suggests bugs in the logic. If filtering condition is on specific date and closed-open range of hours (e.g. thedate = '20171011' and thehour >= '10' and thehour < '11'), in addition to sequentially scanning all the cube partitions (as described above), Kylin will scan HBase tables for all the hours from the specified starting hour and till the last hour of the day (e.g. from hour 10 to 24, instead of just hour 10). As the result query will run much longer that necessary, and might run out of memory, causing JVM heap dump and Kylin server crash. If filtering condition is on specific date by hour interval is specified as open-closed (e.g. thedate = '20171011' and thehour > '09' and thehour <= '10'), Kylin will scan all HBase tables for all the later dates and hours (e.g. from hour 10 and till the most recent hour on the most recent day, which can be hundreds of r). As the result query execution will dramatically increase and in most cases Kylin server will be terminated with OOM error and JVM heap dump. was: Current algorithm of cube segment elimination seems to be rather inefficient. We are using a model where cubes are partitioned by date and time: bq. "partition_desc": { bq. "partition_date_column": "A_VL_HOURLY_V.THEDATE", bq. "partition_time_column": "A_VL_HOURLY_V.THEHOUR", bq. "partition_date_start": 0, bq. "partition_date_format": "MMdd", bq. "partition_time_format": "HH", bq. "partition_type": "APPEND", bq. "partition_condition_builder": "org.apache.kylin.metadata.model.PartitionDesc$DefaultPartitionConditionBuilder" bq. }, Cubes contain partitions for multiple days and 24 hours for each day. Each cube segment corresponds to just one hour. When a query is issued where both date and hour are specified using equality condition (e.g. thedate = '20171011' and thehour = '10') Kylin sequentially integrates over all the segment cubes (hundreds of them) only to skip all except for the one that needs to be scanned (which can be observed by looking in the logs). The expectation is that Kylin would use existing info on the partitioning columns (date and time) and known hierarchical relations between date and time to locate required partition much more efficiently that linear scan through all the cube partitions. Now, if filtering condition is on the range of hours, behavior of the partition pruning and scanning becomes not very logical, which suggests bugs in the logic. If filtering condition is on specific date and closed-open range of hours (e.g. thedate = '20171011' and thehour >= '10' and thehour < '11'), in addition to sequentially scanning all the cube partitions (as described above), Kylin will scan HBase tables for all the hours from the specified starting hour and till the last hour of the day (e.g. from hour 10 to 24, instead of just hour 10). As the result query will run much longer that necessary, and might run out of memory, causing JVM heap dump and Kylin server crash. If filtering condition is on specific date by hour interval is specified as open-closed (e.g. thedate = '20171011' and thehour > '09' and thehour <= '10'), Kylin will scan all HBase tables for all the later dates and hours (e.g. from hour 10 and till the most recent hour on the most recent day, which can be hundreds of r). As the result query execution will dramatically increase and in most c
[jira] [Updated] (KYLIN-3122) Partition elimination algorithm seems to be inefficient and have serious issues with handling date/time ranges, can lead to very slow queries and OOM/Java heap dump condi
[ https://issues.apache.org/jira/browse/KYLIN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vsevolod Ostapenko updated KYLIN-3122: -- Description: Current algorithm of cube segment elimination seems to be rather inefficient. We are using a model where cubes are partitioned by date and time: bq. "partition_desc": { bq. "partition_date_column": "A_VL_HOURLY_V.THEDATE", bq. "partition_time_column": "A_VL_HOURLY_V.THEHOUR", bq. "partition_date_start": 0, bq. "partition_date_format": "MMdd", bq. "partition_time_format": "HH", bq. "partition_type": "APPEND", bq. "partition_condition_builder": "org.apache.kylin.metadata.model.PartitionDesc$DefaultPartitionConditionBuilder" bq. }, Cubes contain partitions for multiple days and 24 hours for each day. Each cube segment corresponds to just one hour. When a query is issued where both date and hour are specified using equality condition (e.g. thedate = '20171011' and thehour = '10') Kylin sequentially integrates over all the segment cubes (hundreds of them) only to skip all except for the one that needs to be scanned (which can be observed by looking in the logs). The expectation is that Kylin would use existing info on the partitioning columns (date and time) and known hierarchical relations between date and time to locate required partition much more efficiently that linear scan through all the cube partitions. Now, if filtering condition is on the range of hours, behavior of the partition pruning and scanning becomes not very logical, which suggests bugs in the logic. If filtering condition is on specific date and closed-open range of hours (e.g. thedate = '20171011' and thehour >= '10' and thehour < '11'), in addition to sequentially scanning all the cube partitions (as described above), Kylin will scan HBase tables for all the hours from the specified starting hour and till the last hour of the day (e.g. from hour 10 to 24, instead of just hour 10). As the result query will run much longer that necessary, and might run out of memory, causing JVM heap dump and Kylin server crash. If filtering condition is on specific date by hour interval is specified as open-closed (e.g. thedate = '20171011' and thehour > '09' and thehour <= '10'), Kylin will scan all HBase tables for all the later dates and hours (e.g. from hour 10 and till the most recent hour on the most recent day, which can be hundreds of r). As the result query execution will dramatically increase and in most cases Kylin server will be terminated with OOM error and JVM heap dump. was: Current algorithm of cube segment elimination seems to be rather inefficient. We are using a model where cubes are partitioned by date and time: {{"partition_desc": { "partition_date_column": "A_VL_HOURLY_V.THEDATE", "partition_time_column": "A_VL_HOURLY_V.THEHOUR", "partition_date_start": 0, "partition_date_format": "MMdd", "partition_time_format": "HH", "partition_type": "APPEND", "partition_condition_builder": "org.apache.kylin.metadata.model.PartitionDesc$DefaultPartitionConditionBuilder" },}} Cubes contain partitions for multiple days and 24 hours for each day. Each cube segment corresponds to just one hour. When a query is issued where both date and hour are specified using equality condition (e.g. thedate = '20171011' and thehour = '10') Kylin sequentially integrates over all the segment cubes (hundreds of them) only to skip all except for the one that needs to be scanned (which can be observed by looking in the logs). The expectation is that Kylin would use existing info on the partitioning columns (date and time) and known hierarchical relations between date and time to locate required partition much more efficiently that linear scan through all the cube partitions. Now, if filtering condition is on the range of hours, behavior of the partition pruning and scanning becomes not very logical, which suggests bugs in the logic. If filtering condition is on specific date and closed-open range of hours (e.g. thedate = '20171011' and thehour >= '10' and thehour < '11'), in addition to sequentially scanning all the cube partitions (as described above), Kylin will scan HBase tables for all the hours from the specified starting hour and till the last hour of the day (e.g. from hour 10 to 24, instead of just hour 10). As the result query will run much longer that necessary, and might run out of memory, causing JVM heap dump and Kylin server crash. If filtering condition is on specific date by hour interval is specified as open-closed (e.g. thedate = '20171011' and thehour > '09' and thehour <= '10'), Kylin will scan all HBase tables for all the later dates and hours (e.g. from hour 10 and till the most recent hour on the most recent day, which can be hundreds of r). As the result query execution will dramatically increase and in most ca
[jira] [Updated] (KYLIN-3122) Partition elimination algorithm seems to be inefficient and have serious issues with handling date/time ranges, can lead to very slow queries and OOM/Java heap dump condi
[ https://issues.apache.org/jira/browse/KYLIN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vsevolod Ostapenko updated KYLIN-3122: -- Description: Current algorithm of cube segment elimination seems to be rather inefficient. We are using a model where cubes are partitioned by date and time: {{"partition_desc": { "partition_date_column": "A_VL_HOURLY_V.THEDATE", "partition_time_column": "A_VL_HOURLY_V.THEHOUR", "partition_date_start": 0, "partition_date_format": "MMdd", "partition_time_format": "HH", "partition_type": "APPEND", "partition_condition_builder": "org.apache.kylin.metadata.model.PartitionDesc$DefaultPartitionConditionBuilder" },}} Cubes contain partitions for multiple days and 24 hours for each day. Each cube segment corresponds to just one hour. When a query is issued where both date and hour are specified using equality condition (e.g. thedate = '20171011' and thehour = '10') Kylin sequentially integrates over all the segment cubes (hundreds of them) only to skip all except for the one that needs to be scanned (which can be observed by looking in the logs). The expectation is that Kylin would use existing info on the partitioning columns (date and time) and known hierarchical relations between date and time to locate required partition much more efficiently that linear scan through all the cube partitions. Now, if filtering condition is on the range of hours, behavior of the partition pruning and scanning becomes not very logical, which suggests bugs in the logic. If filtering condition is on specific date and closed-open range of hours (e.g. thedate = '20171011' and thehour >= '10' and thehour < '11'), in addition to sequentially scanning all the cube partitions (as described above), Kylin will scan HBase tables for all the hours from the specified starting hour and till the last hour of the day (e.g. from hour 10 to 24, instead of just hour 10). As the result query will run much longer that necessary, and might run out of memory, causing JVM heap dump and Kylin server crash. If filtering condition is on specific date by hour interval is specified as open-closed (e.g. thedate = '20171011' and thehour > '09' and thehour <= '10'), Kylin will scan all HBase tables for all the later dates and hours (e.g. from hour 10 and till the most recent hour on the most recent day, which can be hundreds of r). As the result query execution will dramatically increase and in most cases Kylin server will be terminated with OOM error and JVM heap dump. was: Current algorithm of cube segment elimination seems to be rather inefficient. We are using a model where cubes are partitioned by date and time: "partition_desc": { "partition_date_column": "A_VL_HOURLY_V.THEDATE", "partition_time_column": "A_VL_HOURLY_V.THEHOUR", "partition_date_start": 0, "partition_date_format": "MMdd", "partition_time_format": "HH", "partition_type": "APPEND", "partition_condition_builder": "org.apache.kylin.metadata.model.PartitionDesc$DefaultPartitionConditionBuilder" }, Cubes contain partitions for multiple days and 24 hours for each day. Each cube segment corresponds to just one hour. When a query is issued where both date and hour are specified using equality condition (e.g. thedate = '20171011' and thehour = '10') Kylin sequentially integrates over all the segment cubes (hundreds of them) only to skip all except for the one that needs to be scanned (which can be observed by looking in the logs). The expectation is that Kylin would use existing info on the partitioning columns (date and time) and known hierarchical relations between date and time to locate required partition much more efficiently that linear scan through all the cube partitions. Now, if filtering condition is on the range of hours, behavior of the partition pruning and scanning becomes not very logical, which suggests bugs in the logic. If filtering condition is on specific date and closed-open range of hours (e.g. thedate = '20171011' and thehour >= '10' and thehour < '11'), in addition to sequentially scanning all the cube partitions (as described above), Kylin will scan HBase tables for all the hours from the specified starting hour and till the last hour of the day (e.g. from hour 10 to 24, instead of just hour 10). As the result query will run much longer that necessary, and might run out of memory, causing JVM heap dump and Kylin server crash. If filtering condition is on specific date by hour interval is specified as open-closed (e.g. thedate = '20171011' and thehour > '09' and thehour <= '10'), Kylin will scan all HBase tables for all the later dates and hours (e.g. from hour 10 and till the most recent hour on the most recent day, which can be hundreds of r). As the result query execution will dramatically increase and in most cases Kylin server will be terminat
[jira] [Updated] (KYLIN-3122) Partition elimination algorithm seems to be inefficient and have serious issues with handling date/time ranges, can lead to very slow queries and OOM/Java heap dump condi
[ https://issues.apache.org/jira/browse/KYLIN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vsevolod Ostapenko updated KYLIN-3122: -- Description: Current algorithm of cube segment elimination seems to be rather inefficient. We are using a model where cubes are partitioned by date and time: "partition_desc": { "partition_date_column": "A_VL_HOURLY_V.THEDATE", "partition_time_column": "A_VL_HOURLY_V.THEHOUR", "partition_date_start": 0, "partition_date_format": "MMdd", "partition_time_format": "HH", "partition_type": "APPEND", "partition_condition_builder": "org.apache.kylin.metadata.model.PartitionDesc$DefaultPartitionConditionBuilder" }, Cubes contain partitions for multiple days and 24 hours for each day. Each cube segment corresponds to just one hour. When a query is issued where both date and hour are specified using equality condition (e.g. thedate = '20171011' and thehour = '10') Kylin sequentially integrates over all the segment cubes (hundreds of them) only to skip all except for the one that needs to be scanned (which can be observed by looking in the logs). The expectation is that Kylin would use existing info on the partitioning columns (date and time) and known hierarchical relations between date and time to locate required partition much more efficiently that linear scan through all the cube partitions. Now, if filtering condition is on the range of hours, behavior of the partition pruning and scanning becomes not very logical, which suggests bugs in the logic. If filtering condition is on specific date and closed-open range of hours (e.g. thedate = '20171011' and thehour >= '10' and thehour < '11'), in addition to sequentially scanning all the cube partitions (as described above), Kylin will scan HBase tables for all the hours from the specified starting hour and till the last hour of the day (e.g. from hour 10 to 24, instead of just hour 10). As the result query will run much longer that necessary, and might run out of memory, causing JVM heap dump and Kylin server crash. If filtering condition is on specific date by hour interval is specified as open-closed (e.g. thedate = '20171011' and thehour > '09' and thehour <= '10'), Kylin will scan all HBase tables for all the later dates and hours (e.g. from hour 10 and till the most recent hour on the most recent day, which can be hundreds of r). As the result query execution will dramatically increase and in most cases Kylin server will be terminated with OOM error and JVM heap dump. was: Current algorithm of cube segment elimination seems to be rather inefficient. We are using a model where cubes are partitioned by date and time: "partition_desc": { "partition_date_column": "A_VL_HOURLY_V.THEDATE", "partition_time_column": "A_VL_HOURLY_V.THEHOUR", "partition_date_start": 0, "partition_date_format": "MMdd", "partition_time_format": "HH", "partition_type": "APPEND", "partition_condition_builder": "org.apache.kylin.metadata.model.PartitionDesc$DefaultPartitionConditionBuilder" }, Cubes contain partitions for multiple days and 24 hours for each day. Each cube segment corresponds to just one hour. When a query is issued where both date and hour are specified using equality condition (e.g. thedate = '20171011' and thehour = '00') Kylin sequentially integrates over all the segment cubes (hundreds of them) only to skip all except for the one that needs to be scanned (which can be observed by looking in the logs). The expectation is that Kylin would use existing info on the partitioning columns (date and time) and known hierarchical relations between date and time to locate required partition much more efficiently that linear scan through all the cube partitions. Now, if filtering condition is on the range of hours, behavior of the partition pruning and scanning becomes not very logical, which suggest bugs in the logic. If condition is on specific date and closed-open range of hours (e.g. thedate = '20171011' and thehour >= '10' and thehour < '11'), in addition to sequentially scanning all the cube partitions (as described above), Kylin will scan HBase regions for all the hours from the starting hour and till the last hour of the day (e.g. from hour 10 to 24). As the result query will run much longer that necessary, and might run out of memory, causing JVM heap dump and Kylin server crash. If condition is on specific date by hour interval is specified as open-closed (e.g. thedate = '20171011' and thehour > '10' and thehour <= '11'), Kylin will scan all HBase regions for all the later dates and hours (e.g. from hour 10 and till the most recent hour on the most recent day). As the result query execution will dramatically increase and in most cases Kylin server will be terminated with OOM error and JVM heap dump. > Partition elimination algorithm seems to be in
[jira] [Updated] (KYLIN-3122) Partition elimination algorithm seems to be inefficient and have serious issues with handling date/time ranges, can lead to very slow queries and OOM/Java heap dump condi
[ https://issues.apache.org/jira/browse/KYLIN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vsevolod Ostapenko updated KYLIN-3122: -- Description: Current algorithm of cube segment elimination seems to be rather inefficient. We are using a model where cubes are partitioned by date and time: "partition_desc": { "partition_date_column": "A_VL_HOURLY_V.THEDATE", "partition_time_column": "A_VL_HOURLY_V.THEHOUR", "partition_date_start": 0, "partition_date_format": "MMdd", "partition_time_format": "HH", "partition_type": "APPEND", "partition_condition_builder": "org.apache.kylin.metadata.model.PartitionDesc$DefaultPartitionConditionBuilder" }, Cubes contain partitions for multiple days and 24 hours for each day. Each cube segment corresponds to just one hour. When a query is issued where both date and hour are specified using equality condition (e.g. thedate = '20171011' and thehour = '00') Kylin sequentially integrates over all the segment cubes (hundreds of them) only to skip all except for the one that needs to be scanned (which can be observed by looking in the logs). The expectation is that Kylin would use existing info on the partitioning columns (date and time) and known hierarchical relations between date and time to locate required partition much more efficiently that linear scan through all the cube partitions. Now, if filtering condition is on the range of hours, behavior of the partition pruning and scanning becomes not very logical, which suggest bugs in the logic. If condition is on specific date and closed-open range of hours (e.g. thedate = '20171011' and thehour >= '10' and thehour < '11'), in addition to sequentially scanning all the cube partitions (as described above), Kylin will scan HBase regions for all the hours from the starting hour and till the last hour of the day (e.g. from hour 10 to 24). As the result query will run much longer that necessary, and might run out of memory, causing JVM heap dump and Kylin server crash. If condition is on specific date by hour interval is specified as open-closed (e.g. thedate = '20171011' and thehour > '10' and thehour <= '11'), Kylin will scan all HBase regions for all the later dates and hours (e.g. from hour 10 and till the most recent hour on the most recent day). As the result query execution will dramatically increase and in most cases Kylin server will be terminated with OOM error and JVM heap dump. was: Current algorithm of cube segment elimination seems to be rather inefficient. We are using a model where cubes are partitioned by date and time: "partition_desc": { "partition_date_column": "A_VL_HOURLY_V.THEDATE", "partition_time_column": "A_VL_HOURLY_V.THEHOUR", "partition_date_start": 0, "partition_date_format": "MMdd", "partition_time_format": "HH", "partition_type": "APPEND", "partition_condition_builder": "org.apache.kylin.metadata.model.PartitionDesc$DefaultPartitionConditionBuilder" }, Cubes contain partitions for multiple days and 24 hours for each day. Each cube segment corresponds to just one hour. When a query is issued where both date and hour are specified using equality condition (e.g. thedate = '20171011' and thehour = '00') Kylin sequentially integrates over all the segment cubes (hundreds of them) only to skip all except for the one that needs to be scanned (which can be observed by looking in the logs). The expectation is that Kylin would use existing info on the partitioning columns (date and time) and known hierarchical relations between date and time to locate required partition much more efficiently that linear scan through all the cube partitions. Now, if filtering condition is on the range of hours, behavior of the partition pruning and scanning becomes not very logical, which suggest bugs in the logic. If condition is on specific date and closed-open range of hours (e.g. thedate = '20171011' and thehour >= '10' and thehour < '11'), in addition to sequentially scanning all the cube partitions (as described above), Kylin will scan HBase regions for all the hours from the starting hour and till the last hour of the day (e.g. from hour 10 to 24). As the result query will run much longer that necessary, and might run out of memory. If condition is on specific date by hour interval is specified as open-closed (e.g. thedate = '20171011' and thehour > '10' and thehour <= '11'), Kylin will scan all HBase regions for all the later dates and hours (e.g. from hour 10 and till the most recent hour on the most recent day). As the result query execution will dramatically increase and in most cases Kylin server will be terminated with OOM error and JVM heap dump. > Partition elimination algorithm seems to be inefficient and have serious > issues with handling date/time ranges, can lead to very slow queries and > OOM/Java heap dump cond
[jira] [Created] (KYLIN-3122) Partition elimination algorithm seems to be inefficient and have serious issues with handling date/time ranges, can lead to very slow queries and OOM/Java heap dump condi
Vsevolod Ostapenko created KYLIN-3122: - Summary: Partition elimination algorithm seems to be inefficient and have serious issues with handling date/time ranges, can lead to very slow queries and OOM/Java heap dump conditions Key: KYLIN-3122 URL: https://issues.apache.org/jira/browse/KYLIN-3122 Project: Kylin Issue Type: Bug Components: Storage - HBase Affects Versions: v2.2.0 Environment: HDP 2.5.6, Kylin 2.2.0 Reporter: Vsevolod Ostapenko Assignee: hongbin ma Current algorithm of cube segment elimination seems to be rather inefficient. We are using a model where cubes are partitioned by date and time: "partition_desc": { "partition_date_column": "A_VL_HOURLY_V.THEDATE", "partition_time_column": "A_VL_HOURLY_V.THEHOUR", "partition_date_start": 0, "partition_date_format": "MMdd", "partition_time_format": "HH", "partition_type": "APPEND", "partition_condition_builder": "org.apache.kylin.metadata.model.PartitionDesc$DefaultPartitionConditionBuilder" }, Cubes contain partitions for multiple days and 24 hours for each day. Each cube segment corresponds to just one hour. When a query is issued where both date and hour are specified using equality condition (e.g. thedate = '20171011' and thehour = '00') Kylin sequentially integrates over all the segment cubes (hundreds of them) only to skip all except for the one that needs to be scanned (which can be observed by looking in the logs). The expectation is that Kylin would use existing info on the partitioning columns (date and time) and known hierarchical relations between date and time to locate required partition much more efficiently that linear scan through all the cube partitions. Now, if filtering condition is on the range of hours, behavior of the partition pruning and scanning becomes not very logical, which suggest bugs in the logic. If condition is on specific date and closed-open range of hours (e.g. thedate = '20171011' and thehour >= '10' and thehour < '11'), in addition to sequentially scanning all the cube partitions (as described above), Kylin will scan HBase regions for all the hours from the starting hour and till the last hour of the day (e.g. from hour 10 to 24). As the result query will run much longer that necessary, and might run out of memory. If condition is on specific date by hour interval is specified as open-closed (e.g. thedate = '20171011' and thehour > '10' and thehour <= '11'), Kylin will scan all HBase regions for all the later dates and hours (e.g. from hour 10 and till the most recent hour on the most recent day). As the result query execution will dramatically increase and in most cases Kylin server will be terminated with OOM error and JVM heap dump. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KYLIN-3121) NPE while executing a query with two left outer joins and floating point expressions on nullable fields
[ https://issues.apache.org/jira/browse/KYLIN-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vsevolod Ostapenko updated KYLIN-3121: -- Description: Queries that include two (or more) left outer joins and contain floating point expressions that operate on the fields that contain integer NULL values (due to left outer join) fail in-flight with NullPointerExceptions. As an example, the following query generates NPE on either of the two expressions: * 100.0 * t2.media_gap_call_count * 1.0 * NULLIF(t1.active_call_count, 0) with t1 as ( select d1.cell_name, count(distinct a1.call_id) as active_call_count from zetticsdw.a_vl_hourly_v a1 inner join zetticsdw.d_cell_v d1 on a1.cell_key = d1.cell_key where d1.region_3 = 'Mumbai' and a1.thedate = '20171011' and a1.thehour = '00' and a1.active_call_flg = 1 group by d1.cell_name ), t2 as ( select d1.cell_name, count(distinct a1.call_id) as media_gap_call_count from zetticsdw.a_vl_hourly_v a1 inner join zetticsdw.d_cell_v d1 on a1.cell_key = d1.cell_key where d1.region_3 = 'Mumbai' and a1.thedate='20171011' and a1.thehour = '00' and a1.media_gap_call_flg = 1 group by d1.cell_name ) , t3 as ( select d1.cell_name, sum(a1.ow_call_flg) one_way_call_count, sum(a1.succ_call_flg) successfull_call_count from zetticsdw.a_vl_hourly_v a1 inner join zetticsdw.d_cell_v d1 on a1.cell_key = d1.cell_key where d1.region_3 = 'Mumbai' and a1.thedate='20171011' and a1.thehour = '00' group by d1.cell_name ) select t3.cell_name, t1.active_call_count, t2.media_gap_call_count, t3.one_way_call_count, t3.successfull_call_count, -- 100 * t2.media_gap_call_count nom, -- works -- 1 * NULLIF(t1.active_call_count, 0) denom-- works 100.0 * t2.media_gap_call_count nom, -- fails, NPE of one kind 1.0 * NULLIF(t1.active_call_count, 0) denom -- fails, NPE of different kind -- 100.0 * COALESCE(t2.media_gap_call_count, 0) nom,-- works -- 1.0 * CAST(NULLIF(t1.active_call_count, 0) as DOUBLE) denom -- works from t3 left outer join t1 on t3.cell_name = t1.cell_name left outer join t2 on t3.cell_name = t2.cell_name In the first case (multiplication of an integer field with a NULL value and a double) kylin log contains a stack trace similar to the following: null at org.apache.calcite.avatica.Helper.createException(Helper.java:56) at org.apache.calcite.avatica.Helper.createException(Helper.java:41) at org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:156) at org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:218) at org.apache.kylin.rest.service.QueryService.execute(QueryService.java:834) at org.apache.kylin.rest.service.QueryService.queryWithSqlMassage(QueryService.java:561) at org.apache.kylin.rest.service.QueryService.query(QueryService.java:181) at org.apache.kylin.rest.service.QueryService.doQueryWithCache(QueryService.java:415) at org.apache.kylin.rest.controller.QueryController.query(QueryController.java:78) at sun.reflect.GeneratedMethodAccessor545.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:205) at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:133) at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:97) at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:827) at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:738) at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:85) at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:967) at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:901) at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:970) at org.springfr
[jira] [Created] (KYLIN-3121) NPE while executing a query with two left outer joins and floating point expressions on nullable fields
Vsevolod Ostapenko created KYLIN-3121: - Summary: NPE while executing a query with two left outer joins and floating point expressions on nullable fields Key: KYLIN-3121 URL: https://issues.apache.org/jira/browse/KYLIN-3121 Project: Kylin Issue Type: Bug Components: Query Engine Affects Versions: v2.2.0 Environment: HDP 2.5.6, Kylin 2.2.0 Reporter: Vsevolod Ostapenko Assignee: liyang Queries that include two (or more) left outer joins and contain floating point expressions that operate on the fields that contain integer NULL values (due to left outer join) fail in-flight with NullPointerExceptions. As an example, the following query generates NPE on either of the two expressions: * 100.0 * t2.media_gap_call_count * 1.0 * NULLIF(t1.active_call_count, 0) {{with t1 as ( select d1.cell_name, count(distinct a1.call_id) as active_call_count from zetticsdw.a_vl_hourly_v a1 inner join zetticsdw.d_cell_v d1 on a1.cell_key = d1.cell_key where d1.region_3 = 'Mumbai' and a1.thedate = '20171011' and a1.thehour = '00' and a1.active_call_flg = 1 group by d1.cell_name ), t2 as ( select d1.cell_name, count(distinct a1.call_id) as media_gap_call_count from zetticsdw.a_vl_hourly_v a1 inner join zetticsdw.d_cell_v d1 on a1.cell_key = d1.cell_key where d1.region_3 = 'Mumbai' and a1.thedate='20171011' and a1.thehour = '00' and a1.media_gap_call_flg = 1 group by d1.cell_name ) , t3 as ( select d1.cell_name, sum(a1.ow_call_flg) one_way_call_count, sum(a1.succ_call_flg) successfull_call_count from zetticsdw.a_vl_hourly_v a1 inner join zetticsdw.d_cell_v d1 on a1.cell_key = d1.cell_key where d1.region_3 = 'Mumbai' and a1.thedate='20171011' and a1.thehour = '00' group by d1.cell_name ) select t3.cell_name, t1.active_call_count, t2.media_gap_call_count, t3.one_way_call_count, t3.successfull_call_count, -- 100 * t2.media_gap_call_count nom, -- works -- 1 * NULLIF(t1.active_call_count, 0) denom-- works 100.0 * t2.media_gap_call_count nom, -- fails, NPE of one kind 1.0 * NULLIF(t1.active_call_count, 0) denom -- fails, NPE of different kind -- 100.0 * COALESCE(t2.media_gap_call_count, 0) nom,-- works -- 1.0 * CAST(NULLIF(t1.active_call_count, 0) as DOUBLE) denom -- works from t3 left outer join t1 on t3.cell_name = t1.cell_name left outer join t2 on t3.cell_name = t2.cell_name}} In the first (multiplication of integer NULL and a double) kylin log contains a stack trace similar to the following: null at org.apache.calcite.avatica.Helper.createException(Helper.java:56) at org.apache.calcite.avatica.Helper.createException(Helper.java:41) at org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:156) at org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:218) at org.apache.kylin.rest.service.QueryService.execute(QueryService.java:834) at org.apache.kylin.rest.service.QueryService.queryWithSqlMassage(QueryService.java:561) at org.apache.kylin.rest.service.QueryService.query(QueryService.java:181) at org.apache.kylin.rest.service.QueryService.doQueryWithCache(QueryService.java:415) at org.apache.kylin.rest.controller.QueryController.query(QueryController.java:78) at sun.reflect.GeneratedMethodAccessor545.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:205) at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:133) at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:97) at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:827) at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:738) at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:85) at org.springframe
[jira] [Commented] (KYLIN-3114) Make timeout for the queries submitted through the Web UI configurable
[ https://issues.apache.org/jira/browse/KYLIN-3114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16297537#comment-16297537 ] Vsevolod Ostapenko commented on KYLIN-3114: --- I modified kylinConfig.isInitialized() method to use angular.isString(), which is a more appropriate check. Updated patch is attached. > Make timeout for the queries submitted through the Web UI configurable > -- > > Key: KYLIN-3114 > URL: https://issues.apache.org/jira/browse/KYLIN-3114 > Project: Kylin > Issue Type: Bug > Components: Web >Affects Versions: v2.2.0 > Environment: HDP 2.5.6, Kylin 2.2.0 >Reporter: Vsevolod Ostapenko >Assignee: Vsevolod Ostapenko >Priority: Minor > Fix For: v2.3.0 > > Attachments: KYLIN-3114.master.002.patch > > Original Estimate: 48h > Remaining Estimate: 48h > > Currently query.js hard codes timeout for the queries submitted via Web UI to > be 300_000 milliseconds. > Depending on the situation, the default value can be either too large, or too > small, especially when query does not hit any cube and is passed through to > Hive or Impala. > Query timeout should be made configurable via kylin.properties. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KYLIN-3114) Make timeout for the queries submitted through the Web UI configurable
[ https://issues.apache.org/jira/browse/KYLIN-3114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vsevolod Ostapenko updated KYLIN-3114: -- Attachment: (was: KYLIN-3114.master.001.patch) > Make timeout for the queries submitted through the Web UI configurable > -- > > Key: KYLIN-3114 > URL: https://issues.apache.org/jira/browse/KYLIN-3114 > Project: Kylin > Issue Type: Bug > Components: Web >Affects Versions: v2.2.0 > Environment: HDP 2.5.6, Kylin 2.2.0 >Reporter: Vsevolod Ostapenko >Assignee: Vsevolod Ostapenko >Priority: Minor > Fix For: v2.3.0 > > Attachments: KYLIN-3114.master.002.patch > > Original Estimate: 48h > Remaining Estimate: 48h > > Currently query.js hard codes timeout for the queries submitted via Web UI to > be 300_000 milliseconds. > Depending on the situation, the default value can be either too large, or too > small, especially when query does not hit any cube and is passed through to > Hive or Impala. > Query timeout should be made configurable via kylin.properties. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KYLIN-3114) Make timeout for the queries submitted through the Web UI configurable
[ https://issues.apache.org/jira/browse/KYLIN-3114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vsevolod Ostapenko updated KYLIN-3114: -- Attachment: KYLIN-3114.master.002.patch > Make timeout for the queries submitted through the Web UI configurable > -- > > Key: KYLIN-3114 > URL: https://issues.apache.org/jira/browse/KYLIN-3114 > Project: Kylin > Issue Type: Bug > Components: Web >Affects Versions: v2.2.0 > Environment: HDP 2.5.6, Kylin 2.2.0 >Reporter: Vsevolod Ostapenko >Assignee: Vsevolod Ostapenko >Priority: Minor > Fix For: v2.3.0 > > Attachments: KYLIN-3114.master.001.patch, KYLIN-3114.master.002.patch > > Original Estimate: 48h > Remaining Estimate: 48h > > Currently query.js hard codes timeout for the queries submitted via Web UI to > be 300_000 milliseconds. > Depending on the situation, the default value can be either too large, or too > small, especially when query does not hit any cube and is passed through to > Hive or Impala. > Query timeout should be made configurable via kylin.properties. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (KYLIN-3114) Make timeout for the queries submitted through the Web UI configurable
[ https://issues.apache.org/jira/browse/KYLIN-3114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16297455#comment-16297455 ] Vsevolod Ostapenko edited comment on KYLIN-3114 at 12/19/17 9:31 PM: - Hi [~Shaofengshi], They are completely different things. kylin.web.query-timeout is used to set "timeout" property on the REST API query action of AngularJS QueryService controller (https://docs.angularjs.org/api/ngResource/service/$resource). This timeout is enforced by AngularJS framework. It's measured in milliseconds. Up to this point is was hardcoded to be 300_000 milliseconds (5 minutes). kylin.query.timeout-seconds - despite its name is not a query timeout at all, but a "soft" limit on for how long query results can be fetched from a storage provider. It's measured in seconds, and it's enforced in SequentialCubeTupleIterator.java (btw, check only happens on the .next() iterator call, so technically query may never return and this limit will never be enforced). It defaults to 0 (zero), which indicates that there is no time limit (technically it's Integer.MAX_VALUE/1000 seconds). Just to summarize, those settings are completely different and apply to different parts of Kylin. Mine is for the Web UI, and the other one is for the Kylin back-end. was (Author: seva_ostapenko): They are completely different things. kylin.web.query-timeout is used to set "timeout" property on the REST API query action of AngularJS QueryService controller (https://docs.angularjs.org/api/ngResource/service/$resource). This timeout is enforced by AngularJS framework. It's measured in milliseconds. Up to this point is was hardcoded to be 300_000 milliseconds (5 minutes). kylin.query.timeout-seconds - despite its name is not a query timeout at all, but a "soft" limit on for how long query results can be fetched from a storage provider. It's measured in seconds, and it's enforced in SequentialCubeTupleIterator.java (btw, check only happens on the .next() iterator call, so technically query may never return and this limit will never be enforced). It defaults to 0 (zero), which indicates that there is no time limit (technically it's Integer.MAX_VALUE/1000 seconds). Just to summarize, those settings are completely different and apply to different parts of Kylin. Mine is for the Web UI, and the other one is for the Kylin back-end. > Make timeout for the queries submitted through the Web UI configurable > -- > > Key: KYLIN-3114 > URL: https://issues.apache.org/jira/browse/KYLIN-3114 > Project: Kylin > Issue Type: Bug > Components: Web >Affects Versions: v2.2.0 > Environment: HDP 2.5.6, Kylin 2.2.0 >Reporter: Vsevolod Ostapenko >Assignee: Vsevolod Ostapenko >Priority: Minor > Fix For: v2.3.0 > > Attachments: KYLIN-3114.master.001.patch > > Original Estimate: 48h > Remaining Estimate: 48h > > Currently query.js hard codes timeout for the queries submitted via Web UI to > be 300_000 milliseconds. > Depending on the situation, the default value can be either too large, or too > small, especially when query does not hit any cube and is passed through to > Hive or Impala. > Query timeout should be made configurable via kylin.properties. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KYLIN-3114) Make timeout for the queries submitted through the Web UI configurable
[ https://issues.apache.org/jira/browse/KYLIN-3114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16297455#comment-16297455 ] Vsevolod Ostapenko commented on KYLIN-3114: --- They are completely different things. kylin.web.query-timeout is used to set "timeout" property on the REST API query action of AngularJS QueryService controller (https://docs.angularjs.org/api/ngResource/service/$resource). This timeout is enforced by AngularJS framework. It's measured in milliseconds. Up to this point is was hardcoded to be 300_000 milliseconds (5 minutes). kylin.query.timeout-seconds - despite its name is not a query timeout at all, but a "soft" limit on for how long query results can be fetched from a storage provider. It's measured in seconds, and it's enforced in SequentialCubeTupleIterator.java (btw, check only happens on the .next() iterator call, so technically query may never return and this limit will never be enforced). It defaults to 0 (zero), which indicates that there is no time limit (technically it's Integer.MAX_VALUE/1000 seconds). Just to summarize, those settings are completely different and apply to different parts of Kylin. Mine is for the Web UI, and the other one is for the Kylin back-end. > Make timeout for the queries submitted through the Web UI configurable > -- > > Key: KYLIN-3114 > URL: https://issues.apache.org/jira/browse/KYLIN-3114 > Project: Kylin > Issue Type: Bug > Components: Web >Affects Versions: v2.2.0 > Environment: HDP 2.5.6, Kylin 2.2.0 >Reporter: Vsevolod Ostapenko >Assignee: Vsevolod Ostapenko >Priority: Minor > Fix For: v2.3.0 > > Attachments: KYLIN-3114.master.001.patch > > Original Estimate: 48h > Remaining Estimate: 48h > > Currently query.js hard codes timeout for the queries submitted via Web UI to > be 300_000 milliseconds. > Depending on the situation, the default value can be either too large, or too > small, especially when query does not hit any cube and is passed through to > Hive or Impala. > Query timeout should be made configurable via kylin.properties. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KYLIN-3104) When the user log out from "Monitor" page, an alert dialog will pop up warning "Failed to load query."
[ https://issues.apache.org/jira/browse/KYLIN-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16295769#comment-16295769 ] Vsevolod Ostapenko commented on KYLIN-3104: --- I would also suggest changing the error message to something like "Failed to retrieve information about a slow-running query" as the current message is too generic. Btw, the fix looks good to me, as I did exactly the same code change on my local copy of 2.2.x to get around this annoyance. > When the user log out from "Monitor" page, an alert dialog will pop up > warning "Failed to load query." > -- > > Key: KYLIN-3104 > URL: https://issues.apache.org/jira/browse/KYLIN-3104 > Project: Kylin > Issue Type: Bug > Components: General, Web >Affects Versions: v2.3.0 >Reporter: peng.jianhua >Assignee: peng.jianhua > Attachments: > 0001-KYLIN-3104-When-the-user-log-out-from-Monitor-page-a.patch, > alert_dialog_will_pop_up_when_log_out_from_Monitor_page.PNG > > > When the user log out from "Monitor" page, an alert dialog will pop up > warning "Failed to load query." -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Issue Comment Deleted] (KYLIN-3104) When the user log out from "Monitor" page, an alert dialog will pop up warning "Failed to load query."
[ https://issues.apache.org/jira/browse/KYLIN-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vsevolod Ostapenko updated KYLIN-3104: -- Comment: was deleted (was: I would also suggest changing the error message to something like "Failed to retrieve information about a slow-running query" as the current message is too generic. Btw, the fix looks good to me, as I did exactly the same code change on my local copy of 2.2.x to get around this annoyance. ) > When the user log out from "Monitor" page, an alert dialog will pop up > warning "Failed to load query." > -- > > Key: KYLIN-3104 > URL: https://issues.apache.org/jira/browse/KYLIN-3104 > Project: Kylin > Issue Type: Bug > Components: General, Web >Affects Versions: v2.3.0 >Reporter: peng.jianhua >Assignee: peng.jianhua > Attachments: > 0001-KYLIN-3104-When-the-user-log-out-from-Monitor-page-a.patch, > alert_dialog_will_pop_up_when_log_out_from_Monitor_page.PNG > > > When the user log out from "Monitor" page, an alert dialog will pop up > warning "Failed to load query." -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KYLIN-3104) When the user log out from "Monitor" page, an alert dialog will pop up warning "Failed to load query."
[ https://issues.apache.org/jira/browse/KYLIN-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16295766#comment-16295766 ] Vsevolod Ostapenko commented on KYLIN-3104: --- I would also suggest changing the error message to something like "Failed to retrieve information about a slow-running query" as the current message is too generic. Btw, the fix looks good to me, as I did exactly the same code change on my local copy of 2.2.x to get around this annoyance. > When the user log out from "Monitor" page, an alert dialog will pop up > warning "Failed to load query." > -- > > Key: KYLIN-3104 > URL: https://issues.apache.org/jira/browse/KYLIN-3104 > Project: Kylin > Issue Type: Bug > Components: General, Web >Affects Versions: v2.3.0 >Reporter: peng.jianhua >Assignee: peng.jianhua > Attachments: > 0001-KYLIN-3104-When-the-user-log-out-from-Monitor-page-a.patch, > alert_dialog_will_pop_up_when_log_out_from_Monitor_page.PNG > > > When the user log out from "Monitor" page, an alert dialog will pop up > warning "Failed to load query." -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (KYLIN-3114) Make timeout for the queries submitted through the Web UI configurable
[ https://issues.apache.org/jira/browse/KYLIN-3114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16295202#comment-16295202 ] Vsevolod Ostapenko edited comment on KYLIN-3114 at 12/18/17 4:26 PM: - [~Shaofengshi], could you please look at the changes or perhaps ask the right person to do that and provide feedback? Thanks in advance, Vsevolod. was (Author: seva_ostapenko): [~Shaofengshi], could you please look at the changes or perhaps the right person to do that and provide feedback? Thanks in advance, Vsevolod. > Make timeout for the queries submitted through the Web UI configurable > -- > > Key: KYLIN-3114 > URL: https://issues.apache.org/jira/browse/KYLIN-3114 > Project: Kylin > Issue Type: Bug > Components: Web >Affects Versions: v2.2.0 > Environment: HDP 2.5.6, Kylin 2.2.0 >Reporter: Vsevolod Ostapenko >Assignee: Vsevolod Ostapenko >Priority: Minor > Fix For: v2.3.0 > > Attachments: KYLIN-3114.master.001.patch > > Original Estimate: 48h > Remaining Estimate: 48h > > Currently query.js hard codes timeout for the queries submitted via Web UI to > be 300_000 milliseconds. > Depending on the situation, the default value can be either too large, or too > small, especially when query does not hit any cube and is passed through to > Hive or Impala. > Query timeout should be made configurable via kylin.properties. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KYLIN-3114) Make timeout for the queries submitted through the Web UI configurable
[ https://issues.apache.org/jira/browse/KYLIN-3114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16295202#comment-16295202 ] Vsevolod Ostapenko commented on KYLIN-3114: --- [~Shaofengshi], could you please look at the changes or perhaps the right person to do that and provide feedback? Thanks in advance, Vsevolod. > Make timeout for the queries submitted through the Web UI configurable > -- > > Key: KYLIN-3114 > URL: https://issues.apache.org/jira/browse/KYLIN-3114 > Project: Kylin > Issue Type: Bug > Components: Web >Affects Versions: v2.2.0 > Environment: HDP 2.5.6, Kylin 2.2.0 >Reporter: Vsevolod Ostapenko >Assignee: Vsevolod Ostapenko >Priority: Minor > Fix For: v2.3.0 > > Attachments: KYLIN-3114.master.001.patch > > Original Estimate: 48h > Remaining Estimate: 48h > > Currently query.js hard codes timeout for the queries submitted via Web UI to > be 300_000 milliseconds. > Depending on the situation, the default value can be either too large, or too > small, especially when query does not hit any cube and is passed through to > Hive or Impala. > Query timeout should be made configurable via kylin.properties. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KYLIN-3069) Add proper time zone support to the WebUI instead of GMT/PST kludge
[ https://issues.apache.org/jira/browse/KYLIN-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16293357#comment-16293357 ] Vsevolod Ostapenko commented on KYLIN-3069: --- [~peng.jianhua], I believe that instead of using time = moment(item).tz(timezone).format(format) + " (" + timezone + ")"; it should be time = moment(item).tz(timezone).format(format + " z"); or formats should include short time zone name element, e.g. format = "-MM-DD HH:mm:ss z"; > Add proper time zone support to the WebUI instead of GMT/PST kludge > --- > > Key: KYLIN-3069 > URL: https://issues.apache.org/jira/browse/KYLIN-3069 > Project: Kylin > Issue Type: Bug > Components: Web >Affects Versions: v2.2.0 > Environment: HDP 2.5.3, Kylin 2.2.0 >Reporter: Vsevolod Ostapenko >Assignee: peng.jianhua >Priority: Minor > Attachments: > 0001-KYLIN-3069-Add-proper-time-zone-support-to-the-WebUI.patch, Screen Shot > 2017-12-05 at 10.01.39 PM.png, kylin_pic1.png, kylin_pic2.png, kylin_pic3.png > > Original Estimate: 168h > Remaining Estimate: 168h > > Time zone handling logic in the WebUI is a kludge, coded to parse only > "GMT-N" time zone specifications and defaulting to PST, if parsing is not > successful (kylin/webapp/app/js/filters/filter.js) > Integrating moment and moment time zone (http://momentjs.com/timezone/docs/) > into the product, would allow correct time zone handling. > For the users who happen to reside in the geographical locations that do > observe day light savings time, usage of GMT-N format is very inconvenient > and info reported by the UI in various places is perplexing. > Needless to say that the GMT moniker itself is long deprecated. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KYLIN-3114) Make timeout for the queries submitted through the Web UI configurable
[ https://issues.apache.org/jira/browse/KYLIN-3114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16293300#comment-16293300 ] Vsevolod Ostapenko commented on KYLIN-3114: --- I attached the patch for the proposed enhancement. Tested it internally and it seem to work as expected. Properties will be reloaded by QueryService only if kylinConfig has not been yet initialized (use case for that is when a user hits refresh while on the "Insights" tab or navigates directly to :7070/kylin/query URL in their browser). Please review and provide feedback. > Make timeout for the queries submitted through the Web UI configurable > -- > > Key: KYLIN-3114 > URL: https://issues.apache.org/jira/browse/KYLIN-3114 > Project: Kylin > Issue Type: Bug > Components: Web >Affects Versions: v2.2.0 > Environment: HDP 2.5.6, Kylin 2.2.0 >Reporter: Vsevolod Ostapenko >Assignee: Vsevolod Ostapenko >Priority: Minor > Fix For: v2.3.0 > > Attachments: KYLIN-3114.master.001.patch > > Original Estimate: 48h > Remaining Estimate: 48h > > Currently query.js hard codes timeout for the queries submitted via Web UI to > be 300_000 milliseconds. > Depending on the situation, the default value can be either too large, or too > small, especially when query does not hit any cube and is passed through to > Hive or Impala. > Query timeout should be made configurable via kylin.properties. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KYLIN-3114) Make timeout for the queries submitted through the Web UI configurable
[ https://issues.apache.org/jira/browse/KYLIN-3114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vsevolod Ostapenko updated KYLIN-3114: -- Attachment: KYLIN-3114.master.001.patch > Make timeout for the queries submitted through the Web UI configurable > -- > > Key: KYLIN-3114 > URL: https://issues.apache.org/jira/browse/KYLIN-3114 > Project: Kylin > Issue Type: Bug > Components: Web >Affects Versions: v2.2.0 > Environment: HDP 2.5.6, Kylin 2.2.0 >Reporter: Vsevolod Ostapenko >Assignee: Vsevolod Ostapenko >Priority: Minor > Fix For: v2.3.0 > > Attachments: KYLIN-3114.master.001.patch > > Original Estimate: 48h > Remaining Estimate: 48h > > Currently query.js hard codes timeout for the queries submitted via Web UI to > be 300_000 milliseconds. > Depending on the situation, the default value can be either too large, or too > small, especially when query does not hit any cube and is passed through to > Hive or Impala. > Query timeout should be made configurable via kylin.properties. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KYLIN-3114) Make timeout for the queries submitted through the Web UI configurable
Vsevolod Ostapenko created KYLIN-3114: - Summary: Make timeout for the queries submitted through the Web UI configurable Key: KYLIN-3114 URL: https://issues.apache.org/jira/browse/KYLIN-3114 Project: Kylin Issue Type: Bug Components: Web Affects Versions: v2.2.0 Environment: HDP 2.5.6, Kylin 2.2.0 Reporter: Vsevolod Ostapenko Assignee: Vsevolod Ostapenko Priority: Minor Fix For: v2.3.0 Currently query.js hard codes timeout for the queries submitted via Web UI to be 300_000 milliseconds. Depending on the situation, the default value can be either too large, or too small, especially when query does not hit any cube and is passed through to Hive or Impala. Query timeout should be made configurable via kylin.properties. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KYLIN-3070) Add a config property for flat table storage format
[ https://issues.apache.org/jira/browse/KYLIN-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16284311#comment-16284311 ] Vsevolod Ostapenko commented on KYLIN-3070: --- [~yimingliu] or [~Shaofengshi], could one of you guys review my changes and provide feedback or, if the changes are ok, commit them into the master? > Add a config property for flat table storage format > --- > > Key: KYLIN-3070 > URL: https://issues.apache.org/jira/browse/KYLIN-3070 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Affects Versions: v2.2.0 > Environment: HDP 2.5.6, Kylin 2.2.0 >Reporter: Vsevolod Ostapenko >Assignee: Vsevolod Ostapenko >Priority: Minor > Labels: newbie > Attachments: KYLIN-3070.master.001.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > Flat table storage format is currently hard-coded as SEQUENCEFILE in the > core-job/src/main/java/org/apache/kylin/job/JoinedFlatTable.java > That prevents using Impala as a SQL engine while using beeline CLI (via > custom JDBC URL), as Impala cannot write sequence files. > Adding a parameter to kylin.properties to override the default setting would > address the issue. > Removing a hard-coded value for storage format might be good idea in and on > itself. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KYLIN-3084) File not found Exception when processing union-all in TEZ mode
[ https://issues.apache.org/jira/browse/KYLIN-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16282609#comment-16282609 ] Vsevolod Ostapenko commented on KYLIN-3084: --- It's a tez "feature". In order to instruct tez to coalesce the results from multiple parallel writers and prevent it from writing into table storage subfolders set hive.merge.tezfiles to true. > File not found Exception when processing union-all in TEZ mode > -- > > Key: KYLIN-3084 > URL: https://issues.apache.org/jira/browse/KYLIN-3084 > Project: Kylin > Issue Type: Bug >Reporter: Wang Cheng >Assignee: Wang Cheng >Priority: Minor > > If hive.execution.engine=TEZ and hql contains union all, it causes exception > like: file not found when materializing the view or redistributing flat hive > table. > Here is the reason: > http://grokbase.com/t/hive/user/162r80a2g9/anyway-to-avoid-creating-subdirectories-by-insert-with-union > i.e. "The Tez execution of UNION is entirely parallel & > the task-ids overlaps - so the files created have to have unique names. > But the total counts for "Map 1" and "Map 2" are only available as the job > runs, so they write to different dirs." > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KYLIN-3070) Add a config property for flat table storage format
[ https://issues.apache.org/jira/browse/KYLIN-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16281336#comment-16281336 ] Vsevolod Ostapenko commented on KYLIN-3070: --- Patch file is attached, please review. Let me know, if you have any questions or comments. > Add a config property for flat table storage format > --- > > Key: KYLIN-3070 > URL: https://issues.apache.org/jira/browse/KYLIN-3070 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Affects Versions: v2.2.0 > Environment: HDP 2.5.6, Kylin 2.2.0 >Reporter: Vsevolod Ostapenko >Assignee: Vsevolod Ostapenko >Priority: Minor > Labels: newbie > Attachments: KYLIN-3070.master.001.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > Flat table storage format is currently hard-coded as SEQUENCEFILE in the > core-job/src/main/java/org/apache/kylin/job/JoinedFlatTable.java > That prevents using Impala as a SQL engine while using beeline CLI (via > custom JDBC URL), as Impala cannot write sequence files. > Adding a parameter to kylin.properties to override the default setting would > address the issue. > Removing a hard-coded value for storage format might be good idea in and on > itself. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KYLIN-3070) Add a config property for flat table storage format
[ https://issues.apache.org/jira/browse/KYLIN-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vsevolod Ostapenko updated KYLIN-3070: -- Attachment: KYLIN-3070.master.001.patch > Add a config property for flat table storage format > --- > > Key: KYLIN-3070 > URL: https://issues.apache.org/jira/browse/KYLIN-3070 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Affects Versions: v2.2.0 > Environment: HDP 2.5.6, Kylin 2.2.0 >Reporter: Vsevolod Ostapenko >Assignee: Vsevolod Ostapenko >Priority: Minor > Labels: newbie > Attachments: KYLIN-3070.master.001.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > Flat table storage format is currently hard-coded as SEQUENCEFILE in the > core-job/src/main/java/org/apache/kylin/job/JoinedFlatTable.java > That prevents using Impala as a SQL engine while using beeline CLI (via > custom JDBC URL), as Impala cannot write sequence files. > Adding a parameter to kylin.properties to override the default setting would > address the issue. > Removing a hard-coded value for storage format might be good idea in and on > itself. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KYLIN-3070) Add a config property for flat table storage format
[ https://issues.apache.org/jira/browse/KYLIN-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16281003#comment-16281003 ] Vsevolod Ostapenko commented on KYLIN-3070: --- I made a fix and tested it on my copy of the master branch. My version of the fix introduces two new parameters in the kylin.properties: * kylin.source.hive.flat-table-storage-format, which defaults to SEQUENCEFILE * kylin.source.hive.flat-table-field-delimiter, which defaults to \u001F (Unit separator, the same default field separator that Hive uses) I tested my changes internally and confirmed that they are working as expected. Btw, while making the change I found a problem with existing handling of the TEXTFILE field separators - namely, the value was always fetched from kylin.source.jdbc.field-delimiter (apparently a kludge), which technically has no direct relations to flat table, so introduction of the kylin.source.hive.flat-table-field-delimiter seems warranted. If you don't have changes ready, please reassign this JIRA ticket to me. > Add a config property for flat table storage format > --- > > Key: KYLIN-3070 > URL: https://issues.apache.org/jira/browse/KYLIN-3070 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Affects Versions: v2.2.0 > Environment: HDP 2.5.6, Kylin 2.2.0 >Reporter: Vsevolod Ostapenko >Assignee: Rong H >Priority: Minor > Labels: newbie > Original Estimate: 24h > Remaining Estimate: 24h > > Flat table storage format is currently hard-coded as SEQUENCEFILE in the > core-job/src/main/java/org/apache/kylin/job/JoinedFlatTable.java > That prevents using Impala as a SQL engine while using beeline CLI (via > custom JDBC URL), as Impala cannot write sequence files. > Adding a parameter to kylin.properties to override the default setting would > address the issue. > Removing a hard-coded value for storage format might be good idea in and on > itself. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (KYLIN-3069) Add proper time zone support to the WebUI instead of GMT/PST kludge
[ https://issues.apache.org/jira/browse/KYLIN-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16279656#comment-16279656 ] Vsevolod Ostapenko edited comment on KYLIN-3069 at 12/6/17 4:23 PM: Hi [~peng.jianhua], here is my use case. I have kylin.web.timezone set to America/New_York in my kylie.properties. The time zone is a perfectly valid canonical time zone name. JVM has no issues recognizing it as such. As the result all times formatted in Java on the server have correct short time zone moniker (EST) - note the job names in the attached screenshot. !https://issues.apache.org/jira/secure/attachment/12900799/Screen%20Shot%202017-12-05%20at%2010.01.39%20PM.png! At the same time, since Web UI code does not handle time zone names correctly, UI defaults to using PST when formatting time values - again this can be seen in the same screenshot in the "Last Modified Time" column. My expectation is that when moment/moment time zone are integrated, canonical time zone names will be recognized properly and correct 3-letter time zone abbreviated name would be used while formatting time values. So, when the issue is corrected "Last Modified Time" would show times in EST time zone. I suppose that after reading and checking time zone settings, Web UI should internally carry around an object with the original tz name specified in the kylin.properties. 3-letter abbreviated tz name and tz offset from UTC should be computed for each time value that needs to be rendered on the screen. More over, if time zone name happens to be is incorrect (or not yet supported by moment time zone), instead of defaulting to PST, Web UI code should default to UTC. Also, since GMT has been deprecated, all references to GMT (if any left after integrating support for moment time zone) should be replaced with UTC. was (Author: seva_ostapenko): Hi [~peng.jianhua], here is my use case. I have kylin.web.timezone set to America/New_York in my kylie.properties. The time zone is a perfectly valid canonical time zone name. JVM has no issues recognizing it as such. As the result all times formatted in Java on the server have correct short time zone moniker (EST) - note the job names in the attached screenshot. !https://issues.apache.org/jira/secure/attachment/12900799/Screen%20Shot%202017-12-05%20at%2010.01.39%20PM.png! At the same time, since Web UI code does not handle time zone names correctly, UI defaults to using PST when formatting time values - again this can be seen in the same screenshot in the "Last Modified Time" column. My expectation is that when moment/moment time zone are integrated, canonical time zone names will be recognized properly and correct 3-letter time zone abbreviated name would be used while formatting time values. So, when the issue is corrected "Last Modified Time" would show times in EST time zone. I suppose that after reading and checking time zone settings, Web UI should carry internally carry around an object with at least three attributes - original tz name specified in the kylin.properties, 3-letter abbreviated tz name and tz offset from UTC (the last two retrieved by calling moment time zone functions). More over, if time zone name happens to be is incorrect (or not yet supported by moment time zone), instead of defaulting to PST, Web UI code should default to UTC. Also, since GMT has been deprecated, all references to GMT (if any left after integrating support for moment time zone) should be replaced with UTC. > Add proper time zone support to the WebUI instead of GMT/PST kludge > --- > > Key: KYLIN-3069 > URL: https://issues.apache.org/jira/browse/KYLIN-3069 > Project: Kylin > Issue Type: Bug > Components: Web >Affects Versions: v2.2.0 > Environment: HDP 2.5.3, Kylin 2.2.0 >Reporter: Vsevolod Ostapenko >Assignee: peng.jianhua >Priority: Minor > Attachments: Screen Shot 2017-12-05 at 10.01.39 PM.png > > Original Estimate: 168h > Remaining Estimate: 168h > > Time zone handling logic in the WebUI is a kludge, coded to parse only > "GMT-N" time zone specifications and defaulting to PST, if parsing is not > successful (kylin/webapp/app/js/filters/filter.js) > Integrating moment and moment time zone (http://momentjs.com/timezone/docs/) > into the product, would allow correct time zone handling. > For the users who happen to reside in the geographical locations that do > observe day light savings time, usage of GMT-N format is very inconvenient > and info reported by the UI in various places is perplexing. > Needless to say that the GMT moniker itself is long deprecated. -- This message was sent by Atlassian JIRA (v6.4.14#64029)