[jira] [Commented] (KYLIN-3728) unexpected behavior when do fix holes for steaming cube
[ https://issues.apache.org/jira/browse/KYLIN-3728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728610#comment-16728610 ] Lijun Cao commented on KYLIN-3728: -- I can filled up the hole by using a sample streaming cube. Could you provide more details ? > unexpected behavior when do fix holes for steaming cube > --- > > Key: KYLIN-3728 > URL: https://issues.apache.org/jira/browse/KYLIN-3728 > Project: Kylin > Issue Type: Bug > Components: Job Engine >Affects Versions: v2.5.1 >Reporter: wangxianbin >Priority: Major > Attachments: after new segment ready.png, fix hole finished.png, > fix_holes_kylin.log, in process of fix holes.png > > > after we finished fix holes, exist holes did not been filled up, and sometime > more exist segment will become hole, check fix hole log in attachment. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3731) java.lang.IllegalArgumentException: Unsupported data type array at
[ https://issues.apache.org/jira/browse/KYLIN-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728600#comment-16728600 ] Chao Long commented on KYLIN-3731: -- Sorry, I don't understand what's your meaning. If you use hive view, you may convert array to string in the view, then Kylin will load string type data not array type data, right? If you provide query sql, maybe I can understand well. > java.lang.IllegalArgumentException: Unsupported data type array at > --- > > Key: KYLIN-3731 > URL: https://issues.apache.org/jira/browse/KYLIN-3731 > Project: Kylin > Issue Type: Bug > Components: Job Engine >Affects Versions: v2.5.1 >Reporter: HongBo Dai >Assignee: Chao Long >Priority: Critical > Labels: build > Fix For: v2.5.1 > > Attachments: error of kylin.txt, image-2018-12-20-10-59-04-060.png > > > As kylin was recently upgraded from 2.3 to 2.5.1, its data type of array > metadata was found to be unsupported and the following exception occurred > "java. lang. IllegalArgumentException: Unsupported data type array", are in > kylin2.3 hive data storage array before running this type is no problem, > there is the lead in building a cube when the third step is as follows > "org. apache. kylin. engine. mr. Exception. MapReduceException: no counters > for the job", could you tell me how to solve the problem without changing > data structure situation now? please look up attachment. > !image-2018-12-20-10-59-04-060.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] codecov-io commented on issue #398: Kylin 3597 fix sonar issues
codecov-io commented on issue #398: Kylin 3597 fix sonar issues URL: https://github.com/apache/kylin/pull/398#issuecomment-449808086 # [Codecov](https://codecov.io/gh/apache/kylin/pull/398?src=pr=h1) Report > Merging [#398](https://codecov.io/gh/apache/kylin/pull/398?src=pr=desc) into [master](https://codecov.io/gh/apache/kylin/commit/1af62e46516b7b9a31d3de5a1a7867f9cb51799b?src=pr=desc) will **decrease** coverage by `<.01%`. > The diff coverage is `0%`. [![Impacted file tree graph](https://codecov.io/gh/apache/kylin/pull/398/graphs/tree.svg?width=650=JawVgbgsVo=150=pr)](https://codecov.io/gh/apache/kylin/pull/398?src=pr=tree) ```diff @@ Coverage Diff @@ ## master #398 +/- ## - Coverage 24.39% 24.38% -0.01% + Complexity 4935 4934 -1 Files 1143 1143 Lines 6925969259 Branches 9859 9859 - Hits 1689716892 -5 - Misses5066550668 +3 - Partials 1697 1699 +2 ``` | [Impacted Files](https://codecov.io/gh/apache/kylin/pull/398?src=pr=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...che/kylin/cube/inmemcubing2/InMemCubeBuilder2.java](https://codecov.io/gh/apache/kylin/pull/398/diff?src=pr=tree#diff-Y29yZS1jdWJlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9reWxpbi9jdWJlL2lubWVtY3ViaW5nMi9Jbk1lbUN1YmVCdWlsZGVyMi5qYXZh) | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: | | [...he/kylin/dict/lookup/cache/RocksDBLookupTable.java](https://codecov.io/gh/apache/kylin/pull/398/diff?src=pr=tree#diff-Y29yZS1kaWN0aW9uYXJ5L3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9reWxpbi9kaWN0L2xvb2t1cC9jYWNoZS9Sb2Nrc0RCTG9va3VwVGFibGUuamF2YQ==) | `72.97% <0%> (-5.41%)` | `6% <0%> (-1%)` | | | [...rg/apache/kylin/cube/inmemcubing/MemDiskStore.java](https://codecov.io/gh/apache/kylin/pull/398/diff?src=pr=tree#diff-Y29yZS1jdWJlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9reWxpbi9jdWJlL2lubWVtY3ViaW5nL01lbURpc2tTdG9yZS5qYXZh) | `69.3% <0%> (-0.92%)` | `7% <0%> (ø)` | | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/kylin/pull/398?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/kylin/pull/398?src=pr=footer). Last update [1af62e4...37389bf](https://codecov.io/gh/apache/kylin/pull/398?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] hit-lacus commented on a change in pull request #397: KYLIN-3722 Disable limit push down after join
hit-lacus commented on a change in pull request #397: KYLIN-3722 Disable limit push down after join URL: https://github.com/apache/kylin/pull/397#discussion_r243877021 ## File path: query/src/main/java/org/apache/kylin/query/relnode/OLAPLimitRel.java ## @@ -82,7 +82,8 @@ public void implementOLAP(OLAPImplementor implementor) { // ignore limit after having clause // ignore limit after another limit, e.g. select A, count(*) from (select A,B from fact group by A,B limit 100) limit 10 // ignore limit after outer aggregate, e.g. select count(1) from (select A,B from fact group by A,B ) limit 10 -if (!context.afterHavingClauseFilter && !context.afterLimit && !context.afterOuterAggregate) { +// ignore limit after join +if (!context.afterHavingClauseFilter && !context.afterLimit && !context.afterOuterAggregate && !context.afterJoin) { Review comment: Good point, I am looking for better solution now. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] shaofengshi commented on a change in pull request #397: KYLIN-3722 Disable limit push down after join
shaofengshi commented on a change in pull request #397: KYLIN-3722 Disable limit push down after join URL: https://github.com/apache/kylin/pull/397#discussion_r243876893 ## File path: query/src/main/java/org/apache/kylin/query/relnode/OLAPLimitRel.java ## @@ -82,7 +82,8 @@ public void implementOLAP(OLAPImplementor implementor) { // ignore limit after having clause // ignore limit after another limit, e.g. select A, count(*) from (select A,B from fact group by A,B limit 100) limit 10 // ignore limit after outer aggregate, e.g. select count(1) from (select A,B from fact group by A,B ) limit 10 -if (!context.afterHavingClauseFilter && !context.afterLimit && !context.afterOuterAggregate) { +// ignore limit after join +if (!context.afterHavingClauseFilter && !context.afterLimit && !context.afterOuterAggregate && !context.afterJoin) { Review comment: My concern is, this change may make many normal queries (join with the limit) much inefficient. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (KYLIN-3739) Use table alias rather than table identity for snapshots in CubeSegment, CubeInstance, CubeDesc
[ https://issues.apache.org/jira/browse/KYLIN-3739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728578#comment-16728578 ] Shaofeng SHI commented on KYLIN-3739: - Today the table snapshots are shared across cubes, as the table ID is the same. If change to this way, can the snapshot be shared as before? > Use table alias rather than table identity for snapshots in CubeSegment, > CubeInstance, CubeDesc > --- > > Key: KYLIN-3739 > URL: https://issues.apache.org/jira/browse/KYLIN-3739 > Project: Kylin > Issue Type: Improvement > Components: Metadata >Reporter: Zhong Yanghong >Priority: Major > > In 2.0.0, Kylin introduced table alias in DataModelDesc. Most of the elements > in CubeDesc, CubeInstance & CubeSegment use the table alias rather than the > table identity. However, for the lookup table snapshots, it still uses table > identity. > It's better for us to use only table alias instead of table identity in > CubeDesc, CubeInstance & CubeSegment. If so, it can provide several > advantages: > # For CubeDesc, CubeInstance & CubeSegment, what exposed to them is only the > snowflake model and they don't need to care which real table is used. > # If users want to change the table name in the snowflake model, we can still > keep the table alias unchanged and what we need to change is only the > DataModelDesc. And we don't need to do any change for CubeDesc, CubeInstance > & CubeSegment. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3739) Use table alias rather than table identity for snapshots in CubeSegment, CubeInstance, CubeDesc
[ https://issues.apache.org/jira/browse/KYLIN-3739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728577#comment-16728577 ] Zhong Yanghong commented on KYLIN-3739: --- Hi [~liyang.g...@gmail.com] & [~Shaofengshi], what do you think? > Use table alias rather than table identity for snapshots in CubeSegment, > CubeInstance, CubeDesc > --- > > Key: KYLIN-3739 > URL: https://issues.apache.org/jira/browse/KYLIN-3739 > Project: Kylin > Issue Type: Improvement > Components: Metadata >Reporter: Zhong Yanghong >Priority: Major > > In 2.0.0, Kylin introduced table alias in DataModelDesc. Most of the elements > in CubeDesc, CubeInstance & CubeSegment use the table alias rather than the > table identity. However, for the lookup table snapshots, it still uses table > identity. > It's better for us to use only table alias instead of table identity in > CubeDesc, CubeInstance & CubeSegment. If so, it can provide several > advantages: > # For CubeDesc, CubeInstance & CubeSegment, what exposed to them is only the > snowflake model and they don't need to care which real table is used. > # If users want to change the table name in the snowflake model, we can still > keep the table alias unchanged and what we need to change is only the > DataModelDesc. And we don't need to do any change for CubeDesc, CubeInstance > & CubeSegment. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KYLIN-3559) Use Splitter for splitting String
[ https://issues.apache.org/jira/browse/KYLIN-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wu Bin resolved KYLIN-3559. --- Resolution: Fixed > Use Splitter for splitting String > - > > Key: KYLIN-3559 > URL: https://issues.apache.org/jira/browse/KYLIN-3559 > Project: Kylin > Issue Type: Task >Reporter: Ted Yu >Assignee: Wu Bin >Priority: Major > Fix For: v2.6.0 > > > See http://errorprone.info/bugpattern/StringSplitter for why Splitter is > preferred . -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3738) Edit cube measure may make the decimal type change unexpectly
[ https://issues.apache.org/jira/browse/KYLIN-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728569#comment-16728569 ] ASF GitHub Bot commented on KYLIN-3738: --- shaofengshi commented on pull request #414: KYLIN-3738 Edit cube measure may make the decimal type change unexpectly URL: https://github.com/apache/kylin/pull/414 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Edit cube measure may make the decimal type change unexpectly > - > > Key: KYLIN-3738 > URL: https://issues.apache.org/jira/browse/KYLIN-3738 > Project: Kylin > Issue Type: Bug > Components: Web >Affects Versions: v2.5.2 >Reporter: Pan, Julian >Assignee: Pan, Julian >Priority: Major > > When edit cube's measure and click save, the origin return type maybe changed > from decimal(19,4) to decimal(19), that will cause cube build result not > correct and query result incorrectly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3738) Edit cube measure may make the decimal type change unexpectly
[ https://issues.apache.org/jira/browse/KYLIN-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728568#comment-16728568 ] Shaofeng SHI commented on KYLIN-3738: - Julian, thanks for the confirmation. Please go ahead. > Edit cube measure may make the decimal type change unexpectly > - > > Key: KYLIN-3738 > URL: https://issues.apache.org/jira/browse/KYLIN-3738 > Project: Kylin > Issue Type: Bug > Components: Web >Affects Versions: v2.5.2 >Reporter: Pan, Julian >Assignee: Pan, Julian >Priority: Major > > When edit cube's measure and click save, the origin return type maybe changed > from decimal(19,4) to decimal(19), that will cause cube build result not > correct and query result incorrectly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-2243) TopN memory estimation is inaccurate in some cases
[ https://issues.apache.org/jira/browse/KYLIN-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728571#comment-16728571 ] ASF subversion and git services commented on KYLIN-2243: Commit 1af62e46516b7b9a31d3de5a1a7867f9cb51799b in kylin's branch refs/heads/master from liapan [ https://gitbox.apache.org/repos/asf?p=kylin.git;h=1af62e4 ] KYLIN-3738 Edit cube measure may make the decimal type change unexpectly revert KYLIN-2243 8c0c44b887e2caa21b097c2334f8d21c42462e80 > TopN memory estimation is inaccurate in some cases > -- > > Key: KYLIN-2243 > URL: https://issues.apache.org/jira/browse/KYLIN-2243 > Project: Kylin > Issue Type: Bug >Reporter: Shaofeng SHI >Assignee: Shaofeng SHI >Priority: Major > Fix For: v2.0.0 > > > TopNCounterSerializer.maxLength() and > TopNCounterSerializer.getStorageBytesEstimate() might be inaccurate, > especially when there are multiple "group by" columns in one TopN measure and > some uses long bytes encoding like "fixed_length:16" > The inaccurate estimation may cause memory issue when using in-mem cubing, > and will cause the estimation on final cube size inaccurate. > The root cause is the data type like "top(100)" doesn't have the info of how > long a key can be. So far it uses a default value 4 which is too small when > the encoding is something like "fixed_length:16". The solution is extending > the expression of data type to "top(100, 16)" to indicate that one key can be > 16 bytes long. If the "scale" is absent, use 4 bytes as default. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3738) Edit cube measure may make the decimal type change unexpectly
[ https://issues.apache.org/jira/browse/KYLIN-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728570#comment-16728570 ] ASF subversion and git services commented on KYLIN-3738: Commit 1af62e46516b7b9a31d3de5a1a7867f9cb51799b in kylin's branch refs/heads/master from liapan [ https://gitbox.apache.org/repos/asf?p=kylin.git;h=1af62e4 ] KYLIN-3738 Edit cube measure may make the decimal type change unexpectly revert KYLIN-2243 8c0c44b887e2caa21b097c2334f8d21c42462e80 > Edit cube measure may make the decimal type change unexpectly > - > > Key: KYLIN-3738 > URL: https://issues.apache.org/jira/browse/KYLIN-3738 > Project: Kylin > Issue Type: Bug > Components: Web >Affects Versions: v2.5.2 >Reporter: Pan, Julian >Assignee: Pan, Julian >Priority: Major > > When edit cube's measure and click save, the origin return type maybe changed > from decimal(19,4) to decimal(19), that will cause cube build result not > correct and query result incorrectly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] shaofengshi closed pull request #414: KYLIN-3738 Edit cube measure may make the decimal type change unexpectly
shaofengshi closed pull request #414: KYLIN-3738 Edit cube measure may make the decimal type change unexpectly URL: https://github.com/apache/kylin/pull/414 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/webapp/app/js/controllers/cubeMeasures.js b/webapp/app/js/controllers/cubeMeasures.js index f1821dda87..fb5610f571 100644 --- a/webapp/app/js/controllers/cubeMeasures.js +++ b/webapp/app/js/controllers/cubeMeasures.js @@ -58,7 +58,6 @@ KylinApp.controller('CubeMeasuresCtrl', function ($scope, $modal,MetaModel,cubes $scope.nextParameters = []; $scope.measureParamValueColumn=$scope.getCommonMetricColumns(); $scope.newMeasure = (!!measure)? jQuery.extend(true, {},measure):CubeDescModel.createMeasure(); - $scope.newMeasure.function.returntype=$scope.newMeasure.function.returntype.replace(/\,\d+/,''); if(!!measure && measure.function.parameter.next_parameter){ $scope.nextPara.value = measure.function.parameter.next_parameter.value; } This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (KYLIN-3731) java.lang.IllegalArgumentException: Unsupported data type array at
[ https://issues.apache.org/jira/browse/KYLIN-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728560#comment-16728560 ] HongBo Dai commented on KYLIN-3731: Hi, with complex data types in kylin array column as dimension table query, just can't directly use need to be converted into the hive view, through the link to query the fact table, in the hive view can use function explodes multidimensional arrays can be converted to the form of query, whether in version 2.3 or higher, if again in 2.3 using did not join DataTypeOrder when the class is no problem now in 2.5 x version will have that problem. > java.lang.IllegalArgumentException: Unsupported data type array at > --- > > Key: KYLIN-3731 > URL: https://issues.apache.org/jira/browse/KYLIN-3731 > Project: Kylin > Issue Type: Bug > Components: Job Engine >Affects Versions: v2.5.1 >Reporter: HongBo Dai >Assignee: Chao Long >Priority: Critical > Labels: build > Fix For: v2.5.1 > > Attachments: error of kylin.txt, image-2018-12-20-10-59-04-060.png > > > As kylin was recently upgraded from 2.3 to 2.5.1, its data type of array > metadata was found to be unsupported and the following exception occurred > "java. lang. IllegalArgumentException: Unsupported data type array", are in > kylin2.3 hive data storage array before running this type is no problem, > there is the lead in building a cube when the third step is as follows > "org. apache. kylin. engine. mr. Exception. MapReduceException: no counters > for the job", could you tell me how to solve the problem without changing > data structure situation now? please look up attachment. > !image-2018-12-20-10-59-04-060.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (KYLIN-3731) java.lang.IllegalArgumentException: Unsupported data type array at
[ https://issues.apache.org/jira/browse/KYLIN-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728560#comment-16728560 ] HongBo Dai edited comment on KYLIN-3731 at 12/25/18 1:47 AM: -- Hi, with complex data types in kylin array column as dimension table query, just can't directly use need to be converted into the hive view, through the link to query the fact table, in the hive view can use function explode multidimensional arrays can be converted to the form of query, whether in version 2.3 or higher, if again in 2.3 using did not join DataTypeOrder when the class is no problem now in 2.5 x version will have that problem. was (Author: ville): Hi, with complex data types in kylin array column as dimension table query, just can't directly use need to be converted into the hive view, through the link to query the fact table, in the hive view can use function explodes multidimensional arrays can be converted to the form of query, whether in version 2.3 or higher, if again in 2.3 using did not join DataTypeOrder when the class is no problem now in 2.5 x version will have that problem. > java.lang.IllegalArgumentException: Unsupported data type array at > --- > > Key: KYLIN-3731 > URL: https://issues.apache.org/jira/browse/KYLIN-3731 > Project: Kylin > Issue Type: Bug > Components: Job Engine >Affects Versions: v2.5.1 >Reporter: HongBo Dai >Assignee: Chao Long >Priority: Critical > Labels: build > Fix For: v2.5.1 > > Attachments: error of kylin.txt, image-2018-12-20-10-59-04-060.png > > > As kylin was recently upgraded from 2.3 to 2.5.1, its data type of array > metadata was found to be unsupported and the following exception occurred > "java. lang. IllegalArgumentException: Unsupported data type array", are in > kylin2.3 hive data storage array before running this type is no problem, > there is the lead in building a cube when the third step is as follows > "org. apache. kylin. engine. mr. Exception. MapReduceException: no counters > for the job", could you tell me how to solve the problem without changing > data structure situation now? please look up attachment. > !image-2018-12-20-10-59-04-060.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KYLIN-3540) Improve Mandatory Cuboid Recommendation Algorithm
[ https://issues.apache.org/jira/browse/KYLIN-3540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhong Yanghong resolved KYLIN-3540. --- Resolution: Resolved > Improve Mandatory Cuboid Recommendation Algorithm > - > > Key: KYLIN-3540 > URL: https://issues.apache.org/jira/browse/KYLIN-3540 > Project: Kylin > Issue Type: Improvement >Reporter: Zhong Yanghong >Assignee: Zhong Yanghong >Priority: Major > Fix For: v2.6.0 > > > Previously to add cuboids which are not prebuilt, the cube planner turns to > mandatory cuboids which are selected if its rollup row count is above some > threshold. There are two shortcomings: > * The way to estimate the rollup row count is not good > * It's hard to determine the threshold of rollup row count for recommending > mandatory cuboids > bq. {color:#f79232}The improved way to estimate the rollup row count is as > follows:{color} > Current criteria to recommend mandatory cuboids is based on the average > rollup count collected with query metrics. There's a disadvantage. An example > is as follows: > Cuboid (A,B) has 1000 rows, prebuilt; Cuboid (B) has 10 rows, not prebuilt; > The ground truth for the rollup count from Cuboid (A,B) to Cuboid (B) is > {code} > Cuboid (A,B) - Cuboid (A) = 1000 - 10 = 990 > {code} > Suppose B is evenly composed with A. Then for each value of B with A, the row > count is 1000 * (10/100) = 100. > Now for sql > {code} > select B, count(*) > from T > where B = 'e1' > group by B > {code} > Then the rollup count by current algorithm will be > {code} > Cuboid (A,{'e1'}) - return count = 100 - 1 = 99 > {code} > which is much smaller than 990 due to the influence of lots of filtered row > count. > It's better to calculate the rollup rate first and then multiple the parent > cuboid row count to estimate the rollup count. The refined formula is as > follows: > {code} > Cuboid (A,B) - Cuboid (A,B) * (return count) / Cuboid (A,{'e1'}) = > 1000-1000*1/100 = 990 > {code} > Another sql > {code} > select count(*) > from T > where B in {'e1','e2'} > {code} > The rollup count by current algorithm will be > {code} > Cuboid (A,{'e1','e2'}) - return count = 100*2 - 1 = 199 > {code} > The rollup count by refined algorithm will be > {code} > Cuboid (A,B) - Cuboid (A,B) * (return count) / Cuboid (A,{'e1','e2'}) = > 1000-1000*1/(100*2) = 995 > {code} > Above all, the refined algorithm will be much less influenced by filters in > sql. > bq. {color:#f79232}Don't recommend mandatory cuboids & don't need the > threshold > {color} > Previously the reason to recommend mandatory cuboids is that they are not > prebuilt and their row count statistics are not known, which causes it's not > possible to apply cube planner algorithm for them. Now by the improved way of > estimating rollup row count, we can better estimate the row count statistics > for those cuboids which are not prebuilt. Then the cost-based cube planner > algorithm will decide which cuboid to be built or not and the threshold is > not needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3540) Improve Mandatory Cuboid Recommendation Algorithm
[ https://issues.apache.org/jira/browse/KYLIN-3540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728555#comment-16728555 ] ASF subversion and git services commented on KYLIN-3540: Commit 8cfe32cf4f61d218439e57e134ccc3413aa98f89 in kylin's branch refs/heads/master from kyotoYaho [ https://gitbox.apache.org/repos/asf?p=kylin.git;h=8cfe32c ] KYLIN-3540 move queryService of CubeController to CubeService > Improve Mandatory Cuboid Recommendation Algorithm > - > > Key: KYLIN-3540 > URL: https://issues.apache.org/jira/browse/KYLIN-3540 > Project: Kylin > Issue Type: Improvement >Reporter: Zhong Yanghong >Assignee: Zhong Yanghong >Priority: Major > Fix For: v2.6.0 > > > Previously to add cuboids which are not prebuilt, the cube planner turns to > mandatory cuboids which are selected if its rollup row count is above some > threshold. There are two shortcomings: > * The way to estimate the rollup row count is not good > * It's hard to determine the threshold of rollup row count for recommending > mandatory cuboids > bq. {color:#f79232}The improved way to estimate the rollup row count is as > follows:{color} > Current criteria to recommend mandatory cuboids is based on the average > rollup count collected with query metrics. There's a disadvantage. An example > is as follows: > Cuboid (A,B) has 1000 rows, prebuilt; Cuboid (B) has 10 rows, not prebuilt; > The ground truth for the rollup count from Cuboid (A,B) to Cuboid (B) is > {code} > Cuboid (A,B) - Cuboid (A) = 1000 - 10 = 990 > {code} > Suppose B is evenly composed with A. Then for each value of B with A, the row > count is 1000 * (10/100) = 100. > Now for sql > {code} > select B, count(*) > from T > where B = 'e1' > group by B > {code} > Then the rollup count by current algorithm will be > {code} > Cuboid (A,{'e1'}) - return count = 100 - 1 = 99 > {code} > which is much smaller than 990 due to the influence of lots of filtered row > count. > It's better to calculate the rollup rate first and then multiple the parent > cuboid row count to estimate the rollup count. The refined formula is as > follows: > {code} > Cuboid (A,B) - Cuboid (A,B) * (return count) / Cuboid (A,{'e1'}) = > 1000-1000*1/100 = 990 > {code} > Another sql > {code} > select count(*) > from T > where B in {'e1','e2'} > {code} > The rollup count by current algorithm will be > {code} > Cuboid (A,{'e1','e2'}) - return count = 100*2 - 1 = 199 > {code} > The rollup count by refined algorithm will be > {code} > Cuboid (A,B) - Cuboid (A,B) * (return count) / Cuboid (A,{'e1','e2'}) = > 1000-1000*1/(100*2) = 995 > {code} > Above all, the refined algorithm will be much less influenced by filters in > sql. > bq. {color:#f79232}Don't recommend mandatory cuboids & don't need the > threshold > {color} > Previously the reason to recommend mandatory cuboids is that they are not > prebuilt and their row count statistics are not known, which causes it's not > possible to apply cube planner algorithm for them. Now by the improved way of > estimating rollup row count, we can better estimate the row count statistics > for those cuboids which are not prebuilt. Then the cost-based cube planner > algorithm will decide which cuboid to be built or not and the threshold is > not needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3540) Improve Mandatory Cuboid Recommendation Algorithm
[ https://issues.apache.org/jira/browse/KYLIN-3540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728556#comment-16728556 ] ASF subversion and git services commented on KYLIN-3540: Commit 4db6a37c7220c122cedb9fac1a2c735f10e27226 in kylin's branch refs/heads/master from kyotoYaho [ https://gitbox.apache.org/repos/asf?p=kylin.git;h=4db6a37 ] KYLIN-3540 refactor the interface of querying on SYSTEM project > Improve Mandatory Cuboid Recommendation Algorithm > - > > Key: KYLIN-3540 > URL: https://issues.apache.org/jira/browse/KYLIN-3540 > Project: Kylin > Issue Type: Improvement >Reporter: Zhong Yanghong >Assignee: Zhong Yanghong >Priority: Major > Fix For: v2.6.0 > > > Previously to add cuboids which are not prebuilt, the cube planner turns to > mandatory cuboids which are selected if its rollup row count is above some > threshold. There are two shortcomings: > * The way to estimate the rollup row count is not good > * It's hard to determine the threshold of rollup row count for recommending > mandatory cuboids > bq. {color:#f79232}The improved way to estimate the rollup row count is as > follows:{color} > Current criteria to recommend mandatory cuboids is based on the average > rollup count collected with query metrics. There's a disadvantage. An example > is as follows: > Cuboid (A,B) has 1000 rows, prebuilt; Cuboid (B) has 10 rows, not prebuilt; > The ground truth for the rollup count from Cuboid (A,B) to Cuboid (B) is > {code} > Cuboid (A,B) - Cuboid (A) = 1000 - 10 = 990 > {code} > Suppose B is evenly composed with A. Then for each value of B with A, the row > count is 1000 * (10/100) = 100. > Now for sql > {code} > select B, count(*) > from T > where B = 'e1' > group by B > {code} > Then the rollup count by current algorithm will be > {code} > Cuboid (A,{'e1'}) - return count = 100 - 1 = 99 > {code} > which is much smaller than 990 due to the influence of lots of filtered row > count. > It's better to calculate the rollup rate first and then multiple the parent > cuboid row count to estimate the rollup count. The refined formula is as > follows: > {code} > Cuboid (A,B) - Cuboid (A,B) * (return count) / Cuboid (A,{'e1'}) = > 1000-1000*1/100 = 990 > {code} > Another sql > {code} > select count(*) > from T > where B in {'e1','e2'} > {code} > The rollup count by current algorithm will be > {code} > Cuboid (A,{'e1','e2'}) - return count = 100*2 - 1 = 199 > {code} > The rollup count by refined algorithm will be > {code} > Cuboid (A,B) - Cuboid (A,B) * (return count) / Cuboid (A,{'e1','e2'}) = > 1000-1000*1/(100*2) = 995 > {code} > Above all, the refined algorithm will be much less influenced by filters in > sql. > bq. {color:#f79232}Don't recommend mandatory cuboids & don't need the > threshold > {color} > Previously the reason to recommend mandatory cuboids is that they are not > prebuilt and their row count statistics are not known, which causes it's not > possible to apply cube planner algorithm for them. Now by the improved way of > estimating rollup row count, we can better estimate the row count statistics > for those cuboids which are not prebuilt. Then the cost-based cube planner > algorithm will decide which cuboid to be built or not and the threshold is > not needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3540) Improve Mandatory Cuboid Recommendation Algorithm
[ https://issues.apache.org/jira/browse/KYLIN-3540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728554#comment-16728554 ] ASF GitHub Bot commented on KYLIN-3540: --- kyotoYaho commented on pull request #407: KYLIN-3540 estimate the row counts of source cuboids which are not built & remove mandatory cuboids recommendation URL: https://github.com/apache/kylin/pull/407 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Improve Mandatory Cuboid Recommendation Algorithm > - > > Key: KYLIN-3540 > URL: https://issues.apache.org/jira/browse/KYLIN-3540 > Project: Kylin > Issue Type: Improvement >Reporter: Zhong Yanghong >Assignee: Zhong Yanghong >Priority: Major > Fix For: v2.6.0 > > > Previously to add cuboids which are not prebuilt, the cube planner turns to > mandatory cuboids which are selected if its rollup row count is above some > threshold. There are two shortcomings: > * The way to estimate the rollup row count is not good > * It's hard to determine the threshold of rollup row count for recommending > mandatory cuboids > bq. {color:#f79232}The improved way to estimate the rollup row count is as > follows:{color} > Current criteria to recommend mandatory cuboids is based on the average > rollup count collected with query metrics. There's a disadvantage. An example > is as follows: > Cuboid (A,B) has 1000 rows, prebuilt; Cuboid (B) has 10 rows, not prebuilt; > The ground truth for the rollup count from Cuboid (A,B) to Cuboid (B) is > {code} > Cuboid (A,B) - Cuboid (A) = 1000 - 10 = 990 > {code} > Suppose B is evenly composed with A. Then for each value of B with A, the row > count is 1000 * (10/100) = 100. > Now for sql > {code} > select B, count(*) > from T > where B = 'e1' > group by B > {code} > Then the rollup count by current algorithm will be > {code} > Cuboid (A,{'e1'}) - return count = 100 - 1 = 99 > {code} > which is much smaller than 990 due to the influence of lots of filtered row > count. > It's better to calculate the rollup rate first and then multiple the parent > cuboid row count to estimate the rollup count. The refined formula is as > follows: > {code} > Cuboid (A,B) - Cuboid (A,B) * (return count) / Cuboid (A,{'e1'}) = > 1000-1000*1/100 = 990 > {code} > Another sql > {code} > select count(*) > from T > where B in {'e1','e2'} > {code} > The rollup count by current algorithm will be > {code} > Cuboid (A,{'e1','e2'}) - return count = 100*2 - 1 = 199 > {code} > The rollup count by refined algorithm will be > {code} > Cuboid (A,B) - Cuboid (A,B) * (return count) / Cuboid (A,{'e1','e2'}) = > 1000-1000*1/(100*2) = 995 > {code} > Above all, the refined algorithm will be much less influenced by filters in > sql. > bq. {color:#f79232}Don't recommend mandatory cuboids & don't need the > threshold > {color} > Previously the reason to recommend mandatory cuboids is that they are not > prebuilt and their row count statistics are not known, which causes it's not > possible to apply cube planner algorithm for them. Now by the improved way of > estimating rollup row count, we can better estimate the row count statistics > for those cuboids which are not prebuilt. Then the cost-based cube planner > algorithm will decide which cuboid to be built or not and the threshold is > not needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3540) Improve Mandatory Cuboid Recommendation Algorithm
[ https://issues.apache.org/jira/browse/KYLIN-3540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728557#comment-16728557 ] ASF subversion and git services commented on KYLIN-3540: Commit 4850dacece8a2e90f4333b50e2e8304635a730a2 in kylin's branch refs/heads/master from kyotoYaho [ https://gitbox.apache.org/repos/asf?p=kylin.git;h=4850dac ] KYLIN-3540 estimate the row counts of source cuboids which are not built & remove mandatory cuboids recommendation > Improve Mandatory Cuboid Recommendation Algorithm > - > > Key: KYLIN-3540 > URL: https://issues.apache.org/jira/browse/KYLIN-3540 > Project: Kylin > Issue Type: Improvement >Reporter: Zhong Yanghong >Assignee: Zhong Yanghong >Priority: Major > Fix For: v2.6.0 > > > Previously to add cuboids which are not prebuilt, the cube planner turns to > mandatory cuboids which are selected if its rollup row count is above some > threshold. There are two shortcomings: > * The way to estimate the rollup row count is not good > * It's hard to determine the threshold of rollup row count for recommending > mandatory cuboids > bq. {color:#f79232}The improved way to estimate the rollup row count is as > follows:{color} > Current criteria to recommend mandatory cuboids is based on the average > rollup count collected with query metrics. There's a disadvantage. An example > is as follows: > Cuboid (A,B) has 1000 rows, prebuilt; Cuboid (B) has 10 rows, not prebuilt; > The ground truth for the rollup count from Cuboid (A,B) to Cuboid (B) is > {code} > Cuboid (A,B) - Cuboid (A) = 1000 - 10 = 990 > {code} > Suppose B is evenly composed with A. Then for each value of B with A, the row > count is 1000 * (10/100) = 100. > Now for sql > {code} > select B, count(*) > from T > where B = 'e1' > group by B > {code} > Then the rollup count by current algorithm will be > {code} > Cuboid (A,{'e1'}) - return count = 100 - 1 = 99 > {code} > which is much smaller than 990 due to the influence of lots of filtered row > count. > It's better to calculate the rollup rate first and then multiple the parent > cuboid row count to estimate the rollup count. The refined formula is as > follows: > {code} > Cuboid (A,B) - Cuboid (A,B) * (return count) / Cuboid (A,{'e1'}) = > 1000-1000*1/100 = 990 > {code} > Another sql > {code} > select count(*) > from T > where B in {'e1','e2'} > {code} > The rollup count by current algorithm will be > {code} > Cuboid (A,{'e1','e2'}) - return count = 100*2 - 1 = 199 > {code} > The rollup count by refined algorithm will be > {code} > Cuboid (A,B) - Cuboid (A,B) * (return count) / Cuboid (A,{'e1','e2'}) = > 1000-1000*1/(100*2) = 995 > {code} > Above all, the refined algorithm will be much less influenced by filters in > sql. > bq. {color:#f79232}Don't recommend mandatory cuboids & don't need the > threshold > {color} > Previously the reason to recommend mandatory cuboids is that they are not > prebuilt and their row count statistics are not known, which causes it's not > possible to apply cube planner algorithm for them. Now by the improved way of > estimating rollup row count, we can better estimate the row count statistics > for those cuboids which are not prebuilt. Then the cost-based cube planner > algorithm will decide which cuboid to be built or not and the threshold is > not needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] kyotoYaho closed pull request #407: KYLIN-3540 estimate the row counts of source cuboids which are not built & remove mandatory cuboids recommendation
kyotoYaho closed pull request #407: KYLIN-3540 estimate the row counts of source cuboids which are not built & remove mandatory cuboids recommendation URL: https://github.com/apache/kylin/pull/407 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/core-common/src/main/java/org/apache/kylin/common/KylinConfigBase.java b/core-common/src/main/java/org/apache/kylin/common/KylinConfigBase.java index f67f6b3479..b63062e31a 100644 --- a/core-common/src/main/java/org/apache/kylin/common/KylinConfigBase.java +++ b/core-common/src/main/java/org/apache/kylin/common/KylinConfigBase.java @@ -316,8 +316,7 @@ public String getMetastoreBigCellHdfsDirectory() { public String getReadHdfsWorkingDirectory() { if (StringUtils.isNotEmpty(getHBaseClusterFs())) { Path workingDir = new Path(getHdfsWorkingDirectory()); -return new Path(getHBaseClusterFs(), Path.getPathWithoutSchemeAndAuthority(workingDir)).toString() -+ "/"; +return new Path(getHBaseClusterFs(), Path.getPathWithoutSchemeAndAuthority(workingDir)).toString() + "/"; } return getHdfsWorkingDirectory(); @@ -644,8 +643,12 @@ public int getCubePlannerRecommendCuboidCacheMaxSize() { return Integer.parseInt(getOptional("kylin.cube.cubeplanner.recommend-cache-max-size", "200")); } -public long getCubePlannerMandatoryRollUpThreshold() { -return Long.parseLong(getOptional("kylin.cube.cubeplanner.mandatory-rollup-threshold", "1000")); +public double getCubePlannerQueryUncertaintyRatio() { +return Double.parseDouble(getOptional("kylin.cube.cubeplanner.query-uncertainty-ratio", "0.1")); +} + +public double getCubePlannerBPUSMinBenefitRatio() { +return Double.parseDouble(getOptional("kylin.cube.cubeplanner.bpus-min-benefit-ratio", "0.01")); } public int getCubePlannerAgreedyAlgorithmAutoThreshold() { @@ -1910,12 +1913,13 @@ public boolean isJsonAlwaysSmallCell() { } public int getSmallCellMetadataWarningThreshold() { -return Integer.parseInt(getOptional("kylin.metadata.jdbc.small-cell-meta-size-warning-threshold", -String.valueOf(100 << 20))); //100mb +return Integer.parseInt( + getOptional("kylin.metadata.jdbc.small-cell-meta-size-warning-threshold", String.valueOf(100 << 20))); //100mb } public int getSmallCellMetadataErrorThreshold() { -return Integer.parseInt(getOptional("kylin.metadata.jdbc.small-cell-meta-size-error-threshold", String.valueOf(1 << 30))); // 1gb +return Integer.parseInt( + getOptional("kylin.metadata.jdbc.small-cell-meta-size-error-threshold", String.valueOf(1 << 30))); // 1gb } public int getJdbcResourceStoreMaxCellSize() { diff --git a/core-cube/src/main/java/org/apache/kylin/cube/cuboid/algorithm/BPUSCalculator.java b/core-cube/src/main/java/org/apache/kylin/cube/cuboid/algorithm/BPUSCalculator.java index 6316858d58..39c52dafe9 100755 --- a/core-cube/src/main/java/org/apache/kylin/cube/cuboid/algorithm/BPUSCalculator.java +++ b/core-cube/src/main/java/org/apache/kylin/cube/cuboid/algorithm/BPUSCalculator.java @@ -142,7 +142,7 @@ public boolean ifEfficient(CuboidBenefitModel best) { } public double getMinBenefitRatio() { -return 0.01; +return cuboidStats.getBpusMinBenefitRatio(); } @Override diff --git a/core-cube/src/main/java/org/apache/kylin/cube/cuboid/algorithm/CuboidRecommender.java b/core-cube/src/main/java/org/apache/kylin/cube/cuboid/algorithm/CuboidRecommender.java index baacb51791..0e6a844a95 100644 --- a/core-cube/src/main/java/org/apache/kylin/cube/cuboid/algorithm/CuboidRecommender.java +++ b/core-cube/src/main/java/org/apache/kylin/cube/cuboid/algorithm/CuboidRecommender.java @@ -154,12 +154,11 @@ public static CuboidRecommender getInstance() { Map recommendCuboidsWithStats = Maps.newLinkedHashMap(); for (Long cuboid : recommendCuboidList) { -if (cuboid.equals(cuboidStats.getBaseCuboid())) { -recommendCuboidsWithStats.put(cuboid, cuboidStats.getCuboidCount(cuboid)); -} else if (cuboidStats.getAllCuboidsForSelection().contains(cuboid)) { -recommendCuboidsWithStats.put(cuboid, cuboidStats.getCuboidCount(cuboid)); +if (cuboid == 0L) { +// for zero cuboid, just simply recommend the cheapest cuboid. +handleCuboidZeroRecommend(cuboidStats, recommendCuboidsWithStats); } else { -recommendCuboidsWithStats.put(cuboid, -1L); +recommendCuboidsWithStats.put(cuboid, cuboidStats.getCuboidCount(cuboid)); }
[jira] [Commented] (KYLIN-3738) Edit cube measure may make the decimal type change unexpectly
[ https://issues.apache.org/jira/browse/KYLIN-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728548#comment-16728548 ] Pan, Julian commented on KYLIN-3738: Hi shaofeng, I test in my local: After revert the pervious commit, the topN measure ($scope.newMeasure) is same as before and the decimal issue will be resolved. > Edit cube measure may make the decimal type change unexpectly > - > > Key: KYLIN-3738 > URL: https://issues.apache.org/jira/browse/KYLIN-3738 > Project: Kylin > Issue Type: Bug > Components: Web >Affects Versions: v2.5.2 >Reporter: Pan, Julian >Assignee: Pan, Julian >Priority: Major > > When edit cube's measure and click save, the origin return type maybe changed > from decimal(19,4) to decimal(19), that will cause cube build result not > correct and query result incorrectly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3738) Edit cube measure may make the decimal type change unexpectly
[ https://issues.apache.org/jira/browse/KYLIN-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728400#comment-16728400 ] Shaofeng SHI commented on KYLIN-3738: - hi Julian, thanks for the reporting. Is there drawback if revert the previous commit? Can it be fixed? > Edit cube measure may make the decimal type change unexpectly > - > > Key: KYLIN-3738 > URL: https://issues.apache.org/jira/browse/KYLIN-3738 > Project: Kylin > Issue Type: Bug > Components: Web >Affects Versions: v2.5.2 >Reporter: Pan, Julian >Assignee: Pan, Julian >Priority: Major > > When edit cube's measure and click save, the origin return type maybe changed > from decimal(19,4) to decimal(19), that will cause cube build result not > correct and query result incorrectly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3738) Edit cube measure may make the decimal type change unexpectly
[ https://issues.apache.org/jira/browse/KYLIN-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728334#comment-16728334 ] ASF GitHub Bot commented on KYLIN-3738: --- sanjulian commented on pull request #414: KYLIN-3738 Edit cube measure may make the decimal type change unexpectly URL: https://github.com/apache/kylin/pull/414 revert KYLIN-2243 8c0c44b887e2caa21b097c2334f8d21c42462e80 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Edit cube measure may make the decimal type change unexpectly > - > > Key: KYLIN-3738 > URL: https://issues.apache.org/jira/browse/KYLIN-3738 > Project: Kylin > Issue Type: Bug > Components: Web >Affects Versions: v2.5.2 >Reporter: Pan, Julian >Assignee: Pan, Julian >Priority: Major > > When edit cube's measure and click save, the origin return type maybe changed > from decimal(19,4) to decimal(19), that will cause cube build result not > correct and query result incorrectly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] asfgit commented on issue #414: KYLIN-3738 Edit cube measure may make the decimal type change unexpectly
asfgit commented on issue #414: KYLIN-3738 Edit cube measure may make the decimal type change unexpectly URL: https://github.com/apache/kylin/pull/414#issuecomment-449721083 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sanjulian opened a new pull request #414: KYLIN-3738 Edit cube measure may make the decimal type change unexpectly
sanjulian opened a new pull request #414: KYLIN-3738 Edit cube measure may make the decimal type change unexpectly URL: https://github.com/apache/kylin/pull/414 revert KYLIN-2243 8c0c44b887e2caa21b097c2334f8d21c42462e80 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (KYLIN-3731) java.lang.IllegalArgumentException: Unsupported data type array at
[ https://issues.apache.org/jira/browse/KYLIN-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728330#comment-16728330 ] Chao Long commented on KYLIN-3731: -- Yes, Kylin support loading array type data from hive and can build successfully, but when I query with array type column, I get some error like "String cannot cast to array". So I think Kylin doesn't really support complex data type. So, I want to know how does your query sql like when you are using kylin2.3. > java.lang.IllegalArgumentException: Unsupported data type array at > --- > > Key: KYLIN-3731 > URL: https://issues.apache.org/jira/browse/KYLIN-3731 > Project: Kylin > Issue Type: Bug > Components: Job Engine >Affects Versions: v2.5.1 >Reporter: HongBo Dai >Assignee: Chao Long >Priority: Critical > Labels: build > Fix For: v2.5.1 > > Attachments: error of kylin.txt, image-2018-12-20-10-59-04-060.png > > > As kylin was recently upgraded from 2.3 to 2.5.1, its data type of array > metadata was found to be unsupported and the following exception occurred > "java. lang. IllegalArgumentException: Unsupported data type array", are in > kylin2.3 hive data storage array before running this type is no problem, > there is the lead in building a cube when the third step is as follows > "org. apache. kylin. engine. mr. Exception. MapReduceException: no counters > for the job", could you tell me how to solve the problem without changing > data structure situation now? please look up attachment. > !image-2018-12-20-10-59-04-060.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3738) Edit cube measure may make the decimal type change unexpectly
[ https://issues.apache.org/jira/browse/KYLIN-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728319#comment-16728319 ] Pan, Julian commented on KYLIN-3738: Hi [~Shaofengshi], should we revert the commit? I test in my local, the configuration attribute will cover encoding. > Edit cube measure may make the decimal type change unexpectly > - > > Key: KYLIN-3738 > URL: https://issues.apache.org/jira/browse/KYLIN-3738 > Project: Kylin > Issue Type: Bug > Components: Web >Affects Versions: v2.5.2 >Reporter: Pan, Julian >Assignee: Pan, Julian >Priority: Major > > When edit cube's measure and click save, the origin return type maybe changed > from decimal(19,4) to decimal(19), that will cause cube build result not > correct and query result incorrectly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KYLIN-3738) Edit cube measure may make the decimal type change unexpectly
[ https://issues.apache.org/jira/browse/KYLIN-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pan, Julian updated KYLIN-3738: --- Description: When edit cube's measure and click save, the origin return type maybe changed from decimal(19,4) to decimal(19), that will cause cube build result not correct and query result incorrectly. (was: When edit cube's measure and click save, the origin return type maybe changed from decimal(19,4) to decimal(19), that will cause cube build result not correct and query incorrectly.) > Edit cube measure may make the decimal type change unexpectly > - > > Key: KYLIN-3738 > URL: https://issues.apache.org/jira/browse/KYLIN-3738 > Project: Kylin > Issue Type: Bug > Components: Web >Affects Versions: v2.5.2 >Reporter: Pan, Julian >Assignee: Pan, Julian >Priority: Major > > When edit cube's measure and click save, the origin return type maybe changed > from decimal(19,4) to decimal(19), that will cause cube build result not > correct and query result incorrectly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3731) java.lang.IllegalArgumentException: Unsupported data type array at
[ https://issues.apache.org/jira/browse/KYLIN-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728308#comment-16728308 ] HongBo Dai commented on KYLIN-3731: Hi, mainly hive data storage table is stored inside the fact table is a complex data type is an array, and create a hive view, and kylin dimension table to build inside have to field data cube, so an error directly, I read the class source code should be under the lack of judgment logic, so lead to kylin third step direct error when creating the cube. > java.lang.IllegalArgumentException: Unsupported data type array at > --- > > Key: KYLIN-3731 > URL: https://issues.apache.org/jira/browse/KYLIN-3731 > Project: Kylin > Issue Type: Bug > Components: Job Engine >Affects Versions: v2.5.1 >Reporter: HongBo Dai >Assignee: Chao Long >Priority: Critical > Labels: build > Fix For: v2.5.1 > > Attachments: error of kylin.txt, image-2018-12-20-10-59-04-060.png > > > As kylin was recently upgraded from 2.3 to 2.5.1, its data type of array > metadata was found to be unsupported and the following exception occurred > "java. lang. IllegalArgumentException: Unsupported data type array", are in > kylin2.3 hive data storage array before running this type is no problem, > there is the lead in building a cube when the third step is as follows > "org. apache. kylin. engine. mr. Exception. MapReduceException: no counters > for the job", could you tell me how to solve the problem without changing > data structure situation now? please look up attachment. > !image-2018-12-20-10-59-04-060.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (KYLIN-3731) java.lang.IllegalArgumentException: Unsupported data type array at
[ https://issues.apache.org/jira/browse/KYLIN-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728308#comment-16728308 ] HongBo Dai edited comment on KYLIN-3731 at 12/24/18 10:12 AM: --- Hi, mainly hive data storage table is stored inside the fact table is a complex data type is an array, and create a hive view, and kylin dimension table to build inside have to field data cube, so an error directly, I read the class source code should be under the lack of judgment logic, so lead to kylin third step direct error when creating the cube. kylin itself supports complex data types, and that logic aside, if kylin does not support complex data types, hive data tables cannot be loaded and used in kylin's web interface. was (Author: ville): Hi, mainly hive data storage table is stored inside the fact table is a complex data type is an array, and create a hive view, and kylin dimension table to build inside have to field data cube, so an error directly, I read the class source code should be under the lack of judgment logic, so lead to kylin third step direct error when creating the cube. > java.lang.IllegalArgumentException: Unsupported data type array at > --- > > Key: KYLIN-3731 > URL: https://issues.apache.org/jira/browse/KYLIN-3731 > Project: Kylin > Issue Type: Bug > Components: Job Engine >Affects Versions: v2.5.1 >Reporter: HongBo Dai >Assignee: Chao Long >Priority: Critical > Labels: build > Fix For: v2.5.1 > > Attachments: error of kylin.txt, image-2018-12-20-10-59-04-060.png > > > As kylin was recently upgraded from 2.3 to 2.5.1, its data type of array > metadata was found to be unsupported and the following exception occurred > "java. lang. IllegalArgumentException: Unsupported data type array", are in > kylin2.3 hive data storage array before running this type is no problem, > there is the lead in building a cube when the third step is as follows > "org. apache. kylin. engine. mr. Exception. MapReduceException: no counters > for the job", could you tell me how to solve the problem without changing > data structure situation now? please look up attachment. > !image-2018-12-20-10-59-04-060.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3731) java.lang.IllegalArgumentException: Unsupported data type array at
[ https://issues.apache.org/jira/browse/KYLIN-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728302#comment-16728302 ] Chao Long commented on KYLIN-3731: -- As I know, Kylin doesn't support complex data type, like array, map... How does it work in your scenario?In other words, what's your query pattern with the column which data type is array? > java.lang.IllegalArgumentException: Unsupported data type array at > --- > > Key: KYLIN-3731 > URL: https://issues.apache.org/jira/browse/KYLIN-3731 > Project: Kylin > Issue Type: Bug > Components: Job Engine >Affects Versions: v2.5.1 >Reporter: HongBo Dai >Assignee: Chao Long >Priority: Critical > Labels: build > Fix For: v2.5.1 > > Attachments: error of kylin.txt, image-2018-12-20-10-59-04-060.png > > > As kylin was recently upgraded from 2.3 to 2.5.1, its data type of array > metadata was found to be unsupported and the following exception occurred > "java. lang. IllegalArgumentException: Unsupported data type array", are in > kylin2.3 hive data storage array before running this type is no problem, > there is the lead in building a cube when the third step is as follows > "org. apache. kylin. engine. mr. Exception. MapReduceException: no counters > for the job", could you tell me how to solve the problem without changing > data structure situation now? please look up attachment. > !image-2018-12-20-10-59-04-060.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3021) Check MapReduce job failed reason and include the diagnostics into email notification
[ https://issues.apache.org/jira/browse/KYLIN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728289#comment-16728289 ] ASF subversion and git services commented on KYLIN-3021: Commit 0c60c6b9cad6ffefd716fd2a3e41a3b9b45788b5 in kylin's branch refs/heads/master from Wang Ken [ https://gitbox.apache.org/repos/asf?p=kylin.git;h=0c60c6b ] KYLIN-3021 check MapReduce job failed reason and include the diagnostics into email notification > Check MapReduce job failed reason and include the diagnostics into email > notification > - > > Key: KYLIN-3021 > URL: https://issues.apache.org/jira/browse/KYLIN-3021 > Project: Kylin > Issue Type: Improvement >Reporter: Zhong Yanghong >Assignee: Zhong Yanghong >Priority: Major > Fix For: v2.6.0 > > > the current kylin.log and failed job email notification, we do not have the > detailed error info that why the map reduce jobs are failed. We just log "no > counters for job" or "Counters: 0". > > 2017-08-03 18:24:10,197 WARN [pool-10-thread-17] common.HadoopCmdOutput:90 : > no counters for job job_1497957612021_709431 > > 2017-08-03 15:08:02,351 DEBUG [pool-10-thread-3] common.HadoopCmdOutput:95 : > Counters: 0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (KYLIN-3291) 在构建好的cube上提交逻辑相同的sql查询结果不同
[ https://issues.apache.org/jira/browse/KYLIN-3291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhong Yanghong reassigned KYLIN-3291: - Assignee: Zhong Yanghong > 在构建好的cube上提交逻辑相同的sql查询结果不同 > -- > > Key: KYLIN-3291 > URL: https://issues.apache.org/jira/browse/KYLIN-3291 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v2.0.0 > Environment: kylin 2.0hbase 1.2.0 >Reporter: zhang >Assignee: Zhong Yanghong >Priority: Blocker > Labels: patch > Fix For: v2.6.0 > > > select > a.agent > ,b.channel_name > ,a.ONLINE_SECONDS_TYPE > ,a.pt_dt > ,count(*) ct > from ( > select > agent , ONLINE_SECONDS_TYPE ,pt_dt > from zhangyc02.DM_CHL_REGUSER_1D_WIDETABLE_D > where pt_dt>='2017-12-01' and pt_dt<='2017-12-03' and agent in > (6,3) > ) a > left join zhangyc02.dim_res_info b > on a.agent = b.channel_id > group by a.agent,b.channel_name,a.ONLINE_SECONDS_TYPE,a.pt_dt > order by agent, pt_dt , ONLINE_SECONDS_TYPE ; > 这种查询结果是错误的 > select > a.agent > ,b.channel_name agent_name > ,a.ONLINE_SECONDS_TYPE > ,pt_dt > ,count(*) ct > from zhangyc02.DM_CHL_REGUSER_1D_WIDETABLE_D a > left join zhangyc02.dim_res_info b > on a.agent = b.channel_id > where pt_dt>='2017-12-01' and pt_dt<='2017-12-03' and a.agent in (6,3) > group by a.agent,b.channel_name,a.ONLINE_SECONDS_TYPE,a.pt_dt > order by agent, pt_dt , ONLINE_SECONDS_TYPE ; > 这种查询结果是正确的 > 校验方式:我将两个sql在impala中分别查询,结果一致并且与kylin中的下面的sql结果一致。 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KYLIN-3021) Check MapReduce job failed reason and include the diagnostics into email notification
[ https://issues.apache.org/jira/browse/KYLIN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhong Yanghong resolved KYLIN-3021. --- Resolution: Resolved > Check MapReduce job failed reason and include the diagnostics into email > notification > - > > Key: KYLIN-3021 > URL: https://issues.apache.org/jira/browse/KYLIN-3021 > Project: Kylin > Issue Type: Improvement >Reporter: Zhong Yanghong >Assignee: Zhong Yanghong >Priority: Major > Fix For: v2.6.0 > > > the current kylin.log and failed job email notification, we do not have the > detailed error info that why the map reduce jobs are failed. We just log "no > counters for job" or "Counters: 0". > > 2017-08-03 18:24:10,197 WARN [pool-10-thread-17] common.HadoopCmdOutput:90 : > no counters for job job_1497957612021_709431 > > 2017-08-03 15:08:02,351 DEBUG [pool-10-thread-3] common.HadoopCmdOutput:95 : > Counters: 0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3559) Use Splitter for splitting String
[ https://issues.apache.org/jira/browse/KYLIN-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728294#comment-16728294 ] Zhong Yanghong commented on KYLIN-3559: --- Hi [~skywind2006], thanks very much. I think you can mark it by yourself:D > Use Splitter for splitting String > - > > Key: KYLIN-3559 > URL: https://issues.apache.org/jira/browse/KYLIN-3559 > Project: Kylin > Issue Type: Task >Reporter: Ted Yu >Assignee: Wu Bin >Priority: Major > Fix For: v2.6.0 > > > See http://errorprone.info/bugpattern/StringSplitter for why Splitter is > preferred . -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3738) Edit cube measure may make the decimal type change unexpectly
[ https://issues.apache.org/jira/browse/KYLIN-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728292#comment-16728292 ] Pan, Julian commented on KYLIN-3738: The bug is related by KYLIN-2243 {code:java} $scope.newMeasure.function.returntype=$scope.newMeasure.function.returntype.replace(/\,\d+/,''); {code} The code in addNewMeasure will change decimal(19,4) to decimal(19). > Edit cube measure may make the decimal type change unexpectly > - > > Key: KYLIN-3738 > URL: https://issues.apache.org/jira/browse/KYLIN-3738 > Project: Kylin > Issue Type: Bug > Components: Web >Affects Versions: v2.5.2 >Reporter: Pan, Julian >Assignee: Pan, Julian >Priority: Major > > When edit cube's measure and click save, the origin return type maybe changed > from decimal(19,4) to decimal(19), that will cause cube build result not > correct -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KYLIN-2924) Utilize error-prone to discover common coding mistakes
[ https://issues.apache.org/jira/browse/KYLIN-2924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shaofeng SHI updated KYLIN-2924: Fix Version/s: (was: v2.6.0) > Utilize error-prone to discover common coding mistakes > -- > > Key: KYLIN-2924 > URL: https://issues.apache.org/jira/browse/KYLIN-2924 > Project: Kylin > Issue Type: Improvement >Reporter: Ted Yu >Assignee: Billy Liu >Priority: Major > > http://errorprone.info/ is a tool which detects common coding mistakes. > We should incorporate into Kylin build. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3698) check-env.sh should print more details about checking items
[ https://issues.apache.org/jira/browse/KYLIN-3698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728279#comment-16728279 ] Zhong Yanghong commented on KYLIN-3698: --- Hi [~DDDQ], what's the status of this? Can we move this to the next release? > check-env.sh should print more details about checking items > --- > > Key: KYLIN-3698 > URL: https://issues.apache.org/jira/browse/KYLIN-3698 > Project: Kylin > Issue Type: Improvement >Affects Versions: v2.5.1 >Reporter: May Zhou >Assignee: May Zhou >Priority: Minor > Fix For: v2.6.0 > > > In the current version, when users run _check-env.sh_, if there's no error > message, it means everything is OK. > From my perspective, adding more details about the checking items when > executing check-env.sh is better. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KYLIN-3738) Edit cube measure may make the decimal type change unexpectly
Pan, Julian created KYLIN-3738: -- Summary: Edit cube measure may make the decimal type change unexpectly Key: KYLIN-3738 URL: https://issues.apache.org/jira/browse/KYLIN-3738 Project: Kylin Issue Type: Bug Components: Web Affects Versions: v2.5.2 Reporter: Pan, Julian Assignee: Pan, Julian When edit cube's measure and click save, the origin return type maybe changed from decimal(19,4) to decimal(19), that will cause cube build result not correct -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3021) Check MapReduce job failed reason and include the diagnostics into email notification
[ https://issues.apache.org/jira/browse/KYLIN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728288#comment-16728288 ] ASF GitHub Bot commented on KYLIN-3021: --- shaofengshi commented on pull request #413: KYLIN-3021 check MapReduce job failed reason and include the diagnost… URL: https://github.com/apache/kylin/pull/413 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Check MapReduce job failed reason and include the diagnostics into email > notification > - > > Key: KYLIN-3021 > URL: https://issues.apache.org/jira/browse/KYLIN-3021 > Project: Kylin > Issue Type: Improvement >Reporter: Zhong Yanghong >Assignee: Zhong Yanghong >Priority: Major > Fix For: v2.6.0 > > > the current kylin.log and failed job email notification, we do not have the > detailed error info that why the map reduce jobs are failed. We just log "no > counters for job" or "Counters: 0". > > 2017-08-03 18:24:10,197 WARN [pool-10-thread-17] common.HadoopCmdOutput:90 : > no counters for job job_1497957612021_709431 > > 2017-08-03 15:08:02,351 DEBUG [pool-10-thread-3] common.HadoopCmdOutput:95 : > Counters: 0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] shaofengshi closed pull request #413: KYLIN-3021 check MapReduce job failed reason and include the diagnost…
shaofengshi closed pull request #413: KYLIN-3021 check MapReduce job failed reason and include the diagnost… URL: https://github.com/apache/kylin/pull/413 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/engine-mr/src/main/java/org/apache/kylin/engine/mr/common/HadoopCmdOutput.java b/engine-mr/src/main/java/org/apache/kylin/engine/mr/common/HadoopCmdOutput.java index d82b988665..df89ed8553 100644 --- a/engine-mr/src/main/java/org/apache/kylin/engine/mr/common/HadoopCmdOutput.java +++ b/engine-mr/src/main/java/org/apache/kylin/engine/mr/common/HadoopCmdOutput.java @@ -18,6 +18,7 @@ package org.apache.kylin.engine.mr.common; +import java.io.IOException; import java.util.Collections; import java.util.HashMap; import java.util.Map; @@ -26,6 +27,8 @@ import org.apache.hadoop.mapreduce.Counters; import org.apache.hadoop.mapreduce.FileSystemCounter; import org.apache.hadoop.mapreduce.Job; +import org.apache.hadoop.mapreduce.JobStatus; +import org.apache.hadoop.mapreduce.TaskCompletionEvent; import org.apache.hadoop.mapreduce.TaskCounter; import org.apache.kylin.common.KylinConfig; import org.apache.kylin.engine.mr.steps.FactDistinctColumnsMapper.RawDataCounter; @@ -92,29 +95,66 @@ public void updateJobCounter() { String errorMsg = "no counters for job " + getMrJobId(); logger.warn(errorMsg); output.append(errorMsg); -return; +} else { +this.output.append(counters.toString()).append("\n"); +logger.debug(counters.toString()); + +mapInputRecords = String.valueOf(counters.findCounter(TaskCounter.MAP_INPUT_RECORDS).getValue()); +rawInputBytesRead = String.valueOf(counters.findCounter(RawDataCounter.BYTES).getValue()); + +String outputFolder = job.getConfiguration().get("mapreduce.output.fileoutputformat.outputdir", + KylinConfig.getInstanceFromEnv().getHdfsWorkingDirectory()); +logger.debug("outputFolder is " + outputFolder); +Path outputPath = new Path(outputFolder); +String fsScheme = outputPath.getFileSystem(job.getConfiguration()).getScheme(); +long bytesWritten = counters.findCounter(fsScheme, FileSystemCounter.BYTES_WRITTEN).getValue(); +if (bytesWritten == 0) { +logger.debug("Seems no counter found for " + fsScheme); +bytesWritten = counters.findCounter("FileSystemCounters", "HDFS_BYTES_WRITTEN").getValue(); +} +hdfsBytesWritten = String.valueOf(bytesWritten); } -this.output.append(counters.toString()).append("\n"); -logger.debug(counters.toString()); - -mapInputRecords = String.valueOf(counters.findCounter(TaskCounter.MAP_INPUT_RECORDS).getValue()); -rawInputBytesRead = String.valueOf(counters.findCounter(RawDataCounter.BYTES).getValue()); - -String outputFolder = job.getConfiguration().get("mapreduce.output.fileoutputformat.outputdir", KylinConfig.getInstanceFromEnv().getHdfsWorkingDirectory()); -logger.debug("outputFolder is " + outputFolder); -Path outputPath = new Path(outputFolder); -String fsScheme = outputPath.getFileSystem(job.getConfiguration()).getScheme(); -long bytesWritten = counters.findCounter(fsScheme, FileSystemCounter.BYTES_WRITTEN).getValue(); -if (bytesWritten == 0) { -logger.debug("Seems no counter found for " + fsScheme); -bytesWritten = counters.findCounter("FileSystemCounters", "HDFS_BYTES_WRITTEN").getValue(); +JobStatus jobStatus = job.getStatus(); +if (jobStatus.getState() == JobStatus.State.FAILED) { +logger.warn("Job Diagnostics:" + jobStatus.getFailureInfo()); +output.append("Job Diagnostics:").append(jobStatus.getFailureInfo()).append("\n"); +TaskCompletionEvent taskEvent = getOneTaskFailure(job); +if (taskEvent != null) { +String[] fails = job.getTaskDiagnostics(taskEvent.getTaskAttemptId()); +logger.warn("Failure task Diagnostics:"); +output.append("Failure task Diagnostics:").append("\n"); +for (String failure : fails) { +logger.warn(failure); +output.append(failure).append("\n"); +} +} } -hdfsBytesWritten = String.valueOf(bytesWritten); - } catch (Exception e) { logger.error(e.getLocalizedMessage(), e);
[GitHub] shaofengshi commented on issue #382: Kylin-3654 New Kylin Streaming
shaofengshi commented on issue #382: Kylin-3654 New Kylin Streaming URL: https://github.com/apache/kylin/pull/382#issuecomment-449708009 Staged in realtime-streaming branch; Close this PR now. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] shaofengshi closed pull request #386: Kylin on Druid blog
shaofengshi closed pull request #386: Kylin on Druid blog URL: https://github.com/apache/kylin/pull/386 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/website/_posts/blog/2018-12-12-why-did-meituan-develop-kylin-on-druid-part1-of-2.md b/website/_posts/blog/2018-12-12-why-did-meituan-develop-kylin-on-druid-part1-of-2.md new file mode 100644 index 00..11c7df5527 --- /dev/null +++ b/website/_posts/blog/2018-12-12-why-did-meituan-develop-kylin-on-druid-part1-of-2.md @@ -0,0 +1,186 @@ +--- +layout: post-blog +title: Why did Meituan develop Kylin On Druid (part 1 of 2)? +date: 2018-12-12 17:30:00 +author: Xiaoxiang Yu +categories: blog +--- + +## Preface + +In the Big Data field, Apache Kylin and Apache Druid(incubating) are two commonly adopted OLAP engines, both of which enable fast querying on huge datasets. In the enterprises that heavily rely on big data analytics, they often run both for different use cases. + +During the Apache Kylin Meetup in August 2018, the Meituan team shared their Kylin on Druid (KoD) solution. Why did they develop this hybrid system? What’s the rationale behind it? This article will answer these questions and help you to understand the differences and the pros and cons of each OLAP engine. + +## 01 Introduction to Apache Kylin +Apache Kylin is an open source distributed big data analytics engine. It constructs data models on top of huge datasets, builds pre-calculated Cubes to support multi-dimensional analysis, and provides a SQL query interface and multi-dimensional analysis on top of Hadoop, with general ODBC, JDBC, and RESTful API interfaces. Apache Kylin’s unique pre-calculation ability enables it to handle extremely large datasets with sub-second query response times. +![](/images/blog/Kylin-On-Durid/1 kylin_architecture.png) +Graphic 1 Kylin Architecture + +## 02 Apache Kylin’s Advantage +1. The mature, Hadoop-based computing engines (MapReduce and Spark) that provide strong capability of pre-calculation on super large datasets, which can be deployed out-of-the-box on any mainstream Hadoop platform. +2. Support of ANSI SQL that allows users to do data analysis with SQL directly. +3. Sub-second, low-latency query response times. +4. Common OLAP Star/Snowflake Schema data modeling. +5. A rich OLAP function set including Sum, Count Distinct, Top N, Percentile, etc. +6. Intelligent trimming of Cuboids that reduces consumption of storage and computing power. +7. Direct integration with mainstream BI tools and rich interfaces. +8. Support of both batch loading of super large historical datasets and micro-batches of data streams. + +## 03 Introduction to Apache Druid (incubating) +Druid was created in 2012. It’s an open source distributed data store. Its core design combines the concept of analytical databases, time-series databases, and search systems, and it can support data collection and analytics on fairly large datasets. Druid uses an Apache V2 license and is an Apache incubator project. + +Druid Architecture +From the perspective of deployment architectures, Druid’s processes mostly fall into 3 categories based on their roles. + +### • Data Node (Slave node for data ingestion and calculation) +The Historical node is in charge of loading segments (committed immutable data) and receiving queries on historical data. +Middle Manager is in charge of data ingestion and commit segments. Each task is done by a separate JVM. +Peon is in charge of completing a single task, which is managed and monitored by the Middle Manager. + +### • Query Node +Broker receives query requests, determines on which segment the data resides, and distributes sub-queries and merges query results. + +### • Master Node (Task Coordinator and Cluster Manager) +Coordinator monitors Historical nodes, dispatches segments and monitor workload. +Overlord monitors Middle Manager, dispatches tasks to Middle Manager, and assists releasing of segments. + + +### External Dependency +At the same time, Druid has 3 replaceable external dependencies. + +### • Deep Storage (distributed storage) +Druid uses Deep storage to transfer data files between nodes. + +### • Metadata Storage +Metadata Storage stores the metadata about segment positions and task output. + +### • Zookeeper (cluster management and task coordination) +Druid uses Zookeeper (ZK) to ensure consistency of the cluster status. +![](/images/blog/Kylin-On-Durid/2 druid_architecture.png) +Graphic 2 Druid Architecture + +## Data Source and Segment +Druid stores data in Data Source. Data Source is equivalent to Table in RDBMS. Data Source is divided into multiple Chunks based on timestamps, and data within the same time range will be organized into the same Chunk. Each
[jira] [Commented] (KYLIN-3559) Use Splitter for splitting String
[ https://issues.apache.org/jira/browse/KYLIN-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728262#comment-16728262 ] Wu Bin commented on KYLIN-3559: --- [~yaho] I think it's resolved. Should I mark it or wait for the admin? > Use Splitter for splitting String > - > > Key: KYLIN-3559 > URL: https://issues.apache.org/jira/browse/KYLIN-3559 > Project: Kylin > Issue Type: Task >Reporter: Ted Yu >Assignee: Wu Bin >Priority: Major > Fix For: v2.6.0 > > > See http://errorprone.info/bugpattern/StringSplitter for why Splitter is > preferred . -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3597) Fix sonar reported static code issues
[ https://issues.apache.org/jira/browse/KYLIN-3597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728271#comment-16728271 ] ASF GitHub Bot commented on KYLIN-3597: --- shaofengshi commented on pull request #401: KYLIN-3597 fix sonar issues URL: https://github.com/apache/kylin/pull/401 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Fix sonar reported static code issues > - > > Key: KYLIN-3597 > URL: https://issues.apache.org/jira/browse/KYLIN-3597 > Project: Kylin > Issue Type: Improvement > Components: Others >Reporter: Shaofeng SHI >Priority: Major > Fix For: v2.6.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3597) Fix sonar reported static code issues
[ https://issues.apache.org/jira/browse/KYLIN-3597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728272#comment-16728272 ] ASF subversion and git services commented on KYLIN-3597: Commit 3b9c5a55139ca85e60e45bb5c748f480178a93d5 in kylin's branch refs/heads/master from whuwb [ https://gitbox.apache.org/repos/asf?p=kylin.git;h=3b9c5a5 ] KYLIN-3597 fix sonar issues > Fix sonar reported static code issues > - > > Key: KYLIN-3597 > URL: https://issues.apache.org/jira/browse/KYLIN-3597 > Project: Kylin > Issue Type: Improvement > Components: Others >Reporter: Shaofeng SHI >Priority: Major > Fix For: v2.6.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] shaofengshi closed pull request #401: KYLIN-3597 fix sonar issues
shaofengshi closed pull request #401: KYLIN-3597 fix sonar issues URL: https://github.com/apache/kylin/pull/401 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/core-cube/src/main/java/org/apache/kylin/cube/cuboid/algorithm/CuboidStatsUtil.java b/core-cube/src/main/java/org/apache/kylin/cube/cuboid/algorithm/CuboidStatsUtil.java index def1e68116..dc3471b4b7 100644 --- a/core-cube/src/main/java/org/apache/kylin/cube/cuboid/algorithm/CuboidStatsUtil.java +++ b/core-cube/src/main/java/org/apache/kylin/cube/cuboid/algorithm/CuboidStatsUtil.java @@ -55,25 +55,26 @@ for (Map.Entry hitFrequency : hitFrequencyMap.entrySet()) { long cuboid = hitFrequency.getKey(); -if (statistics.get(cuboid) != null) { -continue; -} -if (rollingUpCountSourceMap.get(cuboid) == null || rollingUpCountSourceMap.get(cuboid).isEmpty()) { -continue; -} -long totalEstScanCount = 0L; -for (long estScanCount : rollingUpCountSourceMap.get(cuboid).values()) { -totalEstScanCount += estScanCount; -} -totalEstScanCount /= rollingUpCountSourceMap.get(cuboid).size(); -if ((hitFrequency.getValue() * 1.0 / totalHitFrequency) -* totalEstScanCount >= rollUpThresholdForMandatory) { -mandatoryCuboidSet.add(cuboid); + +if (isCuboidMandatory(cuboid, statistics, rollingUpCountSourceMap)) { +long totalEstScanCount = 0L; +for (long estScanCount : rollingUpCountSourceMap.get(cuboid).values()) { +totalEstScanCount += estScanCount; +} +totalEstScanCount /= rollingUpCountSourceMap.get(cuboid).size(); +if ((hitFrequency.getValue() * 1.0 / totalHitFrequency) +* totalEstScanCount >= rollUpThresholdForMandatory) { +mandatoryCuboidSet.add(cuboid); +} } } return mandatoryCuboidSet; } +private static boolean isCuboidMandatory(Long cuboid, Map statistics, Map> rollingUpCountSourceMap) { +return !statistics.containsKey(cuboid) && rollingUpCountSourceMap.containsKey(cuboid) && !rollingUpCountSourceMap.get(cuboid).isEmpty(); +} + /** * Complement row count for mandatory cuboids * with its best parent's row count @@ -81,7 +82,7 @@ public static void complementRowCountForMandatoryCuboids(Map statistics, long baseCuboid, Set mandatoryCuboidSet) { // Sort entries order by row count asc -SortedSet> sortedStatsSet = new TreeSet>( +SortedSet> sortedStatsSet = new TreeSet<>( new Comparator>() { public int compare(Map.Entry o1, Map.Entry o2) { return o1.getValue().compareTo(o2.getValue()); diff --git a/core-cube/src/main/java/org/apache/kylin/cube/cuboid/algorithm/greedy/GreedyAlgorithm.java b/core-cube/src/main/java/org/apache/kylin/cube/cuboid/algorithm/greedy/GreedyAlgorithm.java index 7f415de0bc..e8b0ae894a 100755 --- a/core-cube/src/main/java/org/apache/kylin/cube/cuboid/algorithm/greedy/GreedyAlgorithm.java +++ b/core-cube/src/main/java/org/apache/kylin/cube/cuboid/algorithm/greedy/GreedyAlgorithm.java @@ -110,12 +110,12 @@ public GreedyAlgorithm(final long timeout, BenefitPolicy benefitPolicy, CuboidSt List excluded = Lists.newArrayList(remaining); remaining.retainAll(selected); -Preconditions.checkArgument(remaining.size() == 0, +Preconditions.checkArgument(remaining.isEmpty(), "There should be no intersection between excluded list and selected list."); logger.info("Greedy Algorithm finished."); if (logger.isTraceEnabled()) { -logger.trace("Excluded cuboidId size:" + excluded.size()); +logger.trace(String.format(Locale.ROOT, "Excluded cuboidId size:%d", excluded.size())); logger.trace("Excluded cuboidId detail:"); for (Long cuboid : excluded) { logger.trace(String.format(Locale.ROOT, "cuboidId %d and Cost: %d and Space: %f", cuboid, diff --git a/core-storage/src/main/java/org/apache/kylin/storage/gtrecord/CubeScanRangePlanner.java b/core-storage/src/main/java/org/apache/kylin/storage/gtrecord/CubeScanRangePlanner.java index 1a02e1aa3a..3095c8f708 100644 --- a/core-storage/src/main/java/org/apache/kylin/storage/gtrecord/CubeScanRangePlanner.java +++ b/core-storage/src/main/java/org/apache/kylin/storage/gtrecord/CubeScanRangePlanner.java @@ -24,6 +24,7 @@ import java.util.Collections; import java.util.Comparator;
[jira] [Commented] (KYLIN-3737) Refactor cache part for RDBMS
[ https://issues.apache.org/jira/browse/KYLIN-3737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728266#comment-16728266 ] ASF subversion and git services commented on KYLIN-3737: Commit 5982bb79bacc5e0728df52a6e602239e1d2c6b26 in kylin's branch refs/heads/master from woyumen4597 [ https://gitbox.apache.org/repos/asf?p=kylin.git;h=5982bb7 ] KYLIN-3737 refactor cache part for RDBMS > Refactor cache part for RDBMS > - > > Key: KYLIN-3737 > URL: https://issues.apache.org/jira/browse/KYLIN-3737 > Project: Kylin > Issue Type: Improvement > Components: RDBMS Source >Affects Versions: v2.6.0 > Environment: MacOSx,JDK1.8 >Reporter: rongchuan.jin >Assignee: rongchuan.jin >Priority: Major > Fix For: v2.6.0 > > > Currently, Kylin cache part for RDBMS has poor performance while load many > tables with sql-case-sensitive,it will take much time to load > database,table,column identifier to cache in order to fix sql-case-sensitive > problem for RDBMS.I found it has space to imporve .So I'd like to contribute > a patch. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3737) Refactor cache part for RDBMS
[ https://issues.apache.org/jira/browse/KYLIN-3737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728265#comment-16728265 ] ASF GitHub Bot commented on KYLIN-3737: --- shaofengshi commented on pull request #411: KYLIN-3737 refactor cache part for RDBMS URL: https://github.com/apache/kylin/pull/411 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor cache part for RDBMS > - > > Key: KYLIN-3737 > URL: https://issues.apache.org/jira/browse/KYLIN-3737 > Project: Kylin > Issue Type: Improvement > Components: RDBMS Source >Affects Versions: v2.6.0 > Environment: MacOSx,JDK1.8 >Reporter: rongchuan.jin >Assignee: rongchuan.jin >Priority: Major > Fix For: v2.6.0 > > > Currently, Kylin cache part for RDBMS has poor performance while load many > tables with sql-case-sensitive,it will take much time to load > database,table,column identifier to cache in order to fix sql-case-sensitive > problem for RDBMS.I found it has space to imporve .So I'd like to contribute > a patch. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] shaofengshi closed pull request #411: KYLIN-3737 refactor cache part for RDBMS
shaofengshi closed pull request #411: KYLIN-3737 refactor cache part for RDBMS URL: https://github.com/apache/kylin/pull/411 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/datasource-sdk/src/main/java/org/apache/kylin/sdk/datasource/adaptor/AbstractJdbcAdaptor.java b/datasource-sdk/src/main/java/org/apache/kylin/sdk/datasource/adaptor/AbstractJdbcAdaptor.java index 3e36faedf3..3a66499b33 100644 --- a/datasource-sdk/src/main/java/org/apache/kylin/sdk/datasource/adaptor/AbstractJdbcAdaptor.java +++ b/datasource-sdk/src/main/java/org/apache/kylin/sdk/datasource/adaptor/AbstractJdbcAdaptor.java @@ -53,11 +53,11 @@ protected final DataSourceDef dataSourceDef; protected SqlConverter.IConfigurer configurer; protected final Cache> columnsCache = CacheBuilder.newBuilder() -.expireAfterWrite(1, TimeUnit.DAYS).maximumSize(30).build(); +.expireAfterWrite(1, TimeUnit.DAYS).maximumSize(4096).build(); protected final Cache> databasesCache = CacheBuilder.newBuilder() -.expireAfterWrite(1, TimeUnit.DAYS).maximumSize(30).build(); +.expireAfterWrite(1, TimeUnit.DAYS).maximumSize(4096).build(); protected final Cache> tablesCache = CacheBuilder.newBuilder() -.expireAfterWrite(1, TimeUnit.DAYS).maximumSize(30).build(); +.expireAfterWrite(1, TimeUnit.DAYS).maximumSize(4096).build(); private static Joiner joiner = Joiner.on("_"); @@ -308,7 +308,7 @@ public String getDataSourceId() { */ public List listDatabasesWithCache(boolean init) throws SQLException { if (configurer.enableCache()) { -String cacheKey = config.datasourceId + config.url + "_databases"; +String cacheKey = joiner.join(config.datasourceId, config.url, "databases"); List cachedDatabases; if (init || (cachedDatabases = databasesCache.getIfPresent(cacheKey)) == null) { cachedDatabases = listDatabases(); @@ -429,7 +429,7 @@ public String getDataSourceId() { */ public List listColumnsWithCache(String database, String tableName, boolean init) throws SQLException { if (configurer.enableCache()) { -String cacheKey = config.datasourceId + config.url + "_" + tableName + "_columns"; +String cacheKey = joiner.join(config.datasourceId, config.url, database, tableName, "columns"); List cachedColumns; if (init || (cachedColumns = columnsCache.getIfPresent(cacheKey)) == null) { cachedColumns = listColumns(database, tableName); diff --git a/datasource-sdk/src/main/java/org/apache/kylin/sdk/datasource/adaptor/DefaultAdaptor.java b/datasource-sdk/src/main/java/org/apache/kylin/sdk/datasource/adaptor/DefaultAdaptor.java index 66c45e1dcf..da24e9831b 100644 --- a/datasource-sdk/src/main/java/org/apache/kylin/sdk/datasource/adaptor/DefaultAdaptor.java +++ b/datasource-sdk/src/main/java/org/apache/kylin/sdk/datasource/adaptor/DefaultAdaptor.java @@ -28,6 +28,7 @@ import java.util.Map; import javax.sql.rowset.CachedRowSet; +import com.google.common.base.Joiner; import org.apache.commons.lang.StringUtils; /** @@ -36,9 +37,7 @@ */ public class DefaultAdaptor extends AbstractJdbcAdaptor { -protected static final String QUOTE_REG_LFT = "[`\"\\[]*"; -protected static final String QUOTE_REG_RHT = "[`\"\\]]*"; -private final static String [] POSSIBLE_TALBE_END= {",", " ", ")", "\r", "\n", "."}; +private static Joiner joiner = Joiner.on('_'); public DefaultAdaptor(AdaptorConfig config) throws Exception { super(config); @@ -140,19 +139,6 @@ public String fixSql(String sql) { return sql; } -private boolean checkSqlContainstable(String orig, String table) { -// ensure table is single match(e.g match account but not match accountant) -if (orig.endsWith(table.toUpperCase(Locale.ROOT))) { -return true; -} -for (String end:POSSIBLE_TALBE_END) { -if (orig.contains(table.toUpperCase(Locale.ROOT) + end)){ -return true; -} -} -return false; -} - /** * By default, use schema as database of kylin. * @return @@ -270,26 +256,100 @@ public CachedRowSet getTableColumns(String schema, String table) throws SQLExcep public String fixIdentifierCaseSensitve(String identifier) { try { List databases = listDatabasesWithCache(); -for (String db : databases) { -if (db.equalsIgnoreCase(identifier)) { -return db; +for (String database : databases) { +if (identifier.equalsIgnoreCase(database)) { +
[jira] [Resolved] (KYLIN-3724) Kylin IT test sql is unreasonable
[ https://issues.apache.org/jira/browse/KYLIN-3724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] XiaoXiang Yu resolved KYLIN-3724. - Resolution: Fixed > Kylin IT test sql is unreasonable > - > > Key: KYLIN-3724 > URL: https://issues.apache.org/jira/browse/KYLIN-3724 > Project: Kylin > Issue Type: Bug >Reporter: XiaoXiang Yu >Assignee: XiaoXiang Yu >Priority: Major > Fix For: v2.6.0 > > Attachments: image-2018-12-17-16-55-13-816.png, > image-2018-12-17-16-58-28-349.png > > > In {color:#33}*kylin-it,*{color} we use query under > +_sql_distinct_precisely_+ folder to test the +*COUNT_DISTINCT(Bitmap)*+ . > But we find that query04 using a COUNT_DISTINCT(HLL) in having condition, it > is unreasonable and can cause some data reduction. And I think it maybe > causing some unpredictable test failure. > > {quote}select test_cal_dt.cal_dt,sum(test_kylin_fact.price) as GMV > , count(1) as TRANS_CNT > , count(distinct TEST_COUNT_DISTINCT_BITMAP) as user_count > , count(distinct site_name) as site_count > from test_kylin_fact > inner JOIN edw.test_cal_dt as test_cal_dt > ON test_kylin_fact.cal_dt = test_cal_dt.cal_dt > inner JOIN test_category_groupings > on test_kylin_fact.leaf_categ_id = test_category_groupings.leaf_categ_id and > test_kylin_fact.lstg_site_id = test_category_groupings.site_id > inner JOIN edw.test_sites as test_sites > on test_kylin_fact.lstg_site_id = test_sites.site_id > inner JOIN edw.test_seller_type_dim as test_seller_type_dim > on test_kylin_fact.slr_segment_cd = test_seller_type_dim.seller_type_cd > where test_kylin_fact.lstg_format_name='FP-GTC' > and test_cal_dt.cal_dt between DATE '2013-05-01' and DATE '2013-08-01' > group by test_cal_dt.cal_dt > having count(distinct seller_id) > 2 > {quote} > > > > !image-2018-12-17-16-58-28-349.png! > In our jenkin server, sometime we got a build failure, but when I run again > without modify code, the CI test pass. > !image-2018-12-17-16-55-13-816.png! > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KYLIN-3411) kylin scan different in same sql
[ https://issues.apache.org/jira/browse/KYLIN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhong Yanghong updated KYLIN-3411: -- Issue Type: Improvement (was: Bug) > kylin scan different in same sql > > > Key: KYLIN-3411 > URL: https://issues.apache.org/jira/browse/KYLIN-3411 > Project: Kylin > Issue Type: Improvement > Components: Query Engine >Affects Versions: v2.3.1 >Reporter: Lemont >Assignee: Zhong Yanghong >Priority: Minor > Fix For: v2.6.0 > > > There are two sql: > select sum(value) from test where time > 1524326400 group by id > and > select sum(value) from test where time > (1524931200-7*86400) group by id > As we can see 1524326400 =(1524931200-7*86400) > but the second sql query slower than the first sql > Cuboid Ids: [3904] > Total scan count: 1157959 > Total scan bytes: 265530668 > Result row count: 34991 > Cuboid Ids: [3904] > Total scan count: 611795 > Total scan bytes: 140681855 > Result row count: 34991 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3411) kylin scan different in same sql
[ https://issues.apache.org/jira/browse/KYLIN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728260#comment-16728260 ] Zhong Yanghong commented on KYLIN-3411: --- Good catch. > kylin scan different in same sql > > > Key: KYLIN-3411 > URL: https://issues.apache.org/jira/browse/KYLIN-3411 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v2.3.1 >Reporter: Lemont >Priority: Minor > Fix For: v2.6.0 > > > There are two sql: > select sum(value) from test where time > 1524326400 group by id > and > select sum(value) from test where time > (1524931200-7*86400) group by id > As we can see 1524326400 =(1524931200-7*86400) > but the second sql query slower than the first sql > Cuboid Ids: [3904] > Total scan count: 1157959 > Total scan bytes: 265530668 > Result row count: 34991 > Cuboid Ids: [3904] > Total scan count: 611795 > Total scan bytes: 140681855 > Result row count: 34991 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (KYLIN-3411) kylin scan different in same sql
[ https://issues.apache.org/jira/browse/KYLIN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhong Yanghong reassigned KYLIN-3411: - Assignee: Zhong Yanghong > kylin scan different in same sql > > > Key: KYLIN-3411 > URL: https://issues.apache.org/jira/browse/KYLIN-3411 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v2.3.1 >Reporter: Lemont >Assignee: Zhong Yanghong >Priority: Minor > Fix For: v2.6.0 > > > There are two sql: > select sum(value) from test where time > 1524326400 group by id > and > select sum(value) from test where time > (1524931200-7*86400) group by id > As we can see 1524326400 =(1524931200-7*86400) > but the second sql query slower than the first sql > Cuboid Ids: [3904] > Total scan count: 1157959 > Total scan bytes: 265530668 > Result row count: 34991 > Cuboid Ids: [3904] > Total scan count: 611795 > Total scan bytes: 140681855 > Result row count: 34991 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3724) Kylin IT test sql is unreasonable
[ https://issues.apache.org/jira/browse/KYLIN-3724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728253#comment-16728253 ] XiaoXiang Yu commented on KYLIN-3724: - [~yaho], I have marked this as Resolved. > Kylin IT test sql is unreasonable > - > > Key: KYLIN-3724 > URL: https://issues.apache.org/jira/browse/KYLIN-3724 > Project: Kylin > Issue Type: Bug >Reporter: XiaoXiang Yu >Assignee: XiaoXiang Yu >Priority: Major > Fix For: v2.6.0 > > Attachments: image-2018-12-17-16-55-13-816.png, > image-2018-12-17-16-58-28-349.png > > > In {color:#33}*kylin-it,*{color} we use query under > +_sql_distinct_precisely_+ folder to test the +*COUNT_DISTINCT(Bitmap)*+ . > But we find that query04 using a COUNT_DISTINCT(HLL) in having condition, it > is unreasonable and can cause some data reduction. And I think it maybe > causing some unpredictable test failure. > > {quote}select test_cal_dt.cal_dt,sum(test_kylin_fact.price) as GMV > , count(1) as TRANS_CNT > , count(distinct TEST_COUNT_DISTINCT_BITMAP) as user_count > , count(distinct site_name) as site_count > from test_kylin_fact > inner JOIN edw.test_cal_dt as test_cal_dt > ON test_kylin_fact.cal_dt = test_cal_dt.cal_dt > inner JOIN test_category_groupings > on test_kylin_fact.leaf_categ_id = test_category_groupings.leaf_categ_id and > test_kylin_fact.lstg_site_id = test_category_groupings.site_id > inner JOIN edw.test_sites as test_sites > on test_kylin_fact.lstg_site_id = test_sites.site_id > inner JOIN edw.test_seller_type_dim as test_seller_type_dim > on test_kylin_fact.slr_segment_cd = test_seller_type_dim.seller_type_cd > where test_kylin_fact.lstg_format_name='FP-GTC' > and test_cal_dt.cal_dt between DATE '2013-05-01' and DATE '2013-08-01' > group by test_cal_dt.cal_dt > having count(distinct seller_id) > 2 > {quote} > > > > !image-2018-12-17-16-58-28-349.png! > In our jenkin server, sometime we got a build failure, but when I run again > without modify code, the CI test pass. > !image-2018-12-17-16-55-13-816.png! > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-2924) Utilize error-prone to discover common coding mistakes
[ https://issues.apache.org/jira/browse/KYLIN-2924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728249#comment-16728249 ] Billy Liu commented on KYLIN-2924: -- [~yaho] I think [~Shaofengshi] has disabled this feature, to avoid too much log. > Utilize error-prone to discover common coding mistakes > -- > > Key: KYLIN-2924 > URL: https://issues.apache.org/jira/browse/KYLIN-2924 > Project: Kylin > Issue Type: Improvement >Reporter: Ted Yu >Assignee: Billy Liu >Priority: Major > Fix For: v2.6.0 > > > http://errorprone.info/ is a tool which detects common coding mistakes. > We should incorporate into Kylin build. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3724) Kylin IT test sql is unreasonable
[ https://issues.apache.org/jira/browse/KYLIN-3724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728252#comment-16728252 ] Zhong Yanghong commented on KYLIN-3724: --- Hi [~hit_lacus], should we mark this as Resolved? > Kylin IT test sql is unreasonable > - > > Key: KYLIN-3724 > URL: https://issues.apache.org/jira/browse/KYLIN-3724 > Project: Kylin > Issue Type: Bug >Reporter: XiaoXiang Yu >Assignee: XiaoXiang Yu >Priority: Major > Fix For: v2.6.0 > > Attachments: image-2018-12-17-16-55-13-816.png, > image-2018-12-17-16-58-28-349.png > > > In {color:#33}*kylin-it,*{color} we use query under > +_sql_distinct_precisely_+ folder to test the +*COUNT_DISTINCT(Bitmap)*+ . > But we find that query04 using a COUNT_DISTINCT(HLL) in having condition, it > is unreasonable and can cause some data reduction. And I think it maybe > causing some unpredictable test failure. > > {quote}select test_cal_dt.cal_dt,sum(test_kylin_fact.price) as GMV > , count(1) as TRANS_CNT > , count(distinct TEST_COUNT_DISTINCT_BITMAP) as user_count > , count(distinct site_name) as site_count > from test_kylin_fact > inner JOIN edw.test_cal_dt as test_cal_dt > ON test_kylin_fact.cal_dt = test_cal_dt.cal_dt > inner JOIN test_category_groupings > on test_kylin_fact.leaf_categ_id = test_category_groupings.leaf_categ_id and > test_kylin_fact.lstg_site_id = test_category_groupings.site_id > inner JOIN edw.test_sites as test_sites > on test_kylin_fact.lstg_site_id = test_sites.site_id > inner JOIN edw.test_seller_type_dim as test_seller_type_dim > on test_kylin_fact.slr_segment_cd = test_seller_type_dim.seller_type_cd > where test_kylin_fact.lstg_format_name='FP-GTC' > and test_cal_dt.cal_dt between DATE '2013-05-01' and DATE '2013-08-01' > group by test_cal_dt.cal_dt > having count(distinct seller_id) > 2 > {quote} > > > > !image-2018-12-17-16-58-28-349.png! > In our jenkin server, sometime we got a build failure, but when I run again > without modify code, the CI test pass. > !image-2018-12-17-16-55-13-816.png! > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3597) Fix sonar reported static code issues
[ https://issues.apache.org/jira/browse/KYLIN-3597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728242#comment-16728242 ] Zhong Yanghong commented on KYLIN-3597: --- Hi [~Shaofengshi], what's the status of this? Can it be marked as resolved? > Fix sonar reported static code issues > - > > Key: KYLIN-3597 > URL: https://issues.apache.org/jira/browse/KYLIN-3597 > Project: Kylin > Issue Type: Improvement > Components: Others >Reporter: Shaofeng SHI >Priority: Major > Fix For: v2.6.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3658) The keywords of Hive are not supported By Kylin
[ https://issues.apache.org/jira/browse/KYLIN-3658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728250#comment-16728250 ] Zhong Yanghong commented on KYLIN-3658: --- Hi [~zhixin], what's the status of this issue? Should we move this to next release? > The keywords of Hive are not supported By Kylin > --- > > Key: KYLIN-3658 > URL: https://issues.apache.org/jira/browse/KYLIN-3658 > Project: Kylin > Issue Type: Bug >Affects Versions: v2.5.0 >Reporter: liuzhixin >Priority: Major > Fix For: v2.6.0 > > > Hive2.x version strictly limited in the SQL keywords which must be added on > the quotes, > e.g. ` date `, `timestamp` ... > When Kylin visits Hive, the generated SQL statement does not add the quotes ` > ` governing the SQL keywords, it will bring some problems. > # -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3559) Use Splitter for splitting String
[ https://issues.apache.org/jira/browse/KYLIN-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728240#comment-16728240 ] Zhong Yanghong commented on KYLIN-3559: --- Hi [~skywind2006], what's the status of this? > Use Splitter for splitting String > - > > Key: KYLIN-3559 > URL: https://issues.apache.org/jira/browse/KYLIN-3559 > Project: Kylin > Issue Type: Task >Reporter: Ted Yu >Assignee: Wu Bin >Priority: Major > Fix For: v2.6.0 > > > See http://errorprone.info/bugpattern/StringSplitter for why Splitter is > preferred . -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3628) The wrong result when a query with one lookup table
[ https://issues.apache.org/jira/browse/KYLIN-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728247#comment-16728247 ] Zhong Yanghong commented on KYLIN-3628: --- Hi [~Na Zhai], could you explain more details about your solution for this issue? And what's the status of the PR now? > The wrong result when a query with one lookup table > --- > > Key: KYLIN-3628 > URL: https://issues.apache.org/jira/browse/KYLIN-3628 > Project: Kylin > Issue Type: Improvement >Reporter: Na Zhai >Assignee: Na Zhai >Priority: Major > Fix For: v2.6.0 > > > Two cubes use the same lookup table, and then the lookup table data in Hive > changes. One of Kylin's cubes builds the new data, and the other doesn't. > When Kylin queries the lookup table, the data gets confused. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KYLIN-1295) Add new document to describe key concepts like model/cube/segment etc
[ https://issues.apache.org/jira/browse/KYLIN-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shaofeng SHI updated KYLIN-1295: Fix Version/s: (was: v2.6.0) Move to future release. > Add new document to describe key concepts like model/cube/segment etc > - > > Key: KYLIN-1295 > URL: https://issues.apache.org/jira/browse/KYLIN-1295 > Project: Kylin > Issue Type: Improvement > Components: Documentation >Reporter: liyang >Assignee: Shaofeng SHI >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3571) Not build Spark in Kylin's binary package
[ https://issues.apache.org/jira/browse/KYLIN-3571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728241#comment-16728241 ] Zhong Yanghong commented on KYLIN-3571: --- Hi [~Wayne0101], what's the status of this issue? > Not build Spark in Kylin's binary package > - > > Key: KYLIN-3571 > URL: https://issues.apache.org/jira/browse/KYLIN-3571 > Project: Kylin > Issue Type: Improvement > Components: Environment >Reporter: Shaofeng SHI >Assignee: Chao Long >Priority: Major > Fix For: v2.6.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (KYLIN-3559) Use Splitter for splitting String
[ https://issues.apache.org/jira/browse/KYLIN-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728240#comment-16728240 ] Zhong Yanghong edited comment on KYLIN-3559 at 12/24/18 8:20 AM: - Hi [~skywind2006], what's the status of this? And can it be marked as Resolved? was (Author: yaho): Hi [~skywind2006], what's the status of this? > Use Splitter for splitting String > - > > Key: KYLIN-3559 > URL: https://issues.apache.org/jira/browse/KYLIN-3559 > Project: Kylin > Issue Type: Task >Reporter: Ted Yu >Assignee: Wu Bin >Priority: Major > Fix For: v2.6.0 > > > See http://errorprone.info/bugpattern/StringSplitter for why Splitter is > preferred . -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] woyumen4597 commented on issue #411: KYLIN-3737 refactor cache part for RDBMS
woyumen4597 commented on issue #411: KYLIN-3737 refactor cache part for RDBMS URL: https://github.com/apache/kylin/pull/411#issuecomment-449701482 Local CI has passed. ![image](https://user-images.githubusercontent.com/24585832/50394312-a74fed00-0797-11e9-935e-dca7fe728b50.png) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (KYLIN-3310) Use lint for maven-compiler-plugin
[ https://issues.apache.org/jira/browse/KYLIN-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728236#comment-16728236 ] Zhong Yanghong commented on KYLIN-3310: --- Hi [~Aron.tao], what's the status of this? > Use lint for maven-compiler-plugin > -- > > Key: KYLIN-3310 > URL: https://issues.apache.org/jira/browse/KYLIN-3310 > Project: Kylin > Issue Type: Improvement > Components: Tools, Build and Test >Reporter: Ted Yu >Assignee: Jiatao Tao >Priority: Major > Fix For: v2.6.0 > > > lint helps identify structural problems. > We should enable lint for maven-compiler-plugin > {code} > maven-compiler-plugin > ${maven-compiler-plugin.version} > > 1.8 > 1.8 > > -Xlint:all > ${compiler.error.flag} > > -Xlint:-options > > -Xlint:-cast > -Xlint:-deprecation > -Xlint:-processing > -Xlint:-rawtypes > -Xlint:-serial > -Xlint:-try > -Xlint:-unchecked > -Xlint:-varargs > > > > > true > > false > > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3093) Upgrade curator to 2.12
[ https://issues.apache.org/jira/browse/KYLIN-3093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728234#comment-16728234 ] Zhong Yanghong commented on KYLIN-3093: --- Hi [~Shaofengshi], what's the status of this issue? Will it be fixed in future? > Upgrade curator to 2.12 > --- > > Key: KYLIN-3093 > URL: https://issues.apache.org/jira/browse/KYLIN-3093 > Project: Kylin > Issue Type: Improvement > Components: Tools, Build and Test >Reporter: Ted Yu >Assignee: Shaofeng SHI >Priority: Major > Fix For: v2.6.0 > > > curator-2.10.0 has several bug fixes over current version (2.7.1), updating > would help improve stability. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KYLIN-2973) Potential issue of not atomically update cube instance map
[ https://issues.apache.org/jira/browse/KYLIN-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhong Yanghong resolved KYLIN-2973. --- Resolution: Fixed > Potential issue of not atomically update cube instance map > -- > > Key: KYLIN-2973 > URL: https://issues.apache.org/jira/browse/KYLIN-2973 > Project: Kylin > Issue Type: Bug >Reporter: Zhong Yanghong >Assignee: Zhong Yanghong >Priority: Major > Fix For: v2.6.0 > > > P1 > {code} > try { > getStore().putResource(cube.getResourcePath(), cube, > CUBE_SERIALIZER); > } catch (IllegalStateException ise) { > logger.warn("Write conflict to update cube " + cube.getName() + " > at try " + retry + ", will retry..."); > if (retry >= 7) { > logger.error("Retried 7 times till got error, abandoning...", > ise); > throw ise; > } > cube = reloadCubeLocal(cube.getName()); > update.setCubeInstance(cube); > retry++; > cube = updateCubeWithRetry(update, retry); > } > {code} > P2 > {code} > if (toRemoveResources.size() > 0) { > for (String resource : toRemoveResources) { > try { > getStore().deleteResource(resource); > } catch (IOException ioe) { > logger.error("Failed to delete resource " + > toRemoveResources.toString()); > } > } > } > {code} > P3 > {code} > cubeMap.put(cube.getName(), cube); > {code} > There's a chance like: > # Thread t1, goes into P2; > # Then Thread t2, goes into P1, P2, P3; the cube instance in the map will be > updated by t2 > # Then Thread t1 goes into P3; the cube instance in the map will be updated > by t1, which is not correct -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-2924) Utilize error-prone to discover common coding mistakes
[ https://issues.apache.org/jira/browse/KYLIN-2924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728231#comment-16728231 ] Zhong Yanghong commented on KYLIN-2924: --- Hi [~yimingliu], what's the status of this? > Utilize error-prone to discover common coding mistakes > -- > > Key: KYLIN-2924 > URL: https://issues.apache.org/jira/browse/KYLIN-2924 > Project: Kylin > Issue Type: Improvement >Reporter: Ted Yu >Assignee: Billy Liu >Priority: Major > Fix For: v2.6.0 > > > http://errorprone.info/ is a tool which detects common coding mistakes. > We should incorporate into Kylin build. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-1577) make kylin metadata store support multiple replication
[ https://issues.apache.org/jira/browse/KYLIN-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728224#comment-16728224 ] Zhong Yanghong commented on KYLIN-1577: --- Hi [~liyang.g...@gmail.com] and [~Shaofengshi], what's the status of this patch? > make kylin metadata store support multiple replication > -- > > Key: KYLIN-1577 > URL: https://issues.apache.org/jira/browse/KYLIN-1577 > Project: Kylin > Issue Type: Sub-task > Components: Metadata >Reporter: yunjiong zhao >Assignee: yunjiong zhao >Priority: Major > Fix For: v2.6.0 > > Attachments: KYLIN-1577.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (KYLIN-1295) Add new document to describe key concepts like model/cube/segment etc
[ https://issues.apache.org/jira/browse/KYLIN-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728220#comment-16728220 ] Zhong Yanghong edited comment on KYLIN-1295 at 12/24/18 7:59 AM: - Hi [~Shaofengshi], how about the progress? And is it feasible to finish this by v2.6.0? was (Author: yaho): Hi [~Shaofengshi], how about the progress? > Add new document to describe key concepts like model/cube/segment etc > - > > Key: KYLIN-1295 > URL: https://issues.apache.org/jira/browse/KYLIN-1295 > Project: Kylin > Issue Type: Improvement > Components: Documentation >Reporter: liyang >Assignee: Shaofeng SHI >Priority: Major > Fix For: v2.6.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)