[GitHub] [kylin] nichunen merged pull request #849: Revert "Kylin 3392 support sum(null)"
nichunen merged pull request #849: Revert "Kylin 3392 support sum(null)" URL: https://github.com/apache/kylin/pull/849 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [kylin] coveralls edited a comment on issue #845: KYLIN-4100 run load overview func when change tab and css refine
coveralls edited a comment on issue #845: KYLIN-4100 run load overview func when change tab and css refine URL: https://github.com/apache/kylin/pull/845#issuecomment-533254450 ## Pull Request Test Coverage Report for [Build 5031](https://coveralls.io/builds/25805273) * **0** of **0** changed or added relevant lines in **0** files are covered. * No unchanged relevant lines lost coverage. * Overall coverage remained the same at ?**%** --- | Totals | [![Coverage Status](https://coveralls.io/builds/25805273/badge)](https://coveralls.io/builds/25805273) | | :-- | --: | | Change from base [Build 5023](https://coveralls.io/builds/25775999): | 0.0% | | Covered Lines: | | | Relevant Lines: | 0 | --- # - [Coveralls](https://coveralls.io) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (KYLIN-4167) Refactor streaming coordinator
[ https://issues.apache.org/jira/browse/KYLIN-4167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoxiang Yu updated KYLIN-4167: Description: h2. Summary # Currently, *coordinator* has too many responsibility, which violate single responsibility principle, and it not easy for extension, a good separation of responsibilities is a recommended way. # Some cluster level operation has no atomicity guarantee, we should implement then in idempotent way to achieve final consistency # Resubmit when job was discarded # Clarify overall design for realtime OLAP h4. StreamingCoordinator Facade of coordinator, will controll BuildJobSummitter/ReceiverClusterMangaer and delegate operation to them. h4. BuildJobSubmitter The main responsibility of BuildJobSubmitter including: 1. Try to find candidate segment which ready to submit a build job 2. Trace the status of candidate segment's build job and promote segment if it is has met requirements h4. h4. ReceiverClusterManager This class manage operation related to multi streaming receivers. They are often not atomic and maybe idempotent. h4. ClusterStateChecker Basic step of this class: 1. stop/pause coordinator to avoid underlying concurrency issue 2. check inconsistent state of all receiver cluster 3. send summary via mail to kylin admin 4. if need, call ClusterDoctor to repair inconsistent issue h4. ClusterDoctor Repair inconsistent state according to result of ClusterStateChecker h3. Candidate Segment The candidate segments are those segments what can be saw/perceived by streaming coordinator, candidate segment could be divided into following state/queue: 1. segment which data are uploaded *PARTLY* 2. segment which data are uploaded completely and *WAITING* to build 3. segment which in *BUILDING* state, job's state should be one of (NEW/RUNNING/ERROR/DISCARD) 4. segment which built *succeed* and wait to be delivered to historical part (and to be deleted in realtime part) 5. segment which *in historical part*(HBase Ready Segment) By design, segment should transfer to next queue in sequential way(shouldn't jump the queue), do not break this. h3. Atomicity In a multi-step transcation, following acepts should be thought twice: 1. should *fail fast* or continue when exception thrown. 2. should API(remote call) be *synchronous* or asynchronous 3. when transcation failed, could *roll back* always succeed 4. transcation should be *idempotent* so when it failed, it could be fixed by retry How to ensure whole cluster opreation smoothly without blocking problem. I divided all multi-step transcation into three kinds: NotAtomicIdempotent NotAtomicAndNotIdempotent NonSideEffect was: 1. Currently, coordinator has too many responsibility, which violate single responsibility principle, and it not easy for extension, a good separation of responsibilities is a recommended way 2. Some cluster level operation has no atomicity guarantee, we should implement then in idempotent way to achieve final consistency 3. Resubmit when job was discarded 4. Clarify overall design for realtime OLAP > Refactor streaming coordinator > -- > > Key: KYLIN-4167 > URL: https://issues.apache.org/jira/browse/KYLIN-4167 > Project: Kylin > Issue Type: Improvement > Components: Real-time Streaming >Reporter: Xiaoxiang Yu >Assignee: Xiaoxiang Yu >Priority: Major > Fix For: v3.0.0-beta > > > h2. Summary > # Currently, *coordinator* has too many responsibility, which violate single > responsibility principle, and it not easy for extension, a good separation of > responsibilities is a recommended way. > # Some cluster level operation has no atomicity guarantee, we should > implement then in idempotent way to achieve final consistency > # Resubmit when job was discarded > # Clarify overall design for realtime OLAP > > h4. StreamingCoordinator > Facade of coordinator, will controll BuildJobSummitter/ReceiverClusterMangaer > and delegate operation to them. > h4. BuildJobSubmitter > The main responsibility of BuildJobSubmitter including: > 1. Try to find candidate segment which ready to submit a build job > 2. Trace the status of candidate segment's build job and promote segment if > it is has met requirements > h4. > h4. ReceiverClusterManager > This class manage operation related to multi streaming receivers. They are > often not atomic and maybe idempotent. > h4. ClusterStateChecker > Basic step of this class: > 1. stop/pause coordinator to avoid underlying concurrency issue > 2. check inconsistent state of all receiver cluster > 3. send summary via mail to kylin admin > 4. if need, call ClusterDoctor to repair inconsistent issue > h4. ClusterDoctor > Repair inconsistent state according to result of ClusterStateChecker > > >
[GitHub] [kylin] codecov-io commented on issue #849: Revert "Kylin 3392 support sum(null)"
codecov-io commented on issue #849: Revert "Kylin 3392 support sum(null)" URL: https://github.com/apache/kylin/pull/849#issuecomment-533572399 # [Codecov](https://codecov.io/gh/apache/kylin/pull/849?src=pr=h1) Report > :exclamation: No coverage uploaded for pull request base (`master@2091f1e`). [Click here to learn what that means](https://docs.codecov.io/docs/error-reference#section-missing-base-commit). > The diff coverage is `39.62%`. [![Impacted file tree graph](https://codecov.io/gh/apache/kylin/pull/849/graphs/tree.svg?width=650=JawVgbgsVo=150=pr)](https://codecov.io/gh/apache/kylin/pull/849?src=pr=tree) ```diff @@Coverage Diff@@ ## master #849 +/- ## = Coverage ? 25.61% Complexity? 6069 = Files ? 1399 Lines ?83654 Branches ?11722 = Hits ?21428 Misses?60168 Partials ? 2058 ``` | [Impacted Files](https://codecov.io/gh/apache/kylin/pull/849?src=pr=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...apache/kylin/metadata/datatype/LongSerializer.java](https://codecov.io/gh/apache/kylin/pull/849/diff?src=pr=tree#diff-Y29yZS1tZXRhZGF0YS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUva3lsaW4vbWV0YWRhdGEvZGF0YXR5cGUvTG9uZ1NlcmlhbGl6ZXIuamF2YQ==) | `0% <0%> (ø)` | `0 <0> (?)` | | | [...org/apache/kylin/measure/basic/DoubleIngester.java](https://codecov.io/gh/apache/kylin/pull/849/diff?src=pr=tree#diff-Y29yZS1tZXRhZGF0YS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUva3lsaW4vbWVhc3VyZS9iYXNpYy9Eb3VibGVJbmdlc3Rlci5qYXZh) | `0% <0%> (ø)` | `0 <0> (?)` | | | [...apache/kylin/measure/basic/BigDecimalIngester.java](https://codecov.io/gh/apache/kylin/pull/849/diff?src=pr=tree#diff-Y29yZS1tZXRhZGF0YS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUva3lsaW4vbWVhc3VyZS9iYXNpYy9CaWdEZWNpbWFsSW5nZXN0ZXIuamF2YQ==) | `0% <0%> (ø)` | `0 <0> (?)` | | | [...g/apache/kylin/measure/basic/BasicMeasureType.java](https://codecov.io/gh/apache/kylin/pull/849/diff?src=pr=tree#diff-Y29yZS1tZXRhZGF0YS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUva3lsaW4vbWVhc3VyZS9iYXNpYy9CYXNpY01lYXN1cmVUeXBlLmphdmE=) | `10.52% <0%> (ø)` | `1 <0> (?)` | | | [...a/org/apache/kylin/measure/basic/LongIngester.java](https://codecov.io/gh/apache/kylin/pull/849/diff?src=pr=tree#diff-Y29yZS1tZXRhZGF0YS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUva3lsaW4vbWVhc3VyZS9iYXNpYy9Mb25nSW5nZXN0ZXIuamF2YQ==) | `0% <0%> (ø)` | `0 <0> (?)` | | | [...ache/kylin/metadata/datatype/DoubleSerializer.java](https://codecov.io/gh/apache/kylin/pull/849/diff?src=pr=tree#diff-Y29yZS1tZXRhZGF0YS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUva3lsaW4vbWV0YWRhdGEvZGF0YXR5cGUvRG91YmxlU2VyaWFsaXplci5qYXZh) | `0% <0%> (ø)` | `0 <0> (?)` | | | [.../kylin/metadata/datatype/BigDecimalSerializer.java](https://codecov.io/gh/apache/kylin/pull/849/diff?src=pr=tree#diff-Y29yZS1tZXRhZGF0YS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUva3lsaW4vbWV0YWRhdGEvZGF0YXR5cGUvQmlnRGVjaW1hbFNlcmlhbGl6ZXIuamF2YQ==) | `72.97% <100%> (ø)` | `8 <2> (?)` | | | [...n/java/org/apache/kylin/common/util/BytesUtil.java](https://codecov.io/gh/apache/kylin/pull/849/diff?src=pr=tree#diff-Y29yZS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2t5bGluL2NvbW1vbi91dGlsL0J5dGVzVXRpbC5qYXZh) | `25% <100%> (ø)` | `27 <0> (?)` | | | [...e/kylin/measure/basic/BigDecimalMinAggregator.java](https://codecov.io/gh/apache/kylin/pull/849/diff?src=pr=tree#diff-Y29yZS1tZXRhZGF0YS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUva3lsaW4vbWVhc3VyZS9iYXNpYy9CaWdEZWNpbWFsTWluQWdncmVnYXRvci5qYXZh) | `27.77% <25%> (ø)` | `3 <1> (?)` | | | [...pache/kylin/measure/basic/DoubleMinAggregator.java](https://codecov.io/gh/apache/kylin/pull/849/diff?src=pr=tree#diff-Y29yZS1tZXRhZGF0YS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUva3lsaW4vbWVhc3VyZS9iYXNpYy9Eb3VibGVNaW5BZ2dyZWdhdG9yLmphdmE=) | `41.66% <25%> (ø)` | `3 <1> (?)` | | | ... and [7 more](https://codecov.io/gh/apache/kylin/pull/849/diff?src=pr=tree-more) | | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/kylin/pull/849?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/kylin/pull/849?src=pr=footer). Last update [2091f1e...f475c11](https://codecov.io/gh/apache/kylin/pull/849?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go
[GitHub] [kylin] codecov-io commented on issue #848: KYLIN-4095: Add RESOURCE_PATH_PREFIX option in ResourceTool (addendum)
codecov-io commented on issue #848: KYLIN-4095: Add RESOURCE_PATH_PREFIX option in ResourceTool (addendum) URL: https://github.com/apache/kylin/pull/848#issuecomment-533569512 # [Codecov](https://codecov.io/gh/apache/kylin/pull/848?src=pr=h1) Report > :exclamation: No coverage uploaded for pull request base (`master@2091f1e`). [Click here to learn what that means](https://docs.codecov.io/docs/error-reference#section-missing-base-commit). > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/kylin/pull/848/graphs/tree.svg?width=650=JawVgbgsVo=150=pr)](https://codecov.io/gh/apache/kylin/pull/848?src=pr=tree) ```diff @@Coverage Diff@@ ## master #848 +/- ## = Coverage ? 25.69% Complexity? 6123 = Files ? 1400 Lines ?83729 Branches ?11740 = Hits ?21516 Misses?60142 Partials ? 2071 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/kylin/pull/848?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/kylin/pull/848?src=pr=footer). Last update [2091f1e...2823341](https://codecov.io/gh/apache/kylin/pull/848?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (KYLIN-4174) job statistics info is incorrect
ZhouKang created KYLIN-4174: --- Summary: job statistics info is incorrect Key: KYLIN-4174 URL: https://issues.apache.org/jira/browse/KYLIN-4174 Project: Kylin Issue Type: Bug Reporter: ZhouKang Kylin: master branch The job statistics info is incorrect. The number behind type is not equal to the size of job list. Step 1: login kylin webui, open the Monitor Step 2: refresh the web page You can see, the query params of RESTful API is wrong, projectName is "_null" {code:java} // code placeholder curl 'http://zjy-bigdata-prc-kylin04.bj:7070/kylin/api/jobs/overview?jobSearchMode=ALL=_null=1' {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [kylin] RupengWang opened a new pull request #849: Revert "Kylin 3392 support sum(null)"
RupengWang opened a new pull request #849: Revert "Kylin 3392 support sum(null)" URL: https://github.com/apache/kylin/pull/849 This reverts commit 1a2f779a074c1d03703eebbb7506d171ef26ce56. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [kylin] asf-ci commented on issue #849: Revert "Kylin 3392 support sum(null)"
asf-ci commented on issue #849: Revert "Kylin 3392 support sum(null)" URL: https://github.com/apache/kylin/pull/849#issuecomment-533558965 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (KYLIN-3392) Support NULL value in Sum, Max, Min Aggregation
[ https://issues.apache.org/jira/browse/KYLIN-3392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangrupeng updated KYLIN-3392: -- Fix Version/s: (was: v2.6.4) (was: v3.0.0-beta) > Support NULL value in Sum, Max, Min Aggregation > --- > > Key: KYLIN-3392 > URL: https://issues.apache.org/jira/browse/KYLIN-3392 > Project: Kylin > Issue Type: Bug >Reporter: Yifei Wu >Assignee: Yifei Wu >Priority: Major > Attachments: KYLIN-3392-2.png, KYLIN-3392.png, kylin-3.0.0-alpha2.png > > > It is treated as 0 when confronted with NULL value in KYLIN's basic aggregate > measure (like sum, max, min). However, to distinguish the NULL value with 0 > is very necessary. > It should be like this > *sum(null, null) = null* > *sum(null, 1) = 1* > *max(null, null) = null* > *max(null, -1) = -1* > *min(null, -1)= -1* > in accordance with Hive and SparkSQL -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (KYLIN-3392) Support NULL value in Sum, Max, Min Aggregation
[ https://issues.apache.org/jira/browse/KYLIN-3392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangrupeng reopened KYLIN-3392: --- > Support NULL value in Sum, Max, Min Aggregation > --- > > Key: KYLIN-3392 > URL: https://issues.apache.org/jira/browse/KYLIN-3392 > Project: Kylin > Issue Type: Bug >Reporter: Yifei Wu >Assignee: Yifei Wu >Priority: Major > Fix For: v3.0.0-beta, v2.6.4 > > Attachments: KYLIN-3392-2.png, KYLIN-3392.png, kylin-3.0.0-alpha2.png > > > It is treated as 0 when confronted with NULL value in KYLIN's basic aggregate > measure (like sum, max, min). However, to distinguish the NULL value with 0 > is very necessary. > It should be like this > *sum(null, null) = null* > *sum(null, 1) = 1* > *max(null, null) = null* > *max(null, -1) = -1* > *min(null, -1)= -1* > in accordance with Hive and SparkSQL -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KYLIN-3392) Support NULL value in Sum, Max, Min Aggregation
[ https://issues.apache.org/jira/browse/KYLIN-3392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16934404#comment-16934404 ] wangrupeng commented on KYLIN-3392: --- This pr can cause other serious problem, there are details about the problem I met. when I query with aggregate function such as count, sum, max, min, etc, if the column values are all NULL or the values filter by where condition contains NULL , the query thread will be hang up and after about one minute it will throw an exception. To reproduce this problem, I create a new hive table whose table structure is same as KYLIN_SALES and contains 4 rows, the column contains two NULL values is PRICE : 1 2013/10/19 FP-non 37831 0 13 30 3 1209 10003717ANALYST Beijing 2 2012/10/22 Others 140746 100 11 70 20 1154 10006076ADMIN Shanghai 3 2013/10/19 FP-non 37831 0 13 NULL3 1209 10003717ANALYST Beijing 4 2012/10/22 Others 140746 100 11 NULLNULL115410006076 ADMIN Shanghai And then I build model and cube using this table named KYLIN_SALES_3392 as fact table and no lookup table. I compare the kylin which merged this pr with the one that didn't merge. Merged this pr, like I said , when where condition filter column values contains NULL, it will throw an exception: select min(price) from kylin_sales_3392 where trans_id>=1 and trans_id<=4 !KYLIN-3392.png|width=654,height=296! if where condition filter column values don't contain NULL, it will work properlyselect min(price) from kylin_sales_3392 where trans_id>=1 and trans_id<=2 !KYLIN-3392-2.png|width=680,height=256! And below is the result from origin kylin version select min(price) from kylin_sales_3392 where trans_id>=1 and trans_id<=4 !kylin-3.0.0-alpha2.png|width=668,height=215! > Support NULL value in Sum, Max, Min Aggregation > --- > > Key: KYLIN-3392 > URL: https://issues.apache.org/jira/browse/KYLIN-3392 > Project: Kylin > Issue Type: Bug >Reporter: Yifei Wu >Assignee: Yifei Wu >Priority: Major > Fix For: v3.0.0-beta, v2.6.4 > > Attachments: KYLIN-3392-2.png, KYLIN-3392.png, kylin-3.0.0-alpha2.png > > > It is treated as 0 when confronted with NULL value in KYLIN's basic aggregate > measure (like sum, max, min). However, to distinguish the NULL value with 0 > is very necessary. > It should be like this > *sum(null, null) = null* > *sum(null, 1) = 1* > *max(null, null) = null* > *max(null, -1) = -1* > *min(null, -1)= -1* > in accordance with Hive and SparkSQL -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KYLIN-3392) Support NULL value in Sum, Max, Min Aggregation
[ https://issues.apache.org/jira/browse/KYLIN-3392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangrupeng updated KYLIN-3392: -- Attachment: kylin-3.0.0-alpha2.png > Support NULL value in Sum, Max, Min Aggregation > --- > > Key: KYLIN-3392 > URL: https://issues.apache.org/jira/browse/KYLIN-3392 > Project: Kylin > Issue Type: Bug >Reporter: Yifei Wu >Assignee: Yifei Wu >Priority: Major > Fix For: v3.0.0-beta, v2.6.4 > > Attachments: KYLIN-3392-2.png, KYLIN-3392.png, kylin-3.0.0-alpha2.png > > > It is treated as 0 when confronted with NULL value in KYLIN's basic aggregate > measure (like sum, max, min). However, to distinguish the NULL value with 0 > is very necessary. > It should be like this > *sum(null, null) = null* > *sum(null, 1) = 1* > *max(null, null) = null* > *max(null, -1) = -1* > *min(null, -1)= -1* > in accordance with Hive and SparkSQL -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KYLIN-3392) Support NULL value in Sum, Max, Min Aggregation
[ https://issues.apache.org/jira/browse/KYLIN-3392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangrupeng updated KYLIN-3392: -- Attachment: KYLIN-3392-2.png > Support NULL value in Sum, Max, Min Aggregation > --- > > Key: KYLIN-3392 > URL: https://issues.apache.org/jira/browse/KYLIN-3392 > Project: Kylin > Issue Type: Bug >Reporter: Yifei Wu >Assignee: Yifei Wu >Priority: Major > Fix For: v3.0.0-beta, v2.6.4 > > Attachments: KYLIN-3392-2.png, KYLIN-3392.png > > > It is treated as 0 when confronted with NULL value in KYLIN's basic aggregate > measure (like sum, max, min). However, to distinguish the NULL value with 0 > is very necessary. > It should be like this > *sum(null, null) = null* > *sum(null, 1) = 1* > *max(null, null) = null* > *max(null, -1) = -1* > *min(null, -1)= -1* > in accordance with Hive and SparkSQL -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [kylin] lshmouse opened a new pull request #848: KYLIN-4095: Add RESOURCE_PATH_PREFIX option in ResourceTool (addendum)
lshmouse opened a new pull request #848: KYLIN-4095: Add RESOURCE_PATH_PREFIX option in ResourceTool (addendum) URL: https://github.com/apache/kylin/pull/848 https://issues.apache.org/jira/browse/KYLIN-4095 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [kylin] asf-ci commented on issue #848: KYLIN-4095: Add RESOURCE_PATH_PREFIX option in ResourceTool (addendum)
asf-ci commented on issue #848: KYLIN-4095: Add RESOURCE_PATH_PREFIX option in ResourceTool (addendum) URL: https://github.com/apache/kylin/pull/848#issuecomment-533556510 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (KYLIN-4095) Add RESOURCE_PATH_PREFIX option in ResourceTool
[ https://issues.apache.org/jira/browse/KYLIN-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16934402#comment-16934402 ] ASF GitHub Bot commented on KYLIN-4095: --- lshmouse commented on pull request #848: KYLIN-4095: Add RESOURCE_PATH_PREFIX option in ResourceTool (addendum) URL: https://github.com/apache/kylin/pull/848 https://issues.apache.org/jira/browse/KYLIN-4095 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add RESOURCE_PATH_PREFIX option in ResourceTool > --- > > Key: KYLIN-4095 > URL: https://issues.apache.org/jira/browse/KYLIN-4095 > Project: Kylin > Issue Type: Improvement >Reporter: Liu Shaohui >Priority: Major > Fix For: v3.0.0-beta > > Attachments: image-2019-09-20-10-16-37-603.png, > image-2019-09-20-10-17-31-459.png > > > ResourceTool is very useful to fix the metadata with overlap segments. > But downloading and uploading entire metadata is too heavy. > It's better to have a RESOURCE_PATH_PREFIX option for downloading and > uploading cmds. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KYLIN-3392) Support NULL value in Sum, Max, Min Aggregation
[ https://issues.apache.org/jira/browse/KYLIN-3392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangrupeng updated KYLIN-3392: -- Attachment: KYLIN-3392.png > Support NULL value in Sum, Max, Min Aggregation > --- > > Key: KYLIN-3392 > URL: https://issues.apache.org/jira/browse/KYLIN-3392 > Project: Kylin > Issue Type: Bug >Reporter: Yifei Wu >Assignee: Yifei Wu >Priority: Major > Fix For: v3.0.0-beta, v2.6.4 > > Attachments: KYLIN-3392.png > > > It is treated as 0 when confronted with NULL value in KYLIN's basic aggregate > measure (like sum, max, min). However, to distinguish the NULL value with 0 > is very necessary. > It should be like this > *sum(null, null) = null* > *sum(null, 1) = 1* > *max(null, null) = null* > *max(null, -1) = -1* > *min(null, -1)= -1* > in accordance with Hive and SparkSQL -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KYLIN-4141) Build Global Dictionary in no time
[ https://issues.apache.org/jira/browse/KYLIN-4141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoxiang Yu updated KYLIN-4141: Description: h2. h2. Backgroud Currently, realtime OLAP do not support COUNT_DISTINCT(bitmap) for non interger type. Because of the lack the ability of encoding string at once, so I want to use RocksDB & HBase as implementation of streaming distributed dictionary. h2. Design # each receiver will own a local dict cache # all receiver will share a remote dict storage # we choose to use RocksDB as local dict cache # we choose to use HBase as remote dict storage # for each cube, we will create a local dict and a hbase table # we will create column family both in RocksDB and HBase for each column which occur in COUNT_DISTINCT h2. Design Diagram !image-2019-09-20-19-04-47-937.png! !image-2019-09-20-19-04-55-935.png! !image-2019-09-20-20-06-15-960.png! was: h2. Backgroud Currently, realtime OLAP do not support COUNT_DISTINCT(bitmap) for non interger type. Because of the lack the ability of encoding string at once, so I want to use RocksDB & HBase as implementation of streaming distributed dictionary. h2. Design # each receiver will own a local dict cache # all receiver will share a remote dict storage # we choose to use RocksDB as local dict cache # we choose to use HBase as remote dict storage # for each cube, we will create a local dict and a hbase table # we will create column family both in RocksDB and HBase for each column which occur in COUNT_DISTINCT !image-2019-09-20-19-04-47-937.png! !image-2019-09-20-19-04-55-935.png! > Build Global Dictionary in no time > -- > > Key: KYLIN-4141 > URL: https://issues.apache.org/jira/browse/KYLIN-4141 > Project: Kylin > Issue Type: Improvement > Components: Real-time Streaming >Affects Versions: v3.0.0-beta >Reporter: Xiaoxiang Yu >Assignee: Xiaoxiang Yu >Priority: Major > Fix For: v3.0.0-beta > > Attachments: image-2019-09-20-19-04-47-937.png, > image-2019-09-20-19-04-55-935.png, image-2019-09-20-20-06-15-960.png > > > h2. > h2. Backgroud > Currently, realtime OLAP do not support COUNT_DISTINCT(bitmap) for non > interger type. > Because of the lack the ability of encoding string at once, so I want to use > RocksDB & HBase as implementation of streaming distributed dictionary. > h2. Design > # each receiver will own a local dict cache > # all receiver will share a remote dict storage > # we choose to use RocksDB as local dict cache > # we choose to use HBase as remote dict storage > > # for each cube, we will create a local dict and a hbase table > # we will create column family both in RocksDB and HBase for each column > which occur in COUNT_DISTINCT > h2. Design Diagram > !image-2019-09-20-19-04-47-937.png! > !image-2019-09-20-19-04-55-935.png! > > !image-2019-09-20-20-06-15-960.png! > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KYLIN-4141) Build Global Dictionary in no time
[ https://issues.apache.org/jira/browse/KYLIN-4141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoxiang Yu updated KYLIN-4141: Attachment: image-2019-09-20-20-06-15-960.png > Build Global Dictionary in no time > -- > > Key: KYLIN-4141 > URL: https://issues.apache.org/jira/browse/KYLIN-4141 > Project: Kylin > Issue Type: Improvement > Components: Real-time Streaming >Affects Versions: v3.0.0-beta >Reporter: Xiaoxiang Yu >Assignee: Xiaoxiang Yu >Priority: Major > Fix For: v3.0.0-beta > > Attachments: image-2019-09-20-19-04-47-937.png, > image-2019-09-20-19-04-55-935.png, image-2019-09-20-20-06-15-960.png > > > h2. Backgroud > Currently, realtime OLAP do not support COUNT_DISTINCT(bitmap) for non > interger type. > Because of the lack the ability of encoding string at once, so I want to use > RocksDB & HBase as implementation of streaming distributed dictionary. > h2. Design > # each receiver will own a local dict cache > # all receiver will share a remote dict storage > # we choose to use RocksDB as local dict cache > # we choose to use HBase as remote dict storage > > # for each cube, we will create a local dict and a hbase table > # we will create column family both in RocksDB and HBase for each column > which occur in COUNT_DISTINCT > > !image-2019-09-20-19-04-47-937.png! > !image-2019-09-20-19-04-55-935.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KYLIN-4141) Build Global Dictionary in no time
[ https://issues.apache.org/jira/browse/KYLIN-4141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoxiang Yu updated KYLIN-4141: Description: h2. Backgroud Currently, realtime OLAP do not support COUNT_DISTINCT(bitmap) for non interger type. Because of the lack the ability of encoding string at once, so I want to use RocksDB & HBase as implementation of streaming distributed dictionary. h2. Design # each receiver will own a local dict cache # all receiver will share a remote dict storage # we choose to use RocksDB as local dict cache # we choose to use HBase as remote dict storage # for each cube, we will create a local dict and a hbase table # we will create column family both in RocksDB and HBase for each column which occur in COUNT_DISTINCT !image-2019-09-20-19-04-47-937.png! !image-2019-09-20-19-04-55-935.png! was: h2. Backgroud Currently, realtime OLAP do not support COUNT_DISTINCT(bitmap) for non interger type. Because of the lack the ability of encoding string at once, so I want to use RocksDB & HBase as implementation of streaming distributed dictionary. h2. Design # each receiver will own a local dict cache # all receiver will share a remote dict storage # we choose to use RocksDB as local dict cache # we choose to use HBase as remote dict storage # for each cube, we will create a local dict and a hbase table # we will create column family both in RocksDB and HBase for each column which occur in COUNT_DISTINCT > Build Global Dictionary in no time > -- > > Key: KYLIN-4141 > URL: https://issues.apache.org/jira/browse/KYLIN-4141 > Project: Kylin > Issue Type: Improvement > Components: Real-time Streaming >Affects Versions: v3.0.0-beta >Reporter: Xiaoxiang Yu >Assignee: Xiaoxiang Yu >Priority: Major > Fix For: v3.0.0-beta > > Attachments: image-2019-09-20-19-04-47-937.png, > image-2019-09-20-19-04-55-935.png > > > h2. Backgroud > Currently, realtime OLAP do not support COUNT_DISTINCT(bitmap) for non > interger type. > Because of the lack the ability of encoding string at once, so I want to use > RocksDB & HBase as implementation of streaming distributed dictionary. > h2. Design > # each receiver will own a local dict cache > # all receiver will share a remote dict storage > # we choose to use RocksDB as local dict cache > # we choose to use HBase as remote dict storage > > # for each cube, we will create a local dict and a hbase table > # we will create column family both in RocksDB and HBase for each column > which occur in COUNT_DISTINCT > > !image-2019-09-20-19-04-47-937.png! > !image-2019-09-20-19-04-55-935.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KYLIN-4141) Build Global Dictionary in no time
[ https://issues.apache.org/jira/browse/KYLIN-4141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoxiang Yu updated KYLIN-4141: Attachment: image-2019-09-20-19-04-47-937.png > Build Global Dictionary in no time > -- > > Key: KYLIN-4141 > URL: https://issues.apache.org/jira/browse/KYLIN-4141 > Project: Kylin > Issue Type: Improvement > Components: Real-time Streaming >Affects Versions: v3.0.0-beta >Reporter: Xiaoxiang Yu >Assignee: Xiaoxiang Yu >Priority: Major > Fix For: v3.0.0-beta > > Attachments: image-2019-09-20-19-04-47-937.png, > image-2019-09-20-19-04-55-935.png > > > h2. Backgroud > Currently, realtime OLAP do not support COUNT_DISTINCT(bitmap) for non > interger type. > Because of the lack the ability of encoding string at once, so I want to use > RocksDB & HBase as implementation of streaming distributed dictionary. > h2. Design > # each receiver will own a local dict cache > # all receiver will share a remote dict storage > # we choose to use RocksDB as local dict cache > # we choose to use HBase as remote dict storage > > # for each cube, we will create a local dict and a hbase table > # we will create column family both in RocksDB and HBase for each column > which occur in COUNT_DISTINCT -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KYLIN-4141) Build Global Dictionary in no time
[ https://issues.apache.org/jira/browse/KYLIN-4141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoxiang Yu updated KYLIN-4141: Attachment: image-2019-09-20-19-04-55-935.png > Build Global Dictionary in no time > -- > > Key: KYLIN-4141 > URL: https://issues.apache.org/jira/browse/KYLIN-4141 > Project: Kylin > Issue Type: Improvement > Components: Real-time Streaming >Affects Versions: v3.0.0-beta >Reporter: Xiaoxiang Yu >Assignee: Xiaoxiang Yu >Priority: Major > Fix For: v3.0.0-beta > > Attachments: image-2019-09-20-19-04-47-937.png, > image-2019-09-20-19-04-55-935.png > > > h2. Backgroud > Currently, realtime OLAP do not support COUNT_DISTINCT(bitmap) for non > interger type. > Because of the lack the ability of encoding string at once, so I want to use > RocksDB & HBase as implementation of streaming distributed dictionary. > h2. Design > # each receiver will own a local dict cache > # all receiver will share a remote dict storage > # we choose to use RocksDB as local dict cache > # we choose to use HBase as remote dict storage > > # for each cube, we will create a local dict and a hbase table > # we will create column family both in RocksDB and HBase for each column > which occur in COUNT_DISTINCT -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KYLIN-4141) Build Global Dictionary in no time
[ https://issues.apache.org/jira/browse/KYLIN-4141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoxiang Yu updated KYLIN-4141: Description: h2. Backgroud Currently, realtime OLAP do not support COUNT_DISTINCT(bitmap) for non interger type. Because of the lack the ability of encoding string at once, so I want to use RocksDB & HBase as implementation of streaming distributed dictionary. h2. Design # each receiver will own a local dict cache # all receiver will share a remote dict storage # we choose to use RocksDB as local dict cache # we choose to use HBase as remote dict storage # for each cube, we will create a local dict and a hbase table # we will create column family both in RocksDB and HBase for each column which occur in COUNT_DISTINCT > Build Global Dictionary in no time > -- > > Key: KYLIN-4141 > URL: https://issues.apache.org/jira/browse/KYLIN-4141 > Project: Kylin > Issue Type: Improvement > Components: Real-time Streaming >Affects Versions: v3.0.0-beta >Reporter: Xiaoxiang Yu >Assignee: Xiaoxiang Yu >Priority: Major > Fix For: v3.0.0-beta > > > h2. Backgroud > Currently, realtime OLAP do not support COUNT_DISTINCT(bitmap) for non > interger type. > Because of the lack the ability of encoding string at once, so I want to use > RocksDB & HBase as implementation of streaming distributed dictionary. > h2. Design > # each receiver will own a local dict cache > # all receiver will share a remote dict storage > # we choose to use RocksDB as local dict cache > # we choose to use HBase as remote dict storage > > # for each cube, we will create a local dict and a hbase table > # we will create column family both in RocksDB and HBase for each column > which occur in COUNT_DISTINCT -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KYLIN-4010) Auto adjust offset according to query server's timezone for time derived column
[ https://issues.apache.org/jira/browse/KYLIN-4010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoxiang Yu updated KYLIN-4010: Description: h2. Backgroud In realtime OLAP, we index real-time event in streaming receiver. We know that each event must contains a timestamp column (we often call it event time), that value should represent when this event was produced. Because event maybe come from different timezone and use local timezone is always *error-prone*, so we recommend to use a {color:#DE350B}GMT+0{color} timestamp(System.currentTimeMillis()) to avoid such issue. I think this is good by design, it is easy to understand and always correct. But the *side effect* is that, the end user(business manager behind a BI tools) are unhappy because he have to use GMT+0 with date/time related filter in SQL and should understand the result should be *shifted* with his local timezone. It is not user-firendly and inconvenient for normal user. Because user may compare query result from different data source and compare them and summarize, use GMT+0 may trouble them. h2. Example For example, kylin user work in *GMT+8* (maybe in Shanghai) want to know some metrics which occured from {color:#DE350B}2019-09-01 12:00:00{color} to {color:#DE350B}2019-09-01 14:00:00{color} in his {color:#DE350B}local timezone{color}, so he has to {color:#DE350B}rewrite{color} his query (with eight hour offset) to following: {code:sql} select hour_start, count(*) from realtime_table where hour_start >= "2019-09-01 04:00:00" and hour_start < "2019-09-01 06:00:00" group by hour_start {code} And he will get result like : ||hour_start ||count|| |2019-09-01 04:00:00|139202| |2019-09-01 05:00:00|89398| And he must convert to a more meaningful result in his mind, it is realy annoying! ||hour_start ||count|| |2019-09-01 12:00:00|139202| |2019-09-01 13:00:00|89398| h2. Desgin We should not change the way receiver index event, event time should be stored in UTC timestamp. We should auto rewrite sql's event time related filter. In kylin, filter condition in where clause will be convert to a *TupleFilter*, and it looks like *RelNode* in Apache Calicate. For where hour_start >= "2019-09-01 12:00:00" and hour_start < "2019-09-01 14:00:00", we will send TupleFilter to streaming receiver or region server which looks like this: {noformat} AND GreatThanOrEqual hout_start CAST "2019-09-01 12:00:00" timestamp LessThanOrEqual hout_start CAST "2019-09-01 14:00:00" timestamp {noformat} But for streaming query, we want to change each ConstantTupleFilter and minus value for that timestamp. So the TupleFilter which be sent will be following: {noformat} AND GreatThanOrEqual hout_start CAST "2019-09-01 04:00:00" timestamp LessThanOrEqual hout_start CAST "2019-09-01 06:00:00" timestamp {noformat} Before query result processed by *OLAPEnumerator*, kylin will plus each value of time derived column, thus protect row from be filtered by calcite generated code. So, user will get what he want in his timezone without any burden. h2. How to use To enable auto shift by time zone, please set {color:#DE350B}kylin.stream.auto.just.by.timezone{color} to true. You can specific time zone by {color:#DE350B}kylin.web.timezone{color}, otherwise, time zone will be auto detected. Only *time derived column* will be affected. h2. Related Issue Originally, the event time can only in the format of a long value (UTC timestamp). But in some case, the event time is in a format of "-MM-dd HH:mm:ss", we use a new class DateTimeParser(introduced in KYLIN-4001) to convert such format into a UTC timestamp. h3. Old Describe In Real-Time Streaming Cube when I send some records to kafka topic, the tmestamp for the record is 2019-01-01 00:00:00.000, but kylin create a segment named 2018123116_2018123117. Then I found that TimeZone is hard-coded to "GMT" in function makeSegmentName for class CubeSegment. I think that it should be config in kylin.properties. was: h2. Backgroud In realtime OLAP, we index real-time event in streaming receiver. We know that each event must contains a timestamp column (we often call it event time), that value should represent when this event was produced. Because event maybe come from different timezone and use local timezone is always error-prone, so we recommend to use a GMT+0 timestamp(System.currentTimeMillis()) to avoid such issue. I think this is good by design, it is easy to understand and always correct. But the side effect is that, the end user(business manager behind a BI tools) are unhappy because he have to use GMT+0 with date/time related filter in SQL and should understand the result should be shifted with his local timezone. It is not user-firendly and inconvenient for normal user. Because user may
[jira] [Updated] (KYLIN-4010) Auto adjust offset according to query server's timezone for time derived column
[ https://issues.apache.org/jira/browse/KYLIN-4010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoxiang Yu updated KYLIN-4010: Summary: Auto adjust offset according to query server's timezone for time derived column (was: TimeZone is hard-coded in function makeSegmentName for class CubeSegment) > Auto adjust offset according to query server's timezone for time derived > column > --- > > Key: KYLIN-4010 > URL: https://issues.apache.org/jira/browse/KYLIN-4010 > Project: Kylin > Issue Type: Improvement > Components: Others >Affects Versions: v3.0.0-alpha >Reporter: zengrui >Assignee: Xiaoxiang Yu >Priority: Minor > Fix For: v3.0.0-beta > > Attachments: image-2019-07-15-17-15-31-209.png, > image-2019-07-15-17-17-04-029.png, image-2019-07-15-17-17-39-568.png > > > h2. Backgroud > In realtime OLAP, we index real-time event in streaming receiver. We know > that each event must contains a timestamp column (we often call it event > time), that value should represent when this event was produced. Because > event maybe come from different timezone and use local timezone is always > error-prone, so we recommend to use a GMT+0 > timestamp(System.currentTimeMillis()) to avoid such issue. > I think this is good by design, it is easy to understand and always correct. > But the side effect is that, the end user(business manager behind a BI tools) > are unhappy because he have to use GMT+0 with date/time related filter in SQL > and should understand the result should be shifted with his local timezone. > It is not user-firendly and inconvenient for normal user. Because user may > compare query result from different data source and compare them and > summarize, use GMT+0 may trouble them. > h2. Example > For example, kylin user work in GMT+8 (maybe in Shanghai) want to know some > metrics which occured from {color:#DE350B}2019-09-01 12:00:00{color} to > {color:#DE350B}2019-09-01 14:00:00{color} in his {color:#DE350B}local > timezone{color}, so he has to {color:#DE350B}rewrite{color} his query (with > eight hour offset) to following: > {code:sql} > select hour_start, count(*) > from realtime_table > where hour_start >= "2019-09-01 04:00:00" and hour_start < "2019-09-01 > 06:00:00" > group by hour_start > {code} > And he will get result like : > ||hour_start ||count|| > |2019-09-01 04:00:00 |139202| > |2019-09-01 05:00:00 |89398| > And he must convert to a more meaningful result in his mind, it is realy > annoying! > ||hour_start ||count|| > |2019-09-01 12:00:00 |139202| > |2019-09-01 13:00:00 |89398| > h2. Desgin > We should not change the way receiver index event, event time should be > stored in UTC timestamp. We should auto rewrite sql's event time related > filter. > In kylin, filter condition in where clause will be convert to a > *TupleFilter*, and it looks like *RelNode* in Apache Calicate. > For where hour_start >= "2019-09-01 12:00:00" and hour_start < "2019-09-01 > 14:00:00", we will send TupleFilter to streaming receiver or region server > which looks like this: > {noformat} > AND > GreatThanOrEqual > hout_start > CAST > "2019-09-01 12:00:00" > timestamp > LessThanOrEqual > hout_start > CAST > "2019-09-01 14:00:00" > timestamp > {noformat} > But for streaming query, we want to change each ConstantTupleFilter and minus > value for that timestamp. So the TupleFilter which be sent will be following: > {noformat} > AND > GreatThanOrEqual > hout_start > CAST > "2019-09-01 04:00:00" > timestamp > LessThanOrEqual > hout_start > CAST > "2019-09-01 06:00:00" > timestamp > {noformat} > Before query result processed by *OLAPEnumerator*, kylin will plus each > value of time derived column, thus protect row from be filtered by calcite > generated code. > So, user will get what he want in his timezone without any burden. > h2. How to use > To enable auto shift by time zone, please set kylin.stream.auto-timezone to > true. > You can specific time zone by kylin.web.timezone, otherwise, time zone will > be auto detected. > Only *time derived column* will be affected. > h2. Related Issue > Originally, the event time can only in the format of a long value (UTC > timestamp). But in some case, the event time is in a format of "-MM-dd > HH:mm:ss", we use a new class DateTimeParser(introduced in KYLIN-4001) to > convert such format into a UTC timestamp. > In Real-Time Streaming Cube when I send some records to kafka topic, the > tmestamp for the record is 2019-01-01 00:00:00.000, but kylin create a > segment named 2018123116_2018123117. > Then I found that TimeZone is hard-coded to "GMT" in function makeSegmentName > for class
[jira] [Assigned] (KYLIN-4170) Add cube information and query information when hbase deadline happened
[ https://issues.apache.org/jira/browse/KYLIN-4170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nichunen reassigned KYLIN-4170: --- Assignee: 柳攀 > Add cube information and query information when hbase deadline happened > --- > > Key: KYLIN-4170 > URL: https://issues.apache.org/jira/browse/KYLIN-4170 > Project: Kylin > Issue Type: Improvement > Components: Storage - HBase >Affects Versions: v2.6.2 >Reporter: 柳攀 >Assignee: 柳攀 >Priority: Major > Fix For: v3.0.0 > > Original Estimate: 168h > Remaining Estimate: 168h > > When there is a large number of cube query and hbase server is overload, > hbase deadline would happened.But there is no information about cube and > query for us to locate the problem.If we could get cube and query information > in query log, we could solve the query problem quickly. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KYLIN-4010) TimeZone is hard-coded in function makeSegmentName for class CubeSegment
[ https://issues.apache.org/jira/browse/KYLIN-4010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoxiang Yu updated KYLIN-4010: Description: h2. Backgroud In realtime OLAP, we index real-time event in streaming receiver. We know that each event must contains a timestamp column (we often call it event time), that value should represent when this event was produced. Because event maybe come from different timezone and use local timezone is always error-prone, so we recommend to use a GMT+0 timestamp(System.currentTimeMillis()) to avoid such issue. I think this is good by design, it is easy to understand and always correct. But the side effect is that, the end user(business manager behind a BI tools) are unhappy because he have to use GMT+0 with date/time related filter in SQL and should understand the result should be shifted with his local timezone. It is not user-firendly and inconvenient for normal user. Because user may compare query result from different data source and compare them and summarize, use GMT+0 may trouble them. h2. Example For example, kylin user work in GMT+8 (maybe in Shanghai) want to know some metrics which occured from {color:#DE350B}2019-09-01 12:00:00{color} to {color:#DE350B}2019-09-01 14:00:00{color} in his {color:#DE350B}local timezone{color}, so he has to {color:#DE350B}rewrite{color} his query (with eight hour offset) to following: {code:sql} select hour_start, count(*) from realtime_table where hour_start >= "2019-09-01 04:00:00" and hour_start < "2019-09-01 06:00:00" group by hour_start {code} And he will get result like : ||hour_start ||count|| |2019-09-01 04:00:00|139202| |2019-09-01 05:00:00|89398| And he must convert to a more meaningful result in his mind, it is realy annoying! ||hour_start ||count|| |2019-09-01 12:00:00|139202| |2019-09-01 13:00:00|89398| h2. Desgin We should not change the way receiver index event, event time should be stored in UTC timestamp. We should auto rewrite sql's event time related filter. In kylin, filter condition in where clause will be convert to a *TupleFilter*, and it looks like *RelNode* in Apache Calicate. For where hour_start >= "2019-09-01 12:00:00" and hour_start < "2019-09-01 14:00:00", we will send TupleFilter to streaming receiver or region server which looks like this: {noformat} AND GreatThanOrEqual hout_start CAST "2019-09-01 12:00:00" timestamp LessThanOrEqual hout_start CAST "2019-09-01 14:00:00" timestamp {noformat} But for streaming query, we want to change each ConstantTupleFilter and minus value for that timestamp. So the TupleFilter which be sent will be following: {noformat} AND GreatThanOrEqual hout_start CAST "2019-09-01 04:00:00" timestamp LessThanOrEqual hout_start CAST "2019-09-01 06:00:00" timestamp {noformat} Before query result processed by *OLAPEnumerator*, kylin will plus each value of time derived column, thus protect row from be filtered by calcite generated code. So, user will get what he want in his timezone without any burden. h2. How to use To enable auto shift by time zone, please set kylin.stream.auto-timezone to true. You can specific time zone by kylin.web.timezone, otherwise, time zone will be auto detected. Only *time derived column* will be affected. h2. Related Issue Originally, the event time can only in the format of a long value (UTC timestamp). But in some case, the event time is in a format of "-MM-dd HH:mm:ss", we use a new class DateTimeParser(introduced in KYLIN-4001) to convert such format into a UTC timestamp. In Real-Time Streaming Cube when I send some records to kafka topic, the tmestamp for the record is 2019-01-01 00:00:00.000, but kylin create a segment named 2018123116_2018123117. Then I found that TimeZone is hard-coded to "GMT" in function makeSegmentName for class CubeSegment. I think that it should be config in kylin.properties. was: h2. Backgroud In realtime OLAP, we index real-time event in streaming receiver. We know that each event must contains a timestamp column (we often call it event time), that value should represent when this event was produced. Because event maybe come from different timezone and use local timezone is always error-prone, so we recommend to use a GMT+0 timestamp(System.currentTimeMillis()) to avoid such issue. I think this is good by design, it is easy to understand and always correct. But the side effect is that, the end user(business manager behind a BI tools) are unhappy because he have to use GMT+0 with date/time related filter in SQL and should understand the result should be shifted with his local timezone. It is not user-firendly and inconvenient for normal user. Because user may compare query result from different data source and compare them and summarize, use GMT+0 may
[jira] [Updated] (KYLIN-4010) TimeZone is hard-coded in function makeSegmentName for class CubeSegment
[ https://issues.apache.org/jira/browse/KYLIN-4010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoxiang Yu updated KYLIN-4010: Description: h2. Backgroud In realtime OLAP, we index real-time event in streaming receiver. We know that each event must contains a timestamp column (we often call it event time), that value should represent when this event was produced. Because event maybe come from different timezone and use local timezone is always error-prone, so we recommend to use a GMT+0 timestamp(System.currentTimeMillis()) to avoid such issue. I think this is good by design, it is easy to understand and always correct. But the side effect is that, the end user(business manager behind a BI tools) are unhappy because he have to use GMT+0 with date/time related filter in SQL and should understand the result should be shifted with his local timezone. It is not user-firendly and inconvenient for normal user. Because user may compare query result from different data source and compare them and summarize, use GMT+0 may trouble them. h2. Example For example, kylin user work in GMT+8 (maybe in Shanghai) want to know some metrics which occured from 2019-09-01 12:00:00 to 2019-09-01 14:00:00 in his local timezone, so he has to rewrite his query (with eight hour offset) to following: select hour_start, count(*) from realtime_table where hour_start >= "2019-09-01 04:00:00" and hour_start < "2019-09-01 06:00:00" group by hour_start And he will get result like : And he must convert to a more meaningful result in his mind, it is realy annoying! h2. Desgin We should not change the way receiver index event, event time should be stored in UTC timestamp. We should auto rewrite sql's event time related filter. In kylin, filter condition in where clause will be convert to a TupleFilter, and it looks like RelNode in Apache Calicate. For where hour_start >= "2019-09-01 12:00:00" and hour_start < "2019-09-01 14:00:00", we will send TupleFilter to streaming receiver or region server which looks like this: AND GreatThanOrEqual hout_start CAST "2019-09-01 12:00:00" timestamp LessThanOrEqual hout_start CAST "2019-09-01 14:00:00" timestamp But for streaming query, we want to change each ConstantTupleFilter and minus value for that timestamp. So the TupleFilter which be sent will be following: AND GreatThanOrEqual hout_start CAST "2019-09-01 04:00:00" timestamp LessThanOrEqual hout_start CAST "2019-09-01 06:00:00" timestamp Before query result processed by OLAPEnumerator, kylin will plus each value of time derived column, thus to protect row be filtered by generated code. So, user will get what he want in his timezone without any burden. h2. How to use To enable auto shift by time zone, please set kylin.stream.auto-timezone to true. You can specific time zone by kylin.web.timezone, otherwise, time zone will be auto detected. Only time derived column will be affected. h2. Related Issue Originally, the event time can only in the format of a long value (UTC timestamp). But in some case, the event time is in a format of "-MM-dd HH:mm:ss", we use a new class DateTimeParser(introduced in KYLIN-4001) to convert such format into a UTC timestamp. In Real-Time Streaming Cube when I send some records to kafka topic, the tmestamp for the record is 2019-01-01 00:00:00.000, but kylin create a segment named 2018123116_2018123117. Then I found that TimeZone is hard-coded to "GMT" in function makeSegmentName for class CubeSegment. I think that it should be config in kylin.properties. was: In Real-Time Streaming Cube when I send some records to kafka topic, the tmestamp for the record is 2019-01-01 00:00:00.000, but kylin create a segment named 2018123116_2018123117. Then I found that TimeZone is hard-coded to "GMT" in function makeSegmentName for class CubeSegment. I think that it should be config in kylin.properties. > TimeZone is hard-coded in function makeSegmentName for class CubeSegment > > > Key: KYLIN-4010 > URL: https://issues.apache.org/jira/browse/KYLIN-4010 > Project: Kylin > Issue Type: Improvement > Components: Others >Affects Versions: v3.0.0-alpha >Reporter: zengrui >Assignee: Xiaoxiang Yu >Priority: Minor > Fix For: v3.0.0-beta > > Attachments: image-2019-07-15-17-15-31-209.png, > image-2019-07-15-17-17-04-029.png, image-2019-07-15-17-17-39-568.png > > > h2. Backgroud > In realtime OLAP, we index real-time event in streaming receiver. We know > that each event must contains a timestamp column (we often call it event > time), that value should represent when this event was produced. Because > event maybe come from different timezone and use local timezone is always > error-prone, so we
[jira] [Updated] (KYLIN-4166) kylin parse sql error
[ https://issues.apache.org/jira/browse/KYLIN-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] phil.zhang updated KYLIN-4166: -- Description: 1. I use tableau on kylin, tableau has generate a sql : ''' SELECT "X___SQL___"."FLIGHT_TYPE" AS "FLIGHT_TYPE", SUM("X___SQL___"."SHOPPING_SUCCESS_COUNT") AS "sum_SHOPPING_SUCCESS_COUNT_ok" FROM ( select *, case when cast(DURATION_LEVEL as varchar) like '1' then '>=0s,<3s' when cast(DURATION_LEVEL as varchar) like '2' then '>=3s,<4s' when cast(DURATION_LEVEL as varchar) like '3' then '>=4s,<5s' when cast(DURATION_LEVEL as varchar) like '4' then '>=5s,<6s' when cast(DURATION_LEVEL as varchar) like '5' then '>=6s,<7s' when cast(DURATION_LEVEL as varchar) like '6' then '>=7s,<8s' when cast(DURATION_LEVEL as varchar) like '7' then '>=8s,<9s' when cast(DURATION_LEVEL as varchar) like '8' then '>=9s,<10s' when cast(DURATION_LEVEL as varchar) like '9' then '>=10s,<12s' when cast(DURATION_LEVEL as varchar) like '10' then '>=12s,<14s' when cast(DURATION_LEVEL as varchar) like '11' then '>=14s,<16s' when cast(DURATION_LEVEL as varchar) like '12' then '>=16s,<18s' when cast(DURATION_LEVEL as varchar) like '13' then '>=18s,<20s' when cast(DURATION_LEVEL as varchar) like '14' then '>=20s,<30s' when cast(DURATION_LEVEL as varchar) like '15' then '>=30s,<40s' when cast(DURATION_LEVEL as varchar) like '16' then '>=40s,<50s' when cast(DURATION_LEVEL as varchar) like '17' then '>=50s' else 'null' end as DURATION_LEVEL_TRANS from DM_AIR_API_DURATION_SPB_D ) "X___SQL___" WHERE ("X___SQL___"."DURATION_LEVEL_TRANS" = '>=5s,<6s') GROUP BY 1 ''' it seems that the sql is correct, but the result is unexpect, it returns null . !image-2019-09-20-15-22-15-356.png! But actually, it should returns some rows because there exist data fit the query, let's change the sql form to check it. ''' SELECT "FLIGHT_TYPE" AS "FLIGHT_TYPE", SUM("SHOPPING_SUCCESS_COUNT") AS "sum_SHOPPING_SUCCESS_COUNT_ok" from DM_AIR_API_DURATION_SPB_D WHERE ("DURATION_LEVEL" = 4 and "DAY_NUM"='2019-09-10') GROUP BY 1 ''' !image-2019-09-20-15-23-29-881.png! 2. and when i write sql like this ''' SELECT flight_type,SUM("SHOPPING_SUCCESS_COUNT") AS "sum_SHOPPING_SUCCESS_COUNT_ok" from DM_AIR_API_DURATION_SPB_D group by flight_type ''' i got result {color:#33}like this{color} {color:#33}!http://mail.163.com/js6/s?func=mbox:getMessageData=270:xtbBDhAtQVrbEIH+vQAAsh=5!{color} it is same to the result of ''' SELECT "X___SQL___"."FLIGHT_TYPE" AS "FLIGHT_TYPE", SUM("X___SQL___"."SHOPPING_SUCCESS_COUNT") AS "sum_SHOPPING_SUCCESS_COUNT_ok" FROM ( select *, case when cast(DURATION_LEVEL as varchar) like '1' then *'>=0s,<3s'* when cast(DURATION_LEVEL as varchar) like '2' then '>=3s,<4s' when cast(DURATION_LEVEL as varchar) like '3' then '>=4s,<5s' when cast(DURATION_LEVEL as varchar) like '4' then '>=5s,<6s' when cast(DURATION_LEVEL as varchar) like '5' then '>=6s,<7s' when cast(DURATION_LEVEL as varchar) like '6' then '>=7s,<8s' when cast(DURATION_LEVEL as varchar) like '7' then '>=8s,<9s' when cast(DURATION_LEVEL as varchar) like '8' then '>=9s,<10s' when cast(DURATION_LEVEL as varchar) like '9' then '>=10s,<12s' when cast(DURATION_LEVEL as varchar) like '10' then '>=12s,<14s' when cast(DURATION_LEVEL as varchar) like '11' then '>=14s,<16s' when cast(DURATION_LEVEL as varchar) like '12' then '>=16s,<18s' when cast(DURATION_LEVEL as varchar) like '13' then '>=18s,<20s' when cast(DURATION_LEVEL as varchar) like '14' then '>=20s,<30s' when cast(DURATION_LEVEL as varchar) like '15' then '>=30s,<40s' when cast(DURATION_LEVEL as varchar) like '16' then '>=40s,<50s' when cast(DURATION_LEVEL as varchar) like '17' then '>=50s' else 'null' end as DURATION_LEVEL_TRANS from DM_AIR_API_DURATION_SPB_D ) "X___SQL___" WHERE ("X___SQL___"."DURATION_LEVEL_TRANS" = *'>=0s,<3s'*) GROUP BY 1 ''' !http://mail.163.com/js6/s?func=mbox:getMessageData=270:xtbBDhAtQVrbEIH+vQAAsh=6! {quote} *I guess maybe there are some bugs about parsing sql like upper ? I hope someone can help me to deal this problem.* {quote} was: {quote} 1. I use tableau on kylin, tableau has generate a sql : ''' SELECT "X___SQL___"."FLIGHT_TYPE" AS "FLIGHT_TYPE", SUM("X___SQL___"."SHOPPING_SUCCESS_COUNT") AS "sum_SHOPPING_SUCCESS_COUNT_ok" FROM ( select *, case when cast(DURATION_LEVEL as varchar) like '1' then '>=0s,<3s' when cast(DURATION_LEVEL as varchar) like '2' then '>=3s,<4s' when cast(DURATION_LEVEL as varchar) like '3' then '>=4s,<5s' when cast(DURATION_LEVEL as varchar) like '4' then* '>=5s,<6s' *when cast(DURATION_LEVEL as varchar) like '5' then '>=6s,<7s' when
[jira] [Updated] (KYLIN-4166) kylin parse sql error
[ https://issues.apache.org/jira/browse/KYLIN-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] phil.zhang updated KYLIN-4166: -- Description: {quote} 1. I use tableau on kylin, tableau has generate a sql : ''' SELECT "X___SQL___"."FLIGHT_TYPE" AS "FLIGHT_TYPE", SUM("X___SQL___"."SHOPPING_SUCCESS_COUNT") AS "sum_SHOPPING_SUCCESS_COUNT_ok" FROM ( select *, case when cast(DURATION_LEVEL as varchar) like '1' then '>=0s,<3s' when cast(DURATION_LEVEL as varchar) like '2' then '>=3s,<4s' when cast(DURATION_LEVEL as varchar) like '3' then '>=4s,<5s' when cast(DURATION_LEVEL as varchar) like '4' then* '>=5s,<6s' *when cast(DURATION_LEVEL as varchar) like '5' then '>=6s,<7s' when cast(DURATION_LEVEL as varchar) like '6' then '>=7s,<8s' when cast(DURATION_LEVEL as varchar) like '7' then '>=8s,<9s' when cast(DURATION_LEVEL as varchar) like '8' then '>=9s,<10s' when cast(DURATION_LEVEL as varchar) like '9' then '>=10s,<12s' when cast(DURATION_LEVEL as varchar) like '10' then '>=12s,<14s' when cast(DURATION_LEVEL as varchar) like '11' then '>=14s,<16s' when cast(DURATION_LEVEL as varchar) like '12' then '>=16s,<18s' when cast(DURATION_LEVEL as varchar) like '13' then '>=18s,<20s' when cast(DURATION_LEVEL as varchar) like '14' then '>=20s,<30s' when cast(DURATION_LEVEL as varchar) like '15' then '>=30s,<40s' when cast(DURATION_LEVEL as varchar) like '16' then '>=40s,<50s' when cast(DURATION_LEVEL as varchar) like '17' then '>=50s' else 'null' end as DURATION_LEVEL_TRANS from DM_AIR_API_DURATION_SPB_D ) "X*_*__SQL___" WHERE ("X___SQL___"."DURATION_LEVEL_TRANS" =* '>=5s,<6s' ) GROUP BY 1 ''' it seems that the sql is correct, but the result is unexpect, it returns null . {quote} !image-2019-09-20-15-22-15-356.png! {quote} 2. But actually, it should returns some rows because there exist data fit the query, let's change the sql form to check it. ''' SELECT "FLIGHT_TYPE" AS "FLIGHT_TYPE", SUM("SHOPPING_SUCCESS_COUNT") AS "sum_SHOPPING_SUCCESS_COUNT_ok" from DM_AIR_API_DURATION_SPB_D WHERE ("DURATION_LEVEL" = 4 and "DAY_NUM"='2019-09-10') GROUP BY 1 ''' {quote} !image-2019-09-20-15-23-29-881.png! 2. {quote}and when i write sql like this ''' SELECT flight_type,SUM("SHOPPING_SUCCESS_COUNT") AS "sum_SHOPPING_SUCCESS_COUNT_ok" from DM_AIR_API_DURATION_SPB_D group by flight_type ''' i got result {color:#33}like this{color} {quote} {color:#33}!http://mail.163.com/js6/s?func=mbox:getMessageData=270:xtbBDhAtQVrbEIH+vQAAsh=5!{color} {quote}it is same to the result of ''' SELECT "X___SQL___"."FLIGHT_TYPE" AS "FLIGHT_TYPE", SUM("X___SQL___"."SHOPPING_SUCCESS_COUNT") AS "sum_SHOPPING_SUCCESS_COUNT_ok" FROM ( select *, case when cast(DURATION_LEVEL as varchar) like '1' then* '>=0s,<3s' *when cast(DURATION_LEVEL as varchar) like '2' then '>=3s,<4s' when cast(DURATION_LEVEL as varchar) like '3' then '>=4s,<5s' when cast(DURATION_LEVEL as varchar) like '4' then '>=5s,<6s' when cast(DURATION_LEVEL as varchar) like '5' then '>=6s,<7s' when cast(DURATION_LEVEL as varchar) like '6' then '>=7s,<8s' when cast(DURATION_LEVEL as varchar) like '7' then '>=8s,<9s' when cast(DURATION_LEVEL as varchar) like '8' then '>=9s,<10s' when cast(DURATION_LEVEL as varchar) like '9' then '>=10s,<12s' when cast(DURATION_LEVEL as varchar) like '10' then '>=12s,<14s' when cast(DURATION_LEVEL as varchar) like '11' then '>=14s,<16s' when cast(DURATION_LEVEL as varchar) like '12' then '>=16s,<18s' when cast(DURATION_LEVEL as varchar) like '13' then '>=18s,<20s' when cast(DURATION_LEVEL as varchar) like '14' then '>=20s,<30s' when cast(DURATION_LEVEL as varchar) like '15' then '>=30s,<40s' when cast(DURATION_LEVEL as varchar) like '16' then '>=40s,<50s' when cast(DURATION_LEVEL as varchar) like '17' then '>=50s' else 'null' end as DURATION_LEVEL_TRANS from DM_AIR_API_DURATION_SPB_D ) "X*_*__SQL___" WHERE ("X___SQL___"."DURATION_LEVEL_TRANS" =* '>=0s,<3s' ) GROUP BY 1 ''' {quote} !http://mail.163.com/js6/s?func=mbox:getMessageData=270:xtbBDhAtQVrbEIH+vQAAsh=6! {quote} *I guess maybe there are some bugs about parsing sql like upper ? I hope someone can help me to deal this problem.* {quote} was: {quote} 1. I use tableau on kylin, tableau has generate a sql : ''' SELECT "X___SQL___"."FLIGHT_TYPE" AS "FLIGHT_TYPE", SUM("X___SQL___"."SHOPPING_SUCCESS_COUNT") AS "sum_SHOPPING_SUCCESS_COUNT_ok" FROM ( select *, case when cast(DURATION_LEVEL as varchar) like '1' then '>=0s,<3s' when cast(DURATION_LEVEL as varchar) like '2' then
[jira] [Commented] (KYLIN-4166) kylin parse sql error
[ https://issues.apache.org/jira/browse/KYLIN-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16934159#comment-16934159 ] phil.zhang commented on KYLIN-4166: --- I am sorry for describing the question not exactlly。 i have updated it. I hope someone can fix it. > kylin parse sql error > - > > Key: KYLIN-4166 > URL: https://issues.apache.org/jira/browse/KYLIN-4166 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v3.0.0-alpha2 >Reporter: phil.zhang >Priority: Critical > Attachments: 1.png, 2.png, 3.png, 4.png, > image-2019-09-20-15-22-15-356.png, image-2019-09-20-15-23-29-881.png > > > {quote} > 1. I use tableau on kylin, tableau has generate a sql : > ''' > SELECT "X___SQL___"."FLIGHT_TYPE" AS "FLIGHT_TYPE", > SUM("X___SQL___"."SHOPPING_SUCCESS_COUNT") AS "sum_SHOPPING_SUCCESS_COUNT_ok" > FROM ( select *, case when cast(DURATION_LEVEL as varchar) > like '1' then '>=0s,<3s' when cast(DURATION_LEVEL as varchar) like > '2' then '>=3s,<4s' when cast(DURATION_LEVEL as varchar) like '3' > then '>=4s,<5s' when cast(DURATION_LEVEL as varchar) like '4' then* > '>=5s,<6s' *when cast(DURATION_LEVEL as varchar) like '5' then > '>=6s,<7s' when cast(DURATION_LEVEL as varchar) like '6' then > '>=7s,<8s' when cast(DURATION_LEVEL as varchar) like '7' then > '>=8s,<9s' when cast(DURATION_LEVEL as varchar) like '8' then > '>=9s,<10s' when cast(DURATION_LEVEL as varchar) like '9' then > '>=10s,<12s' when cast(DURATION_LEVEL as varchar) like '10' then > '>=12s,<14s' when cast(DURATION_LEVEL as varchar) like '11' then > '>=14s,<16s' when cast(DURATION_LEVEL as varchar) like '12' then > '>=16s,<18s' when cast(DURATION_LEVEL as varchar) like '13' then > '>=18s,<20s' when cast(DURATION_LEVEL as varchar) like '14' then > '>=20s,<30s' when cast(DURATION_LEVEL as varchar) like '15' then > '>=30s,<40s' when cast(DURATION_LEVEL as varchar) like '16' then > '>=40s,<50s' when cast(DURATION_LEVEL as varchar) like '17' then > '>=50s' else 'null' end as DURATION_LEVEL_TRANS from > DM_AIR_API_DURATION_SPB_D ) "X*_*__SQL___" WHERE > ("X___SQL___"."DURATION_LEVEL_TRANS" =* '>=5s,<6s'*) GROUP BY 1 > ''' > it seems that the sql is correct, but the result is unexpect, it returns > null . > {quote} > !image-2019-09-20-15-22-15-356.png! > {quote} 2. But actually, it should returns some rows because there exist > data fit the query, let's change the sql form to check it. > ''' > SELECT "FLIGHT_TYPE" AS "FLIGHT_TYPE", SUM("SHOPPING_SUCCESS_COUNT") AS > "sum_SHOPPING_SUCCESS_COUNT_ok" from DM_AIR_API_DURATION_SPB_D WHERE > ("DURATION_LEVEL" = 4 and "DAY_NUM"='2019-09-10') GROUP BY 1 > ''' > {quote} > !image-2019-09-20-15-23-29-881.png! > > 2. > {quote}and when i write sql like this > ''' > SELECT flight_type,SUM("SHOPPING_SUCCESS_COUNT") AS > "sum_SHOPPING_SUCCESS_COUNT_ok" from DM_AIR_API_DURATION_SPB_D group by > flight_type > ''' > i got result {color:#33}like this{color} > {quote} > {color:#33}!http://mail.163.com/js6/s?func=mbox:getMessageData=270:xtbBDhAtQVrbEIH+vQAAsh=5!{color} > > {quote}it is same to the result of > ''' > SELECT "X___SQL___"."FLIGHT_TYPE" AS "FLIGHT_TYPE", > SUM("X___SQL___"."SHOPPING_SUCCESS_COUNT") AS "sum_SHOPPING_SUCCESS_COUNT_ok" > FROM ( select *, case when cast(DURATION_LEVEL as varchar) > like '1' then *'>=0s,<3s'* when cast(DURATION_LEVEL as varchar) like > '2' then '>=3s,<4s' when cast(DURATION_LEVEL as varchar) like '3' > then '>=4s,<5s' when cast(DURATION_LEVEL as varchar) like '4' then > '>=5s,<6s' when cast(DURATION_LEVEL as varchar) like '5' then > '>=6s,<7s' when cast(DURATION_LEVEL as varchar) like '6' then > '>=7s,<8s' when cast(DURATION_LEVEL as varchar) like '7' then > '>=8s,<9s' when cast(DURATION_LEVEL as varchar) like '8' then > '>=9s,<10s' when cast(DURATION_LEVEL as varchar) like '9' then > '>=10s,<12s' when cast(DURATION_LEVEL as varchar) like '10' then > '>=12s,<14s' when cast(DURATION_LEVEL as varchar) like '11' then > '>=14s,<16s' when cast(DURATION_LEVEL as varchar) like '12' then > '>=16s,<18s' when cast(DURATION_LEVEL as varchar) like '13' then > '>=18s,<20s' when cast(DURATION_LEVEL as varchar) like '14' then > '>=20s,<30s' when cast(DURATION_LEVEL as varchar) like '15' then > '>=30s,<40s' when cast(DURATION_LEVEL as varchar) like '16' then > '>=40s,<50s' when cast(DURATION_LEVEL as varchar) like '17' then > '>=50s' else 'null' end as DURATION_LEVEL_TRANS
[jira] [Updated] (KYLIN-4166) kylin parse sql error
[ https://issues.apache.org/jira/browse/KYLIN-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] phil.zhang updated KYLIN-4166: -- Description: {quote} 1. I use tableau on kylin, tableau has generate a sql : ''' SELECT "X___SQL___"."FLIGHT_TYPE" AS "FLIGHT_TYPE", SUM("X___SQL___"."SHOPPING_SUCCESS_COUNT") AS "sum_SHOPPING_SUCCESS_COUNT_ok" FROM ( select *, case when cast(DURATION_LEVEL as varchar) like '1' then '>=0s,<3s' when cast(DURATION_LEVEL as varchar) like '2' then '>=3s,<4s' when cast(DURATION_LEVEL as varchar) like '3' then '>=4s,<5s' when cast(DURATION_LEVEL as varchar) like '4' then* '>=5s,<6s' *when cast(DURATION_LEVEL as varchar) like '5' then '>=6s,<7s' when cast(DURATION_LEVEL as varchar) like '6' then '>=7s,<8s' when cast(DURATION_LEVEL as varchar) like '7' then '>=8s,<9s' when cast(DURATION_LEVEL as varchar) like '8' then '>=9s,<10s' when cast(DURATION_LEVEL as varchar) like '9' then '>=10s,<12s' when cast(DURATION_LEVEL as varchar) like '10' then '>=12s,<14s' when cast(DURATION_LEVEL as varchar) like '11' then '>=14s,<16s' when cast(DURATION_LEVEL as varchar) like '12' then '>=16s,<18s' when cast(DURATION_LEVEL as varchar) like '13' then '>=18s,<20s' when cast(DURATION_LEVEL as varchar) like '14' then '>=20s,<30s' when cast(DURATION_LEVEL as varchar) like '15' then '>=30s,<40s' when cast(DURATION_LEVEL as varchar) like '16' then '>=40s,<50s' when cast(DURATION_LEVEL as varchar) like '17' then '>=50s' else 'null' end as DURATION_LEVEL_TRANS from DM_AIR_API_DURATION_SPB_D ) "X*_*__SQL___" WHERE ("X___SQL___"."DURATION_LEVEL_TRANS" =* '>=5s,<6s'*) GROUP BY 1 ''' it seems that the sql is correct, but the result is unexpect, it returns null . {quote} !image-2019-09-20-15-22-15-356.png! {quote} 2. But actually, it should returns some rows because there exist data fit the query, let's change the sql form to check it. ''' SELECT "FLIGHT_TYPE" AS "FLIGHT_TYPE", SUM("SHOPPING_SUCCESS_COUNT") AS "sum_SHOPPING_SUCCESS_COUNT_ok" from DM_AIR_API_DURATION_SPB_D WHERE ("DURATION_LEVEL" = 4 and "DAY_NUM"='2019-09-10') GROUP BY 1 ''' {quote} !image-2019-09-20-15-23-29-881.png! 2. {quote}and when i write sql like this ''' SELECT flight_type,SUM("SHOPPING_SUCCESS_COUNT") AS "sum_SHOPPING_SUCCESS_COUNT_ok" from DM_AIR_API_DURATION_SPB_D group by flight_type ''' i got result {color:#33}like this{color} {quote} {color:#33}!http://mail.163.com/js6/s?func=mbox:getMessageData=270:xtbBDhAtQVrbEIH+vQAAsh=5!{color} {quote}it is same to the result of ''' SELECT "X___SQL___"."FLIGHT_TYPE" AS "FLIGHT_TYPE", SUM("X___SQL___"."SHOPPING_SUCCESS_COUNT") AS "sum_SHOPPING_SUCCESS_COUNT_ok" FROM ( select *, case when cast(DURATION_LEVEL as varchar) like '1' then *'>=0s,<3s'* when cast(DURATION_LEVEL as varchar) like '2' then '>=3s,<4s' when cast(DURATION_LEVEL as varchar) like '3' then '>=4s,<5s' when cast(DURATION_LEVEL as varchar) like '4' then '>=5s,<6s' when cast(DURATION_LEVEL as varchar) like '5' then '>=6s,<7s' when cast(DURATION_LEVEL as varchar) like '6' then '>=7s,<8s' when cast(DURATION_LEVEL as varchar) like '7' then '>=8s,<9s' when cast(DURATION_LEVEL as varchar) like '8' then '>=9s,<10s' when cast(DURATION_LEVEL as varchar) like '9' then '>=10s,<12s' when cast(DURATION_LEVEL as varchar) like '10' then '>=12s,<14s' when cast(DURATION_LEVEL as varchar) like '11' then '>=14s,<16s' when cast(DURATION_LEVEL as varchar) like '12' then '>=16s,<18s' when cast(DURATION_LEVEL as varchar) like '13' then '>=18s,<20s' when cast(DURATION_LEVEL as varchar) like '14' then '>=20s,<30s' when cast(DURATION_LEVEL as varchar) like '15' then '>=30s,<40s' when cast(DURATION_LEVEL as varchar) like '16' then '>=40s,<50s' when cast(DURATION_LEVEL as varchar) like '17' then '>=50s' else 'null' end as DURATION_LEVEL_TRANS from DM_AIR_API_DURATION_SPB_D ) "X___SQL___" WHERE ("X___SQL___"."DURATION_LEVEL_TRANS" = *'>=0s,<3s'*) GROUP BY 1 ''' {quote} !http://mail.163.com/js6/s?func=mbox:getMessageData=270:xtbBDhAtQVrbEIH+vQAAsh=6! {quote} *I guess maybe there are some bugs about parsing sql like upper ? I hope someone can help me to deal this problem.* {quote} was: {quote} 1. I use tableau on kylin, tableau has generate a sql : ''' SELECT "X___SQL___"."FLIGHT_TYPE" AS "FLIGHT_TYPE", SUM("X___SQL___"."SHOPPING_SUCCESS_COUNT") AS "sum_SHOPPING_SUCCESS_COUNT_ok" FROM ( select *, case when cast(DURATION_LEVEL as varchar) like '1' then '>=0s,<3s' when cast(DURATION_LEVEL as varchar) like '2' then '>=3s,<4s'
[jira] [Updated] (KYLIN-4166) kylin parse sql error
[ https://issues.apache.org/jira/browse/KYLIN-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] phil.zhang updated KYLIN-4166: -- Attachment: image-2019-09-20-15-22-15-356.png image-2019-09-20-15-23-29-881.png Description: {quote} 1. I use tableau on kylin, tableau has generate a sql : ''' SELECT "X___SQL___"."FLIGHT_TYPE" AS "FLIGHT_TYPE", SUM("X___SQL___"."SHOPPING_SUCCESS_COUNT") AS "sum_SHOPPING_SUCCESS_COUNT_ok" FROM ( select *, case when cast(DURATION_LEVEL as varchar) like '1' then '>=0s,<3s' when cast(DURATION_LEVEL as varchar) like '2' then '>=3s,<4s' when cast(DURATION_LEVEL as varchar) like '3' then '>=4s,<5s' when cast(DURATION_LEVEL as varchar) like '4' then *'>=5s,<6s'* when cast(DURATION_LEVEL as varchar) like '5' then '>=6s,<7s' when cast(DURATION_LEVEL as varchar) like '6' then '>=7s,<8s' when cast(DURATION_LEVEL as varchar) like '7' then '>=8s,<9s' when cast(DURATION_LEVEL as varchar) like '8' then '>=9s,<10s' when cast(DURATION_LEVEL as varchar) like '9' then '>=10s,<12s' when cast(DURATION_LEVEL as varchar) like '10' then '>=12s,<14s' when cast(DURATION_LEVEL as varchar) like '11' then '>=14s,<16s' when cast(DURATION_LEVEL as varchar) like '12' then '>=16s,<18s' when cast(DURATION_LEVEL as varchar) like '13' then '>=18s,<20s' when cast(DURATION_LEVEL as varchar) like '14' then '>=20s,<30s' when cast(DURATION_LEVEL as varchar) like '15' then '>=30s,<40s' when cast(DURATION_LEVEL as varchar) like '16' then '>=40s,<50s' when cast(DURATION_LEVEL as varchar) like '17' then '>=50s' else 'null' end as DURATION_LEVEL_TRANS from DM_AIR_API_DURATION_SPB_D ) "X___SQL___" WHERE ("X___SQL___"."DURATION_LEVEL_TRANS" = *'>=5s,<6s'*) GROUP BY 1 ''' it seems that the sql is correct, but the result is unexpect, it returns null . {quote} !image-2019-09-20-15-22-15-356.png! {quote} 2. But actually, it should returns some rows because there exist data fit the query, let's change the sql form to check it. ''' SELECT "FLIGHT_TYPE" AS "FLIGHT_TYPE", SUM("SHOPPING_SUCCESS_COUNT") AS "sum_SHOPPING_SUCCESS_COUNT_ok" from DM_AIR_API_DURATION_SPB_D WHERE ("DURATION_LEVEL" = 4 and "DAY_NUM"='2019-09-10') GROUP BY 1 ''' {quote} !image-2019-09-20-15-23-29-881.png! 2. {quote}and when i write sql like this ''' SELECT flight_type,SUM("SHOPPING_SUCCESS_COUNT") AS "sum_SHOPPING_SUCCESS_COUNT_ok" from DM_AIR_API_DURATION_SPB_D group by flight_type ''' i got result {color:#33}like this{color} {quote} {color:#33}!http://mail.163.com/js6/s?func=mbox:getMessageData=270:xtbBDhAtQVrbEIH+vQAAsh=5!{color} {quote}it is same to the result of ''' SELECT "X___SQL___"."FLIGHT_TYPE" AS "FLIGHT_TYPE", SUM("X___SQL___"."SHOPPING_SUCCESS_COUNT") AS "sum_SHOPPING_SUCCESS_COUNT_ok" FROM ( select *, case when cast(DURATION_LEVEL as varchar) like '1' then* '>=0s,<3s' *when cast(DURATION_LEVEL as varchar) like '2' then '>=3s,<4s' when cast(DURATION_LEVEL as varchar) like '3' then '>=4s,<5s' when cast(DURATION_LEVEL as varchar) like '4' then '>=5s,<6s' when cast(DURATION_LEVEL as varchar) like '5' then '>=6s,<7s' when cast(DURATION_LEVEL as varchar) like '6' then '>=7s,<8s' when cast(DURATION_LEVEL as varchar) like '7' then '>=8s,<9s' when cast(DURATION_LEVEL as varchar) like '8' then '>=9s,<10s' when cast(DURATION_LEVEL as varchar) like '9' then '>=10s,<12s' when cast(DURATION_LEVEL as varchar) like '10' then '>=12s,<14s' when cast(DURATION_LEVEL as varchar) like '11' then '>=14s,<16s' when cast(DURATION_LEVEL as varchar) like '12' then '>=16s,<18s' when cast(DURATION_LEVEL as varchar) like '13' then '>=18s,<20s' when cast(DURATION_LEVEL as varchar) like '14' then '>=20s,<30s' when cast(DURATION_LEVEL as varchar) like '15' then '>=30s,<40s' when cast(DURATION_LEVEL as varchar) like '16' then '>=40s,<50s' when cast(DURATION_LEVEL as varchar) like '17' then '>=50s' else 'null' end as DURATION_LEVEL_TRANS from DM_AIR_API_DURATION_SPB_D ) "X*_*__SQL___" WHERE ("X___SQL___"."DURATION_LEVEL_TRANS" =* '>=0s,<3s'*) GROUP BY 1 ''' {quote} !http://mail.163.com/js6/s?func=mbox:getMessageData=270:xtbBDhAtQVrbEIH+vQAAsh=6! {quote} *I guess maybe there are some bugs about parsing sql like upper ? I hope someone can help me to deal this problem.* {quote} was: {quote} 1. I use tableau on kylin, tableau has generate a sql : ''' SELECT "X___SQL___"."FLIGHT_TYPE" AS "FLIGHT_TYPE", SUM("X___SQL___"."SHOPPING_SUCCESS_COUNT") AS "sum_SHOPPING_SUCCESS_COUNT_ok" FROM ( select *, case when cast(DURATION_LEVEL as varchar)
[GitHub] [kylin] nichunen merged pull request #847: The first step of build cube job will fail and throw "Column 'xx' in …
nichunen merged pull request #847: The first step of build cube job will fail and throw "Column 'xx' in … URL: https://github.com/apache/kylin/pull/847 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services