[GitHub] [kylin] nichunen merged pull request #849: Revert "Kylin 3392 support sum(null)"

2019-09-20 Thread GitBox
nichunen merged pull request #849: Revert "Kylin 3392 support sum(null)"
URL: https://github.com/apache/kylin/pull/849
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [kylin] coveralls edited a comment on issue #845: KYLIN-4100 run load overview func when change tab and css refine

2019-09-20 Thread GitBox
coveralls edited a comment on issue #845: KYLIN-4100 run load overview func 
when change tab and css refine
URL: https://github.com/apache/kylin/pull/845#issuecomment-533254450
 
 
   ## Pull Request Test Coverage Report for [Build 
5031](https://coveralls.io/builds/25805273)
   
   * **0** of **0**   changed or added relevant lines in **0** files are 
covered.
   * No unchanged relevant lines lost coverage.
   * Overall coverage remained the same at ?**%**
   
   ---
   
   
   
   |  Totals | [![Coverage 
Status](https://coveralls.io/builds/25805273/badge)](https://coveralls.io/builds/25805273)
 |
   | :-- | --: |
   | Change from base [Build 5023](https://coveralls.io/builds/25775999): |  
0.0% |
   | Covered Lines: |  |
   | Relevant Lines: | 0 |
   
   ---
   #   - [Coveralls](https://coveralls.io)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (KYLIN-4167) Refactor streaming coordinator

2019-09-20 Thread Xiaoxiang Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu updated KYLIN-4167:

Description: 
h2. Summary
 # Currently, *coordinator* has too many responsibility, which violate single 
responsibility principle, and it not easy for extension, a good separation of 
responsibilities is a recommended way.
 # Some cluster level operation has no atomicity guarantee, we should implement 
then in idempotent way to achieve final consistency
 #  Resubmit when job was discarded
 # Clarify overall design for realtime OLAP

 
h4. StreamingCoordinator

Facade of coordinator, will controll BuildJobSummitter/ReceiverClusterMangaer 
and delegate operation to them.
h4. BuildJobSubmitter

The main responsibility of BuildJobSubmitter including:

1. Try to find candidate segment which ready to submit a build job

2. Trace the status of candidate segment's build job and promote segment if it 
is has met requirements
h4.  
h4. ReceiverClusterManager

This class manage operation related to multi streaming receivers. They are 
often not atomic and maybe idempotent.
h4. ClusterStateChecker

Basic step of this class:

1. stop/pause coordinator to avoid underlying concurrency issue

2. check inconsistent state of all receiver cluster

3. send summary via mail to kylin admin

4. if need, call ClusterDoctor to repair inconsistent issue
h4. ClusterDoctor

Repair inconsistent state according to result of ClusterStateChecker

 

h3. Candidate Segment

The candidate segments are those segments what can be saw/perceived by 
streaming coordinator,

candidate segment could be divided into following state/queue:

1. segment which data are uploaded *PARTLY*

2. segment which data are uploaded completely and *WAITING* to build

3. segment which in *BUILDING* state, job's state should be one of 
(NEW/RUNNING/ERROR/DISCARD)

4. segment which built *succeed* and wait to be delivered to historical part 
(and to be deleted in realtime part)

5. segment which *in historical part*(HBase Ready Segment)

 

By design, segment should transfer to next queue in sequential way(shouldn't 
jump the queue), do not break this.
h3. Atomicity

In a multi-step transcation, following acepts should be thought twice:

1. should *fail fast* or continue when exception thrown.

2. should API(remote call) be *synchronous* or asynchronous

3. when transcation failed, could *roll back* always succeed

4. transcation should be *idempotent* so when it failed, it could be fixed by 
retry

 

How to ensure whole cluster opreation smoothly without blocking problem. I 
divided all multi-step transcation into three kinds:

NotAtomicIdempotent

NotAtomicAndNotIdempotent

NonSideEffect

  was:

1. Currently, coordinator has too many responsibility, which violate single 
responsibility principle, and it not easy for extension, a good separation of 
responsibilities is a recommended way
2. Some cluster level operation has no atomicity guarantee, we should implement 
then in idempotent way to achieve final consistency
3. Resubmit when job was discarded
4. Clarify overall design for realtime OLAP


> Refactor streaming coordinator
> --
>
> Key: KYLIN-4167
> URL: https://issues.apache.org/jira/browse/KYLIN-4167
> Project: Kylin
>  Issue Type: Improvement
>  Components: Real-time Streaming
>Reporter: Xiaoxiang Yu
>Assignee: Xiaoxiang Yu
>Priority: Major
> Fix For: v3.0.0-beta
>
>
> h2. Summary
>  # Currently, *coordinator* has too many responsibility, which violate single 
> responsibility principle, and it not easy for extension, a good separation of 
> responsibilities is a recommended way.
>  # Some cluster level operation has no atomicity guarantee, we should 
> implement then in idempotent way to achieve final consistency
>  #  Resubmit when job was discarded
>  # Clarify overall design for realtime OLAP
>  
> h4. StreamingCoordinator
> Facade of coordinator, will controll BuildJobSummitter/ReceiverClusterMangaer 
> and delegate operation to them.
> h4. BuildJobSubmitter
> The main responsibility of BuildJobSubmitter including:
> 1. Try to find candidate segment which ready to submit a build job
> 2. Trace the status of candidate segment's build job and promote segment if 
> it is has met requirements
> h4.  
> h4. ReceiverClusterManager
> This class manage operation related to multi streaming receivers. They are 
> often not atomic and maybe idempotent.
> h4. ClusterStateChecker
> Basic step of this class:
> 1. stop/pause coordinator to avoid underlying concurrency issue
> 2. check inconsistent state of all receiver cluster
> 3. send summary via mail to kylin admin
> 4. if need, call ClusterDoctor to repair inconsistent issue
> h4. ClusterDoctor
> Repair inconsistent state according to result of ClusterStateChecker
>  
> 
> 

[GitHub] [kylin] codecov-io commented on issue #849: Revert "Kylin 3392 support sum(null)"

2019-09-20 Thread GitBox
codecov-io commented on issue #849: Revert "Kylin 3392 support sum(null)"
URL: https://github.com/apache/kylin/pull/849#issuecomment-533572399
 
 
   # [Codecov](https://codecov.io/gh/apache/kylin/pull/849?src=pr=h1) Report
   > :exclamation: No coverage uploaded for pull request base 
(`master@2091f1e`). [Click here to learn what that 
means](https://docs.codecov.io/docs/error-reference#section-missing-base-commit).
   > The diff coverage is `39.62%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/kylin/pull/849/graphs/tree.svg?width=650=JawVgbgsVo=150=pr)](https://codecov.io/gh/apache/kylin/pull/849?src=pr=tree)
   
   ```diff
   @@Coverage Diff@@
   ## master #849   +/-   ##
   =
 Coverage  ?   25.61%   
 Complexity? 6069   
   =
 Files ? 1399   
 Lines ?83654   
 Branches  ?11722   
   =
 Hits  ?21428   
 Misses?60168   
 Partials  ? 2058
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/kylin/pull/849?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...apache/kylin/metadata/datatype/LongSerializer.java](https://codecov.io/gh/apache/kylin/pull/849/diff?src=pr=tree#diff-Y29yZS1tZXRhZGF0YS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUva3lsaW4vbWV0YWRhdGEvZGF0YXR5cGUvTG9uZ1NlcmlhbGl6ZXIuamF2YQ==)
 | `0% <0%> (ø)` | `0 <0> (?)` | |
   | 
[...org/apache/kylin/measure/basic/DoubleIngester.java](https://codecov.io/gh/apache/kylin/pull/849/diff?src=pr=tree#diff-Y29yZS1tZXRhZGF0YS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUva3lsaW4vbWVhc3VyZS9iYXNpYy9Eb3VibGVJbmdlc3Rlci5qYXZh)
 | `0% <0%> (ø)` | `0 <0> (?)` | |
   | 
[...apache/kylin/measure/basic/BigDecimalIngester.java](https://codecov.io/gh/apache/kylin/pull/849/diff?src=pr=tree#diff-Y29yZS1tZXRhZGF0YS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUva3lsaW4vbWVhc3VyZS9iYXNpYy9CaWdEZWNpbWFsSW5nZXN0ZXIuamF2YQ==)
 | `0% <0%> (ø)` | `0 <0> (?)` | |
   | 
[...g/apache/kylin/measure/basic/BasicMeasureType.java](https://codecov.io/gh/apache/kylin/pull/849/diff?src=pr=tree#diff-Y29yZS1tZXRhZGF0YS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUva3lsaW4vbWVhc3VyZS9iYXNpYy9CYXNpY01lYXN1cmVUeXBlLmphdmE=)
 | `10.52% <0%> (ø)` | `1 <0> (?)` | |
   | 
[...a/org/apache/kylin/measure/basic/LongIngester.java](https://codecov.io/gh/apache/kylin/pull/849/diff?src=pr=tree#diff-Y29yZS1tZXRhZGF0YS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUva3lsaW4vbWVhc3VyZS9iYXNpYy9Mb25nSW5nZXN0ZXIuamF2YQ==)
 | `0% <0%> (ø)` | `0 <0> (?)` | |
   | 
[...ache/kylin/metadata/datatype/DoubleSerializer.java](https://codecov.io/gh/apache/kylin/pull/849/diff?src=pr=tree#diff-Y29yZS1tZXRhZGF0YS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUva3lsaW4vbWV0YWRhdGEvZGF0YXR5cGUvRG91YmxlU2VyaWFsaXplci5qYXZh)
 | `0% <0%> (ø)` | `0 <0> (?)` | |
   | 
[.../kylin/metadata/datatype/BigDecimalSerializer.java](https://codecov.io/gh/apache/kylin/pull/849/diff?src=pr=tree#diff-Y29yZS1tZXRhZGF0YS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUva3lsaW4vbWV0YWRhdGEvZGF0YXR5cGUvQmlnRGVjaW1hbFNlcmlhbGl6ZXIuamF2YQ==)
 | `72.97% <100%> (ø)` | `8 <2> (?)` | |
   | 
[...n/java/org/apache/kylin/common/util/BytesUtil.java](https://codecov.io/gh/apache/kylin/pull/849/diff?src=pr=tree#diff-Y29yZS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2t5bGluL2NvbW1vbi91dGlsL0J5dGVzVXRpbC5qYXZh)
 | `25% <100%> (ø)` | `27 <0> (?)` | |
   | 
[...e/kylin/measure/basic/BigDecimalMinAggregator.java](https://codecov.io/gh/apache/kylin/pull/849/diff?src=pr=tree#diff-Y29yZS1tZXRhZGF0YS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUva3lsaW4vbWVhc3VyZS9iYXNpYy9CaWdEZWNpbWFsTWluQWdncmVnYXRvci5qYXZh)
 | `27.77% <25%> (ø)` | `3 <1> (?)` | |
   | 
[...pache/kylin/measure/basic/DoubleMinAggregator.java](https://codecov.io/gh/apache/kylin/pull/849/diff?src=pr=tree#diff-Y29yZS1tZXRhZGF0YS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUva3lsaW4vbWVhc3VyZS9iYXNpYy9Eb3VibGVNaW5BZ2dyZWdhdG9yLmphdmE=)
 | `41.66% <25%> (ø)` | `3 <1> (?)` | |
   | ... and [7 
more](https://codecov.io/gh/apache/kylin/pull/849/diff?src=pr=tree-more) | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/kylin/pull/849?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/kylin/pull/849?src=pr=footer). Last 
update 
[2091f1e...f475c11](https://codecov.io/gh/apache/kylin/pull/849?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go 

[GitHub] [kylin] codecov-io commented on issue #848: KYLIN-4095: Add RESOURCE_PATH_PREFIX option in ResourceTool (addendum)

2019-09-20 Thread GitBox
codecov-io commented on issue #848: KYLIN-4095: Add RESOURCE_PATH_PREFIX option 
in ResourceTool (addendum)
URL: https://github.com/apache/kylin/pull/848#issuecomment-533569512
 
 
   # [Codecov](https://codecov.io/gh/apache/kylin/pull/848?src=pr=h1) Report
   > :exclamation: No coverage uploaded for pull request base 
(`master@2091f1e`). [Click here to learn what that 
means](https://docs.codecov.io/docs/error-reference#section-missing-base-commit).
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/kylin/pull/848/graphs/tree.svg?width=650=JawVgbgsVo=150=pr)](https://codecov.io/gh/apache/kylin/pull/848?src=pr=tree)
   
   ```diff
   @@Coverage Diff@@
   ## master #848   +/-   ##
   =
 Coverage  ?   25.69%   
 Complexity? 6123   
   =
 Files ? 1400   
 Lines ?83729   
 Branches  ?11740   
   =
 Hits  ?21516   
 Misses?60142   
 Partials  ? 2071
   ```
   
   
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/kylin/pull/848?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/kylin/pull/848?src=pr=footer). Last 
update 
[2091f1e...2823341](https://codecov.io/gh/apache/kylin/pull/848?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (KYLIN-4174) job statistics info is incorrect

2019-09-20 Thread ZhouKang (Jira)
ZhouKang created KYLIN-4174:
---

 Summary: job statistics info is incorrect
 Key: KYLIN-4174
 URL: https://issues.apache.org/jira/browse/KYLIN-4174
 Project: Kylin
  Issue Type: Bug
Reporter: ZhouKang


Kylin: master branch

The job statistics info is incorrect. The number behind type is not equal to 
the size of job list.

Step 1: login kylin webui, open the Monitor

Step 2: refresh the web page

 

You can see, the query params of RESTful API is wrong, projectName is "_null"
{code:java}
// code placeholder
curl 
'http://zjy-bigdata-prc-kylin04.bj:7070/kylin/api/jobs/overview?jobSearchMode=ALL=_null=1'
 
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [kylin] RupengWang opened a new pull request #849: Revert "Kylin 3392 support sum(null)"

2019-09-20 Thread GitBox
RupengWang opened a new pull request #849: Revert "Kylin 3392 support sum(null)"
URL: https://github.com/apache/kylin/pull/849
 
 
   This reverts commit 1a2f779a074c1d03703eebbb7506d171ef26ce56.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [kylin] asf-ci commented on issue #849: Revert "Kylin 3392 support sum(null)"

2019-09-20 Thread GitBox
asf-ci commented on issue #849: Revert "Kylin 3392 support sum(null)"
URL: https://github.com/apache/kylin/pull/849#issuecomment-533558965
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (KYLIN-3392) Support NULL value in Sum, Max, Min Aggregation

2019-09-20 Thread wangrupeng (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-3392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wangrupeng updated KYLIN-3392:
--
Fix Version/s: (was: v2.6.4)
   (was: v3.0.0-beta)

> Support NULL value in Sum, Max, Min Aggregation
> ---
>
> Key: KYLIN-3392
> URL: https://issues.apache.org/jira/browse/KYLIN-3392
> Project: Kylin
>  Issue Type: Bug
>Reporter: Yifei Wu
>Assignee: Yifei Wu
>Priority: Major
> Attachments: KYLIN-3392-2.png, KYLIN-3392.png, kylin-3.0.0-alpha2.png
>
>
> It is treated as 0 when confronted with NULL value in KYLIN's basic aggregate 
> measure (like sum, max, min). However, to distinguish the NULL value with 0 
> is very necessary.
> It should be like this
> *sum(null, null) = null*
> *sum(null, 1) = 1*
> *max(null, null) = null*
> *max(null, -1) = -1*
> *min(null,  -1)= -1*
>  in accordance with Hive and SparkSQL



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (KYLIN-3392) Support NULL value in Sum, Max, Min Aggregation

2019-09-20 Thread wangrupeng (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-3392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wangrupeng reopened KYLIN-3392:
---

> Support NULL value in Sum, Max, Min Aggregation
> ---
>
> Key: KYLIN-3392
> URL: https://issues.apache.org/jira/browse/KYLIN-3392
> Project: Kylin
>  Issue Type: Bug
>Reporter: Yifei Wu
>Assignee: Yifei Wu
>Priority: Major
> Fix For: v3.0.0-beta, v2.6.4
>
> Attachments: KYLIN-3392-2.png, KYLIN-3392.png, kylin-3.0.0-alpha2.png
>
>
> It is treated as 0 when confronted with NULL value in KYLIN's basic aggregate 
> measure (like sum, max, min). However, to distinguish the NULL value with 0 
> is very necessary.
> It should be like this
> *sum(null, null) = null*
> *sum(null, 1) = 1*
> *max(null, null) = null*
> *max(null, -1) = -1*
> *min(null,  -1)= -1*
>  in accordance with Hive and SparkSQL



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KYLIN-3392) Support NULL value in Sum, Max, Min Aggregation

2019-09-20 Thread wangrupeng (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-3392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16934404#comment-16934404
 ] 

wangrupeng commented on KYLIN-3392:
---

This pr can cause other serious problem, there are details about the problem I 
met.

when I query with aggregate function such as count, sum, max, min, etc,   if 
the column values are all NULL or the values filter by  where condition 
contains NULL , the query thread will be hang up and after about one minute it 
will throw an exception.

To reproduce this problem, I create a new hive table whose table structure is 
same as KYLIN_SALES and contains 4 rows, the column contains two NULL values is 
PRICE :

1   2013/10/19  FP-non  37831    0     13  30          3          1209
10003717ANALYST Beijing
2   2012/10/22  Others  140746  100  11   70         20         1154
10006076ADMIN   Shanghai
3   2013/10/19  FP-non  37831    0     13  NULL3           1209
10003717ANALYST Beijing
4   2012/10/22  Others  140746  100  11   NULLNULL115410006076  
  ADMIN   Shanghai

And then I build model and cube using this table named KYLIN_SALES_3392 as fact 
table and no lookup table. I compare the kylin which merged this pr with the 
one that didn't merge.

Merged this pr, like I said , when where condition filter column values 
contains NULL, it will throw an exception:
select min(price) from kylin_sales_3392 where trans_id>=1 and trans_id<=4 
!KYLIN-3392.png|width=654,height=296!


if where condition filter column values don't contain NULL, it will work 
properlyselect min(price) from kylin_sales_3392 where trans_id>=1 and 
trans_id<=2
!KYLIN-3392-2.png|width=680,height=256!


And below is the result from origin kylin version
select min(price) from kylin_sales_3392 where trans_id>=1 and trans_id<=4
!kylin-3.0.0-alpha2.png|width=668,height=215!

> Support NULL value in Sum, Max, Min Aggregation
> ---
>
> Key: KYLIN-3392
> URL: https://issues.apache.org/jira/browse/KYLIN-3392
> Project: Kylin
>  Issue Type: Bug
>Reporter: Yifei Wu
>Assignee: Yifei Wu
>Priority: Major
> Fix For: v3.0.0-beta, v2.6.4
>
> Attachments: KYLIN-3392-2.png, KYLIN-3392.png, kylin-3.0.0-alpha2.png
>
>
> It is treated as 0 when confronted with NULL value in KYLIN's basic aggregate 
> measure (like sum, max, min). However, to distinguish the NULL value with 0 
> is very necessary.
> It should be like this
> *sum(null, null) = null*
> *sum(null, 1) = 1*
> *max(null, null) = null*
> *max(null, -1) = -1*
> *min(null,  -1)= -1*
>  in accordance with Hive and SparkSQL



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KYLIN-3392) Support NULL value in Sum, Max, Min Aggregation

2019-09-20 Thread wangrupeng (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-3392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wangrupeng updated KYLIN-3392:
--
Attachment: kylin-3.0.0-alpha2.png

> Support NULL value in Sum, Max, Min Aggregation
> ---
>
> Key: KYLIN-3392
> URL: https://issues.apache.org/jira/browse/KYLIN-3392
> Project: Kylin
>  Issue Type: Bug
>Reporter: Yifei Wu
>Assignee: Yifei Wu
>Priority: Major
> Fix For: v3.0.0-beta, v2.6.4
>
> Attachments: KYLIN-3392-2.png, KYLIN-3392.png, kylin-3.0.0-alpha2.png
>
>
> It is treated as 0 when confronted with NULL value in KYLIN's basic aggregate 
> measure (like sum, max, min). However, to distinguish the NULL value with 0 
> is very necessary.
> It should be like this
> *sum(null, null) = null*
> *sum(null, 1) = 1*
> *max(null, null) = null*
> *max(null, -1) = -1*
> *min(null,  -1)= -1*
>  in accordance with Hive and SparkSQL



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KYLIN-3392) Support NULL value in Sum, Max, Min Aggregation

2019-09-20 Thread wangrupeng (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-3392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wangrupeng updated KYLIN-3392:
--
Attachment: KYLIN-3392-2.png

> Support NULL value in Sum, Max, Min Aggregation
> ---
>
> Key: KYLIN-3392
> URL: https://issues.apache.org/jira/browse/KYLIN-3392
> Project: Kylin
>  Issue Type: Bug
>Reporter: Yifei Wu
>Assignee: Yifei Wu
>Priority: Major
> Fix For: v3.0.0-beta, v2.6.4
>
> Attachments: KYLIN-3392-2.png, KYLIN-3392.png
>
>
> It is treated as 0 when confronted with NULL value in KYLIN's basic aggregate 
> measure (like sum, max, min). However, to distinguish the NULL value with 0 
> is very necessary.
> It should be like this
> *sum(null, null) = null*
> *sum(null, 1) = 1*
> *max(null, null) = null*
> *max(null, -1) = -1*
> *min(null,  -1)= -1*
>  in accordance with Hive and SparkSQL



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [kylin] lshmouse opened a new pull request #848: KYLIN-4095: Add RESOURCE_PATH_PREFIX option in ResourceTool (addendum)

2019-09-20 Thread GitBox
lshmouse opened a new pull request #848: KYLIN-4095: Add RESOURCE_PATH_PREFIX 
option in ResourceTool (addendum)
URL: https://github.com/apache/kylin/pull/848
 
 
   https://issues.apache.org/jira/browse/KYLIN-4095


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [kylin] asf-ci commented on issue #848: KYLIN-4095: Add RESOURCE_PATH_PREFIX option in ResourceTool (addendum)

2019-09-20 Thread GitBox
asf-ci commented on issue #848: KYLIN-4095: Add RESOURCE_PATH_PREFIX option in 
ResourceTool (addendum)
URL: https://github.com/apache/kylin/pull/848#issuecomment-533556510
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (KYLIN-4095) Add RESOURCE_PATH_PREFIX option in ResourceTool

2019-09-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16934402#comment-16934402
 ] 

ASF GitHub Bot commented on KYLIN-4095:
---

lshmouse commented on pull request #848: KYLIN-4095: Add RESOURCE_PATH_PREFIX 
option in ResourceTool (addendum)
URL: https://github.com/apache/kylin/pull/848
 
 
   https://issues.apache.org/jira/browse/KYLIN-4095
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add RESOURCE_PATH_PREFIX option in ResourceTool
> ---
>
> Key: KYLIN-4095
> URL: https://issues.apache.org/jira/browse/KYLIN-4095
> Project: Kylin
>  Issue Type: Improvement
>Reporter: Liu Shaohui
>Priority: Major
> Fix For: v3.0.0-beta
>
> Attachments: image-2019-09-20-10-16-37-603.png, 
> image-2019-09-20-10-17-31-459.png
>
>
> ResourceTool is very useful to fix the metadata with overlap segments.
> But downloading and uploading entire metadata is too heavy.
> It's better to have a RESOURCE_PATH_PREFIX option for downloading and 
> uploading cmds.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KYLIN-3392) Support NULL value in Sum, Max, Min Aggregation

2019-09-20 Thread wangrupeng (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-3392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wangrupeng updated KYLIN-3392:
--
Attachment: KYLIN-3392.png

> Support NULL value in Sum, Max, Min Aggregation
> ---
>
> Key: KYLIN-3392
> URL: https://issues.apache.org/jira/browse/KYLIN-3392
> Project: Kylin
>  Issue Type: Bug
>Reporter: Yifei Wu
>Assignee: Yifei Wu
>Priority: Major
> Fix For: v3.0.0-beta, v2.6.4
>
> Attachments: KYLIN-3392.png
>
>
> It is treated as 0 when confronted with NULL value in KYLIN's basic aggregate 
> measure (like sum, max, min). However, to distinguish the NULL value with 0 
> is very necessary.
> It should be like this
> *sum(null, null) = null*
> *sum(null, 1) = 1*
> *max(null, null) = null*
> *max(null, -1) = -1*
> *min(null,  -1)= -1*
>  in accordance with Hive and SparkSQL



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KYLIN-4141) Build Global Dictionary in no time

2019-09-20 Thread Xiaoxiang Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu updated KYLIN-4141:

Description: 
h2.  
h2. Backgroud

Currently, realtime OLAP do not support COUNT_DISTINCT(bitmap) for non interger 
type.

Because of the lack the ability of encoding string at once, so I want to use 
RocksDB & HBase as implementation of streaming distributed dictionary. 
h2. Design
 # each receiver will own a local dict cache
 # all receiver will share a remote dict storage
 # we choose to use RocksDB as local dict cache
 # we choose to use HBase as remote dict storage

 
 # for each cube, we will create a local dict and a hbase table
 # we will create column family both in RocksDB and HBase for each column which 
occur in COUNT_DISTINCT

h2. Design Diagram

!image-2019-09-20-19-04-47-937.png!

!image-2019-09-20-19-04-55-935.png!

 

!image-2019-09-20-20-06-15-960.png!

 

  was:
h2. Backgroud

Currently, realtime OLAP do not support COUNT_DISTINCT(bitmap) for non interger 
type.

Because of the lack the ability of encoding string at once, so I want to use 
RocksDB & HBase as implementation of streaming distributed dictionary. 
h2. Design
 # each receiver will own a local dict cache
 # all receiver will share a remote dict storage
 # we choose to use RocksDB as local dict cache
 # we choose to use HBase as remote dict storage

 
 # for each cube, we will create a local dict and a hbase table
 # we will create column family both in RocksDB and HBase for each column which 
occur in COUNT_DISTINCT

 

!image-2019-09-20-19-04-47-937.png!

!image-2019-09-20-19-04-55-935.png!


> Build Global Dictionary in no time
> --
>
> Key: KYLIN-4141
> URL: https://issues.apache.org/jira/browse/KYLIN-4141
> Project: Kylin
>  Issue Type: Improvement
>  Components: Real-time Streaming
>Affects Versions: v3.0.0-beta
>Reporter: Xiaoxiang Yu
>Assignee: Xiaoxiang Yu
>Priority: Major
> Fix For: v3.0.0-beta
>
> Attachments: image-2019-09-20-19-04-47-937.png, 
> image-2019-09-20-19-04-55-935.png, image-2019-09-20-20-06-15-960.png
>
>
> h2.  
> h2. Backgroud
> Currently, realtime OLAP do not support COUNT_DISTINCT(bitmap) for non 
> interger type.
> Because of the lack the ability of encoding string at once, so I want to use 
> RocksDB & HBase as implementation of streaming distributed dictionary. 
> h2. Design
>  # each receiver will own a local dict cache
>  # all receiver will share a remote dict storage
>  # we choose to use RocksDB as local dict cache
>  # we choose to use HBase as remote dict storage
>  
>  # for each cube, we will create a local dict and a hbase table
>  # we will create column family both in RocksDB and HBase for each column 
> which occur in COUNT_DISTINCT
> h2. Design Diagram
> !image-2019-09-20-19-04-47-937.png!
> !image-2019-09-20-19-04-55-935.png!
>  
> !image-2019-09-20-20-06-15-960.png!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KYLIN-4141) Build Global Dictionary in no time

2019-09-20 Thread Xiaoxiang Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu updated KYLIN-4141:

Attachment: image-2019-09-20-20-06-15-960.png

> Build Global Dictionary in no time
> --
>
> Key: KYLIN-4141
> URL: https://issues.apache.org/jira/browse/KYLIN-4141
> Project: Kylin
>  Issue Type: Improvement
>  Components: Real-time Streaming
>Affects Versions: v3.0.0-beta
>Reporter: Xiaoxiang Yu
>Assignee: Xiaoxiang Yu
>Priority: Major
> Fix For: v3.0.0-beta
>
> Attachments: image-2019-09-20-19-04-47-937.png, 
> image-2019-09-20-19-04-55-935.png, image-2019-09-20-20-06-15-960.png
>
>
> h2. Backgroud
> Currently, realtime OLAP do not support COUNT_DISTINCT(bitmap) for non 
> interger type.
> Because of the lack the ability of encoding string at once, so I want to use 
> RocksDB & HBase as implementation of streaming distributed dictionary. 
> h2. Design
>  # each receiver will own a local dict cache
>  # all receiver will share a remote dict storage
>  # we choose to use RocksDB as local dict cache
>  # we choose to use HBase as remote dict storage
>  
>  # for each cube, we will create a local dict and a hbase table
>  # we will create column family both in RocksDB and HBase for each column 
> which occur in COUNT_DISTINCT
>  
> !image-2019-09-20-19-04-47-937.png!
> !image-2019-09-20-19-04-55-935.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KYLIN-4141) Build Global Dictionary in no time

2019-09-20 Thread Xiaoxiang Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu updated KYLIN-4141:

Description: 
h2. Backgroud

Currently, realtime OLAP do not support COUNT_DISTINCT(bitmap) for non interger 
type.

Because of the lack the ability of encoding string at once, so I want to use 
RocksDB & HBase as implementation of streaming distributed dictionary. 
h2. Design
 # each receiver will own a local dict cache
 # all receiver will share a remote dict storage
 # we choose to use RocksDB as local dict cache
 # we choose to use HBase as remote dict storage

 
 # for each cube, we will create a local dict and a hbase table
 # we will create column family both in RocksDB and HBase for each column which 
occur in COUNT_DISTINCT

 

!image-2019-09-20-19-04-47-937.png!

!image-2019-09-20-19-04-55-935.png!

  was:
h2. Backgroud

Currently, realtime OLAP do not support COUNT_DISTINCT(bitmap) for non interger 
type.

Because of the lack the ability of encoding string at once, so I want to use 
RocksDB & HBase as implementation of streaming distributed dictionary. 
h2. Design
 # each receiver will own a local dict cache
 # all receiver will share a remote dict storage
 # we choose to use RocksDB as local dict cache
 # we choose to use HBase as remote dict storage

 
 # for each cube, we will create a local dict and a hbase table
 # we will create column family both in RocksDB and HBase for each column which 
occur in COUNT_DISTINCT


> Build Global Dictionary in no time
> --
>
> Key: KYLIN-4141
> URL: https://issues.apache.org/jira/browse/KYLIN-4141
> Project: Kylin
>  Issue Type: Improvement
>  Components: Real-time Streaming
>Affects Versions: v3.0.0-beta
>Reporter: Xiaoxiang Yu
>Assignee: Xiaoxiang Yu
>Priority: Major
> Fix For: v3.0.0-beta
>
> Attachments: image-2019-09-20-19-04-47-937.png, 
> image-2019-09-20-19-04-55-935.png
>
>
> h2. Backgroud
> Currently, realtime OLAP do not support COUNT_DISTINCT(bitmap) for non 
> interger type.
> Because of the lack the ability of encoding string at once, so I want to use 
> RocksDB & HBase as implementation of streaming distributed dictionary. 
> h2. Design
>  # each receiver will own a local dict cache
>  # all receiver will share a remote dict storage
>  # we choose to use RocksDB as local dict cache
>  # we choose to use HBase as remote dict storage
>  
>  # for each cube, we will create a local dict and a hbase table
>  # we will create column family both in RocksDB and HBase for each column 
> which occur in COUNT_DISTINCT
>  
> !image-2019-09-20-19-04-47-937.png!
> !image-2019-09-20-19-04-55-935.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KYLIN-4141) Build Global Dictionary in no time

2019-09-20 Thread Xiaoxiang Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu updated KYLIN-4141:

Attachment: image-2019-09-20-19-04-47-937.png

> Build Global Dictionary in no time
> --
>
> Key: KYLIN-4141
> URL: https://issues.apache.org/jira/browse/KYLIN-4141
> Project: Kylin
>  Issue Type: Improvement
>  Components: Real-time Streaming
>Affects Versions: v3.0.0-beta
>Reporter: Xiaoxiang Yu
>Assignee: Xiaoxiang Yu
>Priority: Major
> Fix For: v3.0.0-beta
>
> Attachments: image-2019-09-20-19-04-47-937.png, 
> image-2019-09-20-19-04-55-935.png
>
>
> h2. Backgroud
> Currently, realtime OLAP do not support COUNT_DISTINCT(bitmap) for non 
> interger type.
> Because of the lack the ability of encoding string at once, so I want to use 
> RocksDB & HBase as implementation of streaming distributed dictionary. 
> h2. Design
>  # each receiver will own a local dict cache
>  # all receiver will share a remote dict storage
>  # we choose to use RocksDB as local dict cache
>  # we choose to use HBase as remote dict storage
>  
>  # for each cube, we will create a local dict and a hbase table
>  # we will create column family both in RocksDB and HBase for each column 
> which occur in COUNT_DISTINCT



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KYLIN-4141) Build Global Dictionary in no time

2019-09-20 Thread Xiaoxiang Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu updated KYLIN-4141:

Attachment: image-2019-09-20-19-04-55-935.png

> Build Global Dictionary in no time
> --
>
> Key: KYLIN-4141
> URL: https://issues.apache.org/jira/browse/KYLIN-4141
> Project: Kylin
>  Issue Type: Improvement
>  Components: Real-time Streaming
>Affects Versions: v3.0.0-beta
>Reporter: Xiaoxiang Yu
>Assignee: Xiaoxiang Yu
>Priority: Major
> Fix For: v3.0.0-beta
>
> Attachments: image-2019-09-20-19-04-47-937.png, 
> image-2019-09-20-19-04-55-935.png
>
>
> h2. Backgroud
> Currently, realtime OLAP do not support COUNT_DISTINCT(bitmap) for non 
> interger type.
> Because of the lack the ability of encoding string at once, so I want to use 
> RocksDB & HBase as implementation of streaming distributed dictionary. 
> h2. Design
>  # each receiver will own a local dict cache
>  # all receiver will share a remote dict storage
>  # we choose to use RocksDB as local dict cache
>  # we choose to use HBase as remote dict storage
>  
>  # for each cube, we will create a local dict and a hbase table
>  # we will create column family both in RocksDB and HBase for each column 
> which occur in COUNT_DISTINCT



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KYLIN-4141) Build Global Dictionary in no time

2019-09-20 Thread Xiaoxiang Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu updated KYLIN-4141:

Description: 
h2. Backgroud

Currently, realtime OLAP do not support COUNT_DISTINCT(bitmap) for non interger 
type.

Because of the lack the ability of encoding string at once, so I want to use 
RocksDB & HBase as implementation of streaming distributed dictionary. 
h2. Design
 # each receiver will own a local dict cache
 # all receiver will share a remote dict storage
 # we choose to use RocksDB as local dict cache
 # we choose to use HBase as remote dict storage

 
 # for each cube, we will create a local dict and a hbase table
 # we will create column family both in RocksDB and HBase for each column which 
occur in COUNT_DISTINCT

> Build Global Dictionary in no time
> --
>
> Key: KYLIN-4141
> URL: https://issues.apache.org/jira/browse/KYLIN-4141
> Project: Kylin
>  Issue Type: Improvement
>  Components: Real-time Streaming
>Affects Versions: v3.0.0-beta
>Reporter: Xiaoxiang Yu
>Assignee: Xiaoxiang Yu
>Priority: Major
> Fix For: v3.0.0-beta
>
>
> h2. Backgroud
> Currently, realtime OLAP do not support COUNT_DISTINCT(bitmap) for non 
> interger type.
> Because of the lack the ability of encoding string at once, so I want to use 
> RocksDB & HBase as implementation of streaming distributed dictionary. 
> h2. Design
>  # each receiver will own a local dict cache
>  # all receiver will share a remote dict storage
>  # we choose to use RocksDB as local dict cache
>  # we choose to use HBase as remote dict storage
>  
>  # for each cube, we will create a local dict and a hbase table
>  # we will create column family both in RocksDB and HBase for each column 
> which occur in COUNT_DISTINCT



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KYLIN-4010) Auto adjust offset according to query server's timezone for time derived column

2019-09-20 Thread Xiaoxiang Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu updated KYLIN-4010:

Description: 
h2. Backgroud
In realtime OLAP, we index real-time event in streaming receiver. We know that 
each event must contains a timestamp column (we often call it event time), that 
value should represent when this event was produced. Because event maybe come 
from different timezone and use local timezone is always *error-prone*, so we 
recommend to use a {color:#DE350B}GMT+0{color} 
timestamp(System.currentTimeMillis()) to avoid such issue.

I think this is good by design, it is easy to understand and always correct. 
But the *side effect* is that, the end user(business manager behind a BI tools) 
are unhappy because he have to use GMT+0 with date/time related filter in SQL 
and should understand the result should be *shifted* with his local timezone. 
It is not user-firendly and inconvenient for normal user. Because user may 
compare query result from different data source and compare them and summarize, 
use GMT+0 may trouble them.

h2. Example
For example, kylin user work in *GMT+8* (maybe in Shanghai) want to know some 
metrics which occured from {color:#DE350B}2019-09-01 12:00:00{color} to 
{color:#DE350B}2019-09-01 14:00:00{color} in his {color:#DE350B}local 
timezone{color}, so he has to {color:#DE350B}rewrite{color} his query (with 
eight hour offset) to following: 


{code:sql}
select hour_start, count(*)
from realtime_table
where hour_start >= "2019-09-01 04:00:00" and hour_start < "2019-09-01 
06:00:00"  
group by hour_start
{code}


And he will get result like :

||hour_start ||count||
|2019-09-01 04:00:00|139202|
|2019-09-01 05:00:00|89398|

And he must convert to a more meaningful result in his mind, it is realy 
annoying!

||hour_start ||count||
|2019-09-01 12:00:00|139202|
|2019-09-01 13:00:00|89398|



h2. Desgin
We should not change the way receiver index event, event time should be stored 
in UTC timestamp. We should auto rewrite sql's event time related filter. 

In kylin, filter condition in where clause will be convert to a *TupleFilter*, 
and it looks like *RelNode* in Apache Calicate.
For where hour_start >= "2019-09-01 12:00:00" and hour_start < "2019-09-01 
14:00:00", we will send TupleFilter to streaming receiver or region server 
which looks like this:


{noformat}
AND
  GreatThanOrEqual
hout_start
CAST
  "2019-09-01 12:00:00"
  timestamp
  LessThanOrEqual
hout_start
CAST
  "2019-09-01 14:00:00"
  timestamp
{noformat}


But for streaming query, we want to change each ConstantTupleFilter and minus 
value for that timestamp. So the TupleFilter which be sent will be following:


{noformat}
AND
  GreatThanOrEqual
hout_start
CAST
  "2019-09-01 04:00:00"
  timestamp
  LessThanOrEqual
hout_start
CAST
  "2019-09-01 06:00:00"
  timestamp
{noformat}


Before query result processed by *OLAPEnumerator*,  kylin will plus each value 
of time derived column, thus protect row from be filtered by calcite generated 
code.

So, user will get what he want in his timezone without any burden.

h2. How to use
To enable auto shift by time zone, please set 
{color:#DE350B}kylin.stream.auto.just.by.timezone{color} to true.
You can specific time zone by {color:#DE350B}kylin.web.timezone{color}, 
otherwise, time zone will be auto detected.
Only *time derived column* will be affected.
h2. Related Issue
Originally, the event time can only in the format of a long value (UTC 
timestamp). But in some case, the event time is in a format of "-MM-dd 
HH:mm:ss", we use a new class DateTimeParser(introduced in KYLIN-4001) to 
convert such format into a UTC timestamp.


h3. Old Describe
In Real-Time Streaming Cube when I send some records to kafka topic, the 
tmestamp for the record is 2019-01-01 00:00:00.000, but kylin create a segment 
named 2018123116_2018123117.

Then I found that TimeZone is hard-coded to "GMT" in function makeSegmentName 
for class CubeSegment. I think that it should be config in kylin.properties.

  was:
h2. Backgroud
In realtime OLAP, we index real-time event in streaming receiver. We know that 
each event must contains a timestamp column (we often call it event time), that 
value should represent when this event was produced. Because event maybe come 
from different timezone and use local timezone is always error-prone, so we 
recommend to use a GMT+0 timestamp(System.currentTimeMillis()) to avoid such 
issue.

I think this is good by design, it is easy to understand and always correct. 
But the side effect is that, the end user(business manager behind a BI tools) 
are unhappy because he have to use GMT+0 with date/time related filter in SQL 
and should understand the result should be shifted with his local timezone. It 
is not user-firendly and inconvenient for normal user. Because user may 

[jira] [Updated] (KYLIN-4010) Auto adjust offset according to query server's timezone for time derived column

2019-09-20 Thread Xiaoxiang Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu updated KYLIN-4010:

Summary: Auto adjust offset according to query server's timezone for time 
derived column  (was: TimeZone is hard-coded in function makeSegmentName for 
class CubeSegment)

> Auto adjust offset according to query server's timezone for time derived 
> column
> ---
>
> Key: KYLIN-4010
> URL: https://issues.apache.org/jira/browse/KYLIN-4010
> Project: Kylin
>  Issue Type: Improvement
>  Components: Others
>Affects Versions: v3.0.0-alpha
>Reporter: zengrui
>Assignee: Xiaoxiang Yu
>Priority: Minor
> Fix For: v3.0.0-beta
>
> Attachments: image-2019-07-15-17-15-31-209.png, 
> image-2019-07-15-17-17-04-029.png, image-2019-07-15-17-17-39-568.png
>
>
> h2. Backgroud
> In realtime OLAP, we index real-time event in streaming receiver. We know 
> that each event must contains a timestamp column (we often call it event 
> time), that value should represent when this event was produced. Because 
> event maybe come from different timezone and use local timezone is always 
> error-prone, so we recommend to use a GMT+0 
> timestamp(System.currentTimeMillis()) to avoid such issue.
> I think this is good by design, it is easy to understand and always correct. 
> But the side effect is that, the end user(business manager behind a BI tools) 
> are unhappy because he have to use GMT+0 with date/time related filter in SQL 
> and should understand the result should be shifted with his local timezone. 
> It is not user-firendly and inconvenient for normal user. Because user may 
> compare query result from different data source and compare them and 
> summarize, use GMT+0 may trouble them.
> h2. Example
> For example, kylin user work in GMT+8 (maybe in Shanghai) want to know some 
> metrics which occured from {color:#DE350B}2019-09-01 12:00:00{color} to 
> {color:#DE350B}2019-09-01 14:00:00{color} in his {color:#DE350B}local 
> timezone{color}, so he has to {color:#DE350B}rewrite{color} his query (with 
> eight hour offset) to following: 
> {code:sql}
> select hour_start, count(*)
> from realtime_table
> where hour_start >= "2019-09-01 04:00:00" and hour_start < "2019-09-01 
> 06:00:00"  
> group by hour_start
> {code}
> And he will get result like :
> ||hour_start ||count||
> |2019-09-01 04:00:00  |139202|
> |2019-09-01 05:00:00  |89398|
> And he must convert to a more meaningful result in his mind, it is realy 
> annoying!
> ||hour_start ||count||
> |2019-09-01 12:00:00  |139202|
> |2019-09-01 13:00:00  |89398|
> h2. Desgin
> We should not change the way receiver index event, event time should be 
> stored in UTC timestamp. We should auto rewrite sql's event time related 
> filter. 
> In kylin, filter condition in where clause will be convert to a 
> *TupleFilter*, and it looks like *RelNode* in Apache Calicate.
> For where hour_start >= "2019-09-01 12:00:00" and hour_start < "2019-09-01 
> 14:00:00", we will send TupleFilter to streaming receiver or region server 
> which looks like this:
> {noformat}
> AND
>   GreatThanOrEqual
> hout_start
> CAST
>   "2019-09-01 12:00:00"
>   timestamp
>   LessThanOrEqual
> hout_start
> CAST
>   "2019-09-01 14:00:00"
>   timestamp
> {noformat}
> But for streaming query, we want to change each ConstantTupleFilter and minus 
> value for that timestamp. So the TupleFilter which be sent will be following:
> {noformat}
> AND
>   GreatThanOrEqual
> hout_start
> CAST
>   "2019-09-01 04:00:00"
>   timestamp
>   LessThanOrEqual
> hout_start
> CAST
>   "2019-09-01 06:00:00"
>   timestamp
> {noformat}
> Before query result processed by *OLAPEnumerator*,  kylin will plus each 
> value of time derived column, thus protect row from be filtered by calcite 
> generated code.
> So, user will get what he want in his timezone without any burden.
> h2. How to use
> To enable auto shift by time zone, please set kylin.stream.auto-timezone to 
> true.
> You can specific time zone by kylin.web.timezone, otherwise, time zone will 
> be auto detected.
> Only *time derived column* will be affected.
> h2. Related Issue
> Originally, the event time can only in the format of a long value (UTC 
> timestamp). But in some case, the event time is in a format of "-MM-dd 
> HH:mm:ss", we use a new class DateTimeParser(introduced in KYLIN-4001) to 
> convert such format into a UTC timestamp.
> In Real-Time Streaming Cube when I send some records to kafka topic, the 
> tmestamp for the record is 2019-01-01 00:00:00.000, but kylin create a 
> segment named 2018123116_2018123117.
> Then I found that TimeZone is hard-coded to "GMT" in function makeSegmentName 
> for class 

[jira] [Assigned] (KYLIN-4170) Add cube information and query information when hbase deadline happened

2019-09-20 Thread nichunen (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nichunen reassigned KYLIN-4170:
---

Assignee: 柳攀

> Add cube information and query information when hbase deadline happened
> ---
>
> Key: KYLIN-4170
> URL: https://issues.apache.org/jira/browse/KYLIN-4170
> Project: Kylin
>  Issue Type: Improvement
>  Components: Storage - HBase
>Affects Versions: v2.6.2
>Reporter: 柳攀
>Assignee: 柳攀
>Priority: Major
> Fix For: v3.0.0
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> When there is a large number of cube query and hbase server is overload, 
> hbase deadline would happened.But there is no information about cube and 
> query for us to locate the problem.If we could get cube and query information 
> in query log, we could solve the query problem quickly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KYLIN-4010) TimeZone is hard-coded in function makeSegmentName for class CubeSegment

2019-09-20 Thread Xiaoxiang Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu updated KYLIN-4010:

Description: 
h2. Backgroud
In realtime OLAP, we index real-time event in streaming receiver. We know that 
each event must contains a timestamp column (we often call it event time), that 
value should represent when this event was produced. Because event maybe come 
from different timezone and use local timezone is always error-prone, so we 
recommend to use a GMT+0 timestamp(System.currentTimeMillis()) to avoid such 
issue.

I think this is good by design, it is easy to understand and always correct. 
But the side effect is that, the end user(business manager behind a BI tools) 
are unhappy because he have to use GMT+0 with date/time related filter in SQL 
and should understand the result should be shifted with his local timezone. It 
is not user-firendly and inconvenient for normal user. Because user may compare 
query result from different data source and compare them and summarize, use 
GMT+0 may trouble them.

h2. Example
For example, kylin user work in GMT+8 (maybe in Shanghai) want to know some 
metrics which occured from {color:#DE350B}2019-09-01 12:00:00{color} to 
{color:#DE350B}2019-09-01 14:00:00{color} in his {color:#DE350B}local 
timezone{color}, so he has to {color:#DE350B}rewrite{color} his query (with 
eight hour offset) to following: 


{code:sql}
select hour_start, count(*)
from realtime_table
where hour_start >= "2019-09-01 04:00:00" and hour_start < "2019-09-01 
06:00:00"  
group by hour_start
{code}


And he will get result like :

||hour_start ||count||
|2019-09-01 04:00:00|139202|
|2019-09-01 05:00:00|89398|

And he must convert to a more meaningful result in his mind, it is realy 
annoying!

||hour_start ||count||
|2019-09-01 12:00:00|139202|
|2019-09-01 13:00:00|89398|



h2. Desgin
We should not change the way receiver index event, event time should be stored 
in UTC timestamp. We should auto rewrite sql's event time related filter. 

In kylin, filter condition in where clause will be convert to a *TupleFilter*, 
and it looks like *RelNode* in Apache Calicate.
For where hour_start >= "2019-09-01 12:00:00" and hour_start < "2019-09-01 
14:00:00", we will send TupleFilter to streaming receiver or region server 
which looks like this:


{noformat}
AND
  GreatThanOrEqual
hout_start
CAST
  "2019-09-01 12:00:00"
  timestamp
  LessThanOrEqual
hout_start
CAST
  "2019-09-01 14:00:00"
  timestamp
{noformat}


But for streaming query, we want to change each ConstantTupleFilter and minus 
value for that timestamp. So the TupleFilter which be sent will be following:


{noformat}
AND
  GreatThanOrEqual
hout_start
CAST
  "2019-09-01 04:00:00"
  timestamp
  LessThanOrEqual
hout_start
CAST
  "2019-09-01 06:00:00"
  timestamp
{noformat}


Before query result processed by *OLAPEnumerator*,  kylin will plus each value 
of time derived column, thus protect row from be filtered by calcite generated 
code.

So, user will get what he want in his timezone without any burden.

h2. How to use
To enable auto shift by time zone, please set kylin.stream.auto-timezone to 
true.
You can specific time zone by kylin.web.timezone, otherwise, time zone will be 
auto detected.
Only *time derived column* will be affected.
h2. Related Issue
Originally, the event time can only in the format of a long value (UTC 
timestamp). But in some case, the event time is in a format of "-MM-dd 
HH:mm:ss", we use a new class DateTimeParser(introduced in KYLIN-4001) to 
convert such format into a UTC timestamp.


In Real-Time Streaming Cube when I send some records to kafka topic, the 
tmestamp for the record is 2019-01-01 00:00:00.000, but kylin create a segment 
named 2018123116_2018123117.

Then I found that TimeZone is hard-coded to "GMT" in function makeSegmentName 
for class CubeSegment. I think that it should be config in kylin.properties.

  was:


h2. Backgroud
In realtime OLAP, we index real-time event in streaming receiver. We know that 
each event must contains a timestamp column (we often call it event time), that 
value should represent when this event was produced. Because event maybe come 
from different timezone and use local timezone is always error-prone, so we 
recommend to use a GMT+0 timestamp(System.currentTimeMillis()) to avoid such 
issue.

I think this is good by design, it is easy to understand and always correct. 
But the side effect is that, the end user(business manager behind a BI tools) 
are unhappy because he have to use GMT+0 with date/time related filter in SQL 
and should understand the result should be shifted with his local timezone. It 
is not user-firendly and inconvenient for normal user. Because user may compare 
query result from different data source and compare them and summarize, use 
GMT+0 may 

[jira] [Updated] (KYLIN-4010) TimeZone is hard-coded in function makeSegmentName for class CubeSegment

2019-09-20 Thread Xiaoxiang Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu updated KYLIN-4010:

Description: 


h2. Backgroud
In realtime OLAP, we index real-time event in streaming receiver. We know that 
each event must contains a timestamp column (we often call it event time), that 
value should represent when this event was produced. Because event maybe come 
from different timezone and use local timezone is always error-prone, so we 
recommend to use a GMT+0 timestamp(System.currentTimeMillis()) to avoid such 
issue.

I think this is good by design, it is easy to understand and always correct. 
But the side effect is that, the end user(business manager behind a BI tools) 
are unhappy because he have to use GMT+0 with date/time related filter in SQL 
and should understand the result should be shifted with his local timezone. It 
is not user-firendly and inconvenient for normal user. Because user may compare 
query result from different data source and compare them and summarize, use 
GMT+0 may trouble them.

h2. Example
For example, kylin user work in GMT+8 (maybe in Shanghai) want to know some 
metrics which occured from 2019-09-01 12:00:00 to 2019-09-01 14:00:00 in his 
local timezone, so he has to rewrite his query (with eight hour offset) to 
following: 

select hour_start, count(*)
from realtime_table
where hour_start >= "2019-09-01 04:00:00" and hour_start < "2019-09-01 
06:00:00"  
group by hour_start

And he will get result like :
And he must convert to a more meaningful result in his mind, it is realy 
annoying!

h2. Desgin
We should not change the way receiver index event, event time should be stored 
in UTC timestamp. We should auto rewrite sql's event time related filter. 

In kylin, filter condition in where clause will be convert to a TupleFilter, 
and it looks like RelNode in Apache Calicate.
For where hour_start >= "2019-09-01 12:00:00" and hour_start < "2019-09-01 
14:00:00", we will send TupleFilter to streaming receiver or region server 
which looks like this:

AND
 GreatThanOrEqual
hout_start
CAST
"2019-09-01 12:00:00"
timestamp
 LessThanOrEqual
hout_start
CAST
"2019-09-01 14:00:00"
timestamp
But for streaming query, we want to change each ConstantTupleFilter and minus 
value for that timestamp. So the TupleFilter which be sent will be following:

AND
 GreatThanOrEqual
hout_start
CAST
"2019-09-01 04:00:00"
timestamp
 LessThanOrEqual
hout_start
CAST
"2019-09-01 06:00:00"
timestamp

Before query result processed by OLAPEnumerator,  kylin will plus each value of 
time derived column, thus to protect row be filtered by generated code.

So, user will get what he want in his timezone without any burden.
h2. How to use
To enable auto shift by time zone, please set kylin.stream.auto-timezone to 
true.
You can specific time zone by kylin.web.timezone, otherwise, time zone will be 
auto detected.
Only time derived column will be affected.
h2. Related Issue
Originally, the event time can only in the format of a long value (UTC 
timestamp). But in some case, the event time is in a format of "-MM-dd 
HH:mm:ss", we use a new class DateTimeParser(introduced in KYLIN-4001) to 
convert such format into a UTC timestamp.


In Real-Time Streaming Cube when I send some records to kafka topic, the 
tmestamp for the record is 2019-01-01 00:00:00.000, but kylin create a segment 
named 2018123116_2018123117.

Then I found that TimeZone is hard-coded to "GMT" in function makeSegmentName 
for class CubeSegment. I think that it should be config in kylin.properties.

  was:
In Real-Time Streaming Cube when I send some records to kafka topic, the 
tmestamp for the record is 2019-01-01 00:00:00.000, but kylin create a segment 
named 2018123116_2018123117.

Then I found that TimeZone is hard-coded to "GMT" in function makeSegmentName 
for class CubeSegment. I think that it should be config in kylin.properties.


> TimeZone is hard-coded in function makeSegmentName for class CubeSegment
> 
>
> Key: KYLIN-4010
> URL: https://issues.apache.org/jira/browse/KYLIN-4010
> Project: Kylin
>  Issue Type: Improvement
>  Components: Others
>Affects Versions: v3.0.0-alpha
>Reporter: zengrui
>Assignee: Xiaoxiang Yu
>Priority: Minor
> Fix For: v3.0.0-beta
>
> Attachments: image-2019-07-15-17-15-31-209.png, 
> image-2019-07-15-17-17-04-029.png, image-2019-07-15-17-17-39-568.png
>
>
> h2. Backgroud
> In realtime OLAP, we index real-time event in streaming receiver. We know 
> that each event must contains a timestamp column (we often call it event 
> time), that value should represent when this event was produced. Because 
> event maybe come from different timezone and use local timezone is always 
> error-prone, so we 

[jira] [Updated] (KYLIN-4166) kylin parse sql error

2019-09-20 Thread phil.zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

phil.zhang updated KYLIN-4166:
--
Description: 
 

   1.  I use tableau on kylin, tableau has generate a sql : 
 '''
SELECT "X___SQL___"."FLIGHT_TYPE" AS "FLIGHT_TYPE", 
SUM("X___SQL___"."SHOPPING_SUCCESS_COUNT") AS "sum_SHOPPING_SUCCESS_COUNT_ok" 
FROM ( select *, case when cast(DURATION_LEVEL as varchar) like '1' then 
'>=0s,<3s' when cast(DURATION_LEVEL as varchar) like '2' then '>=3s,<4s' when 
cast(DURATION_LEVEL as varchar) like '3' then '>=4s,<5s' when 
cast(DURATION_LEVEL as varchar) like '4' then '>=5s,<6s' when 
cast(DURATION_LEVEL as varchar) like '5' then '>=6s,<7s' when 
cast(DURATION_LEVEL as varchar) like '6' then '>=7s,<8s' when 
cast(DURATION_LEVEL as varchar) like '7' then '>=8s,<9s' when 
cast(DURATION_LEVEL as varchar) like '8' then '>=9s,<10s' when 
cast(DURATION_LEVEL as varchar) like '9' then '>=10s,<12s' when 
cast(DURATION_LEVEL as varchar) like '10' then '>=12s,<14s' when 
cast(DURATION_LEVEL as varchar) like '11' then '>=14s,<16s' when 
cast(DURATION_LEVEL as varchar) like '12' then '>=16s,<18s' when 
cast(DURATION_LEVEL as varchar) like '13' then '>=18s,<20s' when 
cast(DURATION_LEVEL as varchar) like '14' then '>=20s,<30s' when 
cast(DURATION_LEVEL as varchar) like '15' then '>=30s,<40s' when 
cast(DURATION_LEVEL as varchar) like '16' then '>=40s,<50s' when 
cast(DURATION_LEVEL as varchar) like '17' then '>=50s' else 'null' end as 
DURATION_LEVEL_TRANS from DM_AIR_API_DURATION_SPB_D ) "X___SQL___" WHERE 
("X___SQL___"."DURATION_LEVEL_TRANS" = '>=5s,<6s') GROUP BY 1
  '''
     it seems that the sql is correct, but the result is unexpect, it returns 
null .

!image-2019-09-20-15-22-15-356.png! 

But actually, it should  returns some rows because there exist data fit the 
query, let's change the sql form to check it.

'''
 SELECT "FLIGHT_TYPE" AS "FLIGHT_TYPE",   SUM("SHOPPING_SUCCESS_COUNT") AS 
"sum_SHOPPING_SUCCESS_COUNT_ok" from DM_AIR_API_DURATION_SPB_D  WHERE 
("DURATION_LEVEL" = 4 and "DAY_NUM"='2019-09-10') GROUP BY 1
 '''

!image-2019-09-20-15-23-29-881.png!
  

2. and when i write sql like this 

'''
 SELECT  flight_type,SUM("SHOPPING_SUCCESS_COUNT") AS 
"sum_SHOPPING_SUCCESS_COUNT_ok" from DM_AIR_API_DURATION_SPB_D group by 
flight_type
 '''
 i got result {color:#33}like this{color}

{color:#33}!http://mail.163.com/js6/s?func=mbox:getMessageData=270:xtbBDhAtQVrbEIH+vQAAsh=5!{color}
 it is same to the result of 

'''
SELECT "X___SQL___"."FLIGHT_TYPE" AS "FLIGHT_TYPE",   
SUM("X___SQL___"."SHOPPING_SUCCESS_COUNT") AS "sum_SHOPPING_SUCCESS_COUNT_ok" 
FROM (   select *,        case         when cast(DURATION_LEVEL as varchar) 
like '1' then *'>=0s,<3s'*         when cast(DURATION_LEVEL as varchar) like 
'2' then '>=3s,<4s'         when cast(DURATION_LEVEL as varchar) like '3' then 
'>=4s,<5s'         when cast(DURATION_LEVEL as varchar) like '4' then 
'>=5s,<6s'         when cast(DURATION_LEVEL as varchar) like '5' then 
'>=6s,<7s'         when cast(DURATION_LEVEL as varchar) like '6' then 
'>=7s,<8s'         when cast(DURATION_LEVEL as varchar) like '7' then 
'>=8s,<9s'         when cast(DURATION_LEVEL as varchar) like '8' then 
'>=9s,<10s'         when cast(DURATION_LEVEL as varchar) like '9' then 
'>=10s,<12s'         when cast(DURATION_LEVEL as varchar) like '10' then 
'>=12s,<14s'         when cast(DURATION_LEVEL as varchar) like '11' then 
'>=14s,<16s'         when cast(DURATION_LEVEL as varchar) like '12' then 
'>=16s,<18s'         when cast(DURATION_LEVEL as varchar) like '13' then 
'>=18s,<20s'         when cast(DURATION_LEVEL as varchar) like '14' then 
'>=20s,<30s'         when cast(DURATION_LEVEL as varchar) like '15' then 
'>=30s,<40s'         when cast(DURATION_LEVEL as varchar) like '16' then 
'>=40s,<50s'         when cast(DURATION_LEVEL as varchar) like '17' then 
'>=50s'            else 'null'         end as DURATION_LEVEL_TRANS    from 
DM_AIR_API_DURATION_SPB_D ) "X___SQL___" WHERE 
("X___SQL___"."DURATION_LEVEL_TRANS" = *'>=0s,<3s'*) GROUP BY 1
 '''

!http://mail.163.com/js6/s?func=mbox:getMessageData=270:xtbBDhAtQVrbEIH+vQAAsh=6!
  
  
{quote} *I guess maybe there are some bugs about parsing sql like upper ? I 
hope someone can help me to deal this problem.*
{quote}

  was:
{quote} 

   1.  I use tableau on kylin, tableau has generate a sql : 
 '''
 SELECT "X___SQL___"."FLIGHT_TYPE" AS "FLIGHT_TYPE",   
SUM("X___SQL___"."SHOPPING_SUCCESS_COUNT") AS "sum_SHOPPING_SUCCESS_COUNT_ok" 
FROM (   select *,        case         when cast(DURATION_LEVEL as varchar) 
like '1' then '>=0s,<3s'         when cast(DURATION_LEVEL as varchar) like '2' 
then '>=3s,<4s'         when cast(DURATION_LEVEL as varchar) like '3' then 
'>=4s,<5s'         when cast(DURATION_LEVEL as varchar) like '4' then* 
'>=5s,<6s'         *when cast(DURATION_LEVEL as varchar) like '5' then 
'>=6s,<7s'         when 

[jira] [Updated] (KYLIN-4166) kylin parse sql error

2019-09-20 Thread phil.zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

phil.zhang updated KYLIN-4166:
--
Description: 
{quote} 

   1.  I use tableau on kylin, tableau has generate a sql : 
 '''
 SELECT "X___SQL___"."FLIGHT_TYPE" AS "FLIGHT_TYPE",   
SUM("X___SQL___"."SHOPPING_SUCCESS_COUNT") AS "sum_SHOPPING_SUCCESS_COUNT_ok" 
FROM (   select *,        case         when cast(DURATION_LEVEL as varchar) 
like '1' then '>=0s,<3s'         when cast(DURATION_LEVEL as varchar) like '2' 
then '>=3s,<4s'         when cast(DURATION_LEVEL as varchar) like '3' then 
'>=4s,<5s'         when cast(DURATION_LEVEL as varchar) like '4' then* 
'>=5s,<6s'         *when cast(DURATION_LEVEL as varchar) like '5' then 
'>=6s,<7s'         when cast(DURATION_LEVEL as varchar) like '6' then 
'>=7s,<8s'         when cast(DURATION_LEVEL as varchar) like '7' then 
'>=8s,<9s'         when cast(DURATION_LEVEL as varchar) like '8' then 
'>=9s,<10s'         when cast(DURATION_LEVEL as varchar) like '9' then 
'>=10s,<12s'         when cast(DURATION_LEVEL as varchar) like '10' then 
'>=12s,<14s'         when cast(DURATION_LEVEL as varchar) like '11' then 
'>=14s,<16s'         when cast(DURATION_LEVEL as varchar) like '12' then 
'>=16s,<18s'         when cast(DURATION_LEVEL as varchar) like '13' then 
'>=18s,<20s'         when cast(DURATION_LEVEL as varchar) like '14' then 
'>=20s,<30s'         when cast(DURATION_LEVEL as varchar) like '15' then 
'>=30s,<40s'         when cast(DURATION_LEVEL as varchar) like '16' then 
'>=40s,<50s'         when cast(DURATION_LEVEL as varchar) like '17' then 
'>=50s'            else 'null'         end as DURATION_LEVEL_TRANS    from 
DM_AIR_API_DURATION_SPB_D ) "X*_*__SQL___" WHERE 
("X___SQL___"."DURATION_LEVEL_TRANS" =* '>=5s,<6s' ) GROUP BY 1
  '''
     it seems that the sql is correct, but the result is unexpect, it returns 
null .
{quote}
!image-2019-09-20-15-22-15-356.png!
{quote}  2. But actually, it should  returns some rows because there exist data 
fit the query, let's change the sql form to check it.
 '''
 SELECT "FLIGHT_TYPE" AS "FLIGHT_TYPE",   SUM("SHOPPING_SUCCESS_COUNT") AS 
"sum_SHOPPING_SUCCESS_COUNT_ok" from DM_AIR_API_DURATION_SPB_D  WHERE 
("DURATION_LEVEL" = 4 and "DAY_NUM"='2019-09-10') GROUP BY 1
 '''
{quote}
!image-2019-09-20-15-23-29-881.png!
  

2. 
{quote}and when i write sql like this 
 '''
 SELECT  flight_type,SUM("SHOPPING_SUCCESS_COUNT") AS 
"sum_SHOPPING_SUCCESS_COUNT_ok" from DM_AIR_API_DURATION_SPB_D group by 
flight_type
 '''
 i got result {color:#33}like this{color}
{quote}
{color:#33}!http://mail.163.com/js6/s?func=mbox:getMessageData=270:xtbBDhAtQVrbEIH+vQAAsh=5!{color}
  
{quote}it is same to the result of 
 '''
 SELECT "X___SQL___"."FLIGHT_TYPE" AS "FLIGHT_TYPE",   
SUM("X___SQL___"."SHOPPING_SUCCESS_COUNT") AS "sum_SHOPPING_SUCCESS_COUNT_ok" 
FROM (   select *,        case         when cast(DURATION_LEVEL as varchar) 
like '1' then* '>=0s,<3s'         *when cast(DURATION_LEVEL as varchar) like 
'2' then '>=3s,<4s'         when cast(DURATION_LEVEL as varchar) like '3' then 
'>=4s,<5s'         when cast(DURATION_LEVEL as varchar) like '4' then 
'>=5s,<6s'         when cast(DURATION_LEVEL as varchar) like '5' then 
'>=6s,<7s'         when cast(DURATION_LEVEL as varchar) like '6' then 
'>=7s,<8s'         when cast(DURATION_LEVEL as varchar) like '7' then 
'>=8s,<9s'         when cast(DURATION_LEVEL as varchar) like '8' then 
'>=9s,<10s'         when cast(DURATION_LEVEL as varchar) like '9' then 
'>=10s,<12s'         when cast(DURATION_LEVEL as varchar) like '10' then 
'>=12s,<14s'         when cast(DURATION_LEVEL as varchar) like '11' then 
'>=14s,<16s'         when cast(DURATION_LEVEL as varchar) like '12' then 
'>=16s,<18s'         when cast(DURATION_LEVEL as varchar) like '13' then 
'>=18s,<20s'         when cast(DURATION_LEVEL as varchar) like '14' then 
'>=20s,<30s'         when cast(DURATION_LEVEL as varchar) like '15' then 
'>=30s,<40s'         when cast(DURATION_LEVEL as varchar) like '16' then 
'>=40s,<50s'         when cast(DURATION_LEVEL as varchar) like '17' then 
'>=50s'            else 'null'         end as DURATION_LEVEL_TRANS    from 
DM_AIR_API_DURATION_SPB_D ) "X*_*__SQL___" WHERE 
("X___SQL___"."DURATION_LEVEL_TRANS" =* '>=0s,<3s' ) GROUP BY 1
 '''
{quote}
!http://mail.163.com/js6/s?func=mbox:getMessageData=270:xtbBDhAtQVrbEIH+vQAAsh=6!
  
  
{quote} *I guess maybe there are some bugs about parsing sql like upper ? I 
hope someone can help me to deal this problem.*
{quote}

  was:
{quote} 

   1.  I use tableau on kylin, tableau has generate a sql : 
 '''
 SELECT "X___SQL___"."FLIGHT_TYPE" AS "FLIGHT_TYPE",   
SUM("X___SQL___"."SHOPPING_SUCCESS_COUNT") AS "sum_SHOPPING_SUCCESS_COUNT_ok" 
FROM (   select *,        case         when cast(DURATION_LEVEL as varchar) 
like '1' then '>=0s,<3s'         when cast(DURATION_LEVEL as varchar) like '2' 
then 

[jira] [Commented] (KYLIN-4166) kylin parse sql error

2019-09-20 Thread phil.zhang (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16934159#comment-16934159
 ] 

phil.zhang commented on KYLIN-4166:
---

I am sorry for  describing the question not exactlly。 i have updated it. I hope 
someone can fix it. 

> kylin parse sql error
> -
>
> Key: KYLIN-4166
> URL: https://issues.apache.org/jira/browse/KYLIN-4166
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v3.0.0-alpha2
>Reporter: phil.zhang
>Priority: Critical
> Attachments: 1.png, 2.png, 3.png, 4.png, 
> image-2019-09-20-15-22-15-356.png, image-2019-09-20-15-23-29-881.png
>
>
> {quote} 
>    1.  I use tableau on kylin, tableau has generate a sql : 
>  '''
>  SELECT "X___SQL___"."FLIGHT_TYPE" AS "FLIGHT_TYPE",   
> SUM("X___SQL___"."SHOPPING_SUCCESS_COUNT") AS "sum_SHOPPING_SUCCESS_COUNT_ok" 
> FROM (   select *,        case         when cast(DURATION_LEVEL as varchar) 
> like '1' then '>=0s,<3s'         when cast(DURATION_LEVEL as varchar) like 
> '2' then '>=3s,<4s'         when cast(DURATION_LEVEL as varchar) like '3' 
> then '>=4s,<5s'         when cast(DURATION_LEVEL as varchar) like '4' then* 
> '>=5s,<6s'         *when cast(DURATION_LEVEL as varchar) like '5' then 
> '>=6s,<7s'         when cast(DURATION_LEVEL as varchar) like '6' then 
> '>=7s,<8s'         when cast(DURATION_LEVEL as varchar) like '7' then 
> '>=8s,<9s'         when cast(DURATION_LEVEL as varchar) like '8' then 
> '>=9s,<10s'         when cast(DURATION_LEVEL as varchar) like '9' then 
> '>=10s,<12s'         when cast(DURATION_LEVEL as varchar) like '10' then 
> '>=12s,<14s'         when cast(DURATION_LEVEL as varchar) like '11' then 
> '>=14s,<16s'         when cast(DURATION_LEVEL as varchar) like '12' then 
> '>=16s,<18s'         when cast(DURATION_LEVEL as varchar) like '13' then 
> '>=18s,<20s'         when cast(DURATION_LEVEL as varchar) like '14' then 
> '>=20s,<30s'         when cast(DURATION_LEVEL as varchar) like '15' then 
> '>=30s,<40s'         when cast(DURATION_LEVEL as varchar) like '16' then 
> '>=40s,<50s'         when cast(DURATION_LEVEL as varchar) like '17' then 
> '>=50s'            else 'null'         end as DURATION_LEVEL_TRANS    from 
> DM_AIR_API_DURATION_SPB_D ) "X*_*__SQL___" WHERE 
> ("X___SQL___"."DURATION_LEVEL_TRANS" =* '>=5s,<6s'*) GROUP BY 1
>   '''
>      it seems that the sql is correct, but the result is unexpect, it returns 
> null .
> {quote}
> !image-2019-09-20-15-22-15-356.png!
> {quote}  2. But actually, it should  returns some rows because there exist 
> data fit the query, let's change the sql form to check it.
>  '''
>  SELECT "FLIGHT_TYPE" AS "FLIGHT_TYPE",   SUM("SHOPPING_SUCCESS_COUNT") AS 
> "sum_SHOPPING_SUCCESS_COUNT_ok" from DM_AIR_API_DURATION_SPB_D  WHERE 
> ("DURATION_LEVEL" = 4 and "DAY_NUM"='2019-09-10') GROUP BY 1
>  '''
> {quote}
> !image-2019-09-20-15-23-29-881.png!
>   
> 2. 
> {quote}and when i write sql like this 
>  '''
>  SELECT  flight_type,SUM("SHOPPING_SUCCESS_COUNT") AS 
> "sum_SHOPPING_SUCCESS_COUNT_ok" from DM_AIR_API_DURATION_SPB_D group by 
> flight_type
>  '''
>  i got result {color:#33}like this{color}
> {quote}
> {color:#33}!http://mail.163.com/js6/s?func=mbox:getMessageData=270:xtbBDhAtQVrbEIH+vQAAsh=5!{color}
>   
> {quote}it is same to the result of 
>  '''
> SELECT "X___SQL___"."FLIGHT_TYPE" AS "FLIGHT_TYPE",   
> SUM("X___SQL___"."SHOPPING_SUCCESS_COUNT") AS "sum_SHOPPING_SUCCESS_COUNT_ok" 
> FROM (   select *,        case         when cast(DURATION_LEVEL as varchar) 
> like '1' then *'>=0s,<3s'*         when cast(DURATION_LEVEL as varchar) like 
> '2' then '>=3s,<4s'         when cast(DURATION_LEVEL as varchar) like '3' 
> then '>=4s,<5s'         when cast(DURATION_LEVEL as varchar) like '4' then 
> '>=5s,<6s'         when cast(DURATION_LEVEL as varchar) like '5' then 
> '>=6s,<7s'         when cast(DURATION_LEVEL as varchar) like '6' then 
> '>=7s,<8s'         when cast(DURATION_LEVEL as varchar) like '7' then 
> '>=8s,<9s'         when cast(DURATION_LEVEL as varchar) like '8' then 
> '>=9s,<10s'         when cast(DURATION_LEVEL as varchar) like '9' then 
> '>=10s,<12s'         when cast(DURATION_LEVEL as varchar) like '10' then 
> '>=12s,<14s'         when cast(DURATION_LEVEL as varchar) like '11' then 
> '>=14s,<16s'         when cast(DURATION_LEVEL as varchar) like '12' then 
> '>=16s,<18s'         when cast(DURATION_LEVEL as varchar) like '13' then 
> '>=18s,<20s'         when cast(DURATION_LEVEL as varchar) like '14' then 
> '>=20s,<30s'         when cast(DURATION_LEVEL as varchar) like '15' then 
> '>=30s,<40s'         when cast(DURATION_LEVEL as varchar) like '16' then 
> '>=40s,<50s'         when cast(DURATION_LEVEL as varchar) like '17' then 
> '>=50s'            else 'null'         end as DURATION_LEVEL_TRANS 

[jira] [Updated] (KYLIN-4166) kylin parse sql error

2019-09-20 Thread phil.zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

phil.zhang updated KYLIN-4166:
--
Description: 
{quote} 

   1.  I use tableau on kylin, tableau has generate a sql : 
 '''
 SELECT "X___SQL___"."FLIGHT_TYPE" AS "FLIGHT_TYPE",   
SUM("X___SQL___"."SHOPPING_SUCCESS_COUNT") AS "sum_SHOPPING_SUCCESS_COUNT_ok" 
FROM (   select *,        case         when cast(DURATION_LEVEL as varchar) 
like '1' then '>=0s,<3s'         when cast(DURATION_LEVEL as varchar) like '2' 
then '>=3s,<4s'         when cast(DURATION_LEVEL as varchar) like '3' then 
'>=4s,<5s'         when cast(DURATION_LEVEL as varchar) like '4' then* 
'>=5s,<6s'         *when cast(DURATION_LEVEL as varchar) like '5' then 
'>=6s,<7s'         when cast(DURATION_LEVEL as varchar) like '6' then 
'>=7s,<8s'         when cast(DURATION_LEVEL as varchar) like '7' then 
'>=8s,<9s'         when cast(DURATION_LEVEL as varchar) like '8' then 
'>=9s,<10s'         when cast(DURATION_LEVEL as varchar) like '9' then 
'>=10s,<12s'         when cast(DURATION_LEVEL as varchar) like '10' then 
'>=12s,<14s'         when cast(DURATION_LEVEL as varchar) like '11' then 
'>=14s,<16s'         when cast(DURATION_LEVEL as varchar) like '12' then 
'>=16s,<18s'         when cast(DURATION_LEVEL as varchar) like '13' then 
'>=18s,<20s'         when cast(DURATION_LEVEL as varchar) like '14' then 
'>=20s,<30s'         when cast(DURATION_LEVEL as varchar) like '15' then 
'>=30s,<40s'         when cast(DURATION_LEVEL as varchar) like '16' then 
'>=40s,<50s'         when cast(DURATION_LEVEL as varchar) like '17' then 
'>=50s'            else 'null'         end as DURATION_LEVEL_TRANS    from 
DM_AIR_API_DURATION_SPB_D ) "X*_*__SQL___" WHERE 
("X___SQL___"."DURATION_LEVEL_TRANS" =* '>=5s,<6s'*) GROUP BY 1
  '''
     it seems that the sql is correct, but the result is unexpect, it returns 
null .
{quote}
!image-2019-09-20-15-22-15-356.png!
{quote}  2. But actually, it should  returns some rows because there exist data 
fit the query, let's change the sql form to check it.
 '''
 SELECT "FLIGHT_TYPE" AS "FLIGHT_TYPE",   SUM("SHOPPING_SUCCESS_COUNT") AS 
"sum_SHOPPING_SUCCESS_COUNT_ok" from DM_AIR_API_DURATION_SPB_D  WHERE 
("DURATION_LEVEL" = 4 and "DAY_NUM"='2019-09-10') GROUP BY 1
 '''
{quote}
!image-2019-09-20-15-23-29-881.png!
  

2. 
{quote}and when i write sql like this 
 '''
 SELECT  flight_type,SUM("SHOPPING_SUCCESS_COUNT") AS 
"sum_SHOPPING_SUCCESS_COUNT_ok" from DM_AIR_API_DURATION_SPB_D group by 
flight_type
 '''
 i got result {color:#33}like this{color}
{quote}
{color:#33}!http://mail.163.com/js6/s?func=mbox:getMessageData=270:xtbBDhAtQVrbEIH+vQAAsh=5!{color}
  
{quote}it is same to the result of 
 '''
SELECT "X___SQL___"."FLIGHT_TYPE" AS "FLIGHT_TYPE",   
SUM("X___SQL___"."SHOPPING_SUCCESS_COUNT") AS "sum_SHOPPING_SUCCESS_COUNT_ok" 
FROM (   select *,        case         when cast(DURATION_LEVEL as varchar) 
like '1' then *'>=0s,<3s'*         when cast(DURATION_LEVEL as varchar) like 
'2' then '>=3s,<4s'         when cast(DURATION_LEVEL as varchar) like '3' then 
'>=4s,<5s'         when cast(DURATION_LEVEL as varchar) like '4' then 
'>=5s,<6s'         when cast(DURATION_LEVEL as varchar) like '5' then 
'>=6s,<7s'         when cast(DURATION_LEVEL as varchar) like '6' then 
'>=7s,<8s'         when cast(DURATION_LEVEL as varchar) like '7' then 
'>=8s,<9s'         when cast(DURATION_LEVEL as varchar) like '8' then 
'>=9s,<10s'         when cast(DURATION_LEVEL as varchar) like '9' then 
'>=10s,<12s'         when cast(DURATION_LEVEL as varchar) like '10' then 
'>=12s,<14s'         when cast(DURATION_LEVEL as varchar) like '11' then 
'>=14s,<16s'         when cast(DURATION_LEVEL as varchar) like '12' then 
'>=16s,<18s'         when cast(DURATION_LEVEL as varchar) like '13' then 
'>=18s,<20s'         when cast(DURATION_LEVEL as varchar) like '14' then 
'>=20s,<30s'         when cast(DURATION_LEVEL as varchar) like '15' then 
'>=30s,<40s'         when cast(DURATION_LEVEL as varchar) like '16' then 
'>=40s,<50s'         when cast(DURATION_LEVEL as varchar) like '17' then 
'>=50s'            else 'null'         end as DURATION_LEVEL_TRANS    from 
DM_AIR_API_DURATION_SPB_D ) "X___SQL___" WHERE 
("X___SQL___"."DURATION_LEVEL_TRANS" = *'>=0s,<3s'*) GROUP BY 1
 '''
{quote}
!http://mail.163.com/js6/s?func=mbox:getMessageData=270:xtbBDhAtQVrbEIH+vQAAsh=6!
  
  
{quote} *I guess maybe there are some bugs about parsing sql like upper ? I 
hope someone can help me to deal this problem.*
{quote}

  was:
{quote} 

   1.  I use tableau on kylin, tableau has generate a sql : 
 '''
SELECT "X___SQL___"."FLIGHT_TYPE" AS "FLIGHT_TYPE",   
SUM("X___SQL___"."SHOPPING_SUCCESS_COUNT") AS "sum_SHOPPING_SUCCESS_COUNT_ok" 
FROM (   select *,        case         when cast(DURATION_LEVEL as varchar) 
like '1' then '>=0s,<3s'         when cast(DURATION_LEVEL as varchar) like '2' 
then '>=3s,<4s'   

[jira] [Updated] (KYLIN-4166) kylin parse sql error

2019-09-20 Thread phil.zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

phil.zhang updated KYLIN-4166:
--
 Attachment: image-2019-09-20-15-22-15-356.png
 image-2019-09-20-15-23-29-881.png
Description: 
{quote} 

   1.  I use tableau on kylin, tableau has generate a sql : 
 '''
SELECT "X___SQL___"."FLIGHT_TYPE" AS "FLIGHT_TYPE",   
SUM("X___SQL___"."SHOPPING_SUCCESS_COUNT") AS "sum_SHOPPING_SUCCESS_COUNT_ok" 
FROM (   select *,        case         when cast(DURATION_LEVEL as varchar) 
like '1' then '>=0s,<3s'         when cast(DURATION_LEVEL as varchar) like '2' 
then '>=3s,<4s'         when cast(DURATION_LEVEL as varchar) like '3' then 
'>=4s,<5s'         when cast(DURATION_LEVEL as varchar) like '4' then 
*'>=5s,<6s'*         when cast(DURATION_LEVEL as varchar) like '5' then 
'>=6s,<7s'         when cast(DURATION_LEVEL as varchar) like '6' then 
'>=7s,<8s'         when cast(DURATION_LEVEL as varchar) like '7' then 
'>=8s,<9s'         when cast(DURATION_LEVEL as varchar) like '8' then 
'>=9s,<10s'         when cast(DURATION_LEVEL as varchar) like '9' then 
'>=10s,<12s'         when cast(DURATION_LEVEL as varchar) like '10' then 
'>=12s,<14s'         when cast(DURATION_LEVEL as varchar) like '11' then 
'>=14s,<16s'         when cast(DURATION_LEVEL as varchar) like '12' then 
'>=16s,<18s'         when cast(DURATION_LEVEL as varchar) like '13' then 
'>=18s,<20s'         when cast(DURATION_LEVEL as varchar) like '14' then 
'>=20s,<30s'         when cast(DURATION_LEVEL as varchar) like '15' then 
'>=30s,<40s'         when cast(DURATION_LEVEL as varchar) like '16' then 
'>=40s,<50s'         when cast(DURATION_LEVEL as varchar) like '17' then 
'>=50s'            else 'null'         end as DURATION_LEVEL_TRANS    from 
DM_AIR_API_DURATION_SPB_D ) "X___SQL___" WHERE 
("X___SQL___"."DURATION_LEVEL_TRANS" = *'>=5s,<6s'*) GROUP BY 1
  '''
     it seems that the sql is correct, but the result is unexpect, it returns 
null .
{quote}
!image-2019-09-20-15-22-15-356.png!
{quote}  2. But actually, it should  returns some rows because there exist data 
fit the query, let's change the sql form to check it.
 '''
 SELECT "FLIGHT_TYPE" AS "FLIGHT_TYPE",   SUM("SHOPPING_SUCCESS_COUNT") AS 
"sum_SHOPPING_SUCCESS_COUNT_ok" from DM_AIR_API_DURATION_SPB_D  WHERE 
("DURATION_LEVEL" = 4 and "DAY_NUM"='2019-09-10') GROUP BY 1
 '''
{quote}
!image-2019-09-20-15-23-29-881.png!
  

2. 
{quote}and when i write sql like this 
 '''
 SELECT  flight_type,SUM("SHOPPING_SUCCESS_COUNT") AS 
"sum_SHOPPING_SUCCESS_COUNT_ok" from DM_AIR_API_DURATION_SPB_D group by 
flight_type
 '''
 i got result {color:#33}like this{color}
{quote}
{color:#33}!http://mail.163.com/js6/s?func=mbox:getMessageData=270:xtbBDhAtQVrbEIH+vQAAsh=5!{color}
  
{quote}it is same to the result of 
 '''
 SELECT "X___SQL___"."FLIGHT_TYPE" AS "FLIGHT_TYPE",   
SUM("X___SQL___"."SHOPPING_SUCCESS_COUNT") AS "sum_SHOPPING_SUCCESS_COUNT_ok" 
FROM (   select *,        case         when cast(DURATION_LEVEL as varchar) 
like '1' then* '>=0s,<3s'         *when cast(DURATION_LEVEL as varchar) like 
'2' then '>=3s,<4s'         when cast(DURATION_LEVEL as varchar) like '3' then 
'>=4s,<5s'         when cast(DURATION_LEVEL as varchar) like '4' then 
'>=5s,<6s'         when cast(DURATION_LEVEL as varchar) like '5' then 
'>=6s,<7s'         when cast(DURATION_LEVEL as varchar) like '6' then 
'>=7s,<8s'         when cast(DURATION_LEVEL as varchar) like '7' then 
'>=8s,<9s'         when cast(DURATION_LEVEL as varchar) like '8' then 
'>=9s,<10s'         when cast(DURATION_LEVEL as varchar) like '9' then 
'>=10s,<12s'         when cast(DURATION_LEVEL as varchar) like '10' then 
'>=12s,<14s'         when cast(DURATION_LEVEL as varchar) like '11' then 
'>=14s,<16s'         when cast(DURATION_LEVEL as varchar) like '12' then 
'>=16s,<18s'         when cast(DURATION_LEVEL as varchar) like '13' then 
'>=18s,<20s'         when cast(DURATION_LEVEL as varchar) like '14' then 
'>=20s,<30s'         when cast(DURATION_LEVEL as varchar) like '15' then 
'>=30s,<40s'         when cast(DURATION_LEVEL as varchar) like '16' then 
'>=40s,<50s'         when cast(DURATION_LEVEL as varchar) like '17' then 
'>=50s'            else 'null'         end as DURATION_LEVEL_TRANS    from 
DM_AIR_API_DURATION_SPB_D ) "X*_*__SQL___" WHERE 
("X___SQL___"."DURATION_LEVEL_TRANS" =* '>=0s,<3s'*) GROUP BY 1
 '''
{quote}
!http://mail.163.com/js6/s?func=mbox:getMessageData=270:xtbBDhAtQVrbEIH+vQAAsh=6!
  
  
{quote} *I guess maybe there are some bugs about parsing sql like upper ? I 
hope someone can help me to deal this problem.*
{quote}

  was:
{quote}   1.  I use tableau on kylin, tableau has generate a sql : 
'''
SELECT "X___SQL___"."FLIGHT_TYPE" AS "FLIGHT_TYPE",   
SUM("X___SQL___"."SHOPPING_SUCCESS_COUNT") AS "sum_SHOPPING_SUCCESS_COUNT_ok" 
FROM (   select *,        case         when cast(DURATION_LEVEL as varchar) 

[GitHub] [kylin] nichunen merged pull request #847: The first step of build cube job will fail and throw "Column 'xx' in …

2019-09-20 Thread GitBox
nichunen merged pull request #847: The first step of build cube job will fail 
and throw "Column 'xx' in …
URL: https://github.com/apache/kylin/pull/847
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services