[jira] [Assigned] (KYLIN-2264) Date error when use new streaming cube in Kylin1.6.0

2016-12-08 Thread Billy Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billy Liu reassigned KYLIN-2264:


Assignee: Shaofeng SHI  (was: Zhong,Jason)

> Date error when use new streaming cube in Kylin1.6.0
> 
>
> Key: KYLIN-2264
> URL: https://issues.apache.org/jira/browse/KYLIN-2264
> Project: Kylin
>  Issue Type: Bug
>  Components: streaming, Web 
>Affects Versions: v1.6.0
> Environment: Debian 3.2.54-2 x86_64 GNU/Linux
>Reporter: WangSheng
>Assignee: Shaofeng SHI
>
> I installed Kylin1.6.0 and built streaming cube successgfully.But I found two 
> problems which I didn't met in Kylin1.5.*.
> First, segments' start/end time displayed on Kylin Web are earlier 8 hours 
> than my PC date, but streaming cube's Last Build Time and Create Time
> displayed on Kylin Web are same with my PC date. Maybe something wrong when 
> Kylin Web transform the segments' start/end timestamp into date, but I'm not 
> sure.
> Second, I did sql query from streaming cube, but the records' time related 
> columns like "HOUR_START" and "MINUTE_START" are all earlier 8 hours than my 
> PC time. I found that these time related columns' timestamp from HBase are 
> correct by remote debug, so I guess something wrong when Kylin server 
> transform these timestamp into date.
> By the way,  I only changed the "kylin.rest.timezone=GM+8" in file 
> "kylin.properties", and my PC date is same with my server date.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-2261) Cleanup Hbase Storage issue

2016-12-08 Thread Billy Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15734594#comment-15734594
 ] 

Billy Liu commented on KYLIN-2261:
--

In kylin.sh retrieveDependency method, you will find how the dependency works. 
The dependency is defined as HBASE_CLASSPATH_PREFIX, it will be loaded by HBase.

In your shell, try run:
`export 
HBASE_CLASSPATH_PREFIX=${KYLIN_HOME}/conf:${KYLIN_HOME}/lib/*:${KYLIN_HOME}/tool/*:${KYLIN_HOME}/ext/*:${HBASE_CLASSPATH_PREFIX}
hbase classpath`

My HDP sandbox will show 
"/opt/apache-kylin-1.6.1-SNAPSHOT-bin/conf:/opt/apache-kylin-1.6.1-SNAPSHOT-bin/lib/*:/opt/apache-kylin-1.6.1-SNAPSHOT-bin/tool/*:/opt/apache-kylin-1.6.1-SNAPSHOT-bin/ext/*::/usr/hdp/2.2.4.2-2/hbase/conf"
You will notice the kylin libraries in the classpath.  



> Cleanup Hbase Storage issue
> ---
>
> Key: KYLIN-2261
> URL: https://issues.apache.org/jira/browse/KYLIN-2261
> Project: Kylin
>  Issue Type: Test
>  Components: Client - CLI
>Affects Versions: v1.6.0
> Environment: CDH-5.7.2
> Hbase1.2
> Kylin1.6
>Reporter: QiLiFei
>Priority: Critical
>  Labels: test
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> When I try to run below command according doc 
> (http://kylin.apache.org/docs16/howto/howto_cleanup_storage.html), it will 
> always raise error "Error: Could not find or load main class 
> org.apache.kylin.tool.StorageCleanupJob"
> Command : 
>  /opt/kylin/bin/kylin.sh  org.apache.kylin.tool.StorageCleanupJob --delete 
> false
>  
> Is the class in the 'kylin-storage-hbase.jar ' ?
> And it should be put into $KYLIN_HOME/lib/  ,  Right ?
> I've put jar file in $KYLIN_HOME/lib &  $HBase_Home/lib/  and set the 777 
> authority . However it's still cannot working!!
> If I'm wrong , please correct me ! Thanks.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-2264) Date error when use new streaming cube in Kylin1.6.0

2016-12-08 Thread WangSheng (JIRA)
WangSheng created KYLIN-2264:


 Summary: Date error when use new streaming cube in Kylin1.6.0
 Key: KYLIN-2264
 URL: https://issues.apache.org/jira/browse/KYLIN-2264
 Project: Kylin
  Issue Type: Bug
  Components: streaming, Web 
Affects Versions: v1.6.0
 Environment: Debian 3.2.54-2 x86_64 GNU/Linux
Reporter: WangSheng
Assignee: Zhong,Jason


I installed Kylin1.6.0 and built streaming cube successgfully.But I found two 
problems which I didn't met in Kylin1.5.*.

First, segments' start/end time displayed on Kylin Web are earlier 8 hours than 
my PC date, but streaming cube's Last Build Time and Create Time
displayed on Kylin Web are same with my PC date. Maybe something wrong when 
Kylin Web transform the segments' start/end timestamp into date, but I'm not 
sure.

Second, I did sql query from streaming cube, but the records' time related 
columns like "HOUR_START" and "MINUTE_START" are all earlier 8 hours than my PC 
time. I found that these time related columns' timestamp from HBase are correct 
by remote debug, so I guess something wrong when Kylin server transform these 
timestamp into date.

By the way,  I only changed the "kylin.rest.timezone=GM+8" in file 
"kylin.properties", and my PC date is same with my server date.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-2261) Cleanup Hbase Storage issue

2016-12-08 Thread QiLiFei (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15734556#comment-15734556
 ] 

QiLiFei commented on KYLIN-2261:


Hi Billy,

I’ve cp the kylin-tool-1.6.0.jar into both  kylin/lib and hbase/lib   from the 
Tool folder before your notification. And then add below scripts into the 
beginning of the kylin.sh . Unfortunatelly it still not work correctly . Could 
you please provide  a sample ? 

export HBASE_CLASSPATH=/opt/cloudera/parcels/CDH/lib/hbase/lib

 



发送自 Windows 10 版邮件应用

发件人: Billy Liu (JIRA)
发送时间: 2016年12月9日 14:03
收件人: zjsearch...@163.com
主题: [jira] [Commented] (KYLIN-2261) Cleanup Hbase Storage issue


[ 
https://issues.apache.org/jira/browse/KYLIN-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15734443#comment-15734443
 ] 

Billy Liu commented on KYLIN-2261:
--

Maybe something wrong in your HBASE_CLASSPATH, which is used to loading Kylin 
dependencies. I could not reproduce your issue. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)



> Cleanup Hbase Storage issue
> ---
>
> Key: KYLIN-2261
> URL: https://issues.apache.org/jira/browse/KYLIN-2261
> Project: Kylin
>  Issue Type: Test
>  Components: Client - CLI
>Affects Versions: v1.6.0
> Environment: CDH-5.7.2
> Hbase1.2
> Kylin1.6
>Reporter: QiLiFei
>Priority: Critical
>  Labels: test
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> When I try to run below command according doc 
> (http://kylin.apache.org/docs16/howto/howto_cleanup_storage.html), it will 
> always raise error "Error: Could not find or load main class 
> org.apache.kylin.tool.StorageCleanupJob"
> Command : 
>  /opt/kylin/bin/kylin.sh  org.apache.kylin.tool.StorageCleanupJob --delete 
> false
>  
> Is the class in the 'kylin-storage-hbase.jar ' ?
> And it should be put into $KYLIN_HOME/lib/  ,  Right ?
> I've put jar file in $KYLIN_HOME/lib &  $HBase_Home/lib/  and set the 777 
> authority . However it's still cannot working!!
> If I'm wrong , please correct me ! Thanks.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KYLIN-2263) Display reasonable exception message if could not find kafka dependency for streaming build

2016-12-08 Thread Billy Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billy Liu resolved KYLIN-2263.
--
   Resolution: Fixed
Fix Version/s: v1.6.1

will show "Could not find Kafka dependency" if that happends

> Display reasonable exception message if could not find kafka dependency for 
> streaming build 
> 
>
> Key: KYLIN-2263
> URL: https://issues.apache.org/jira/browse/KYLIN-2263
> Project: Kylin
>  Issue Type: Improvement
>  Components: Web 
>Affects Versions: v1.6.0
>Reporter: Billy Liu
>Assignee: Billy Liu
>Priority: Minor
> Fix For: v1.6.1
>
>
> Kafka is optional dependency for Kylin install. But is mandatory for 
> streaming build. Currently, if no KAFKA_HOME exported, the build will show 
> "Error", but without any more detail message. It's not convenient for new 
> user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-2263) Display reasonable exception message if could not find kafka dependency for streaming build

2016-12-08 Thread Billy Liu (JIRA)
Billy Liu created KYLIN-2263:


 Summary: Display reasonable exception message if could not find 
kafka dependency for streaming build 
 Key: KYLIN-2263
 URL: https://issues.apache.org/jira/browse/KYLIN-2263
 Project: Kylin
  Issue Type: Improvement
  Components: Web 
Affects Versions: v1.6.0
Reporter: Billy Liu
Assignee: Billy Liu
Priority: Minor


Kafka is optional dependency for Kylin install. But is mandatory for streaming 
build. Currently, if no KAFKA_HOME exported, the build will show "Error", but 
without any more detail message. It's not convenient for new user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-2261) Cleanup Hbase Storage issue

2016-12-08 Thread Billy Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15734443#comment-15734443
 ] 

Billy Liu commented on KYLIN-2261:
--

Maybe something wrong in your HBASE_CLASSPATH, which is used to loading Kylin 
dependencies. I could not reproduce your issue. 

> Cleanup Hbase Storage issue
> ---
>
> Key: KYLIN-2261
> URL: https://issues.apache.org/jira/browse/KYLIN-2261
> Project: Kylin
>  Issue Type: Test
>  Components: Client - CLI
>Affects Versions: v1.6.0
> Environment: CDH-5.7.2
> Hbase1.2
> Kylin1.6
>Reporter: QiLiFei
>Priority: Critical
>  Labels: test
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> When I try to run below command according doc 
> (http://kylin.apache.org/docs16/howto/howto_cleanup_storage.html), it will 
> always raise error "Error: Could not find or load main class 
> org.apache.kylin.tool.StorageCleanupJob"
> Command : 
>  /opt/kylin/bin/kylin.sh  org.apache.kylin.tool.StorageCleanupJob --delete 
> false
>  
> Is the class in the 'kylin-storage-hbase.jar ' ?
> And it should be put into $KYLIN_HOME/lib/  ,  Right ?
> I've put jar file in $KYLIN_HOME/lib &  $HBase_Home/lib/  and set the 777 
> authority . However it's still cannot working!!
> If I'm wrong , please correct me ! Thanks.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KYLIN-2261) Cleanup Hbase Storage issue

2016-12-08 Thread Billy Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15734223#comment-15734223
 ] 

Billy Liu edited comment on KYLIN-2261 at 12/9/16 5:57 AM:
---

For Kylin 1.6, it's in the tool/kylin-tool-1.6.0.jar. 


was (Author: yimingliu):
For Kylin 1.6, it's in the tool/kylin-tool-1.6.0.jar.  Could you try to cp this 
library into lib.

> Cleanup Hbase Storage issue
> ---
>
> Key: KYLIN-2261
> URL: https://issues.apache.org/jira/browse/KYLIN-2261
> Project: Kylin
>  Issue Type: Test
>  Components: Client - CLI
>Affects Versions: v1.6.0
> Environment: CDH-5.7.2
> Hbase1.2
> Kylin1.6
>Reporter: QiLiFei
>Priority: Critical
>  Labels: test
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> When I try to run below command according doc 
> (http://kylin.apache.org/docs16/howto/howto_cleanup_storage.html), it will 
> always raise error "Error: Could not find or load main class 
> org.apache.kylin.tool.StorageCleanupJob"
> Command : 
>  /opt/kylin/bin/kylin.sh  org.apache.kylin.tool.StorageCleanupJob --delete 
> false
>  
> Is the class in the 'kylin-storage-hbase.jar ' ?
> And it should be put into $KYLIN_HOME/lib/  ,  Right ?
> I've put jar file in $KYLIN_HOME/lib &  $HBase_Home/lib/  and set the 777 
> authority . However it's still cannot working!!
> If I'm wrong , please correct me ! Thanks.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-2261) Cleanup Hbase Storage issue

2016-12-08 Thread Billy Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15734223#comment-15734223
 ] 

Billy Liu commented on KYLIN-2261:
--

For Kylin 1.6, it's in the tool/kylin-tool-1.6.0.jar.  Could you try to cp this 
library into lib.

> Cleanup Hbase Storage issue
> ---
>
> Key: KYLIN-2261
> URL: https://issues.apache.org/jira/browse/KYLIN-2261
> Project: Kylin
>  Issue Type: Test
>  Components: Client - CLI
>Affects Versions: v1.6.0
> Environment: CDH-5.7.2
> Hbase1.2
> Kylin1.6
>Reporter: QiLiFei
>Priority: Critical
>  Labels: test
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> When I try to run below command according doc 
> (http://kylin.apache.org/docs16/howto/howto_cleanup_storage.html), it will 
> always raise error "Error: Could not find or load main class 
> org.apache.kylin.tool.StorageCleanupJob"
> Command : 
>  /opt/kylin/bin/kylin.sh  org.apache.kylin.tool.StorageCleanupJob --delete 
> false
>  
> Is the class in the 'kylin-storage-hbase.jar ' ?
> And it should be put into $KYLIN_HOME/lib/  ,  Right ?
> I've put jar file in $KYLIN_HOME/lib &  $HBase_Home/lib/  and set the 777 
> authority . However it's still cannot working!!
> If I'm wrong , please correct me ! Thanks.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-2262) Kafka Streaming Cube build error

2016-12-08 Thread QiLiFei (JIRA)
QiLiFei created KYLIN-2262:
--

 Summary: Kafka Streaming Cube build error  
 Key: KYLIN-2262
 URL: https://issues.apache.org/jira/browse/KYLIN-2262
 Project: Kylin
  Issue Type: Test
  Components: Client - CLI
Affects Versions: v1.6.0
 Environment: CDH1.5.7
Kylin1.6 
KAFKA-2.0.2-1.2.0.2.p0.5
Reporter: QiLiFei
Priority: Blocker


When I build the kafka stream cube  according to the doc 
(http://kylin.apache.org/docs16/tutorial/cube_streaming.html) , it always raise 
the error in the CLI 
{"url":"http://172.31.18.12:7070/kylin/api/cubes/StreamingCube9/build2","exception":null}

>From the kylin.log, there are only "Java.lang.NullPointerException" 
>present!!I'm not sure what exactly happened there !!!Please give me some 
>support !




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (KYLIN-2260) have a try

2016-12-08 Thread lpstart (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lpstart closed KYLIN-2260.
--
Resolution: Fixed

> have a try
> --
>
> Key: KYLIN-2260
> URL: https://issues.apache.org/jira/browse/KYLIN-2260
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine
>Reporter: lpstart
>Assignee: Dong Li
>Priority: Trivial
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (KYLIN-2259) have a try

2016-12-08 Thread lpstart (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lpstart closed KYLIN-2259.
--
Resolution: Fixed

> have a try
> --
>
> Key: KYLIN-2259
> URL: https://issues.apache.org/jira/browse/KYLIN-2259
> Project: Kylin
>  Issue Type: Bug
>  Components: Web 
>Reporter: lpstart
>Assignee: Zhong,Jason
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KYLIN-2260) have a try

2016-12-08 Thread lpstart (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lpstart updated KYLIN-2260:
---
Priority: Trivial  (was: Major)

> have a try
> --
>
> Key: KYLIN-2260
> URL: https://issues.apache.org/jira/browse/KYLIN-2260
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine
>Reporter: lpstart
>Assignee: Dong Li
>Priority: Trivial
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-2259) have a try

2016-12-08 Thread lpstart (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15732267#comment-15732267
 ] 

lpstart commented on KYLIN-2259:


temp

> have a try
> --
>
> Key: KYLIN-2259
> URL: https://issues.apache.org/jira/browse/KYLIN-2259
> Project: Kylin
>  Issue Type: Bug
>  Components: Web 
>Reporter: lpstart
>Assignee: Zhong,Jason
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-2259) have a try

2016-12-08 Thread lpstart (JIRA)
lpstart created KYLIN-2259:
--

 Summary: have a try
 Key: KYLIN-2259
 URL: https://issues.apache.org/jira/browse/KYLIN-2259
 Project: Kylin
  Issue Type: Bug
  Components: Web 
Reporter: lpstart
Assignee: Zhong,Jason






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-2180) Add project config and make config priority become "cube > project > server"

2016-12-08 Thread kangkaisen (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15732236#comment-15732236
 ] 

kangkaisen commented on KYLIN-2180:
---

OK,thank you!

> Add project config and make config priority become "cube > project > server"
> 
>
> Key: KYLIN-2180
> URL: https://issues.apache.org/jira/browse/KYLIN-2180
> Project: Kylin
>  Issue Type: New Feature
>  Components: Metadata
>Affects Versions: v1.5.4.1
>Reporter: kangkaisen
>Assignee: kangkaisen
> Fix For: v1.6.1
>
> Attachments: KYLIN-2180-tmp.patch, KYLIN-2180.patch
>
>
> There are cases we want to override global kylin.properties in the scope of a 
> project. E.g. the queue name of Hadoop job.
> Finally, the config priority for Kylin should be "cube > project > server". I 
> think which is reasonable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KYLIN-2180) Add project config and make config priority become "cube > project > server"

2016-12-08 Thread liyang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyang resolved KYLIN-2180.
---
   Resolution: Fixed
Fix Version/s: v1.6.1

> Add project config and make config priority become "cube > project > server"
> 
>
> Key: KYLIN-2180
> URL: https://issues.apache.org/jira/browse/KYLIN-2180
> Project: Kylin
>  Issue Type: New Feature
>  Components: Metadata
>Affects Versions: v1.5.4.1
>Reporter: kangkaisen
>Assignee: kangkaisen
> Fix For: v1.6.1
>
> Attachments: KYLIN-2180-tmp.patch, KYLIN-2180.patch
>
>
> There are cases we want to override global kylin.properties in the scope of a 
> project. E.g. the queue name of Hadoop job.
> Finally, the config priority for Kylin should be "cube > project > server". I 
> think which is reasonable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-2180) Add project config and make config priority become "cube > project > server"

2016-12-08 Thread liyang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15732227#comment-15732227
 ] 

liyang commented on KYLIN-2180:
---

Merged! Many thanks to Kaisen!

Did a minor revision commit: 
https://github.com/apache/kylin/commit/c31c8490b05d5f9618464f431cc3d923b012e9b8

Renamed the attribute to "override_kylin_properties", keep consistent with 
CubeDesc and other naming conventions.

> Add project config and make config priority become "cube > project > server"
> 
>
> Key: KYLIN-2180
> URL: https://issues.apache.org/jira/browse/KYLIN-2180
> Project: Kylin
>  Issue Type: New Feature
>  Components: Metadata
>Affects Versions: v1.5.4.1
>Reporter: kangkaisen
>Assignee: kangkaisen
> Attachments: KYLIN-2180-tmp.patch, KYLIN-2180.patch
>
>
> There are cases we want to override global kylin.properties in the scope of a 
> project. E.g. the queue name of Hadoop job.
> Finally, the config priority for Kylin should be "cube > project > server". I 
> think which is reasonable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KYLIN-2254) A kind of sub-query does not work

2016-12-08 Thread liyang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyang resolved KYLIN-2254.
---
   Resolution: Fixed
 Assignee: liyang
Fix Version/s: v1.6.1

> A kind of sub-query does not work
> -
>
> Key: KYLIN-2254
> URL: https://issues.apache.org/jira/browse/KYLIN-2254
> Project: Kylin
>  Issue Type: Bug
>Reporter: liyang
>Assignee: liyang
> Fix For: v1.6.1
>
>
> For example below query does not work.
> SELECT
>   f.lstg_format_name
>   ,sum(price) as sum_price
> FROM
>   test_kylin_fact f
>   inner join
>   ( 
> select
>   lstg_format_name,
>   min(slr_segment_cd) as min_seg
> from
>   test_kylin_fact
> group by
>   lstg_format_name
>   ) t on f.lstg_format_name = t.lstg_format_name
> where
>   f.slr_segment_cd = min_seg
> group by
>   f.lstg_format_name



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-2253) sql union did not remove duplicated records (distinct)

2016-12-08 Thread zhou degao (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731845#comment-15731845
 ] 

zhou degao commented on KYLIN-2253:
---

THANKS. But calcite 1.11.0 is not released.  I will download the lastest code 
of calcite to test it.

> sql union  did not remove duplicated records (distinct)
> ---
>
> Key: KYLIN-2253
> URL: https://issues.apache.org/jira/browse/KYLIN-2253
> Project: Kylin
>  Issue Type: Bug
>  Components: Driver - JDBC
>Affects Versions: v1.5.4.1
> Environment: apache-kylin-1.6.0-hbase1.x-bin.tar.gz
>Reporter: zhou degao
>Priority: Critical
>
> sql like following:
> select "LOOKUP_TIME_BY_DAY"."THE_YEAR" as "c0", 
> "LOOKUP_TIME_BY_DAY"."MONTH_OF_YEAR" as "c1" from "VCGBI_FACT_SALE" as 
> "VCGBI_FACT_SALE" join "LOOKUP_TIME_BY_DAY" as "LOOKUP_TIME_BY_DAY" on 
> "VCGBI_FACT_SALE"."DEAL_TIME" = "LOOKUP_TIME_BY_DAY"."THE_DATE" group by 
> "LOOKUP_TIME_BY_DAY"."THE_YEAR", "LOOKUP_TIME_BY_DAY"."MONTH_OF_YEAR"
> union
> select "LOOKUP_TIME_BY_DAY"."THE_YEAR" as "c0", 
> "LOOKUP_TIME_BY_DAY"."MONTH_OF_YEAR" as "c1" from 
> "VCGBI_FACT_PERSON_MISC_DATA" as "VCGBI_FACT_PERSON_MISC_DATA" join 
> "LOOKUP_TIME_BY_DAY" as "LOOKUP_TIME_BY_DAY" on 
> "VCGBI_FACT_PERSON_MISC_DATA"."TIME" = "LOOKUP_TIME_BY_DAY"."THE_DATE" group 
> by "LOOKUP_TIME_BY_DAY"."THE_YEAR", "LOOKUP_TIME_BY_DAY"."MONTH_OF_YEAR" 
> order by 1 ASC, 2 ASC
> I got result like following:
> 2016  1
> 2016  1
> 2016  2
> 2016  2
> 2016  3
> 2016  3
> 2016  4
> 2016  4
> 2016  5
> 2016  5
> 2016  6
> 2016  6
> 2016  7
> 2016  7
> 2016  8
> 2016  8
> 2016  9
> 2016  9
> 2016  10
> 2016  11
> 2016  12



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-2135) Enlarge FactDistinctColumns reducer number

2016-12-08 Thread kangkaisen (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731733#comment-15731733
 ] 

kangkaisen commented on KYLIN-2135:
---

Hi,Shaofeng.  I am sorry to delay to add the test. Do you think how to add this 
test is best?

> Enlarge FactDistinctColumns reducer number
> --
>
> Key: KYLIN-2135
> URL: https://issues.apache.org/jira/browse/KYLIN-2135
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Affects Versions: v1.5.4.1
>Reporter: kangkaisen
>Assignee: kangkaisen
> Fix For: v1.6.1
>
> Attachments: KYLIN-2135.patch, new.png, old.png
>
>
> When the hive table has billions of rows and use global dictionary for 
> precise count distinct measures, the  {{Extract Fact Table Distinct Columns}} 
> job will run o long time.
> So we could use more reducer to deal with the one column.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-2253) sql union did not remove duplicated records (distinct)

2016-12-08 Thread Dayue Gao (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731699#comment-15731699
 ] 

Dayue Gao commented on KYLIN-2253:
--

Hi [~zhoudegao], it's due to a bug described in CALCITE-1501

> sql union  did not remove duplicated records (distinct)
> ---
>
> Key: KYLIN-2253
> URL: https://issues.apache.org/jira/browse/KYLIN-2253
> Project: Kylin
>  Issue Type: Bug
>  Components: Driver - JDBC
>Affects Versions: v1.5.4.1
> Environment: apache-kylin-1.6.0-hbase1.x-bin.tar.gz
>Reporter: zhou degao
>Priority: Critical
>
> sql like following:
> select "LOOKUP_TIME_BY_DAY"."THE_YEAR" as "c0", 
> "LOOKUP_TIME_BY_DAY"."MONTH_OF_YEAR" as "c1" from "VCGBI_FACT_SALE" as 
> "VCGBI_FACT_SALE" join "LOOKUP_TIME_BY_DAY" as "LOOKUP_TIME_BY_DAY" on 
> "VCGBI_FACT_SALE"."DEAL_TIME" = "LOOKUP_TIME_BY_DAY"."THE_DATE" group by 
> "LOOKUP_TIME_BY_DAY"."THE_YEAR", "LOOKUP_TIME_BY_DAY"."MONTH_OF_YEAR"
> union
> select "LOOKUP_TIME_BY_DAY"."THE_YEAR" as "c0", 
> "LOOKUP_TIME_BY_DAY"."MONTH_OF_YEAR" as "c1" from 
> "VCGBI_FACT_PERSON_MISC_DATA" as "VCGBI_FACT_PERSON_MISC_DATA" join 
> "LOOKUP_TIME_BY_DAY" as "LOOKUP_TIME_BY_DAY" on 
> "VCGBI_FACT_PERSON_MISC_DATA"."TIME" = "LOOKUP_TIME_BY_DAY"."THE_DATE" group 
> by "LOOKUP_TIME_BY_DAY"."THE_YEAR", "LOOKUP_TIME_BY_DAY"."MONTH_OF_YEAR" 
> order by 1 ASC, 2 ASC
> I got result like following:
> 2016  1
> 2016  1
> 2016  2
> 2016  2
> 2016  3
> 2016  3
> 2016  4
> 2016  4
> 2016  5
> 2016  5
> 2016  6
> 2016  6
> 2016  7
> 2016  7
> 2016  8
> 2016  8
> 2016  9
> 2016  9
> 2016  10
> 2016  11
> 2016  12



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-2258) sql query(avg/min/max over) result turns to be replicated

2016-12-08 Thread Luyuan Zhai (JIRA)
Luyuan Zhai created KYLIN-2258:
--

 Summary: sql query(avg/min/max over) result turns to be replicated
 Key: KYLIN-2258
 URL: https://issues.apache.org/jira/browse/KYLIN-2258
 Project: Kylin
  Issue Type: Bug
  Components: Query Engine
 Environment: windows
Reporter: Luyuan Zhai
Assignee: liyang
Priority: Minor


Situation: I inputed sql query following then results turned right yet with 
duplicates. It seems the over(partition by clause) doesn't work.

>>data source:learn_kylin
table kylin_sales looks like:
 
Row | TRANS_ID | PART_DT | LSTG_FORMAT_NAME | LEAF_CATEG_ID | LSTG_SITE_ID | 
SLR_SEGMENT_CD | PRICE | ITEM_COUNT | SELLER_ID | USER_ID | REGION |
| 1 | 0 | 2012-12-14 | Others | 88750 | 0 | 11 | 36.2828 | 0 | 1349 | 
ANALYST | Beijing |


>>sql query like:
select LSTG_FORMAT_NAME, avg(price) over(partition by LSTG_FORMAT_NAME) as 
price_level, max(price) over(partition by LSTG_FORMAT_NAME) as price_max, 
min(price) over(partition by LSTG_FORMAT_NAME) as price_min from kylin_sales

>>results returns as: (you can find this file attached)
LSTG_FORMAT_NAMEPRICE_LEVEL PRICE_MAX   PRICE_MIN
FP-non GTC  49.81376145.495 0.0537
FP-non GTC  49.81376145.495 0.0537
FP-non GTC  49.81376145.495 0.0537
FP-non GTC  49.81376145.495 0.0537
FP-non GTC  49.81376145.495 0.0537
FP-non GTC  49.81376145.495 0.0537
FP-non GTC  49.81376145.495 0.0537
FP-non GTC  49.81376145.495 0.0537
FP-non GTC  49.81376145.495 0.0537
FP-non GTC  49.81376145.495 0.0537
FP-non GTC  49.81376145.495 0.0537
FP-non GTC  49.81376145.495 0.0537
FP-non GTC  49.81376145.495 0.0537
FP-non GTC  49.81376145.495 0.0537
FP-non GTC  49.81376145.495 0.0537
FP-non GTC  49.81376145.495 0.0537
FP-non GTC  49.81376145.495 0.0537




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-2237) Ensure dimensions and measures of model don't have null column

2016-12-08 Thread kangkaisen (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731611#comment-15731611
 ] 

kangkaisen commented on KYLIN-2237:
---

I agree with you. we need coding convention , coding style check and  test code 
in front-end.

> Ensure dimensions and measures of model don't have null column
> --
>
> Key: KYLIN-2237
> URL: https://issues.apache.org/jira/browse/KYLIN-2237
> Project: Kylin
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: v1.5.4.1
>Reporter: kangkaisen
>Assignee: kangkaisen
> Attachments: KYLIN-2237.patch
>
>
> currently, the dimensions or measures of model maybe have null column.
> like this: 
> u'dimensions': [{u'table': u'TEST.KYLIN_CAL_DT_KKS', u'columns': [u'CAL_DT', 
> u'YEAR_BEG_DT', u'QTR_BEG_DT', None, u'DAY_OF_CAL_ID_KKS']}]
> which could be produced by the following steps:
> 1. rename the hive column in model dimensions or measures.
> 2. reload the hive table.
> 3. don't remove the null column because of carelessness and update the model.
> 4 edit the model again and could not select the dimensions or measures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KYLIN-2216) Potential NPE in model#findTable() call

2016-12-08 Thread liyang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyang resolved KYLIN-2216.
---
   Resolution: Fixed
 Assignee: liyang
Fix Version/s: v1.6.1

> Potential NPE in model#findTable() call
> ---
>
> Key: KYLIN-2216
> URL: https://issues.apache.org/jira/browse/KYLIN-2216
> Project: Kylin
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: liyang
>Priority: Minor
> Fix For: v1.6.1
>
>
> In DimensionDesc :
> {code}
> if (table != null)
> table = table.toUpperCase();
> DataModelDesc model = cubeDesc.getModel();
> tableRef = model.findTable(this.getTable());
> {code}
> If table is null, there would be NPE in findTable():
> {code}
> public TableRef findTable(String table) throws IllegalArgumentException {
> TableRef result = tableNameMap.get(table.toUpperCase());
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-2216) Potential NPE in model#findTable() call

2016-12-08 Thread liyang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731602#comment-15731602
 ] 

liyang commented on KYLIN-2216:
---

Thanks for reporting Ted!

> Potential NPE in model#findTable() call
> ---
>
> Key: KYLIN-2216
> URL: https://issues.apache.org/jira/browse/KYLIN-2216
> Project: Kylin
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Minor
>
> In DimensionDesc :
> {code}
> if (table != null)
> table = table.toUpperCase();
> DataModelDesc model = cubeDesc.getModel();
> tableRef = model.findTable(this.getTable());
> {code}
> If table is null, there would be NPE in findTable():
> {code}
> public TableRef findTable(String table) throws IllegalArgumentException {
> TableRef result = tableNameMap.get(table.toUpperCase());
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-2216) Potential NPE in model#findTable() call

2016-12-08 Thread liyang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731600#comment-15731600
 ] 

liyang commented on KYLIN-2216:
---

This is fixed accidentally by 
https://github.com/apache/kylin/commit/967ef18062048e199cd0dd351ba82618a10b08e5



> Potential NPE in model#findTable() call
> ---
>
> Key: KYLIN-2216
> URL: https://issues.apache.org/jira/browse/KYLIN-2216
> Project: Kylin
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Minor
>
> In DimensionDesc :
> {code}
> if (table != null)
> table = table.toUpperCase();
> DataModelDesc model = cubeDesc.getModel();
> tableRef = model.findTable(this.getTable());
> {code}
> If table is null, there would be NPE in findTable():
> {code}
> public TableRef findTable(String table) throws IllegalArgumentException {
> TableRef result = tableNameMap.get(table.toUpperCase());
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-2239) Remove refreshSegment in JobService

2016-12-08 Thread kangkaisen (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731598#comment-15731598
 ] 

kangkaisen commented on KYLIN-2239:
---

OK. Do you mean we could remove the refresh API in front-end and back-end both?

> Remove refreshSegment in JobService
> ---
>
> Key: KYLIN-2239
> URL: https://issues.apache.org/jira/browse/KYLIN-2239
> Project: Kylin
>  Issue Type: Improvement
>Reporter: kangkaisen
>Assignee: kangkaisen
>Priority: Minor
>
> currently, we have three build types:build, refresh, merge.  But the build 
> and the refresh type only is one job type indeed and the build type could 
> replace the refresh type completely. 
> So, I think the refresh type is redundant. we can firstly remove  
> refreshSegment in JobService internal and keep the web api unchanged.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-2239) Remove refreshSegment in JobService

2016-12-08 Thread liyang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731578#comment-15731578
 ] 

liyang commented on KYLIN-2239:
---

Agree. If the refresh API is not used in web GUI, I don't think it is used 
other places or 3rd party tools.

> Remove refreshSegment in JobService
> ---
>
> Key: KYLIN-2239
> URL: https://issues.apache.org/jira/browse/KYLIN-2239
> Project: Kylin
>  Issue Type: Improvement
>Reporter: kangkaisen
>Assignee: kangkaisen
>Priority: Minor
>
> currently, we have three build types:build, refresh, merge.  But the build 
> and the refresh type only is one job type indeed and the build type could 
> replace the refresh type completely. 
> So, I think the refresh type is redundant. we can firstly remove  
> refreshSegment in JobService internal and keep the web api unchanged.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-2238) Add query server scan threshold

2016-12-08 Thread liyang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731572#comment-15731572
 ] 

liyang commented on KYLIN-2238:
---

+1

> Add query server scan threshold
> ---
>
> Key: KYLIN-2238
> URL: https://issues.apache.org/jira/browse/KYLIN-2238
> Project: Kylin
>  Issue Type: Improvement
>  Components: Query Engine
>Affects Versions: v1.5.4.1
>Reporter: kangkaisen
>Assignee: kangkaisen
> Attachments: KYLIN-2238.patch
>
>
> currently, we have added  scan threshold in HBase RegionServer, we should 
> also add scan threshold in Kylin query server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (KYLIN-1834) java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build Dimension Dictionary

2016-12-08 Thread liyang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyang closed KYLIN-1834.
-

> java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build 
> Dimension Dictionary
> --
>
> Key: KYLIN-1834
> URL: https://issues.apache.org/jira/browse/KYLIN-1834
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v1.5.2, v1.5.2.1
>Reporter: Richard Calaba
>Assignee: liyang
>Priority: Blocker
> Fix For: v1.6.0
>
> Attachments: job_2016_06_28_09_59_12-value-not-found.zip
>
>
> Getting exception in Step 4 - Build Dimension Dictionary:
> java.lang.IllegalArgumentException: Value not exists!
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160)
>   at 
> org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158)
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96)
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76)
>   at 
> org.apache.kylin.dict.lookup.SnapshotTable.takeSnapshot(SnapshotTable.java:96)
>   at 
> org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:106)
>   at 
> org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215)
>   at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59)
>   at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
>   at 
> org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>   at 
> org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
>   at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
>   at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> result code:2
> The code which generates the exception is:
> org.apache.kylin.dimension.Dictionary.java:
>  /**
>  * A lower level API, return ID integer from raw value bytes. In case of 
> not found 
>  * 
>  * - if roundingFlag=0, throw IllegalArgumentException; 
>  * - if roundingFlag<0, the closest smaller ID integer if exist; 
>  * - if roundingFlag>0, the closest bigger ID integer if exist. 
>  * 
>  * Bypassing the cache layer, this could be significantly slower than 
> getIdFromValue(T value).
>  * 
>  * @throws IllegalArgumentException
>  * if value is not found in dictionary and rounding is off;
>  * or if rounding cannot find a smaller or bigger ID
>  */
> final public int getIdFromValueBytes(byte[] value, int offset, int len, 
> int roundingFlag) throws IllegalArgumentException {
> if (isNullByteForm(value, offset, len))
> return nullId();
> else {
> int id = getIdFromValueBytesImpl(value, offset, len, 
> roundingFlag);
> if (id < 0)
> throw new IllegalArgumentException("Value not exists!");
> return id;
> }
> } 
> ==
> The Cube is big - fact 110 mio rows, the largest dimension (customer) has 10 
> mio rows. I have increased the JVM -Xmx to 16gb and set the 
> kylin.table.snapshot.max_mb=2048 in kylin.properties to make sure the Cube 
> build doesn't fail (previously we were getting exception complaining about 
> the 300MB limit for Dimension dictionary size (req. approx 700MB)).
> ==
> Before that we were getting exception complaining about the Dictionary 
> encoding problem - "Too high cardinality is not suitable for dictionary -- 
> cardinality: 10873977" - this we resolved by changing the affected 
> dimension/row key Encoding from "dict" to "int; length=8" on the Advanced 
> Settings of the Cube.
> ==
> We have 2 high-cardinality fields (one from fact table and one from the big 
> dimension (customer - see above). We need to use in 

[jira] [Resolved] (KYLIN-1834) java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build Dimension Dictionary

2016-12-08 Thread liyang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyang resolved KYLIN-1834.
---
   Resolution: Fixed
Fix Version/s: (was: v1.5.4)
   v1.6.0

> java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build 
> Dimension Dictionary
> --
>
> Key: KYLIN-1834
> URL: https://issues.apache.org/jira/browse/KYLIN-1834
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v1.5.2, v1.5.2.1
>Reporter: Richard Calaba
>Assignee: liyang
>Priority: Blocker
> Fix For: v1.6.0
>
> Attachments: job_2016_06_28_09_59_12-value-not-found.zip
>
>
> Getting exception in Step 4 - Build Dimension Dictionary:
> java.lang.IllegalArgumentException: Value not exists!
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160)
>   at 
> org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158)
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96)
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76)
>   at 
> org.apache.kylin.dict.lookup.SnapshotTable.takeSnapshot(SnapshotTable.java:96)
>   at 
> org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:106)
>   at 
> org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215)
>   at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59)
>   at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
>   at 
> org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>   at 
> org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
>   at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
>   at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> result code:2
> The code which generates the exception is:
> org.apache.kylin.dimension.Dictionary.java:
>  /**
>  * A lower level API, return ID integer from raw value bytes. In case of 
> not found 
>  * 
>  * - if roundingFlag=0, throw IllegalArgumentException; 
>  * - if roundingFlag<0, the closest smaller ID integer if exist; 
>  * - if roundingFlag>0, the closest bigger ID integer if exist. 
>  * 
>  * Bypassing the cache layer, this could be significantly slower than 
> getIdFromValue(T value).
>  * 
>  * @throws IllegalArgumentException
>  * if value is not found in dictionary and rounding is off;
>  * or if rounding cannot find a smaller or bigger ID
>  */
> final public int getIdFromValueBytes(byte[] value, int offset, int len, 
> int roundingFlag) throws IllegalArgumentException {
> if (isNullByteForm(value, offset, len))
> return nullId();
> else {
> int id = getIdFromValueBytesImpl(value, offset, len, 
> roundingFlag);
> if (id < 0)
> throw new IllegalArgumentException("Value not exists!");
> return id;
> }
> } 
> ==
> The Cube is big - fact 110 mio rows, the largest dimension (customer) has 10 
> mio rows. I have increased the JVM -Xmx to 16gb and set the 
> kylin.table.snapshot.max_mb=2048 in kylin.properties to make sure the Cube 
> build doesn't fail (previously we were getting exception complaining about 
> the 300MB limit for Dimension dictionary size (req. approx 700MB)).
> ==
> Before that we were getting exception complaining about the Dictionary 
> encoding problem - "Too high cardinality is not suitable for dictionary -- 
> cardinality: 10873977" - this we resolved by changing the affected 
> dimension/row key Encoding from "dict" to "int; length=8" on the Advanced 
> Settings of the Cube.
> ==
> We have 2 high-cardinality fields (one from fact 

[jira] [Reopened] (KYLIN-1834) java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build Dimension Dictionary

2016-12-08 Thread liyang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyang reopened KYLIN-1834:
---

> java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build 
> Dimension Dictionary
> --
>
> Key: KYLIN-1834
> URL: https://issues.apache.org/jira/browse/KYLIN-1834
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v1.5.2, v1.5.2.1
>Reporter: Richard Calaba
>Assignee: liyang
>Priority: Blocker
> Fix For: v1.5.4
>
> Attachments: job_2016_06_28_09_59_12-value-not-found.zip
>
>
> Getting exception in Step 4 - Build Dimension Dictionary:
> java.lang.IllegalArgumentException: Value not exists!
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160)
>   at 
> org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158)
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96)
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76)
>   at 
> org.apache.kylin.dict.lookup.SnapshotTable.takeSnapshot(SnapshotTable.java:96)
>   at 
> org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:106)
>   at 
> org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215)
>   at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59)
>   at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
>   at 
> org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>   at 
> org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
>   at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
>   at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> result code:2
> The code which generates the exception is:
> org.apache.kylin.dimension.Dictionary.java:
>  /**
>  * A lower level API, return ID integer from raw value bytes. In case of 
> not found 
>  * 
>  * - if roundingFlag=0, throw IllegalArgumentException; 
>  * - if roundingFlag<0, the closest smaller ID integer if exist; 
>  * - if roundingFlag>0, the closest bigger ID integer if exist. 
>  * 
>  * Bypassing the cache layer, this could be significantly slower than 
> getIdFromValue(T value).
>  * 
>  * @throws IllegalArgumentException
>  * if value is not found in dictionary and rounding is off;
>  * or if rounding cannot find a smaller or bigger ID
>  */
> final public int getIdFromValueBytes(byte[] value, int offset, int len, 
> int roundingFlag) throws IllegalArgumentException {
> if (isNullByteForm(value, offset, len))
> return nullId();
> else {
> int id = getIdFromValueBytesImpl(value, offset, len, 
> roundingFlag);
> if (id < 0)
> throw new IllegalArgumentException("Value not exists!");
> return id;
> }
> } 
> ==
> The Cube is big - fact 110 mio rows, the largest dimension (customer) has 10 
> mio rows. I have increased the JVM -Xmx to 16gb and set the 
> kylin.table.snapshot.max_mb=2048 in kylin.properties to make sure the Cube 
> build doesn't fail (previously we were getting exception complaining about 
> the 300MB limit for Dimension dictionary size (req. approx 700MB)).
> ==
> Before that we were getting exception complaining about the Dictionary 
> encoding problem - "Too high cardinality is not suitable for dictionary -- 
> cardinality: 10873977" - this we resolved by changing the affected 
> dimension/row key Encoding from "dict" to "int; length=8" on the Advanced 
> Settings of the Cube.
> ==
> We have 2 high-cardinality fields (one from fact table and one from the big 
> dimension (customer - see above). We need to use in