[jira] [Commented] (KYLIN-2929) speed up Dump file performance

2018-01-28 Thread fengYu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16342967#comment-16342967
 ] 

fengYu commented on KYLIN-2929:
---

Upload new patch, please review it if you are free.

> speed up Dump file performance
> --
>
> Key: KYLIN-2929
> URL: https://issues.apache.org/jira/browse/KYLIN-2929
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.0.0
>Reporter: fengYu
>Assignee: fengYu
>Priority: Major
>  Labels: Performance
> Fix For: v2.3.0
>
> Attachments: 
> 0001-KYLIN-2929-speed-up-dump-performance-write-dump-file.patch
>
>
> when I work on KYLIN-2926, I find coprocessor will dump to disk once 
> estimatedMemSize is bigger than spillThreshold, and found that spill data 
> size is extraordinary smaller that estimatedMemSize, in my case dump file 
> size is about 8MB and spillThreshold is setting to 3GB.   
> So, I try to keep the spill data in memory rather than write the file to disk 
> immediately, and when those in-memory spill data reach the threshold, write 
> all spill files together.
> In my case, the coprocessor process cost time drop from 22s to 16s, it is 
> about 30% upgrade。



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KYLIN-2929) speed up Dump file performance

2018-01-28 Thread fengYu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

fengYu updated KYLIN-2929:
--
Attachment: 0001-KYLIN-2929-speed-up-dump-performance-write-dump-file.patch

> speed up Dump file performance
> --
>
> Key: KYLIN-2929
> URL: https://issues.apache.org/jira/browse/KYLIN-2929
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.0.0
>Reporter: fengYu
>Assignee: fengYu
>Priority: Major
>  Labels: Performance
> Fix For: v2.3.0
>
> Attachments: 
> 0001-KYLIN-2929-speed-up-dump-performance-write-dump-file.patch
>
>
> when I work on KYLIN-2926, I find coprocessor will dump to disk once 
> estimatedMemSize is bigger than spillThreshold, and found that spill data 
> size is extraordinary smaller that estimatedMemSize, in my case dump file 
> size is about 8MB and spillThreshold is setting to 3GB.   
> So, I try to keep the spill data in memory rather than write the file to disk 
> immediately, and when those in-memory spill data reach the threshold, write 
> all spill files together.
> In my case, the coprocessor process cost time drop from 22s to 16s, it is 
> about 30% upgrade。



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KYLIN-2929) speed up Dump file performance

2018-01-28 Thread fengYu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

fengYu updated KYLIN-2929:
--
Attachment: (was: 
0002-KYLIN-2929-speed-up-dump-performance-write-dump-file.patch)

> speed up Dump file performance
> --
>
> Key: KYLIN-2929
> URL: https://issues.apache.org/jira/browse/KYLIN-2929
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.0.0
>Reporter: fengYu
>Assignee: fengYu
>Priority: Major
>  Labels: Performance
> Fix For: v2.3.0
>
> Attachments: 
> 0001-KYLIN-2929-speed-up-dump-performance-write-dump-file.patch
>
>
> when I work on KYLIN-2926, I find coprocessor will dump to disk once 
> estimatedMemSize is bigger than spillThreshold, and found that spill data 
> size is extraordinary smaller that estimatedMemSize, in my case dump file 
> size is about 8MB and spillThreshold is setting to 3GB.   
> So, I try to keep the spill data in memory rather than write the file to disk 
> immediately, and when those in-memory spill data reach the threshold, write 
> all spill files together.
> In my case, the coprocessor process cost time drop from 22s to 16s, it is 
> about 30% upgrade。



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-2929) speed up Dump file performance

2018-01-24 Thread fengYu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16337544#comment-16337544
 ] 

fengYu commented on KYLIN-2929:
---

Sorry for my bad memory, I will do this review this week. 

> speed up Dump file performance
> --
>
> Key: KYLIN-2929
> URL: https://issues.apache.org/jira/browse/KYLIN-2929
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.0.0
>Reporter: fengYu
>Assignee: fengYu
>Priority: Major
>  Labels: Performance
> Attachments: 
> 0002-KYLIN-2929-speed-up-dump-performance-write-dump-file.patch
>
>
> when I work on KYLIN-2926, I find coprocessor will dump to disk once 
> estimatedMemSize is bigger than spillThreshold, and found that spill data 
> size is extraordinary smaller that estimatedMemSize, in my case dump file 
> size is about 8MB and spillThreshold is setting to 3GB.   
> So, I try to keep the spill data in memory rather than write the file to disk 
> immediately, and when those in-memory spill data reach the threshold, write 
> all spill files together.
> In my case, the coprocessor process cost time drop from 22s to 16s, it is 
> about 30% upgrade。



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KYLIN-2950) Change build engine smoothly

2017-10-19 Thread fengYu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

fengYu updated KYLIN-2950:
--
Attachment: 0001-KYLIN-2950-Change-build-engine-smoothly.patch

> Change build engine smoothly
> 
>
> Key: KYLIN-2950
> URL: https://issues.apache.org/jira/browse/KYLIN-2950
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v2.0.0
>Reporter: fengYu
>Assignee: fengYu
> Attachments: 0001-KYLIN-2950-Change-build-engine-smoothly.patch
>
>
> Currently, we can not change build engine without disable cube and purging 
> all existing segments. But it is expensive。
> After my test building cube with MR engine and Spark engine, there generate 
> the same result, and Merge job always use mr engine. Hence, I think take 
> engineType as a part of cube's Signature is improper。
> I change the code and modify cube json. it work will with MR and spark engine 
> Alternately。
> TO BE DONE:modify web UI.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KYLIN-2950) Change build engine smoothly

2017-10-19 Thread fengYu (JIRA)
fengYu created KYLIN-2950:
-

 Summary: Change build engine smoothly
 Key: KYLIN-2950
 URL: https://issues.apache.org/jira/browse/KYLIN-2950
 Project: Kylin
  Issue Type: Bug
Affects Versions: v2.0.0
Reporter: fengYu
Assignee: fengYu


Currently, we can not change build engine without disable cube and purging all 
existing segments. But it is expensive。

After my test building cube with MR engine and Spark engine, there generate the 
same result, and Merge job always use mr engine. Hence, I think take engineType 
as a part of cube's Signature is improper。

I change the code and modify cube json. it work will with MR and spark engine 
Alternately。

TO BE DONE:modify web UI.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-2926) DumpMerger return incorrect results

2017-10-17 Thread fengYu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16207240#comment-16207240
 ] 

fengYu commented on KYLIN-2926:
---

ok, for a quickly fix, I think the patch is useful, a new jira need to be 
created. 

> DumpMerger return incorrect results
> ---
>
> Key: KYLIN-2926
> URL: https://issues.apache.org/jira/browse/KYLIN-2926
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v2.0.0
>Reporter: fengYu
>Assignee: fengYu
> Attachments: 
> 0001-KYLIN-2926-DumpMerger-return-incorrect-results-creat.patch
>
>
> I our scenario, a cube query will get wrong result once coprocessor need to 
> spill to disk, Our version is 2.0.0 and I find the root cause is that in 
> DumpMerger.enqueueFromDump
> because in DataTypeSerializer kylin use a ThreadLocal variable ‘current’, It 
> leading to different elements in dumpCurrentValues share the same object, so 
> next fill up measure values will change the existing values. 
> the incorrect measures is HLLC and raw, which use current variable in 
> deserialize.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-1892) merge interval support

2017-10-16 Thread fengYu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16206955#comment-16206955
 ] 

fengYu commented on KYLIN-1892:
---

sorry for delay, you can finish it if you have planed, Thanks.

> merge interval support
> --
>
> Key: KYLIN-1892
> URL: https://issues.apache.org/jira/browse/KYLIN-1892
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Affects Versions: v1.5.2
>Reporter: fengYu
>Assignee: Yang Hao
>
> We always has some data need to be amended some days later
> in current kylin, once I set Auto Merge Thresholds, the segment newly build 
> will merge if reach Thresholds, the next day refresh will refresh merged 
> segemnt, which is unnecessary.
> So I want to add a interval configuration means auto merge will merge 
> segments outside of the interval. 
> for example, if interval = 2, Auto Merge Thresholds=7, if 07-01 to 07-07 is 
> built, auto merge will not trigger, when 07-09 built success, auto merge will 
> trigger and merge segments from 07-01 to 07-07.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-2926) DumpMerger return incorrect results

2017-10-16 Thread fengYu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16206944#comment-16206944
 ] 

fengYu commented on KYLIN-2926:
---

This is what I means on above response, I think create a codec for every dump 
is a good way too, However, for the finally solving the problem, remove the 
ThreadLocal is a better way, which can avoid the trap for the following 
delevoper.

> DumpMerger return incorrect results
> ---
>
> Key: KYLIN-2926
> URL: https://issues.apache.org/jira/browse/KYLIN-2926
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v2.0.0
>Reporter: fengYu
>Assignee: fengYu
> Attachments: 
> 0001-KYLIN-2926-DumpMerger-return-incorrect-results-creat.patch
>
>
> I our scenario, a cube query will get wrong result once coprocessor need to 
> spill to disk, Our version is 2.0.0 and I find the root cause is that in 
> DumpMerger.enqueueFromDump
> because in DataTypeSerializer kylin use a ThreadLocal variable ‘current’, It 
> leading to different elements in dumpCurrentValues share the same object, so 
> next fill up measure values will change the existing values. 
> the incorrect measures is HLLC and raw, which use current variable in 
> deserialize.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-2926) DumpMerger return incorrect results

2017-10-16 Thread fengYu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16206898#comment-16206898
 ] 

fengYu commented on KYLIN-2926:
---

[~Shaofengshi] I am totally agree with you, but there are some more places 
refer to HLLC and RAW, it need to more test to cover those. I quickly fix it by 
this patch, I think the bug is very serious, A big cube which contains HLLC or 
RAW measure(once coprocessor need dump data) maybe be return incorrect results.

> DumpMerger return incorrect results
> ---
>
> Key: KYLIN-2926
> URL: https://issues.apache.org/jira/browse/KYLIN-2926
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v2.0.0
>Reporter: fengYu
>Assignee: fengYu
> Attachments: 
> 0001-KYLIN-2926-DumpMerger-return-incorrect-results-creat.patch
>
>
> I our scenario, a cube query will get wrong result once coprocessor need to 
> spill to disk, Our version is 2.0.0 and I find the root cause is that in 
> DumpMerger.enqueueFromDump
> because in DataTypeSerializer kylin use a ThreadLocal variable ‘current’, It 
> leading to different elements in dumpCurrentValues share the same object, so 
> next fill up measure values will change the existing values. 
> the incorrect measures is HLLC and raw, which use current variable in 
> deserialize.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KYLIN-2929) speed up Dump file performance

2017-10-13 Thread fengYu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

fengYu updated KYLIN-2929:
--
Attachment: 0002-KYLIN-2929-speed-up-dump-performance-write-dump-file.patch

this is my patch and test result :
run the same sql three times and watch coprocessor process time.

Before : 

2017-10-13 16:53:34,986 INFO  [kylin-coproc--pool5-t70] 
v2.CubeHBaseEndpointRPC:200 : Endpoint RPC 
returned from HTable V200_NEW_KYLIN_KIJXSDW18F Shard 
\x56\x32\x30\x30\x5F\x4E\x45\x57\x5F\x4B\x59\x4C\x49\x4E\x5F\x4B\x49\x4A\x58\x53\x44\x57\x31\x38\x46\x2C\x00\x01\x2C\x31\x35\x30\x37\x37\x30\x33\x35\x32\x33\x36\x39\x35\x2E\x65\x39\x37\x63\x64\x38\x34\x32\x62\x33\x61\x63\x37\x63\x66\x30\x32\x38\x31\x64\x36\x32\x66\x38\x31\x63\x62\x36\x61\x38\x64\x39\x2E
 on host: db-53.photo.163.org.Total scanned row: 634776. Total scanned bytes: 
134956120. Total filtered/aggred row: 527872. Time elapsed in EP: 19082(ms). 
Server CPU usage: 0.0, server physical mem left: 1.381179392E9, server swap mem 
left:2.075181056E9.Etc message: start latency: 34@57,agg done@18265,compress 
done@19081,server stats done@19081, 
debugGitTag:a08e52e24c99f312eaa63bd3f9ef4cdc53fa2a67;.Normal Complete: 
true.Compressed row size: 13954413

2017-10-13 16:55:30,633 INFO  [kylin-coproc--pool5-t72] 
v2.CubeHBaseEndpointRPC:200 : Endpoint RPC 
returned from HTable V200_NEW_KYLIN_KIJXSDW18F Shard 
\x56\x32\x30\x30\x5F\x4E\x45\x57\x5F\x4B\x59\x4C\x49\x4E\x5F\x4B\x49\x4A\x58\x53\x44\x57\x31\x38\x46\x2C\x00\x01\x2C\x31\x35\x30\x37\x37\x30\x33\x35\x32\x33\x36\x39\x35\x2E\x65\x39\x37\x63\x64\x38\x34\x32\x62\x33\x61\x63\x37\x63\x66\x30\x32\x38\x31\x64\x36\x32\x66\x38\x31\x63\x62\x36\x61\x38\x64\x39\x2E
 on host: db-53.photo.163.org.Total scanned row: 634776. Total scanned bytes: 
134956120. Total filtered/aggred row: 527872. Time elapsed in EP: 17371(ms). 
Server CPU usage: 0.08703703703703704, server physical mem left: 1.340674048E9, 
server swap mem left:2.075181056E9.Etc message: start latency: 12@3,agg 
done@16586,compress done@17371,server stats done@17371, 
debugGitTag:a08e52e24c99f312eaa63bd3f9ef4cdc53fa2a67;.Normal Complete: 
true.Compressed row size: 13954413

2017-10-13 16:56:33,382 INFO  [kylin-coproc--pool5-t74] 
v2.CubeHBaseEndpointRPC:200 : Endpoint RPC 
returned from HTable V200_NEW_KYLIN_KIJXSDW18F Shard 
\x56\x32\x30\x30\x5F\x4E\x45\x57\x5F\x4B\x59\x4C\x49\x4E\x5F\x4B\x49\x4A\x58\x53\x44\x57\x31\x38\x46\x2C\x00\x01\x2C\x31\x35\x30\x37\x37\x30\x33\x35\x32\x33\x36\x39\x35\x2E\x65\x39\x37\x63\x64\x38\x34\x32\x62\x33\x61\x63\x37\x63\x66\x30\x32\x38\x31\x64\x36\x32\x66\x38\x31\x63\x62\x36\x61\x38\x64\x39\x2E
 on host: db-53.photo.163.org.Total scanned row: 634776. Total scanned bytes: 
134956120. Total filtered/aggred row: 527872. Time elapsed in EP: 17184(ms). 
Server CPU usage: 0.0624334964886146, server physical mem left: 1.320890368E9, 
server swap mem left:2.075181056E9.Etc message: start latency: 12@1,agg 
done@16397,compress done@17184,server stats done@17184, 
debugGitTag:a08e52e24c99f312eaa63bd3f9ef4cdc53fa2a67;.Normal Complete: 
true.Compressed row size: 13954413

After :

2017-10-13 17:01:05,660 INFO  [kylin-coproc--pool5-t76] 
v2.CubeHBaseEndpointRPC:200 : Endpoint RPC 
returned from HTable V200_NEW_KYLIN_KIJXSDW18F Shard 
\x56\x32\x30\x30\x5F\x4E\x45\x57\x5F\x4B\x59\x4C\x49\x4E\x5F\x4B\x49\x4A\x58\x53\x44\x57\x31\x38\x46\x2C\x00\x01\x2C\x31\x35\x30\x37\x37\x30\x33\x35\x32\x33\x36\x39\x35\x2E\x65\x39\x37\x63\x64\x38\x34\x32\x62\x33\x61\x63\x37\x63\x66\x30\x32\x38\x31\x64\x36\x32\x66\x38\x31\x63\x62\x36\x61\x38\x64\x39\x2E
 on host: db-53.photo.163.org.Total scanned row: 634776. Total scanned bytes: 
134956120. Total filtered/aggred row: 527872. Time elapsed in EP: 12253(ms). 
Server CPU usage: 0.0900900900900901, server physical mem left: 1.328091136E9, 
server swap mem left:2.075181056E9.Etc message: start latency: 33@58,agg 
done@11463,compress done@12253,server stats done@12253, 
debugGitTag:a08e52e24c99f312eaa63bd3f9ef4cdc53fa2a67;.Normal Complete: 
true.Compressed row size: 13954413

2017-10-13 17:02:05,746 INFO  [kylin-coproc--pool5-t78] 
v2.CubeHBaseEndpointRPC:200 : Endpoint RPC 
returned from HTable V200_NEW_KYLIN_KIJXSDW18F Shard 
\x56\x32\x30\x30\x5F\x4E\x45\x57\x5F\x4B\x59\x4C\x49\x4E\x5F\x4B\x49\x4A\x58\x53\x44\x57\x31\x38\x46\x2C\x00\x01\x2C\x31\x35\x30\x37\x37\x30\x33\x35\x32\x33\x36\x39\x35\x2E\x65\x39\x37\x63\x64\x38\x34\x32\x62\x33\x61\x63\x37\x63\x66\x30\x32\x38\x31\x64\x36\x32\x66\x38\x31\x63\x62\x36\x61\x38\x64\x39\x2E
 on host: db-53.photo.163.org.Total scanned row: 634776. Total scanned bytes: 
134956120. Total filtered/aggred row: 527872. Time elapsed in EP: 11394(ms). 
Server CPU usage: 0.09580838323353294, server physical mem left: 1.10680064E9, 
server swap mem left:2.075181056E9.Etc message: start latency: 12@3,agg 
done@10605,compress done@11394,server stats done@11394, 
debugGitTag:a08e52e24c99f312eaa63bd3f9ef4cdc53fa2a67;.Normal

[jira] [Updated] (KYLIN-2926) DumpMerger return incorrect results

2017-10-12 Thread fengYu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

fengYu updated KYLIN-2926:
--
Attachment: 0001-KYLIN-2926-DumpMerger-return-incorrect-results-creat.patch

create codec object for each dump to avoid  shared variable.

> DumpMerger return incorrect results
> ---
>
> Key: KYLIN-2926
> URL: https://issues.apache.org/jira/browse/KYLIN-2926
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v2.0.0
>Reporter: fengYu
>Assignee: fengYu
> Attachments: 
> 0001-KYLIN-2926-DumpMerger-return-incorrect-results-creat.patch
>
>
> I our scenario, a cube query will get wrong result once coprocessor need to 
> spill to disk, Our version is 2.0.0 and I find the root cause is that in 
> DumpMerger.enqueueFromDump
> because in DataTypeSerializer kylin use a ThreadLocal variable ‘current’, It 
> leading to different elements in dumpCurrentValues share the same object, so 
> next fill up measure values will change the existing values. 
> the incorrect measures is HLLC and raw, which use current variable in 
> deserialize.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KYLIN-2929) speed up Dump file performance

2017-10-11 Thread fengYu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

fengYu updated KYLIN-2929:
--
Component/s: Query Engine

> speed up Dump file performance
> --
>
> Key: KYLIN-2929
> URL: https://issues.apache.org/jira/browse/KYLIN-2929
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.0.0
>Reporter: fengYu
>Assignee: fengYu
>  Labels: Performance
>
> when I work on KYLIN-2926, I find coprocessor will dump to disk once 
> estimatedMemSize is bigger than spillThreshold, and found that spill data 
> size is extraordinary smaller that estimatedMemSize, in my case dump file 
> size is about 8MB and spillThreshold is setting to 3GB.   
> So, I try to keep the spill data in memory rather than write the file to disk 
> immediately, and when those in-memory spill data reach the threshold, write 
> all spill files together.
> In my case, the coprocessor process cost time drop from 22s to 16s, it is 
> about 30% upgrade。



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KYLIN-2929) speed up Dump file performance

2017-10-11 Thread fengYu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

fengYu updated KYLIN-2929:
--
Affects Version/s: v2.0.0

> speed up Dump file performance
> --
>
> Key: KYLIN-2929
> URL: https://issues.apache.org/jira/browse/KYLIN-2929
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.0.0
>Reporter: fengYu
>Assignee: fengYu
>  Labels: Performance
>
> when I work on KYLIN-2926, I find coprocessor will dump to disk once 
> estimatedMemSize is bigger than spillThreshold, and found that spill data 
> size is extraordinary smaller that estimatedMemSize, in my case dump file 
> size is about 8MB and spillThreshold is setting to 3GB.   
> So, I try to keep the spill data in memory rather than write the file to disk 
> immediately, and when those in-memory spill data reach the threshold, write 
> all spill files together.
> In my case, the coprocessor process cost time drop from 22s to 16s, it is 
> about 30% upgrade。



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KYLIN-2929) speed up Dump file performance

2017-10-11 Thread fengYu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

fengYu updated KYLIN-2929:
--
Labels: Performance  (was: )

> speed up Dump file performance
> --
>
> Key: KYLIN-2929
> URL: https://issues.apache.org/jira/browse/KYLIN-2929
> Project: Kylin
>  Issue Type: Bug
>Reporter: fengYu
>Assignee: fengYu
>  Labels: Performance
>
> when I work on KYLIN-2926, I find coprocessor will dump to disk once 
> estimatedMemSize is bigger than spillThreshold, and found that spill data 
> size is extraordinary smaller that estimatedMemSize, in my case dump file 
> size is about 8MB and spillThreshold is setting to 3GB.   
> So, I try to keep the spill data in memory rather than write the file to disk 
> immediately, and when those in-memory spill data reach the threshold, write 
> all spill files together.
> In my case, the coprocessor process cost time drop from 22s to 16s, it is 
> about 30% upgrade。



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KYLIN-2929) speed up Dump file performance

2017-10-11 Thread fengYu (JIRA)
fengYu created KYLIN-2929:
-

 Summary: speed up Dump file performance
 Key: KYLIN-2929
 URL: https://issues.apache.org/jira/browse/KYLIN-2929
 Project: Kylin
  Issue Type: Bug
Reporter: fengYu
Assignee: fengYu


when I work on KYLIN-2926, I find coprocessor will dump to disk once 
estimatedMemSize is bigger than spillThreshold, and found that spill data size 
is extraordinary smaller that estimatedMemSize, in my case dump file size is 
about 8MB and spillThreshold is setting to 3GB.   

So, I try to keep the spill data in memory rather than write the file to disk 
immediately, and when those in-memory spill data reach the threshold, write all 
spill files together.

In my case, the coprocessor process cost time drop from 22s to 16s, it is about 
30% upgrade。



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KYLIN-2926) DumpMerger return incorrect results

2017-10-11 Thread fengYu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

fengYu updated KYLIN-2926:
--
Description: 
I our scenario, a cube query will get wrong result once coprocessor need to 
spill to disk, Our version is 2.0.0 and I find the root cause is that in 
DumpMerger.enqueueFromDump

because in DataTypeSerializer kylin use a ThreadLocal variable ‘current’, It 
leading to different elements in dumpCurrentValues share the same object, so 
next fill up measure values will change the existing values. 

the incorrect measures is HLLC and raw, which use current variable in 
deserialize.

  was:
I our scenario, a cube query will get wrong result once coprocessor need to 
spill to disk, Our version is 2.0.0 and I find the root cause is that in 
DumpMerger.enqueueFromDump

because in DataTypeSerializer kylin use a ThreadLocal variable ‘current’, It 
leading to different elements in dumpCurrentValues share the same object, so 
next fill up measure values will change the existing values. 

the incorrect measures is HLLC.


> DumpMerger return incorrect results
> ---
>
> Key: KYLIN-2926
> URL: https://issues.apache.org/jira/browse/KYLIN-2926
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v2.0.0
>Reporter: fengYu
>Assignee: fengYu
>
> I our scenario, a cube query will get wrong result once coprocessor need to 
> spill to disk, Our version is 2.0.0 and I find the root cause is that in 
> DumpMerger.enqueueFromDump
> because in DataTypeSerializer kylin use a ThreadLocal variable ‘current’, It 
> leading to different elements in dumpCurrentValues share the same object, so 
> next fill up measure values will change the existing values. 
> the incorrect measures is HLLC and raw, which use current variable in 
> deserialize.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-2926) DumpMerger return incorrect results

2017-10-11 Thread fengYu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16199892#comment-16199892
 ] 

fengYu commented on KYLIN-2926:
---

for simply modify and test, I modify the function like this : 

private void enqueueFromDump(int index) {
if (dumpIterators.get(index) != null && 
dumpIterators.get(index).hasNext()) {
Pair pair = dumpIterators.get(index).next();
minHeap.offer(new Pair(pair.getKey(), index));
Object[] metricValues = new Object[metrics.trueBitCount()];
BufferedMeasureCodec codec= request.createMeasureCodec();
codec.decode(ByteBuffer.wrap(pair.getValue()), 
metricValues);
dumpCurrentValues.set(index, metricValues);
}
}   

the result will be correct.

> DumpMerger return incorrect results
> ---
>
> Key: KYLIN-2926
> URL: https://issues.apache.org/jira/browse/KYLIN-2926
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v2.0.0
>Reporter: fengYu
>Assignee: fengYu
>
> I our scenario, a cube query will get wrong result once coprocessor need to 
> spill to disk, Our version is 2.0.0 and I find the root cause is that in 
> DumpMerger.enqueueFromDump
> because in DataTypeSerializer kylin use a ThreadLocal variable ‘current’, It 
> leading to different elements in dumpCurrentValues share the same object, so 
> next fill up measure values will change the existing values. 
> the incorrect measures is HLLC.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KYLIN-2926) DumpMerger return incorrect results

2017-10-11 Thread fengYu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

fengYu updated KYLIN-2926:
--
Description: 
I our scenario, a cube query will get wrong result once coprocessor need to 
spill to disk, Our version is 2.0.0 and I find the root cause is that in 
DumpMerger.enqueueFromDump

because in DataTypeSerializer kylin use a ThreadLocal variable ‘current’, It 
leading to different elements in dumpCurrentValues share the same object, so 
next fill up measure values will change the existing values. 

the incorrect measures is HLLC.

  was:
I our scenario, a cube query will get wrong result once coprocessor need to 
spill to disk, Our version is 2.0.0 and I find the root cause is that in 
DumpMerger.enqueueFromDump

because in DataTypeSerializer kylin use a ThreadLocal variable ‘current’, It 
leading to different elements in dumpCurrentValues share the same object, so 
next fill up measure values will change the existing values. 


> DumpMerger return incorrect results
> ---
>
> Key: KYLIN-2926
> URL: https://issues.apache.org/jira/browse/KYLIN-2926
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v2.0.0
>Reporter: fengYu
>Assignee: fengYu
>
> I our scenario, a cube query will get wrong result once coprocessor need to 
> spill to disk, Our version is 2.0.0 and I find the root cause is that in 
> DumpMerger.enqueueFromDump
> because in DataTypeSerializer kylin use a ThreadLocal variable ‘current’, It 
> leading to different elements in dumpCurrentValues share the same object, so 
> next fill up measure values will change the existing values. 
> the incorrect measures is HLLC.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KYLIN-2926) DumpMerger return incorrect results

2017-10-11 Thread fengYu (JIRA)
fengYu created KYLIN-2926:
-

 Summary: DumpMerger return incorrect results
 Key: KYLIN-2926
 URL: https://issues.apache.org/jira/browse/KYLIN-2926
 Project: Kylin
  Issue Type: Bug
Affects Versions: v2.0.0
Reporter: fengYu
Assignee: fengYu


I our scenario, a cube query will get wrong result once coprocessor need to 
spill to disk, Our version is 2.0.0 and I find the root cause is that in 
DumpMerger.enqueueFromDump

because in DataTypeSerializer kylin use a ThreadLocal variable ‘current’, It 
leading to different elements in dumpCurrentValues share the same object, so 
next fill up measure values will change the existing values. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KYLIN-2810) Kylin UDF support

2017-08-24 Thread fengYu (JIRA)
fengYu created KYLIN-2810:
-

 Summary: Kylin UDF support
 Key: KYLIN-2810
 URL: https://issues.apache.org/jira/browse/KYLIN-2810
 Project: Kylin
  Issue Type: Bug
Reporter: fengYu


Kylin do not support some function calcite do not support, May I contribute 
some UDF in kylin, In this way, some of our BI tools can use kylin everywhere.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-2286) global snapshot table for one cube

2017-08-24 Thread fengYu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16139716#comment-16139716
 ] 

fengYu commented on KYLIN-2286:
---

What do you think about the feature, If it meet some demand, I can share our 
implements. Thanks a lot.

> global snapshot table for one cube 
> ---
>
> Key: KYLIN-2286
> URL: https://issues.apache.org/jira/browse/KYLIN-2286
> Project: Kylin
>  Issue Type: Improvement
>Reporter: fengYu
>Assignee: fengYu
>
> I current version, Kylin build a snapshot table for a segment and isolate 
> with each other in the same cube,  even though some segments share the same 
> snapshot table storage  .
> I some scene, we need global snapshot table for one cube, such as we has a 
> cube with snapshot table,ID is PK,the first day, the table look like:
> id name
> 1   A
> 2   B
> 3   C
> the query 'select name, count(1) from fact join dimension group by name' get 
> result:
> A xx
> B xx
> C xx
> the next day(segment), lookup table modified, it looks like :
> id name
> 1   A
> 2   D
> 3   E
> the same query return :
> A xx
> B xx
> C xx
> D xx
> E xx
> However B and D, C and E has the same ID, we need the newest result. so a 
> global snapshot table shared by all segments which has always the newest 
> values is needed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (KYLIN-1890) support hbase table prefix configurable

2017-08-24 Thread fengYu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

fengYu closed KYLIN-1890.
-
Resolution: Won't Fix

> support hbase table prefix configurable
> ---
>
> Key: KYLIN-1890
> URL: https://issues.apache.org/jira/browse/KYLIN-1890
> Project: Kylin
>  Issue Type: Improvement
>  Components: General
>Affects Versions: v1.5.2
>Reporter: fengYu
>Assignee: fengYu
> Attachments: 
> 0001-KYLIN-1890-support-hbase-table-prefix-configurable.patch
>
>
> some times we need deploy two kylin env based on same hbase, I want to change 
> hbase table name prefix based two reasons:
> 1、different kylin env will generate the same table name
> 2、while clean invalid htable for one env will cause delete all tables belong 
> to another env.
> different kylin env use different namespace is acceptable either.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (KYLIN-1172) kylin support multi-hive on different hadoop cluster

2017-08-24 Thread fengYu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

fengYu closed KYLIN-1172.
-
Resolution: Won't Fix

> kylin support multi-hive on different hadoop cluster
> 
>
> Key: KYLIN-1172
> URL: https://issues.apache.org/jira/browse/KYLIN-1172
> Project: Kylin
>  Issue Type: Improvement
>Affects Versions: v1.0
>Reporter: fengYu
>Assignee: fengYu
> Attachments: 0001-kerberos.patch, 
> 0001-support-more-hives-depend-on-different-hadoop-add-co.patch, 
> 0002-hadoop-jar-files.patch, 
> 0003-git-common-package-part-patch-KYLIN-1172.patch, 
> 0004-git-cube-package-part-patch-KYLIN-1172.patch, 
> 0005-git-metadata-package-part-patch-KYLIN-1172.patch, 
> 0006-git-server-package-part-patch-KYLIN-1172.patch, 
> 0007-dictionary-package-part-patch-KYLIN-1172.patch, 
> 0008-job-package-part-patch-KYLIN-1172.patch
>
>
> Hi, I recently modify kylin to support multi-hive on different hadoop 
> cluster and take them as input source to kylin, we do this since the 
> following reasons:
> 1、we have more than one hadoop cluster and many hive depend on them(products 
> may has its own hive), we cannot migrate those hives to one and don't want to 
> deploy one kylin for every hive source. 
> 2、our hadoop cluster deploy in different DC, we need to support them in one 
> kylin instance.
> 3、source data in hive is much less than hfile, so copy those files cross 
> different different is more efficient(fact distinct column job and base 
> cuboid job need take datas at hive as input), so we deploy hbase and hadoop 
> in one DC (separated in different HDFS).
> So, we divide data flow into 3 parts, hive is input source, hadoop do 
> computing which will generate many temporary files, hbase is output. After 
> cube building, queries on kylin just interactive with hbase. therefore, what 
> we need to do is how to build cube base on differnet hives and hadoops.
> Our method are summarized below :
> 1、Deploy hive and hadoops, before start kylin, user should deploy all hives 
> and hadoop, and ensure you can run hive sql in ./hive. and access every HDFS 
> with 'hadoop  fs  'command(add more nameservice in hdfs-site.xml).
> 2、Divide hives into two part: the hive that used when kylin start(we call it 
> default one) and others are additional, we should allocate a name for every 
> hive (default one is null), For simplicity, we just add a config property 
> that tells root directory of all hive client, and every hive client is a 
> directory whose name is the hive name(default one do not need locate in).  
> 3、Attach only a hive to one project , so when creating a project, you should 
> specify a hive name, and according to it we can find the hive client(include 
> hive command and config files).
> 4、when load table in one project, find the hive-site.xml and create a 
> HiveClient using this config file.
> 5、can not take HCatInputFormat as inputFormat in FactDistinctColumnsJob, so 
> we change the job and take the intermediate hive table location as input file 
> and change FactDistinctColumnsMapper. HiveColumnCardinalityJob will fail if 
> we use additional hive.
> 6、Because we need to run MR in one hadoop cluster and input or output located 
>  at other HDFS, so when we set input location to real name node address 
> instead of name service(this is a config property too).
> That is all we do, I think it can make things easy to manage more 
> than one hives and hadoops. we have apply it in our env and it works well. I 
> hope it can help other people... 
> patch uploaded, illustrations:
> 1、add two config property, 
> 2、add hivename to projectInstance and make projectName in cube persistence in 
> hbase.
> 3、create HiveClient with a hive-site.xml file or use default one that in 
> kylin classpath
> 4、modify two hadoop job: FactDistinctColumnsJob and CuboidJob, take 
> Intermediate  table name as input and change to table location in run()
> 5、transform nameservice to master name node while access data located in 
> other hadoop if necessary.
> the patch is based on 1.0-incubating and we add patchs KYLIN-1014、KYLIN-1021 
> and KYLIN-957 in order ..



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-2363) Prune cuboids by capping number of dimensions

2017-04-23 Thread fengYu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15980702#comment-15980702
 ] 

fengYu commented on KYLIN-2363:
---

[~roger.shi]  sorry for delay. I am waiting for the release of kylin 2.0, I 
want to add this feature beyond it, I think this week it will release and I 
will do this job.

> Prune cuboids by capping number of dimensions
> -
>
> Key: KYLIN-2363
> URL: https://issues.apache.org/jira/browse/KYLIN-2363
> Project: Kylin
>  Issue Type: Improvement
>Reporter: fengYu
>
> the scene like this:
> I have 20+ dimensions, However the query will only use at most 5 dimensions 
> in all dimensions, so cuboid that contains 5+ dimensions(except base cuboid) 
> is useless.
> I think we can add a configuration in cube, which limit the max dimensions 
> that cuboid includes.
> What's more, we can config which level(number of dimension) need to 
> calculate. in above scene, we only calculate leve 1,2,3,4,5. and skip level 5+



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KYLIN-2363) support limit of dimensions in a cuboid

2017-01-07 Thread fengYu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15807081#comment-15807081
 ] 

fengYu commented on KYLIN-2363:
---

yes, set a range or enumerate all levels to be calculated is a more 
user-friendly solution.

> support limit of dimensions in a cuboid
> ---
>
> Key: KYLIN-2363
> URL: https://issues.apache.org/jira/browse/KYLIN-2363
> Project: Kylin
>  Issue Type: Improvement
>Reporter: fengYu
>
> the scene like this:
> I have 20+ dimensions, However the query will only use at most 5 dimensions 
> in all dimensions, so cuboid that contains 5+ dimensions(except base cuboid) 
> is useless.
> I think we can add a configuration in cube, which limit the max dimensions 
> that cuboid includes.
> What's more, we can config which level(number of dimension) need to 
> calculate. in above scene, we only calculate leve 1,2,3,4,5. and skip level 5+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-2363) support limit of dimensions in a cuboid

2017-01-06 Thread fengYu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15806993#comment-15806993
 ] 

fengYu commented on KYLIN-2363:
---

Yes, that is what we want to do. I will working on it later.

> support limit of dimensions in a cuboid
> ---
>
> Key: KYLIN-2363
> URL: https://issues.apache.org/jira/browse/KYLIN-2363
> Project: Kylin
>  Issue Type: Improvement
>Reporter: fengYu
>
> the scene like this:
> I have 20+ dimensions, However the query will only use at most 5 dimensions 
> in all dimensions, so cuboid that contains 5+ dimensions(except base cuboid) 
> is useless.
> I think we can add a configuration in cube, which limit the max dimensions 
> that cuboid includes.
> What's more, we can config which level(number of dimension) need to 
> calculate. in above scene, we only calculate leve 1,2,3,4,5. and skip level 5+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-2363) support limit of dimensions in a cuboid

2017-01-06 Thread fengYu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15804025#comment-15804025
 ] 

fengYu commented on KYLIN-2363:
---

I know the configuration and what it means, In my scene, for example, I have 6 
dimensions : A/B/C/D/E/F, I need at most 2 dimensions(in where and group by) in 
any query, so, I need calculate cuboids like AB/AC/AD/AE/AF/... and A/B/C/D/E/F 
and skip ABC/ABD/ACD/... and ABCD/ABCE...

so I need a configuration specify the max dimensions that cuboid contains which 
need to be calculated.

> support limit of dimensions in a cuboid
> ---
>
> Key: KYLIN-2363
> URL: https://issues.apache.org/jira/browse/KYLIN-2363
> Project: Kylin
>  Issue Type: Improvement
>Reporter: fengYu
>
> the scene like this:
> I have 20+ dimensions, However the query will only use at most 5 dimensions 
> in all dimensions, so cuboid that contains 5+ dimensions(except base cuboid) 
> is useless.
> I think we can add a configuration in cube, which limit the max dimensions 
> that cuboid includes.
> What's more, we can config which level(number of dimension) need to 
> calculate. in above scene, we only calculate leve 1,2,3,4,5. and skip level 5+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-2363) support limit of dimensions in a cuboid

2017-01-05 Thread fengYu (JIRA)
fengYu created KYLIN-2363:
-

 Summary: support limit of dimensions in a cuboid
 Key: KYLIN-2363
 URL: https://issues.apache.org/jira/browse/KYLIN-2363
 Project: Kylin
  Issue Type: Improvement
Reporter: fengYu


the scene like this:

I have 20+ dimensions, However the query will only use at most 5 dimensions in 
all dimensions, so cuboid that contains 5+ dimensions(except base cuboid) is 
useless.

I think we can add a configuration in cube, which limit the max dimensions that 
cuboid includes.

What's more, we can config which level(number of dimension) need to calculate. 
in above scene, we only calculate leve 1,2,3,4,5. and skip level 5+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-2286) global snapshot table for one cube

2016-12-18 Thread fengYu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15760291#comment-15760291
 ] 

fengYu commented on KYLIN-2286:
---

I agree the first one, however, if I always use the last one while dimension is 
derived and once lookup table decrease, we can not fetch the result about the 
decreased PK. But we need keep the PK with last appearance. a merge operation 
is need when build new lookup table in kylin.

> global snapshot table for one cube 
> ---
>
> Key: KYLIN-2286
> URL: https://issues.apache.org/jira/browse/KYLIN-2286
> Project: Kylin
>  Issue Type: Improvement
>Reporter: fengYu
>Assignee: fengYu
>
> I current version, Kylin build a snapshot table for a segment and isolate 
> with each other in the same cube,  even though some segments share the same 
> snapshot table storage  .
> I some scene, we need global snapshot table for one cube, such as we has a 
> cube with snapshot table,ID is PK,the first day, the table look like:
> id name
> 1   A
> 2   B
> 3   C
> the query 'select name, count(1) from fact join dimension group by name' get 
> result:
> A xx
> B xx
> C xx
> the next day(segment), lookup table modified, it looks like :
> id name
> 1   A
> 2   D
> 3   E
> the same query return :
> A xx
> B xx
> C xx
> D xx
> E xx
> However B and D, C and E has the same ID, we need the newest result. so a 
> global snapshot table shared by all segments which has always the newest 
> values is needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-2286) global snapshot table for one cube

2016-12-18 Thread fengYu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15759986#comment-15759986
 ] 

fengYu commented on KYLIN-2286:
---

I don't think it is friendly for user that refresh all segments lookup table 
when a cube segment created, our solve is add a cube level property whether to 
enable it, if enable we use snapshots attribute in cubeInstance(add it) instead 
of cubeSegment, and every time when build dictionary, it check the new snapshot 
from hive table and old snapshot from hbase. we merge it by PK to ensure the 
snapshot table is increasing. and take the merged lookup table as snapshot 
table input to rebuild the new one.

Maybe you can help us to review the process, thanks a lot.

> global snapshot table for one cube 
> ---
>
> Key: KYLIN-2286
> URL: https://issues.apache.org/jira/browse/KYLIN-2286
> Project: Kylin
>  Issue Type: Improvement
>Reporter: fengYu
>Assignee: fengYu
>
> I current version, Kylin build a snapshot table for a segment and isolate 
> with each other in the same cube,  even though some segments share the same 
> snapshot table storage  .
> I some scene, we need global snapshot table for one cube, such as we has a 
> cube with snapshot table,ID is PK,the first day, the table look like:
> id name
> 1   A
> 2   B
> 3   C
> the query 'select name, count(1) from fact join dimension group by name' get 
> result:
> A xx
> B xx
> C xx
> the next day(segment), lookup table modified, it looks like :
> id name
> 1   A
> 2   D
> 3   E
> the same query return :
> A xx
> B xx
> C xx
> D xx
> E xx
> However B and D, C and E has the same ID, we need the newest result. so a 
> global snapshot table shared by all segments which has always the newest 
> values is needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-2286) global snapshot table for one cube

2016-12-16 Thread fengYu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15753987#comment-15753987
 ] 

fengYu commented on KYLIN-2286:
---

Yes, If you want to keep the dimension value be the latest modified one, you 
need define the dimension derived, otherwise, you can keep the dimension 
normal. in our usage, we add the normal dimension to fact table and leave 
derived dimensions in lookup table by views. which can make the smallest lookup 
table.

> global snapshot table for one cube 
> ---
>
> Key: KYLIN-2286
> URL: https://issues.apache.org/jira/browse/KYLIN-2286
> Project: Kylin
>  Issue Type: Improvement
>Reporter: fengYu
>Assignee: fengYu
>
> I current version, Kylin build a snapshot table for a segment and isolate 
> with each other in the same cube,  even though some segments share the same 
> snapshot table storage  .
> I some scene, we need global snapshot table for one cube, such as we has a 
> cube with snapshot table,ID is PK,the first day, the table look like:
> id name
> 1   A
> 2   B
> 3   C
> the query 'select name, count(1) from fact join dimension group by name' get 
> result:
> A xx
> B xx
> C xx
> the next day(segment), lookup table modified, it looks like :
> id name
> 1   A
> 2   D
> 3   E
> the same query return :
> A xx
> B xx
> C xx
> D xx
> E xx
> However B and D, C and E has the same ID, we need the newest result. so a 
> global snapshot table shared by all segments which has always the newest 
> values is needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-2286) global snapshot table for one cube

2016-12-15 Thread fengYu (JIRA)
fengYu created KYLIN-2286:
-

 Summary: global snapshot table for one cube 
 Key: KYLIN-2286
 URL: https://issues.apache.org/jira/browse/KYLIN-2286
 Project: Kylin
  Issue Type: Improvement
Reporter: fengYu
Assignee: fengYu


I current version, Kylin build a snapshot table for a segment and isolate with 
each other in the same cube,  even though some segments share the same snapshot 
table storage  .

I some scene, we need global snapshot table for one cube, such as we has a cube 
with snapshot table,ID is PK,the first day, the table look like:
id name
1   A
2   B
3   C
the query 'select name, count(1) from fact join dimension group by name' get 
result:
A xx
B xx
C xx
the next day(segment), lookup table modified, it looks like :
id name
1   A
2   D
3   E
the same query return :
A xx
B xx
C xx
D xx
E xx

However B and D, C and E has the same ID, we need the newest result. so a 
global snapshot table shared by all segments which has always the newest values 
is needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1826) kylin support more than one hive based on different hadoop claster

2016-11-29 Thread fengYu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15704670#comment-15704670
 ] 

fengYu commented on KYLIN-1826:
---

Thanks for  point out those improvements, I will work on this, I doubt that 
whether can I have different config at different project now, if not I have to 
do this job, According to my understanding, you want to make hive.home as a 
config which also is a kind of hive source rather than another source named 
external hives. 

> kylin support more than one hive based on different hadoop claster
> --
>
> Key: KYLIN-1826
> URL: https://issues.apache.org/jira/browse/KYLIN-1826
> Project: Kylin
>  Issue Type: Improvement
>  Components: Environment 
>Affects Versions: v1.5.2
>Reporter: fengYu
>Assignee: fengYu
> Attachments: 
> 0001-KYLIN-1826-add-external-hive-interface-project-table.patch, 
> 0002-KYLIN-1826-add-and-modify-cube-source-job-for-extern.patch, 
> 0003-KYLIN-1826-unify-hive-concept-forbid-modify-hive-nam.patch
>
>
> Currently, kylin only support one hive which should run by 'hive' command, 
> However, when source data located in more than one hive we should deploy more 
> kylin instance and more than one metastore. which is difficult to manager and 
> may cause some conflict.
> I has been working on it Recently, In our cluster, there are some hive 
> client(different metastore) which based on different hadoop cluster, I add a 
> new hive source type which called 'external hive' in kylin 1.5.x
> Thanks to kylin Plug-in architecture in 2.x, which make this work easiler. 
> the main modification are:
> 1. add hive root directory in hive config file, external hive client exist in 
> this directory. hive named by directory name.
> 2. add hive-site.xml file while loading hive tables.
> 3. store hive name into project, one project can only take one hive as source.
> 4. change and add some job to support job building.
> I will upload my patch if I finish all my tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KYLIN-2064) add non-runtime-aggregation measure support

2016-11-03 Thread fengYu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

fengYu updated KYLIN-2064:
--
Attachment: 0001-KYLIN-2064-add-non-runtime-aggregation-measure-for-d.patch

format my patch, I have implements it with little changes on original codes. 
and add two kinds of measure which called nhllc and nbitmap.

If a query need do runtime aggregation , the query will error with an 
exception. 

> add non-runtime-aggregation measure support
> ---
>
> Key: KYLIN-2064
> URL: https://issues.apache.org/jira/browse/KYLIN-2064
> Project: Kylin
>  Issue Type: Improvement
>Affects Versions: v1.5.2
>Reporter: fengYu
>Assignee: fengYu
> Attachments: 
> 0001-KYLIN-2064-add-non-runtime-aggregation-measure-for-d.patch
>
>
> Kylin is based on pre-computation and store result in hbase. however, It 
> support runtime aggregation to satisfy the query which can not match computed 
> data.
> But the runtime aggregation slow down the query and need more storage space 
> in hbase(which will slow down scan speed),  such as distinct count/ topn 
> measures .
> So we can use more pre-compution and less runtime aggregation, in some 
> scenario we do not need results cross different partition(date),  we add a 
> kind of measure which only support result for count distinct, it will speed 
> up query and need less storage.
> what's more, If a query on this measure which is not computed it will return 
> exception.
> I will arrange our solution in my patch later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KYLIN-1826) kylin support more than one hive based on different hadoop claster

2016-11-01 Thread fengYu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

fengYu updated KYLIN-1826:
--
Attachment: 0003-KYLIN-1826-unify-hive-concept-forbid-modify-hive-nam.patch

unify hive name concept,and forbid changing hive name when existing some cube 
in old project.


> kylin support more than one hive based on different hadoop claster
> --
>
> Key: KYLIN-1826
> URL: https://issues.apache.org/jira/browse/KYLIN-1826
> Project: Kylin
>  Issue Type: Improvement
>  Components: Environment 
>Affects Versions: v1.5.2
>Reporter: fengYu
>Assignee: fengYu
> Attachments: 
> 0001-KYLIN-1826-add-external-hive-interface-project-table.patch, 
> 0002-KYLIN-1826-add-and-modify-cube-source-job-for-extern.patch, 
> 0003-KYLIN-1826-unify-hive-concept-forbid-modify-hive-nam.patch
>
>
> Currently, kylin only support one hive which should run by 'hive' command, 
> However, when source data located in more than one hive we should deploy more 
> kylin instance and more than one metastore. which is difficult to manager and 
> may cause some conflict.
> I has been working on it Recently, In our cluster, there are some hive 
> client(different metastore) which based on different hadoop cluster, I add a 
> new hive source type which called 'external hive' in kylin 1.5.x
> Thanks to kylin Plug-in architecture in 2.x, which make this work easiler. 
> the main modification are:
> 1. add hive root directory in hive config file, external hive client exist in 
> this directory. hive named by directory name.
> 2. add hive-site.xml file while loading hive tables.
> 3. store hive name into project, one project can only take one hive as source.
> 4. change and add some job to support job building.
> I will upload my patch if I finish all my tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1826) kylin support more than one hive based on different hadoop claster

2016-11-01 Thread fengYu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15624887#comment-15624887
 ] 

fengYu commented on KYLIN-1826:
---

Thanks for review and reply.  

1、I will modify the job name and unify the name.
2、In early version, I do not pay attention to ID_STREAMING source, I will learn 
this and try to make it incompatible.
3、I have pass 'mvn test' in my cluster, I have to check the test code and rerun 
it.
4、"hive" in TableDesc is comes from ProjectInstance, it make something easy to 
store it in TableDesc, to keep inconsistency, we can take "hive" in 
ProjectInstance unchangeable, which it is meaningless. because we can just 
modify hive name in local filesystem rather than modify metadata.
5、currently, we still use cli for all hive source which is stable, We will 
consider and modify it to beeline if necessary.


> kylin support more than one hive based on different hadoop claster
> --
>
> Key: KYLIN-1826
> URL: https://issues.apache.org/jira/browse/KYLIN-1826
> Project: Kylin
>  Issue Type: Improvement
>  Components: Environment 
>Affects Versions: v1.5.2
>Reporter: fengYu
>Assignee: fengYu
> Attachments: 
> 0001-KYLIN-1826-add-external-hive-interface-project-table.patch, 
> 0002-KYLIN-1826-add-and-modify-cube-source-job-for-extern.patch
>
>
> Currently, kylin only support one hive which should run by 'hive' command, 
> However, when source data located in more than one hive we should deploy more 
> kylin instance and more than one metastore. which is difficult to manager and 
> may cause some conflict.
> I has been working on it Recently, In our cluster, there are some hive 
> client(different metastore) which based on different hadoop cluster, I add a 
> new hive source type which called 'external hive' in kylin 1.5.x
> Thanks to kylin Plug-in architecture in 2.x, which make this work easiler. 
> the main modification are:
> 1. add hive root directory in hive config file, external hive client exist in 
> this directory. hive named by directory name.
> 2. add hive-site.xml file while loading hive tables.
> 3. store hive name into project, one project can only take one hive as source.
> 4. change and add some job to support job building.
> I will upload my patch if I finish all my tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1826) kylin support more than one hive based on different hadoop claster

2016-11-01 Thread fengYu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15624888#comment-15624888
 ] 

fengYu commented on KYLIN-1826:
---

Thanks for review and reply.  

1、I will modify the job name and unify the name.
2、In early version, I do not pay attention to ID_STREAMING source, I will learn 
this and try to make it incompatible.
3、I have pass 'mvn test' in my cluster, I have to check the test code and rerun 
it.
4、"hive" in TableDesc is comes from ProjectInstance, it make something easy to 
store it in TableDesc, to keep inconsistency, we can take "hive" in 
ProjectInstance unchangeable, which it is meaningless. because we can just 
modify hive name in local filesystem rather than modify metadata.
5、currently, we still use cli for all hive source which is stable, We will 
consider and modify it to beeline if necessary.


> kylin support more than one hive based on different hadoop claster
> --
>
> Key: KYLIN-1826
> URL: https://issues.apache.org/jira/browse/KYLIN-1826
> Project: Kylin
>  Issue Type: Improvement
>  Components: Environment 
>Affects Versions: v1.5.2
>Reporter: fengYu
>Assignee: fengYu
> Attachments: 
> 0001-KYLIN-1826-add-external-hive-interface-project-table.patch, 
> 0002-KYLIN-1826-add-and-modify-cube-source-job-for-extern.patch
>
>
> Currently, kylin only support one hive which should run by 'hive' command, 
> However, when source data located in more than one hive we should deploy more 
> kylin instance and more than one metastore. which is difficult to manager and 
> may cause some conflict.
> I has been working on it Recently, In our cluster, there are some hive 
> client(different metastore) which based on different hadoop cluster, I add a 
> new hive source type which called 'external hive' in kylin 1.5.x
> Thanks to kylin Plug-in architecture in 2.x, which make this work easiler. 
> the main modification are:
> 1. add hive root directory in hive config file, external hive client exist in 
> this directory. hive named by directory name.
> 2. add hive-site.xml file while loading hive tables.
> 3. store hive name into project, one project can only take one hive as source.
> 4. change and add some job to support job building.
> I will upload my patch if I finish all my tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (KYLIN-1826) kylin support more than one hive based on different hadoop claster

2016-11-01 Thread fengYu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

fengYu updated KYLIN-1826:
--
Comment: was deleted

(was: Thanks for review and reply.  

1、I will modify the job name and unify the name.
2、In early version, I do not pay attention to ID_STREAMING source, I will learn 
this and try to make it incompatible.
3、I have pass 'mvn test' in my cluster, I have to check the test code and rerun 
it.
4、"hive" in TableDesc is comes from ProjectInstance, it make something easy to 
store it in TableDesc, to keep inconsistency, we can take "hive" in 
ProjectInstance unchangeable, which it is meaningless. because we can just 
modify hive name in local filesystem rather than modify metadata.
5、currently, we still use cli for all hive source which is stable, We will 
consider and modify it to beeline if necessary.
)

> kylin support more than one hive based on different hadoop claster
> --
>
> Key: KYLIN-1826
> URL: https://issues.apache.org/jira/browse/KYLIN-1826
> Project: Kylin
>  Issue Type: Improvement
>  Components: Environment 
>Affects Versions: v1.5.2
>Reporter: fengYu
>Assignee: fengYu
> Attachments: 
> 0001-KYLIN-1826-add-external-hive-interface-project-table.patch, 
> 0002-KYLIN-1826-add-and-modify-cube-source-job-for-extern.patch
>
>
> Currently, kylin only support one hive which should run by 'hive' command, 
> However, when source data located in more than one hive we should deploy more 
> kylin instance and more than one metastore. which is difficult to manager and 
> may cause some conflict.
> I has been working on it Recently, In our cluster, there are some hive 
> client(different metastore) which based on different hadoop cluster, I add a 
> new hive source type which called 'external hive' in kylin 1.5.x
> Thanks to kylin Plug-in architecture in 2.x, which make this work easiler. 
> the main modification are:
> 1. add hive root directory in hive config file, external hive client exist in 
> this directory. hive named by directory name.
> 2. add hive-site.xml file while loading hive tables.
> 3. store hive name into project, one project can only take one hive as source.
> 4. change and add some job to support job building.
> I will upload my patch if I finish all my tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KYLIN-1826) kylin support more than one hive based on different hadoop claster

2016-10-28 Thread fengYu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15615336#comment-15615336
 ] 

fengYu edited comment on KYLIN-1826 at 10/28/16 1:05 PM:
-

Hi, I have remove the threadlocal variable and cut the patch to 2 parts, one is 
about controller such as create project, load table. another is about add or 
modify some source jobs.

We have test this in our production environment for more than 4 months.  so I 
think it is steady.

the patch is base on kylin 1.5.4.1 release, wish it can be review and merge as 
soon as possible. Thanks a lot.


was (Author: feng_xiao_yu):
Hi, I have remove the threadlocal virable and cut the patch to 2 parts, one is 
about controller such as create project, load table. another is about add or 
modify some source jobs.

We have test this in our production environment for more than 4 months.  so I 
think it is steady.

the patch is base on kylin 1.5.4.1 release, wish it can be review and merge as 
soon as possible. Thanks a lot.

> kylin support more than one hive based on different hadoop claster
> --
>
> Key: KYLIN-1826
> URL: https://issues.apache.org/jira/browse/KYLIN-1826
> Project: Kylin
>  Issue Type: Improvement
>  Components: Environment 
>Affects Versions: v1.5.2
>Reporter: fengYu
>Assignee: fengYu
> Attachments: 
> 0001-KYLIN-1826-add-external-hive-interface-project-table.patch, 
> 0002-KYLIN-1826-add-and-modify-cube-source-job-for-extern.patch
>
>
> Currently, kylin only support one hive which should run by 'hive' command, 
> However, when source data located in more than one hive we should deploy more 
> kylin instance and more than one metastore. which is difficult to manager and 
> may cause some conflict.
> I has been working on it Recently, In our cluster, there are some hive 
> client(different metastore) which based on different hadoop cluster, I add a 
> new hive source type which called 'external hive' in kylin 1.5.x
> Thanks to kylin Plug-in architecture in 2.x, which make this work easiler. 
> the main modification are:
> 1. add hive root directory in hive config file, external hive client exist in 
> this directory. hive named by directory name.
> 2. add hive-site.xml file while loading hive tables.
> 3. store hive name into project, one project can only take one hive as source.
> 4. change and add some job to support job building.
> I will upload my patch if I finish all my tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KYLIN-1826) kylin support more than one hive based on different hadoop claster

2016-10-28 Thread fengYu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

fengYu updated KYLIN-1826:
--
Attachment: 0002-KYLIN-1826-add-and-modify-cube-source-job-for-extern.patch
0001-KYLIN-1826-add-external-hive-interface-project-table.patch

Hi, I have remove the threadlocal virable and cut the patch to 2 parts, one is 
about controller such as create project, load table. another is about add or 
modify some source jobs.

We have test this in our production environment for more than 4 months.  so I 
think it is steady.

the patch is base on kylin 1.5.4.1 release, wish it can be review and merge as 
soon as possible. Thanks a lot.

> kylin support more than one hive based on different hadoop claster
> --
>
> Key: KYLIN-1826
> URL: https://issues.apache.org/jira/browse/KYLIN-1826
> Project: Kylin
>  Issue Type: Improvement
>  Components: Environment 
>Affects Versions: v1.5.2
>Reporter: fengYu
>Assignee: fengYu
> Attachments: 
> 0001-KYLIN-1826-add-external-hive-interface-project-table.patch, 
> 0002-KYLIN-1826-add-and-modify-cube-source-job-for-extern.patch
>
>
> Currently, kylin only support one hive which should run by 'hive' command, 
> However, when source data located in more than one hive we should deploy more 
> kylin instance and more than one metastore. which is difficult to manager and 
> may cause some conflict.
> I has been working on it Recently, In our cluster, there are some hive 
> client(different metastore) which based on different hadoop cluster, I add a 
> new hive source type which called 'external hive' in kylin 1.5.x
> Thanks to kylin Plug-in architecture in 2.x, which make this work easiler. 
> the main modification are:
> 1. add hive root directory in hive config file, external hive client exist in 
> this directory. hive named by directory name.
> 2. add hive-site.xml file while loading hive tables.
> 3. store hive name into project, one project can only take one hive as source.
> 4. change and add some job to support job building.
> I will upload my patch if I finish all my tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KYLIN-1826) kylin support more than one hive based on different hadoop claster

2016-10-28 Thread fengYu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

fengYu updated KYLIN-1826:
--
Attachment: (was: 
0001-KYLIN-1826-support-more-hive-based-on-different-hado.patch)

> kylin support more than one hive based on different hadoop claster
> --
>
> Key: KYLIN-1826
> URL: https://issues.apache.org/jira/browse/KYLIN-1826
> Project: Kylin
>  Issue Type: Improvement
>  Components: Environment 
>Affects Versions: v1.5.2
>Reporter: fengYu
>Assignee: fengYu
>
> Currently, kylin only support one hive which should run by 'hive' command, 
> However, when source data located in more than one hive we should deploy more 
> kylin instance and more than one metastore. which is difficult to manager and 
> may cause some conflict.
> I has been working on it Recently, In our cluster, there are some hive 
> client(different metastore) which based on different hadoop cluster, I add a 
> new hive source type which called 'external hive' in kylin 1.5.x
> Thanks to kylin Plug-in architecture in 2.x, which make this work easiler. 
> the main modification are:
> 1. add hive root directory in hive config file, external hive client exist in 
> this directory. hive named by directory name.
> 2. add hive-site.xml file while loading hive tables.
> 3. store hive name into project, one project can only take one hive as source.
> 4. change and add some job to support job building.
> I will upload my patch if I finish all my tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KYLIN-1839) improvement set classpath before submitting mr job

2016-10-19 Thread fengYu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

fengYu updated KYLIN-1839:
--
Attachment: 0001-KYLIN-1839-modify-kylin-config-for-extra-lib.patch

add kylin.properties illustration for this configuration.

> improvement set classpath before submitting mr job
> --
>
> Key: KYLIN-1839
> URL: https://issues.apache.org/jira/browse/KYLIN-1839
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Affects Versions: v1.5.2
>Reporter: fengYu
>Assignee: fengYu
> Fix For: v1.6.0
>
> Attachments: 0001-KYLIN-1839-modify-kylin-config-for-extra-lib.patch, 
> 0001-KYLIN-1839-support-kylin-lib-in-HDFS.patch
>
>
> in setClasspath, kylin will alway find hive jars from hive dependency using 
> regex, however, this will not change in one process lifetime, so I cache the 
> location of tmpjars and tmpfiles.
> What is more, support extends user lib setting to hdfs path rather than only 
> support local filesystem which will cause upload jars every time if 
> DistributedCache do not exist.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1839) improvement set classpath before submitting mr job

2016-10-18 Thread fengYu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587499#comment-15587499
 ] 

fengYu commented on KYLIN-1839:
---

glad to do it, submit it latter.

> improvement set classpath before submitting mr job
> --
>
> Key: KYLIN-1839
> URL: https://issues.apache.org/jira/browse/KYLIN-1839
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Affects Versions: v1.5.2
>Reporter: fengYu
>Assignee: fengYu
> Fix For: v1.6.0
>
> Attachments: 0001-KYLIN-1839-support-kylin-lib-in-HDFS.patch
>
>
> in setClasspath, kylin will alway find hive jars from hive dependency using 
> regex, however, this will not change in one process lifetime, so I cache the 
> location of tmpjars and tmpfiles.
> What is more, support extends user lib setting to hdfs path rather than only 
> support local filesystem which will cause upload jars every time if 
> DistributedCache do not exist.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1839) improvement set classpath before submitting mr job

2016-10-18 Thread fengYu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587392#comment-15587392
 ] 

fengYu commented on KYLIN-1839:
---

This is a small change, it can be illustrated in kylin.properties.

> improvement set classpath before submitting mr job
> --
>
> Key: KYLIN-1839
> URL: https://issues.apache.org/jira/browse/KYLIN-1839
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Affects Versions: v1.5.2
>Reporter: fengYu
>Assignee: fengYu
> Fix For: v1.6.0
>
> Attachments: 0001-KYLIN-1839-support-kylin-lib-in-HDFS.patch
>
>
> in setClasspath, kylin will alway find hive jars from hive dependency using 
> regex, however, this will not change in one process lifetime, so I cache the 
> location of tmpjars and tmpfiles.
> What is more, support extends user lib setting to hdfs path rather than only 
> support local filesystem which will cause upload jars every time if 
> DistributedCache do not exist.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (KYLIN-1888) support backward cube building

2016-10-11 Thread fengYu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

fengYu closed KYLIN-1888.
-
Resolution: Fixed

I find this feature has been added in 1.5.4, close it.

> support backward cube building
> --
>
> Key: KYLIN-1888
> URL: https://issues.apache.org/jira/browse/KYLIN-1888
> Project: Kylin
>  Issue Type: Improvement
>Affects Versions: v1.5.2
>Reporter: fengYu
>Assignee: fengYu
> Attachments: 0001-forward-build-job.patch
>
>
> This is used when user want to see data from last some days, and then fill up 
> history data from cube start date.
> FORWARD just like reverse side of BUILD



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KYLIN-1839) improvement set classpath before submitting mr job

2016-10-11 Thread fengYu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

fengYu updated KYLIN-1839:
--
Attachment: (was: 
0001-KYLIN-1839-support-extend-lib-from-HDFS-and-cache-tm.patch)

> improvement set classpath before submitting mr job
> --
>
> Key: KYLIN-1839
> URL: https://issues.apache.org/jira/browse/KYLIN-1839
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Affects Versions: v1.5.2
>Reporter: fengYu
>Assignee: fengYu
> Attachments: 0001-KYLIN-1839-support-kylin-lib-in-HDFS.patch
>
>
> in setClasspath, kylin will alway find hive jars from hive dependency using 
> regex, however, this will not change in one process lifetime, so I cache the 
> location of tmpjars and tmpfiles.
> What is more, support extends user lib setting to hdfs path rather than only 
> support local filesystem which will cause upload jars every time if 
> DistributedCache do not exist.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KYLIN-1839) improvement set classpath before submitting mr job

2016-10-11 Thread fengYu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

fengYu updated KYLIN-1839:
--
Attachment: 0001-KYLIN-1839-support-kylin-lib-in-HDFS.patch

upload new patch which just support HDFS path for mr lib.

> improvement set classpath before submitting mr job
> --
>
> Key: KYLIN-1839
> URL: https://issues.apache.org/jira/browse/KYLIN-1839
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Affects Versions: v1.5.2
>Reporter: fengYu
>Assignee: fengYu
> Attachments: 0001-KYLIN-1839-support-kylin-lib-in-HDFS.patch
>
>
> in setClasspath, kylin will alway find hive jars from hive dependency using 
> regex, however, this will not change in one process lifetime, so I cache the 
> location of tmpjars and tmpfiles.
> What is more, support extends user lib setting to hdfs path rather than only 
> support local filesystem which will cause upload jars every time if 
> DistributedCache do not exist.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-2065) kylin result cache improvement

2016-10-07 Thread fengYu (JIRA)
fengYu created KYLIN-2065:
-

 Summary: kylin result cache improvement
 Key: KYLIN-2065
 URL: https://issues.apache.org/jira/browse/KYLIN-2065
 Project: Kylin
  Issue Type: Improvement
Reporter: fengYu
Assignee: fengYu


data stored in kylin(hbase) is rarely modified, except some one refresh a 
segment, I think a result cache is useful, kylin support in-memory ehcache, but 
it is limited and can not share between query nodes, we develop a cache 
solution which  store in hbase and expire based on cube last-modify-time or 
segment last-modify-time.

I will arrange the patch on 1.5.4 which will upload later. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-2064) add non-runtime-aggregation measure support

2016-10-07 Thread fengYu (JIRA)
fengYu created KYLIN-2064:
-

 Summary: add non-runtime-aggregation measure support
 Key: KYLIN-2064
 URL: https://issues.apache.org/jira/browse/KYLIN-2064
 Project: Kylin
  Issue Type: Improvement
Affects Versions: v1.5.2
Reporter: fengYu
Assignee: fengYu


Kylin is based on pre-computation and store result in hbase. however, It 
support runtime aggregation to satisfy the query which can not match computed 
data.

But the runtime aggregation slow down the query and need more storage space in 
hbase(which will slow down scan speed),  such as distinct count/ topn measures .

So we can use more pre-compution and less runtime aggregation, in some scenario 
we do not need results cross different partition(date),  we add a kind of 
measure which only support result for count distinct, it will speed up query 
and need less storage.

what's more, If a query on this measure which is not computed it will return 
exception.

I will arrange our solution in my patch later.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1833) union operation will cause error result

2016-09-21 Thread fengYu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15509668#comment-15509668
 ] 

fengYu commented on KYLIN-1833:
---

I am working for this problem currently for kylin-2.x

> union operation will cause error result
> ---
>
> Key: KYLIN-1833
> URL: https://issues.apache.org/jira/browse/KYLIN-1833
> Project: Kylin
>  Issue Type: Improvement
>  Components: Query Engine
>Affects Versions: v1.3.0, v1.5.2
>Reporter: fengYu
>Assignee: fengYu
> Attachments: 0001-KYLIN-1833-union-query-get-error-result.patch
>
>
> query like this will get error result :
> select * from (
> select 'b',  count(1) from kylin_sales  where lstg_format_name >= 'Auction'
> union all
> select 'a',  count(1) from kylin_sales where lstg_format_name >= 'Others'
> union all 
> select 'c',  count(1) from kylin_sales  where lstg_format_name >= 'FP-GTC'
> ) order by 1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1839) improvement set classpath before submitting mr job

2016-09-21 Thread fengYu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15509663#comment-15509663
 ] 

fengYu commented on KYLIN-1839:
---

I think the patch is useable. I think tmpjars and tmpfiles which used for MR 
job is process level, So I cache them all.

> improvement set classpath before submitting mr job
> --
>
> Key: KYLIN-1839
> URL: https://issues.apache.org/jira/browse/KYLIN-1839
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Affects Versions: v1.5.2
>Reporter: fengYu
>Assignee: fengYu
> Attachments: 
> 0001-KYLIN-1839-support-extend-lib-from-HDFS-and-cache-tm.patch
>
>
> in setClasspath, kylin will alway find hive jars from hive dependency using 
> regex, however, this will not change in one process lifetime, so I cache the 
> location of tmpjars and tmpfiles.
> What is more, support extends user lib setting to hdfs path rather than only 
> support local filesystem which will cause upload jars every time if 
> DistributedCache do not exist.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1014) Support kerberos authentication while getting status from RM

2016-09-13 Thread fengYu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15489015#comment-15489015
 ] 

fengYu commented on KYLIN-1014:
---

It works at 1.5.2.1 which I am using now, I think it will works in 1.5.2+.

> Support kerberos authentication while getting status from RM
> 
>
> Key: KYLIN-1014
> URL: https://issues.apache.org/jira/browse/KYLIN-1014
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Affects Versions: v1.0, v0.7.2, v0.7.1
>Reporter: fengYu
>Assignee: fengYu
> Fix For: v1.4.0, v1.3.0
>
> Attachments: 0001-KYLIN-1014-error-while-retry-rm-master.patch, 
> 0001-hadoop-status-checker-support-rm-with-kerberos.patch, 
> patch-for-2.0-rc.patch
>
>
> I have used kylin-0.7.2 build cube and do some query, and I am trying 
> kylin-1.0 in another hadoop cluster. I get this problem below in kylin-0.7.2 
> and kylin-1.0 :
> Our hadoop cluster deals with authentication with kerberos, However, We find 
> after submit a mapreduce job(the second step in building cube), kylin will 
> send a http request to RM server and get the job status at regular intervals, 
> But we always get errors here because kylin do nothing about kerberos. 
> Finally , we do some change on source code and make it support kerberos 
> authentication . attachment is my patch file..
> I add a property named "kylin.job.status.with.kerberos" which means if we 
> need do authentication with kerberos when getting status from RM, the default 
> value is false.
> It will be highly appreciated if you have some good idea or some suggestion. 
> Thanks...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1173) Can not load hive table after modify table metadata

2016-09-11 Thread fengYu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15482757#comment-15482757
 ] 

fengYu commented on KYLIN-1173:
---

Hi, as it is diffcult to detect modifying cloumn in hive cli by kylin server, 
So when user need to modify the column name, we recommend that add column 
rather than modify the name.(which is the advantage of hive view compare to 
hive table)

Therefore I think this jira is a caution for kylin user, I don't think it is 
necessary to solve it in kylin. 

> Can not load hive table after modify table metadata
> ---
>
> Key: KYLIN-1173
> URL: https://issues.apache.org/jira/browse/KYLIN-1173
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v1.0
>Reporter: fengYu
>
> Hi all:
>
> when I want to change the column in hive source table and reload table in 
> kylin,  I can not see any column in the table after reload, I restart kylin 
> server and reload the table , the column(name is modified) appeares 
> I write a test program like this(kylin do the same thing while reloading 
> table) :
> HiveClient client = new HiveClient();
> List fields = client.getHiveTableFields(database, table);
> \\waiting here and modify table column name
>  fields = client.getHiveTableFields(database, table);
> client.getHiveTableFields return all columns in the table at the first time, 
> and after I modify one column and recall  client.getHiveTableFields function, 
> it return am empty list. It will return the same list if I do not change the 
> column name in the middle.
> I doubt maybe something error in hive metastore, any help will be 
> appreciate...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1826) kylin support more than one hive based on different hadoop claster

2016-08-19 Thread fengYu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15427879#comment-15427879
 ] 

fengYu commented on KYLIN-1826:
---

Thanks for your reply, I will remove LocalThreadProject and pass hive parameter 
in function while it is nessesary, Thank you for your advice。

> kylin support more than one hive based on different hadoop claster
> --
>
> Key: KYLIN-1826
> URL: https://issues.apache.org/jira/browse/KYLIN-1826
> Project: Kylin
>  Issue Type: Improvement
>  Components: Environment 
>Affects Versions: v1.5.2
>Reporter: fengYu
>Assignee: fengYu
> Attachments: 
> 0001-KYLIN-1826-support-more-hive-based-on-different-hado.patch
>
>
> Currently, kylin only support one hive which should run by 'hive' command, 
> However, when source data located in more than one hive we should deploy more 
> kylin instance and more than one metastore. which is difficult to manager and 
> may cause some conflict.
> I has been working on it Recently, In our cluster, there are some hive 
> client(different metastore) which based on different hadoop cluster, I add a 
> new hive source type which called 'external hive' in kylin 1.5.x
> Thanks to kylin Plug-in architecture in 2.x, which make this work easiler. 
> the main modification are:
> 1. add hive root directory in hive config file, external hive client exist in 
> this directory. hive named by directory name.
> 2. add hive-site.xml file while loading hive tables.
> 3. store hive name into project, one project can only take one hive as source.
> 4. change and add some job to support job building.
> I will upload my patch if I finish all my tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (KYLIN-1826) kylin support more than one hive based on different hadoop claster

2016-08-10 Thread fengYu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

fengYu updated KYLIN-1826:
--
Comment: was deleted

(was: Sorry for delay, first of all, the hive in ProjectInstance is used in 
load hive table, then store the hive to TableDesc, the reason I use 
LocalThreadProject is I want do less code change, In my solution, I need 
project infomation to get hive instance, If pass this parameter in function, I 
have to modify so much functions. This is tricky but is the easiest way.

About the metadata changed in projectInstance and TableDesc, I consider the 
compatibility, for old project and tableDesc, the hive variable is set to null 
which means use default hive(hive-site.xml located in kylin classpath). 
)

> kylin support more than one hive based on different hadoop claster
> --
>
> Key: KYLIN-1826
> URL: https://issues.apache.org/jira/browse/KYLIN-1826
> Project: Kylin
>  Issue Type: Improvement
>  Components: Environment 
>Affects Versions: v1.5.2
>Reporter: fengYu
>Assignee: fengYu
> Attachments: 
> 0001-KYLIN-1826-support-more-hive-based-on-different-hado.patch
>
>
> Currently, kylin only support one hive which should run by 'hive' command, 
> However, when source data located in more than one hive we should deploy more 
> kylin instance and more than one metastore. which is difficult to manager and 
> may cause some conflict.
> I has been working on it Recently, In our cluster, there are some hive 
> client(different metastore) which based on different hadoop cluster, I add a 
> new hive source type which called 'external hive' in kylin 1.5.x
> Thanks to kylin Plug-in architecture in 2.x, which make this work easiler. 
> the main modification are:
> 1. add hive root directory in hive config file, external hive client exist in 
> this directory. hive named by directory name.
> 2. add hive-site.xml file while loading hive tables.
> 3. store hive name into project, one project can only take one hive as source.
> 4. change and add some job to support job building.
> I will upload my patch if I finish all my tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1826) kylin support more than one hive based on different hadoop claster

2016-08-10 Thread fengYu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416434#comment-15416434
 ] 

fengYu commented on KYLIN-1826:
---

Sorry for delay, first of all, the hive in ProjectInstance is used in load hive 
table, then store the hive to TableDesc, the reason I use LocalThreadProject is 
I want do less code change, In my solution, I need project infomation to get 
hive instance, If pass this parameter in function, I have to modify so much 
functions. This is tricky but is the easiest way.

About the metadata changed in projectInstance and TableDesc, I consider the 
compatibility, for old project and tableDesc, the hive variable is set to null 
which means use default hive(hive-site.xml located in kylin classpath). 


> kylin support more than one hive based on different hadoop claster
> --
>
> Key: KYLIN-1826
> URL: https://issues.apache.org/jira/browse/KYLIN-1826
> Project: Kylin
>  Issue Type: Improvement
>  Components: Environment 
>Affects Versions: v1.5.2
>Reporter: fengYu
>Assignee: fengYu
> Attachments: 
> 0001-KYLIN-1826-support-more-hive-based-on-different-hado.patch
>
>
> Currently, kylin only support one hive which should run by 'hive' command, 
> However, when source data located in more than one hive we should deploy more 
> kylin instance and more than one metastore. which is difficult to manager and 
> may cause some conflict.
> I has been working on it Recently, In our cluster, there are some hive 
> client(different metastore) which based on different hadoop cluster, I add a 
> new hive source type which called 'external hive' in kylin 1.5.x
> Thanks to kylin Plug-in architecture in 2.x, which make this work easiler. 
> the main modification are:
> 1. add hive root directory in hive config file, external hive client exist in 
> this directory. hive named by directory name.
> 2. add hive-site.xml file while loading hive tables.
> 3. store hive name into project, one project can only take one hive as source.
> 4. change and add some job to support job building.
> I will upload my patch if I finish all my tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1826) kylin support more than one hive based on different hadoop claster

2016-08-10 Thread fengYu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416436#comment-15416436
 ] 

fengYu commented on KYLIN-1826:
---

Sorry for delay, first of all, the hive in ProjectInstance is used in load hive 
table, then store the hive to TableDesc, the reason I use LocalThreadProject is 
I want do less code change, In my solution, I need project infomation to get 
hive instance, If pass this parameter in function, I have to modify so much 
functions. This is tricky but is the easiest way.

About the metadata changed in projectInstance and TableDesc, I consider the 
compatibility, for old project and tableDesc, the hive variable is set to null 
which means use default hive(hive-site.xml located in kylin classpath). 


> kylin support more than one hive based on different hadoop claster
> --
>
> Key: KYLIN-1826
> URL: https://issues.apache.org/jira/browse/KYLIN-1826
> Project: Kylin
>  Issue Type: Improvement
>  Components: Environment 
>Affects Versions: v1.5.2
>Reporter: fengYu
>Assignee: fengYu
> Attachments: 
> 0001-KYLIN-1826-support-more-hive-based-on-different-hado.patch
>
>
> Currently, kylin only support one hive which should run by 'hive' command, 
> However, when source data located in more than one hive we should deploy more 
> kylin instance and more than one metastore. which is difficult to manager and 
> may cause some conflict.
> I has been working on it Recently, In our cluster, there are some hive 
> client(different metastore) which based on different hadoop cluster, I add a 
> new hive source type which called 'external hive' in kylin 1.5.x
> Thanks to kylin Plug-in architecture in 2.x, which make this work easiler. 
> the main modification are:
> 1. add hive root directory in hive config file, external hive client exist in 
> this directory. hive named by directory name.
> 2. add hive-site.xml file while loading hive tables.
> 3. store hive name into project, one project can only take one hive as source.
> 4. change and add some job to support job building.
> I will upload my patch if I finish all my tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1826) kylin support more than one hive based on different hadoop claster

2016-07-24 Thread fengYu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15391002#comment-15391002
 ] 

fengYu commented on KYLIN-1826:
---

That's what I'm concerned about. please review it if someone has free time.

> kylin support more than one hive based on different hadoop claster
> --
>
> Key: KYLIN-1826
> URL: https://issues.apache.org/jira/browse/KYLIN-1826
> Project: Kylin
>  Issue Type: Improvement
>  Components: Environment 
>Affects Versions: v1.5.2
>Reporter: fengYu
>Assignee: fengYu
> Attachments: 
> 0001-KYLIN-1826-support-more-hive-based-on-different-hado.patch
>
>
> Currently, kylin only support one hive which should run by 'hive' command, 
> However, when source data located in more than one hive we should deploy more 
> kylin instance and more than one metastore. which is difficult to manager and 
> may cause some conflict.
> I has been working on it Recently, In our cluster, there are some hive 
> client(different metastore) which based on different hadoop cluster, I add a 
> new hive source type which called 'external hive' in kylin 1.5.x
> Thanks to kylin Plug-in architecture in 2.x, which make this work easiler. 
> the main modification are:
> 1. add hive root directory in hive config file, external hive client exist in 
> this directory. hive named by directory name.
> 2. add hive-site.xml file while loading hive tables.
> 3. store hive name into project, one project can only take one hive as source.
> 4. change and add some job to support job building.
> I will upload my patch if I finish all my tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KYLIN-1839) improvement set classpath before submitting mr job

2016-07-24 Thread fengYu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

fengYu resolved KYLIN-1839.
---
Resolution: Resolved

> improvement set classpath before submitting mr job
> --
>
> Key: KYLIN-1839
> URL: https://issues.apache.org/jira/browse/KYLIN-1839
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Affects Versions: v1.5.2
>Reporter: fengYu
>Assignee: fengYu
> Attachments: 
> 0001-KYLIN-1839-support-extend-lib-from-HDFS-and-cache-tm.patch
>
>
> in setClasspath, kylin will alway find hive jars from hive dependency using 
> regex, however, this will not change in one process lifetime, so I cache the 
> location of tmpjars and tmpfiles.
> What is more, support extends user lib setting to hdfs path rather than only 
> support local filesystem which will cause upload jars every time if 
> DistributedCache do not exist.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1890) support hbase table prefix configurable

2016-07-24 Thread fengYu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15390998#comment-15390998
 ] 

fengYu commented on KYLIN-1890:
---

I think make hbase table prefix configurable and add cube name between the 
prefix and real random name is more reasonable.

With it, I can find all htables belongs to one cube in `hbase shell` just 
depneds on htable name.

> support hbase table prefix configurable
> ---
>
> Key: KYLIN-1890
> URL: https://issues.apache.org/jira/browse/KYLIN-1890
> Project: Kylin
>  Issue Type: Improvement
>  Components: General
>Affects Versions: v1.5.2
>Reporter: fengYu
>Assignee: fengYu
> Attachments: 
> 0001-KYLIN-1890-support-hbase-table-prefix-configurable.patch
>
>
> some times we need deploy two kylin env based on same hbase, I want to change 
> hbase table name prefix based two reasons:
> 1、different kylin env will generate the same table name
> 2、while clean invalid htable for one env will cause delete all tables belong 
> to another env.
> different kylin env use different namespace is acceptable either.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (KYLIN-1891) merge interval support

2016-07-24 Thread fengYu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

fengYu closed KYLIN-1891.
-
Resolution: Duplicate

> merge interval support
> --
>
> Key: KYLIN-1891
> URL: https://issues.apache.org/jira/browse/KYLIN-1891
> Project: Kylin
>  Issue Type: Improvement
>Reporter: fengYu
>Assignee: fengYu
>
> We always has some data need to be amended some days later
> in current kylin, once I set Auto Merge Thresholds, the segment newly build 
> will merge if reach Thresholds, the next day refresh will refresh merged 
> segemnt, which is unnecessary.
> So I want to add a interval configuration means auto merge will merge 
> segments outside of the interval. 
> for example, if interval = 2, Auto Merge Thresholds=7, if 07-01 to 07-07 is 
> built, auto merge will not trigger, when 07-09 built success, auto merge will 
> trigger and merge segments from 07-01 to 07-07.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1892) merge interval support

2016-07-24 Thread fengYu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15390995#comment-15390995
 ] 

fengYu commented on KYLIN-1892:
---

I am glad to commit a patch for it once I am free..

> merge interval support
> --
>
> Key: KYLIN-1892
> URL: https://issues.apache.org/jira/browse/KYLIN-1892
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Affects Versions: v1.5.2
>Reporter: fengYu
>Assignee: fengYu
>
> We always has some data need to be amended some days later
> in current kylin, once I set Auto Merge Thresholds, the segment newly build 
> will merge if reach Thresholds, the next day refresh will refresh merged 
> segemnt, which is unnecessary.
> So I want to add a interval configuration means auto merge will merge 
> segments outside of the interval. 
> for example, if interval = 2, Auto Merge Thresholds=7, if 07-01 to 07-07 is 
> built, auto merge will not trigger, when 07-09 built success, auto merge will 
> trigger and merge segments from 07-01 to 07-07.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1888) support backward cube building

2016-07-24 Thread fengYu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15390994#comment-15390994
 ] 

fengYu commented on KYLIN-1888:
---

Yeah, you are correct, I will do some change about the name.

> support backward cube building
> --
>
> Key: KYLIN-1888
> URL: https://issues.apache.org/jira/browse/KYLIN-1888
> Project: Kylin
>  Issue Type: Improvement
>Affects Versions: v1.5.2
>Reporter: fengYu
>Assignee: fengYu
> Attachments: 0001-forward-build-job.patch
>
>
> This is used when user want to see data from last some days, and then fill up 
> history data from cube start date.
> FORWARD just like reverse side of BUILD



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KYLIN-1890) support hbase table prefix configurable

2016-07-14 Thread fengYu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

fengYu updated KYLIN-1890:
--
Attachment: 0001-KYLIN-1890-support-hbase-table-prefix-configurable.patch

> support hbase table prefix configurable
> ---
>
> Key: KYLIN-1890
> URL: https://issues.apache.org/jira/browse/KYLIN-1890
> Project: Kylin
>  Issue Type: Improvement
>  Components: General
>Affects Versions: v1.5.2
>Reporter: fengYu
>Assignee: fengYu
> Attachments: 
> 0001-KYLIN-1890-support-hbase-table-prefix-configurable.patch
>
>
> some times we need deploy two kylin env based on same hbase, I want to change 
> hbase table name prefix based two reasons:
> 1、different kylin env will generate the same table name
> 2、while clean invalid htable for one env will cause delete all tables belong 
> to another env.
> different kylin env use different namespace is acceptable either.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-1891) merge interval support

2016-07-13 Thread fengYu (JIRA)
fengYu created KYLIN-1891:
-

 Summary: merge interval support
 Key: KYLIN-1891
 URL: https://issues.apache.org/jira/browse/KYLIN-1891
 Project: Kylin
  Issue Type: Improvement
Reporter: fengYu
Assignee: fengYu


We always has some data need to be amended some days later

in current kylin, once I set Auto Merge Thresholds, the segment newly build 
will merge if reach Thresholds, the next day refresh will refresh merged 
segemnt, which is unnecessary.

So I want to add a interval configuration means auto merge will merge segments 
outside of the interval. 

for example, if interval = 2, Auto Merge Thresholds=7, if 07-01 to 07-07 is 
built, auto merge will not trigger, when 07-09 built success, auto merge will 
trigger and merge segments from 07-01 to 07-07.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-1892) merge interval support

2016-07-13 Thread fengYu (JIRA)
fengYu created KYLIN-1892:
-

 Summary: merge interval support
 Key: KYLIN-1892
 URL: https://issues.apache.org/jira/browse/KYLIN-1892
 Project: Kylin
  Issue Type: Improvement
  Components: Job Engine
Affects Versions: v1.5.2
Reporter: fengYu
Assignee: fengYu


We always has some data need to be amended some days later

in current kylin, once I set Auto Merge Thresholds, the segment newly build 
will merge if reach Thresholds, the next day refresh will refresh merged 
segemnt, which is unnecessary.

So I want to add a interval configuration means auto merge will merge segments 
outside of the interval. 

for example, if interval = 2, Auto Merge Thresholds=7, if 07-01 to 07-07 is 
built, auto merge will not trigger, when 07-09 built success, auto merge will 
trigger and merge segments from 07-01 to 07-07.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KYLIN-1808) unload non existing table cause NPE

2016-07-13 Thread fengYu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

fengYu resolved KYLIN-1808.
---
Resolution: Fixed
  Assignee: fengYu

> unload non existing table cause NPE
> ---
>
> Key: KYLIN-1808
> URL: https://issues.apache.org/jira/browse/KYLIN-1808
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v1.5.2
>Reporter: fengYu
>Assignee: fengYu
> Attachments: 0001-KYLIN-1808-unload-table-cause-NPE.patch
>
>
> in TableController.java
> private boolean unLoadHiveTable(String tableName, String project), 
> do not judge TableDesc object is null or not ..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KYLIN-1480) NPE throws while execute sql with more than two join.

2016-07-13 Thread fengYu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

fengYu resolved KYLIN-1480.
---
Resolution: Fixed

test passed both in kylin-1.3.0 and kylin-1.5.2

> NPE throws while execute sql with more than two join.
> -
>
> Key: KYLIN-1480
> URL: https://issues.apache.org/jira/browse/KYLIN-1480
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v1.4.0, v1.2, v1.1, v1.0
>Reporter: fengYu
>Assignee: fengYu
> Attachments: 0001-unit-test-case-for-KYLIN-1480.patch, 
> NPE-in-more-joins.patch
>
>
> Hi, I encounter NPE while execute sql more than two join, for example : 
> select A.type, A.cmd, count(1) from fact as A inner join (select type, 
> count(1) from fact group by type having count(1) > 2) as B on A.type = B.type 
> inner join (select cmd, count(1) from fact group by cmd having count(1) > 2) 
> as C on A.cmd = C.cmd group by A.type, A.cmd;
> the fact table is create like this : 
> CREATE TABLE `fact`(
>   `fname` string, 
>   `lname` string, 
>   `dt` date, 
>   `cost` int, 
>   `type` string, 
>   `cmd` string);
> Kylin throws exception like this : 
> Caused by: java.lang.NullPointerException
> at 
> org.apache.kylin.query.relnode.OLAPProjectRel.implementOLAP(OLAPProjectRel.java:103)
> at 
> org.apache.kylin.query.relnode.OLAPRel$OLAPImplementor.visitChild(OLAPRel.java:81)
> at 
> org.apache.kylin.query.relnode.OLAPAggregateRel.implementOLAP(OLAPAggregateRel.java:132)
> at 
> org.apache.kylin.query.relnode.OLAPRel$OLAPImplementor.visitChild(OLAPRel.java:81)
> at 
> org.apache.kylin.query.relnode.OLAPLimitRel.implementOLAP(OLAPLimitRel.java:73)
> at 
> org.apache.kylin.query.relnode.OLAPRel$OLAPImplementor.visitChild(OLAPRel.java:81)
> at 
> org.apache.kylin.query.relnode.OLAPToEnumerableConverter.implement(OLAPToEnumerableConverter.java:67)
> at 
> org.apache.calcite.adapter.enumerable.EnumerableRelImplementor.implementRoot(EnumerableRelImplementor.java:99)
> at 
> org.apache.calcite.adapter.enumerable.EnumerableInterpretable.toBindable(EnumerableInterpretable.java:92)
> at 
> org.apache.calcite.prepare.CalcitePrepareImpl$CalcitePreparingStmt.implement(CalcitePrepareImpl.java:1050)
> at org.apache.calcite.prepare.Prepare.prepareSql(Prepare.java:293)
> at org.apache.calcite.prepare.Prepare.prepareSql(Prepare.java:188)
> at 
> org.apache.calcite.prepare.CalcitePrepareImpl.prepare2_(CalcitePrepareImpl.java:671)
> at 
> org.apache.calcite.prepare.CalcitePrepareImpl.prepare_(CalcitePrepareImpl.java:572)
> at 
> org.apache.calcite.prepare.CalcitePrepareImpl.prepareSql(CalcitePrepareImpl.java:541)
> at 
> org.apache.calcite.jdbc.CalciteConnectionImpl.parseQuery(CalciteConnectionImpl.java:173)
> at 
> org.apache.calcite.jdbc.CalciteMetaImpl.prepareAndExecute(CalciteMetaImpl.java:561)
> at 
> org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:477)
> at 
> org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:109)
> I try it in kylin-1.0 and kylin-2.x-staging, the same exception throws.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (KYLIN-1480) NPE throws while execute sql with more than two join.

2016-07-13 Thread fengYu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

fengYu reassigned KYLIN-1480:
-

Assignee: fengYu  (was: liyang)

> NPE throws while execute sql with more than two join.
> -
>
> Key: KYLIN-1480
> URL: https://issues.apache.org/jira/browse/KYLIN-1480
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v1.4.0, v1.2, v1.1, v1.0
>Reporter: fengYu
>Assignee: fengYu
> Attachments: 0001-unit-test-case-for-KYLIN-1480.patch, 
> NPE-in-more-joins.patch
>
>
> Hi, I encounter NPE while execute sql more than two join, for example : 
> select A.type, A.cmd, count(1) from fact as A inner join (select type, 
> count(1) from fact group by type having count(1) > 2) as B on A.type = B.type 
> inner join (select cmd, count(1) from fact group by cmd having count(1) > 2) 
> as C on A.cmd = C.cmd group by A.type, A.cmd;
> the fact table is create like this : 
> CREATE TABLE `fact`(
>   `fname` string, 
>   `lname` string, 
>   `dt` date, 
>   `cost` int, 
>   `type` string, 
>   `cmd` string);
> Kylin throws exception like this : 
> Caused by: java.lang.NullPointerException
> at 
> org.apache.kylin.query.relnode.OLAPProjectRel.implementOLAP(OLAPProjectRel.java:103)
> at 
> org.apache.kylin.query.relnode.OLAPRel$OLAPImplementor.visitChild(OLAPRel.java:81)
> at 
> org.apache.kylin.query.relnode.OLAPAggregateRel.implementOLAP(OLAPAggregateRel.java:132)
> at 
> org.apache.kylin.query.relnode.OLAPRel$OLAPImplementor.visitChild(OLAPRel.java:81)
> at 
> org.apache.kylin.query.relnode.OLAPLimitRel.implementOLAP(OLAPLimitRel.java:73)
> at 
> org.apache.kylin.query.relnode.OLAPRel$OLAPImplementor.visitChild(OLAPRel.java:81)
> at 
> org.apache.kylin.query.relnode.OLAPToEnumerableConverter.implement(OLAPToEnumerableConverter.java:67)
> at 
> org.apache.calcite.adapter.enumerable.EnumerableRelImplementor.implementRoot(EnumerableRelImplementor.java:99)
> at 
> org.apache.calcite.adapter.enumerable.EnumerableInterpretable.toBindable(EnumerableInterpretable.java:92)
> at 
> org.apache.calcite.prepare.CalcitePrepareImpl$CalcitePreparingStmt.implement(CalcitePrepareImpl.java:1050)
> at org.apache.calcite.prepare.Prepare.prepareSql(Prepare.java:293)
> at org.apache.calcite.prepare.Prepare.prepareSql(Prepare.java:188)
> at 
> org.apache.calcite.prepare.CalcitePrepareImpl.prepare2_(CalcitePrepareImpl.java:671)
> at 
> org.apache.calcite.prepare.CalcitePrepareImpl.prepare_(CalcitePrepareImpl.java:572)
> at 
> org.apache.calcite.prepare.CalcitePrepareImpl.prepareSql(CalcitePrepareImpl.java:541)
> at 
> org.apache.calcite.jdbc.CalciteConnectionImpl.parseQuery(CalciteConnectionImpl.java:173)
> at 
> org.apache.calcite.jdbc.CalciteMetaImpl.prepareAndExecute(CalciteMetaImpl.java:561)
> at 
> org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:477)
> at 
> org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:109)
> I try it in kylin-1.0 and kylin-2.x-staging, the same exception throws.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-1890) support hbase table prefix configurable

2016-07-13 Thread fengYu (JIRA)
fengYu created KYLIN-1890:
-

 Summary: support hbase table prefix configurable
 Key: KYLIN-1890
 URL: https://issues.apache.org/jira/browse/KYLIN-1890
 Project: Kylin
  Issue Type: Improvement
  Components: General
Affects Versions: v1.5.2
Reporter: fengYu


some times we need deploy two kylin env based on same hbase, I want to change 
hbase table name prefix based two reasons:
1、different kylin env will generate the same table name
2、while clean invalid htable for one env will cause delete all tables belong to 
another env.

different kylin env use different namespace is acceptable either.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1480) NPE throws while execute sql with more than two join.

2016-07-13 Thread fengYu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376255#comment-15376255
 ] 

fengYu commented on KYLIN-1480:
---

Why this patch do not merge to 1.5.2?

> NPE throws while execute sql with more than two join.
> -
>
> Key: KYLIN-1480
> URL: https://issues.apache.org/jira/browse/KYLIN-1480
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v1.4.0, v1.2, v1.1, v1.0
>Reporter: fengYu
>Assignee: liyang
> Attachments: 0001-unit-test-case-for-KYLIN-1480.patch, 
> NPE-in-more-joins.patch
>
>
> Hi, I encounter NPE while execute sql more than two join, for example : 
> select A.type, A.cmd, count(1) from fact as A inner join (select type, 
> count(1) from fact group by type having count(1) > 2) as B on A.type = B.type 
> inner join (select cmd, count(1) from fact group by cmd having count(1) > 2) 
> as C on A.cmd = C.cmd group by A.type, A.cmd;
> the fact table is create like this : 
> CREATE TABLE `fact`(
>   `fname` string, 
>   `lname` string, 
>   `dt` date, 
>   `cost` int, 
>   `type` string, 
>   `cmd` string);
> Kylin throws exception like this : 
> Caused by: java.lang.NullPointerException
> at 
> org.apache.kylin.query.relnode.OLAPProjectRel.implementOLAP(OLAPProjectRel.java:103)
> at 
> org.apache.kylin.query.relnode.OLAPRel$OLAPImplementor.visitChild(OLAPRel.java:81)
> at 
> org.apache.kylin.query.relnode.OLAPAggregateRel.implementOLAP(OLAPAggregateRel.java:132)
> at 
> org.apache.kylin.query.relnode.OLAPRel$OLAPImplementor.visitChild(OLAPRel.java:81)
> at 
> org.apache.kylin.query.relnode.OLAPLimitRel.implementOLAP(OLAPLimitRel.java:73)
> at 
> org.apache.kylin.query.relnode.OLAPRel$OLAPImplementor.visitChild(OLAPRel.java:81)
> at 
> org.apache.kylin.query.relnode.OLAPToEnumerableConverter.implement(OLAPToEnumerableConverter.java:67)
> at 
> org.apache.calcite.adapter.enumerable.EnumerableRelImplementor.implementRoot(EnumerableRelImplementor.java:99)
> at 
> org.apache.calcite.adapter.enumerable.EnumerableInterpretable.toBindable(EnumerableInterpretable.java:92)
> at 
> org.apache.calcite.prepare.CalcitePrepareImpl$CalcitePreparingStmt.implement(CalcitePrepareImpl.java:1050)
> at org.apache.calcite.prepare.Prepare.prepareSql(Prepare.java:293)
> at org.apache.calcite.prepare.Prepare.prepareSql(Prepare.java:188)
> at 
> org.apache.calcite.prepare.CalcitePrepareImpl.prepare2_(CalcitePrepareImpl.java:671)
> at 
> org.apache.calcite.prepare.CalcitePrepareImpl.prepare_(CalcitePrepareImpl.java:572)
> at 
> org.apache.calcite.prepare.CalcitePrepareImpl.prepareSql(CalcitePrepareImpl.java:541)
> at 
> org.apache.calcite.jdbc.CalciteConnectionImpl.parseQuery(CalciteConnectionImpl.java:173)
> at 
> org.apache.calcite.jdbc.CalciteMetaImpl.prepareAndExecute(CalciteMetaImpl.java:561)
> at 
> org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:477)
> at 
> org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:109)
> I try it in kylin-1.0 and kylin-2.x-staging, the same exception throws.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1808) unload non existing table cause NPE

2016-07-13 Thread fengYu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376254#comment-15376254
 ] 

fengYu commented on KYLIN-1808:
---

upload my patch just judge null.

> unload non existing table cause NPE
> ---
>
> Key: KYLIN-1808
> URL: https://issues.apache.org/jira/browse/KYLIN-1808
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v1.5.2
>Reporter: fengYu
> Attachments: 0001-KYLIN-1808-unload-table-cause-NPE.patch
>
>
> in TableController.java
> private boolean unLoadHiveTable(String tableName, String project), 
> do not judge TableDesc object is null or not ..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KYLIN-1808) unload non existing table cause NPE

2016-07-13 Thread fengYu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

fengYu updated KYLIN-1808:
--
Attachment: 0001-KYLIN-1808-unload-table-cause-NPE.patch

> unload non existing table cause NPE
> ---
>
> Key: KYLIN-1808
> URL: https://issues.apache.org/jira/browse/KYLIN-1808
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v1.5.2
>Reporter: fengYu
> Attachments: 0001-KYLIN-1808-unload-table-cause-NPE.patch
>
>
> in TableController.java
> private boolean unLoadHiveTable(String tableName, String project), 
> do not judge TableDesc object is null or not ..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KYLIN-1888) support forward cube building

2016-07-13 Thread fengYu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

fengYu updated KYLIN-1888:
--
Attachment: 0001-forward-build-job.patch

> support forward cube building
> -
>
> Key: KYLIN-1888
> URL: https://issues.apache.org/jira/browse/KYLIN-1888
> Project: Kylin
>  Issue Type: Improvement
>Affects Versions: v1.5.2
>Reporter: fengYu
>Assignee: fengYu
> Attachments: 0001-forward-build-job.patch
>
>
> This is used when user want to see data from last some days, and then fill up 
> history data from cube start date.
> FORWARD just like reverse side of BUILD



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-1888) support forward cube building

2016-07-13 Thread fengYu (JIRA)
fengYu created KYLIN-1888:
-

 Summary: support forward cube building
 Key: KYLIN-1888
 URL: https://issues.apache.org/jira/browse/KYLIN-1888
 Project: Kylin
  Issue Type: Improvement
Affects Versions: v1.5.2
Reporter: fengYu
Assignee: fengYu


This is used when user want to see data from last some days, and then fill up 
history data from cube start date.

FORWARD just like reverse side of BUILD



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KYLIN-1826) kylin support more than one hive based on different hadoop claster

2016-07-07 Thread fengYu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

fengYu updated KYLIN-1826:
--
Attachment: 0001-KYLIN-1826-support-more-hive-based-on-different-hado.patch

 I’m sorry for keeping you waiting. I finish my patch and add an source named 
ID_EXTERNAL_HIVE, one project can only use one hive source, for the sake of 
modify less code, I add Thread local virable tells which project is using in 
current thread.

However, the patch is a litter big, It can apply or merge to kylin-1.5.2.1, I 
have test in our env. many times, And It works for more hive client based on 
two hadoop cluster, what is more, our hadoop engine(kylin calculate engin) is 
another cluster, and It works fine for me.

I wish you can do more test in other env, Hope for your feedback. 

> kylin support more than one hive based on different hadoop claster
> --
>
> Key: KYLIN-1826
> URL: https://issues.apache.org/jira/browse/KYLIN-1826
> Project: Kylin
>  Issue Type: Improvement
>  Components: Environment 
>Affects Versions: v1.5.2
>Reporter: fengYu
>Assignee: fengYu
> Attachments: 
> 0001-KYLIN-1826-support-more-hive-based-on-different-hado.patch
>
>
> Currently, kylin only support one hive which should run by 'hive' command, 
> However, when source data located in more than one hive we should deploy more 
> kylin instance and more than one metastore. which is difficult to manager and 
> may cause some conflict.
> I has been working on it Recently, In our cluster, there are some hive 
> client(different metastore) which based on different hadoop cluster, I add a 
> new hive source type which called 'external hive' in kylin 1.5.x
> Thanks to kylin Plug-in architecture in 2.x, which make this work easiler. 
> the main modification are:
> 1. add hive root directory in hive config file, external hive client exist in 
> this directory. hive named by directory name.
> 2. add hive-site.xml file while loading hive tables.
> 3. store hive name into project, one project can only take one hive as source.
> 4. change and add some job to support job building.
> I will upload my patch if I finish all my tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1833) union operation will cause error result

2016-07-05 Thread fengYu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15362481#comment-15362481
 ] 

fengYu commented on KYLIN-1833:
---

1.5.2.1 has this problem too, what is more, do some change like the patch does 
can not resolve it in 1.5, I will analysis it in new version once I am free.

> union operation will cause error result
> ---
>
> Key: KYLIN-1833
> URL: https://issues.apache.org/jira/browse/KYLIN-1833
> Project: Kylin
>  Issue Type: Improvement
>  Components: Query Engine
>Affects Versions: v1.3.0, v1.5.2
>Reporter: fengYu
>Assignee: fengYu
> Attachments: 0001-KYLIN-1833-union-query-get-error-result.patch
>
>
> query like this will get error result :
> select * from (
> select 'b',  count(1) from kylin_sales  where lstg_format_name >= 'Auction'
> union all
> select 'a',  count(1) from kylin_sales where lstg_format_name >= 'Others'
> union all 
> select 'c',  count(1) from kylin_sales  where lstg_format_name >= 'FP-GTC'
> ) order by 1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1808) unload non existing table cause NPE

2016-07-05 Thread fengYu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15362338#comment-15362338
 ] 

fengYu commented on KYLIN-1808:
---

OK, I will upload my patch later.

> unload non existing table cause NPE
> ---
>
> Key: KYLIN-1808
> URL: https://issues.apache.org/jira/browse/KYLIN-1808
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v1.5.2
>Reporter: fengYu
>
> in TableController.java
> private boolean unLoadHiveTable(String tableName, String project), 
> do not judge TableDesc object is null or not ..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (KYLIN-1280) Convert Cuboid Data to HFile failed when hbase in different HDFS

2016-07-05 Thread fengYu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

fengYu closed KYLIN-1280.
-
Resolution: Fixed

> Convert Cuboid Data to HFile failed when hbase in different HDFS
> 
>
> Key: KYLIN-1280
> URL: https://issues.apache.org/jira/browse/KYLIN-1280
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v1.4.0
>Reporter: fengYu
> Attachments: 
> 0001-transform-path-in-other-HDFS-to-real-name-node-path.patch
>
>
> I deploy kylin-2.0 with hbase which rely on a different HDFS with hadoop 
> cluster, so I config this property 'kylin.hbase.cluster.fs' = hdfs://A, the 
> name service is different with 'fs.defaultFS' in hadoop cluster which is 
> hdfs://B.
> In the step 'Convert Cuboid Data to HFile' execute failed, error log is :
> java.io.IOException: Failed to run job : Unable to map logical nameservice 
> URI 'hdfs://A' to a NameNode. Local configuration does not have a failover 
> proxy provide
> r configured.
> at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:300)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:432)
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
> at 
> org.apache.kylin.engine.mr.common.AbstractHadoopJob.waitForCompletion(AbstractHadoopJob.java:129)
> at 
> org.apache.kylin.storage.hbase.steps.CubeHFileJob.run(CubeHFileJob.java:93)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at 
> org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:119)
> at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
> at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
> at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
> at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> I think it is because node manager in hadoop cluster can not recognition 
> hdfs://A in they config. So, I have to tranform the path 
> hdfs://A/path/to/hfile to hdfs://namenode_ip:port/path/to/hfile before 
> execute this step. and it works for me.  
> Here is my patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1280) Convert Cuboid Data to HFile failed when hbase in different HDFS

2016-07-05 Thread fengYu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15362335#comment-15362335
 ] 

fengYu commented on KYLIN-1280:
---

I forget it, We will redeploy hadoop cluster too, so it is useless. I will 
close it.

> Convert Cuboid Data to HFile failed when hbase in different HDFS
> 
>
> Key: KYLIN-1280
> URL: https://issues.apache.org/jira/browse/KYLIN-1280
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v1.4.0
>Reporter: fengYu
> Attachments: 
> 0001-transform-path-in-other-HDFS-to-real-name-node-path.patch
>
>
> I deploy kylin-2.0 with hbase which rely on a different HDFS with hadoop 
> cluster, so I config this property 'kylin.hbase.cluster.fs' = hdfs://A, the 
> name service is different with 'fs.defaultFS' in hadoop cluster which is 
> hdfs://B.
> In the step 'Convert Cuboid Data to HFile' execute failed, error log is :
> java.io.IOException: Failed to run job : Unable to map logical nameservice 
> URI 'hdfs://A' to a NameNode. Local configuration does not have a failover 
> proxy provide
> r configured.
> at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:300)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:432)
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
> at 
> org.apache.kylin.engine.mr.common.AbstractHadoopJob.waitForCompletion(AbstractHadoopJob.java:129)
> at 
> org.apache.kylin.storage.hbase.steps.CubeHFileJob.run(CubeHFileJob.java:93)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at 
> org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:119)
> at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
> at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
> at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
> at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> I think it is because node manager in hadoop cluster can not recognition 
> hdfs://A in they config. So, I have to tranform the path 
> hdfs://A/path/to/hfile to hdfs://namenode_ip:port/path/to/hfile before 
> execute this step. and it works for me.  
> Here is my patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1840) project admin should has right to load table

2016-07-03 Thread fengYu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15360783#comment-15360783
 ] 

fengYu commented on KYLIN-1840:
---

I hive some other question : 
1、only admin has right to do Advance Settings while building cube
2、Every one can see the System tab, which I think it can only open to Admin
3、Slow Queries is opened to everyone, however, in backend interface, users 
except Admin will get Access is denied


> project admin should has right to load table
> 
>
> Key: KYLIN-1840
> URL: https://issues.apache.org/jira/browse/KYLIN-1840
> Project: Kylin
>  Issue Type: Bug
>  Components: REST Service
>Affects Versions: v1.5.2
>Reporter: fengYu
>Assignee: Zhong,Jason
>
> only admin has the right to load table , I try to find whether has some other 
> logical like this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-1840) project admin should has right to load table

2016-06-30 Thread fengYu (JIRA)
fengYu created KYLIN-1840:
-

 Summary: project admin should has right to load table
 Key: KYLIN-1840
 URL: https://issues.apache.org/jira/browse/KYLIN-1840
 Project: Kylin
  Issue Type: Bug
  Components: REST Service
Affects Versions: v1.5.2
Reporter: fengYu
Assignee: Zhong,Jason


only admin has the right to load table , I try to find whether has some other 
logical like this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KYLIN-1839) improvement set classpath before submitting mr job

2016-06-29 Thread fengYu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

fengYu updated KYLIN-1839:
--
Attachment: 0001-KYLIN-1839-support-extend-lib-from-HDFS-and-cache-tm.patch

patch attached, support set kylin.job.mr.lib.dir to HDFS path and cache temp 
jars 

> improvement set classpath before submitting mr job
> --
>
> Key: KYLIN-1839
> URL: https://issues.apache.org/jira/browse/KYLIN-1839
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Affects Versions: v1.5.2
>Reporter: fengYu
>Assignee: fengYu
> Attachments: 
> 0001-KYLIN-1839-support-extend-lib-from-HDFS-and-cache-tm.patch
>
>
> in setClasspath, kylin will alway find hive jars from hive dependency using 
> regex, however, this will not change in one process lifetime, so I cache the 
> location of tmpjars and tmpfiles.
> What is more, support extends user lib setting to hdfs path rather than only 
> support local filesystem which will cause upload jars every time if 
> DistributedCache do not exist.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-1839) improvement set classpath before submitting mr job

2016-06-29 Thread fengYu (JIRA)
fengYu created KYLIN-1839:
-

 Summary: improvement set classpath before submitting mr job
 Key: KYLIN-1839
 URL: https://issues.apache.org/jira/browse/KYLIN-1839
 Project: Kylin
  Issue Type: Improvement
  Components: Job Engine
Affects Versions: v1.5.2
Reporter: fengYu
Assignee: fengYu


in setClasspath, kylin will alway find hive jars from hive dependency using 
regex, however, this will not change in one process lifetime, so I cache the 
location of tmpjars and tmpfiles.

What is more, support extends user lib setting to hdfs path rather than only 
support local filesystem which will cause upload jars every time if 
DistributedCache do not exist.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KYLIN-1833) union operation will cause error result

2016-06-28 Thread fengYu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

fengYu updated KYLIN-1833:
--
Attachment: 0001-KYLIN-1833-union-query-get-error-result.patch

patch for kylin-1.3.0, do not allocate OLAPRel.OLAPImplementor every time in 
OLAPToEnumerableConverter.implement, move it to threadlocal virable.

> union operation will cause error result
> ---
>
> Key: KYLIN-1833
> URL: https://issues.apache.org/jira/browse/KYLIN-1833
> Project: Kylin
>  Issue Type: Improvement
>  Components: Query Engine
>Affects Versions: v1.3.0, v1.5.2
>Reporter: fengYu
>Assignee: fengYu
> Attachments: 0001-KYLIN-1833-union-query-get-error-result.patch
>
>
> query like this will get error result :
> select * from (
> select 'b',  count(1) from kylin_sales  where lstg_format_name >= 'Auction'
> union all
> select 'a',  count(1) from kylin_sales where lstg_format_name >= 'Others'
> union all 
> select 'c',  count(1) from kylin_sales  where lstg_format_name >= 'FP-GTC'
> ) order by 1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1833) union operation will cause error result

2016-06-28 Thread fengYu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15354387#comment-15354387
 ] 

fengYu commented on KYLIN-1833:
---

the result is error. In kylin-2.x, the result is :
a  1987
b  5844
c  5844

and the right result is :
a  2021
b  7987
c  5946

in kylin-1.x, the id of OLAPContext is incorrect because for every subquery, 
kylin allocate a OLAPRel.OLAPImplementor and OLAPContext.id always equals to 0, 
which cause the generated code has some error. I move OLAPRel.OLAPImplementor 
to ThreadLocal virable and get right result.

However, in kylin-2.x, there are seems more effects, I am trying to find the 
reason in kylin-2.x

> union operation will cause error result
> ---
>
> Key: KYLIN-1833
> URL: https://issues.apache.org/jira/browse/KYLIN-1833
> Project: Kylin
>  Issue Type: Improvement
>  Components: Query Engine
>Affects Versions: v1.3.0, v1.5.2
>Reporter: fengYu
>Assignee: fengYu
>
> query like this will get error result :
> select * from (
> select 'b',  count(1) from kylin_sales  where lstg_format_name >= 'Auction'
> union all
> select 'a',  count(1) from kylin_sales where lstg_format_name >= 'Others'
> union all 
> select 'c',  count(1) from kylin_sales  where lstg_format_name >= 'FP-GTC'
> ) order by 1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KYLIN-1832) HyperLogLog speed is too slow in encode and decode

2016-06-28 Thread fengYu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15352646#comment-15352646
 ] 

fengYu edited comment on KYLIN-1832 at 6/28/16 9:19 AM:


the max size of biggerIndexSet is half of m, and I will test about it later, if 
this use too much memory, bitmap is a better choice.



was (Author: feng_xiao_yu):
the max size of biggerIndexSet is half of m, and I will test about it later


> HyperLogLog speed is too slow in encode and decode
> --
>
> Key: KYLIN-1832
> URL: https://issues.apache.org/jira/browse/KYLIN-1832
> Project: Kylin
>  Issue Type: Improvement
>  Components: Metadata
>Affects Versions: v1.3.0, v1.5.2
>Reporter: fengYu
>Assignee: fengYu
> Attachments: HyperLogLogPlusCounter.java
>
>
> We have a cube with more than ten distinct count measure, and use hll15 store 
> the value, we found it is too slow of HyperLogLogPlusCounter, there are three 
> methods will called frequentlly: merge/writeRegisters/readRegisters.
> I found in kylin-1.5.x add a parameter 'singleBucket' to store the only one 
> bucket which can optimize base cuboid.
> However, in other step of cuboid building, it will slow down. I has modify 
> the code to speed up the speed of three operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1832) HyperLogLog speed is too slow in encode and decode

2016-06-28 Thread fengYu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15352646#comment-15352646
 ] 

fengYu commented on KYLIN-1832:
---

the max size of biggerIndexSet is half of m, and I will test about it later


> HyperLogLog speed is too slow in encode and decode
> --
>
> Key: KYLIN-1832
> URL: https://issues.apache.org/jira/browse/KYLIN-1832
> Project: Kylin
>  Issue Type: Improvement
>  Components: Metadata
>Affects Versions: v1.3.0, v1.5.2
>Reporter: fengYu
>Assignee: fengYu
> Attachments: HyperLogLogPlusCounter.java
>
>
> We have a cube with more than ten distinct count measure, and use hll15 store 
> the value, we found it is too slow of HyperLogLogPlusCounter, there are three 
> methods will called frequentlly: merge/writeRegisters/readRegisters.
> I found in kylin-1.5.x add a parameter 'singleBucket' to store the only one 
> bucket which can optimize base cuboid.
> However, in other step of cuboid building, it will slow down. I has modify 
> the code to speed up the speed of three operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1832) HyperLogLog speed is too slow in encode and decode

2016-06-28 Thread fengYu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15352592#comment-15352592
 ] 

fengYu commented on KYLIN-1832:
---

I will upload the patch for 1.x and 2.x, but replace the whole file is ok if 
you can accept this kind of implemention.

> HyperLogLog speed is too slow in encode and decode
> --
>
> Key: KYLIN-1832
> URL: https://issues.apache.org/jira/browse/KYLIN-1832
> Project: Kylin
>  Issue Type: Improvement
>  Components: Metadata
>Affects Versions: v1.3.0, v1.5.2
>Reporter: fengYu
>Assignee: fengYu
> Attachments: HyperLogLogPlusCounter.java
>
>
> We have a cube with more than ten distinct count measure, and use hll15 store 
> the value, we found it is too slow of HyperLogLogPlusCounter, there are three 
> methods will called frequentlly: merge/writeRegisters/readRegisters.
> I found in kylin-1.5.x add a parameter 'singleBucket' to store the only one 
> bucket which can optimize base cuboid.
> However, in other step of cuboid building, it will slow down. I has modify 
> the code to speed up the speed of three operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-1833) union operation will cause error result

2016-06-28 Thread fengYu (JIRA)
fengYu created KYLIN-1833:
-

 Summary: union operation will cause error result
 Key: KYLIN-1833
 URL: https://issues.apache.org/jira/browse/KYLIN-1833
 Project: Kylin
  Issue Type: Improvement
  Components: Query Engine
Affects Versions: v1.5.2, v1.3.0
Reporter: fengYu
Assignee: fengYu


query like this will get error result :

select * from (
select 'b',  count(1) from kylin_sales  where lstg_format_name >= 'Auction'
union all
select 'a',  count(1) from kylin_sales where lstg_format_name >= 'Others'
union all 
select 'c',  count(1) from kylin_sales  where lstg_format_name >= 'FP-GTC'
) order by 1

I will work with it and upload my patch later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KYLIN-1833) union operation will cause error result

2016-06-28 Thread fengYu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

fengYu updated KYLIN-1833:
--
Description: 
query like this will get error result :

select * from (
select 'b',  count(1) from kylin_sales  where lstg_format_name >= 'Auction'
union all
select 'a',  count(1) from kylin_sales where lstg_format_name >= 'Others'
union all 
select 'c',  count(1) from kylin_sales  where lstg_format_name >= 'FP-GTC'
) order by 1


  was:
query like this will get error result :

select * from (
select 'b',  count(1) from kylin_sales  where lstg_format_name >= 'Auction'
union all
select 'a',  count(1) from kylin_sales where lstg_format_name >= 'Others'
union all 
select 'c',  count(1) from kylin_sales  where lstg_format_name >= 'FP-GTC'
) order by 1

I will work with it and upload my patch later.


> union operation will cause error result
> ---
>
> Key: KYLIN-1833
> URL: https://issues.apache.org/jira/browse/KYLIN-1833
> Project: Kylin
>  Issue Type: Improvement
>  Components: Query Engine
>Affects Versions: v1.3.0, v1.5.2
>Reporter: fengYu
>Assignee: fengYu
>
> query like this will get error result :
> select * from (
> select 'b',  count(1) from kylin_sales  where lstg_format_name >= 'Auction'
> union all
> select 'a',  count(1) from kylin_sales where lstg_format_name >= 'Others'
> union all 
> select 'c',  count(1) from kylin_sales  where lstg_format_name >= 'FP-GTC'
> ) order by 1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KYLIN-1832) HyperLogLog speed is too slow in encode and decode

2016-06-28 Thread fengYu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

fengYu updated KYLIN-1832:
--
Attachment: HyperLogLogPlusCounter.java

> HyperLogLog speed is too slow in encode and decode
> --
>
> Key: KYLIN-1832
> URL: https://issues.apache.org/jira/browse/KYLIN-1832
> Project: Kylin
>  Issue Type: Improvement
>  Components: Metadata
>Affects Versions: v1.3.0, v1.5.2
>Reporter: fengYu
>Assignee: fengYu
> Attachments: HyperLogLogPlusCounter.java
>
>
> We have a cube with more than ten distinct count measure, and use hll15 store 
> the value, we found it is too slow of HyperLogLogPlusCounter, there are three 
> methods will called frequentlly: merge/writeRegisters/readRegisters.
> I found in kylin-1.5.x add a parameter 'singleBucket' to store the only one 
> bucket which can optimize base cuboid.
> However, in other step of cuboid building, it will slow down. I has modify 
> the code to speed up the speed of three operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-1832) HyperLogLog speed is too slow in encode and decode

2016-06-28 Thread fengYu (JIRA)
fengYu created KYLIN-1832:
-

 Summary: HyperLogLog speed is too slow in encode and decode
 Key: KYLIN-1832
 URL: https://issues.apache.org/jira/browse/KYLIN-1832
 Project: Kylin
  Issue Type: Improvement
  Components: Metadata
Affects Versions: v1.5.2, v1.3.0
Reporter: fengYu
Assignee: fengYu


We have a cube with more than ten distinct count measure, and use hll15 store 
the value, we found it is too slow of HyperLogLogPlusCounter, there are three 
methods will called frequentlly: merge/writeRegisters/readRegisters.

I found in kylin-1.5.x add a parameter 'singleBucket' to store the only one 
bucket which can optimize base cuboid.

However, in other step of cuboid building, it will slow down. I has modify the 
code to speed up the speed of three operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-1826) kylin support more than one hive based on different hadoop claster

2016-06-27 Thread fengYu (JIRA)
fengYu created KYLIN-1826:
-

 Summary: kylin support more than one hive based on different 
hadoop claster
 Key: KYLIN-1826
 URL: https://issues.apache.org/jira/browse/KYLIN-1826
 Project: Kylin
  Issue Type: Improvement
  Components: Environment 
Affects Versions: v1.5.2
Reporter: fengYu
Assignee: fengYu


Currently, kylin only support one hive which should run by 'hive' command, 
However, when source data located in more than one hive we should deploy more 
kylin instance and more than one metastore. which is difficult to manager and 
may cause some conflict.

I has been working on it Recently, In our cluster, there are some hive 
client(different metastore) which based on different hadoop cluster, I add a 
new hive source type which called 'external hive' in kylin 1.5.x

Thanks to kylin Plug-in architecture in 2.x, which make this work easiler. the 
main modification are:
1. add hive root directory in hive config file, external hive client exist in 
this directory. hive named by directory name.
2. add hive-site.xml file while loading hive tables.
3. store hive name into project, one project can only take one hive as source.
4. change and add some job to support job building.

I will upload my patch if I finish all my tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-1808) unload non existing table cause NPE

2016-06-20 Thread fengYu (JIRA)
fengYu created KYLIN-1808:
-

 Summary: unload non existing table cause NPE
 Key: KYLIN-1808
 URL: https://issues.apache.org/jira/browse/KYLIN-1808
 Project: Kylin
  Issue Type: Bug
Affects Versions: v1.5.2
Reporter: fengYu


in TableController.java
private boolean unLoadHiveTable(String tableName, String project), 

do not judge TableDesc object is null or not ..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KYLIN-1685) error happens while execute a sql contains '?' using Statement

2016-05-12 Thread fengYu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

fengYu updated KYLIN-1685:
--
Attachment: 0003-KYLIN-1685-error-happens-while-execute-a-sql-contain.patch

add my patch and test case

> error happens while execute a sql contains '?' using Statement
> --
>
> Key: KYLIN-1685
> URL: https://issues.apache.org/jira/browse/KYLIN-1685
> Project: Kylin
>  Issue Type: Bug
>  Components: Driver - JDBC
>Affects Versions: v1.2, v1.5.1
>Reporter: fengYu
> Attachments: 
> 0003-KYLIN-1685-error-happens-while-execute-a-sql-contain.patch
>
>
> Exception happen : 
> java.sql.SQLException: Error while executing SQL "select * from test_table 
> where url not in ('http://a.b.com/?a=b')": 
> org.apache.kylin.jdbc.KylinStatement cannot be cast to 
> org.apache.kylin.jdbc.KylinPreparedStatement
>   at org.apache.kylin.jdbc.KylinResultSet.execute(KylinResultSet.java:54)
>   at 
> org.apache.calcite.avatica.AvaticaConnection$1.execute(AvaticaConnection.java:566)
>   at org.apache.kylin.jdbc.KylinMeta.prepareAndExecute(KylinMeta.java:79)
>   at 
> org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:571)
>   at 
> org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:135)
>   at 
> org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:186)
>   at 
> org.apache.kylin.jdbc.DriverTest.testStatementWithQuestionMask(DriverTest.java:79)
> This caused by kylin jdbc will take a sql contain '?' as PreparedStatement 
> and cast as it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KYLIN-1685) error happens while execute a sql contains '?' using Statement

2016-05-12 Thread fengYu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

fengYu updated KYLIN-1685:
--
Attachment: (was: 
0003-KYLIN-1685-error-happens-while-execute-a-sql-contain.patch)

> error happens while execute a sql contains '?' using Statement
> --
>
> Key: KYLIN-1685
> URL: https://issues.apache.org/jira/browse/KYLIN-1685
> Project: Kylin
>  Issue Type: Bug
>  Components: Driver - JDBC
>Affects Versions: v1.2, v1.5.1
>Reporter: fengYu
>
> Exception happen : 
> java.sql.SQLException: Error while executing SQL "select * from test_table 
> where url not in ('http://a.b.com/?a=b')": 
> org.apache.kylin.jdbc.KylinStatement cannot be cast to 
> org.apache.kylin.jdbc.KylinPreparedStatement
>   at org.apache.kylin.jdbc.KylinResultSet.execute(KylinResultSet.java:54)
>   at 
> org.apache.calcite.avatica.AvaticaConnection$1.execute(AvaticaConnection.java:566)
>   at org.apache.kylin.jdbc.KylinMeta.prepareAndExecute(KylinMeta.java:79)
>   at 
> org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:571)
>   at 
> org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:135)
>   at 
> org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:186)
>   at 
> org.apache.kylin.jdbc.DriverTest.testStatementWithQuestionMask(DriverTest.java:79)
> This caused by kylin jdbc will take a sql contain '?' as PreparedStatement 
> and cast as it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KYLIN-1685) error happens while execute a sql contains '?' using Statement

2016-05-12 Thread fengYu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

fengYu updated KYLIN-1685:
--
Attachment: 0003-KYLIN-1685-error-happens-while-execute-a-sql-contain.patch

> error happens while execute a sql contains '?' using Statement
> --
>
> Key: KYLIN-1685
> URL: https://issues.apache.org/jira/browse/KYLIN-1685
> Project: Kylin
>  Issue Type: Bug
>  Components: Driver - JDBC
>Affects Versions: v1.2, v1.5.1
>Reporter: fengYu
> Attachments: 
> 0003-KYLIN-1685-error-happens-while-execute-a-sql-contain.patch
>
>
> Exception happen : 
> java.sql.SQLException: Error while executing SQL "select * from test_table 
> where url not in ('http://a.b.com/?a=b')": 
> org.apache.kylin.jdbc.KylinStatement cannot be cast to 
> org.apache.kylin.jdbc.KylinPreparedStatement
>   at org.apache.kylin.jdbc.KylinResultSet.execute(KylinResultSet.java:54)
>   at 
> org.apache.calcite.avatica.AvaticaConnection$1.execute(AvaticaConnection.java:566)
>   at org.apache.kylin.jdbc.KylinMeta.prepareAndExecute(KylinMeta.java:79)
>   at 
> org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:571)
>   at 
> org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:135)
>   at 
> org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:186)
>   at 
> org.apache.kylin.jdbc.DriverTest.testStatementWithQuestionMask(DriverTest.java:79)
> This caused by kylin jdbc will take a sql contain '?' as PreparedStatement 
> and cast as it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-1685) error happens while execute a sql contains '?' using Statement

2016-05-12 Thread fengYu (JIRA)
fengYu created KYLIN-1685:
-

 Summary: error happens while execute a sql contains '?' using 
Statement
 Key: KYLIN-1685
 URL: https://issues.apache.org/jira/browse/KYLIN-1685
 Project: Kylin
  Issue Type: Bug
  Components: Driver - JDBC
Affects Versions: v1.5.1, v1.2
Reporter: fengYu


Exception happen : 
java.sql.SQLException: Error while executing SQL "select * from test_table 
where url not in ('http://a.b.com/?a=b')": org.apache.kylin.jdbc.KylinStatement 
cannot be cast to org.apache.kylin.jdbc.KylinPreparedStatement
at org.apache.kylin.jdbc.KylinResultSet.execute(KylinResultSet.java:54)
at 
org.apache.calcite.avatica.AvaticaConnection$1.execute(AvaticaConnection.java:566)
at org.apache.kylin.jdbc.KylinMeta.prepareAndExecute(KylinMeta.java:79)
at 
org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:571)
at 
org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:135)
at 
org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:186)
at 
org.apache.kylin.jdbc.DriverTest.testStatementWithQuestionMask(DriverTest.java:79)

This caused by kylin jdbc will take a sql contain '?' as PreparedStatement and 
cast as it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >