[jira] [Commented] (KYLIN-2929) speed up Dump file performance
[ https://issues.apache.org/jira/browse/KYLIN-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16342967#comment-16342967 ] fengYu commented on KYLIN-2929: --- Upload new patch, please review it if you are free. > speed up Dump file performance > -- > > Key: KYLIN-2929 > URL: https://issues.apache.org/jira/browse/KYLIN-2929 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v2.0.0 >Reporter: fengYu >Assignee: fengYu >Priority: Major > Labels: Performance > Fix For: v2.3.0 > > Attachments: > 0001-KYLIN-2929-speed-up-dump-performance-write-dump-file.patch > > > when I work on KYLIN-2926, I find coprocessor will dump to disk once > estimatedMemSize is bigger than spillThreshold, and found that spill data > size is extraordinary smaller that estimatedMemSize, in my case dump file > size is about 8MB and spillThreshold is setting to 3GB. > So, I try to keep the spill data in memory rather than write the file to disk > immediately, and when those in-memory spill data reach the threshold, write > all spill files together. > In my case, the coprocessor process cost time drop from 22s to 16s, it is > about 30% upgrade。 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KYLIN-2929) speed up Dump file performance
[ https://issues.apache.org/jira/browse/KYLIN-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fengYu updated KYLIN-2929: -- Attachment: 0001-KYLIN-2929-speed-up-dump-performance-write-dump-file.patch > speed up Dump file performance > -- > > Key: KYLIN-2929 > URL: https://issues.apache.org/jira/browse/KYLIN-2929 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v2.0.0 >Reporter: fengYu >Assignee: fengYu >Priority: Major > Labels: Performance > Fix For: v2.3.0 > > Attachments: > 0001-KYLIN-2929-speed-up-dump-performance-write-dump-file.patch > > > when I work on KYLIN-2926, I find coprocessor will dump to disk once > estimatedMemSize is bigger than spillThreshold, and found that spill data > size is extraordinary smaller that estimatedMemSize, in my case dump file > size is about 8MB and spillThreshold is setting to 3GB. > So, I try to keep the spill data in memory rather than write the file to disk > immediately, and when those in-memory spill data reach the threshold, write > all spill files together. > In my case, the coprocessor process cost time drop from 22s to 16s, it is > about 30% upgrade。 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KYLIN-2929) speed up Dump file performance
[ https://issues.apache.org/jira/browse/KYLIN-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fengYu updated KYLIN-2929: -- Attachment: (was: 0002-KYLIN-2929-speed-up-dump-performance-write-dump-file.patch) > speed up Dump file performance > -- > > Key: KYLIN-2929 > URL: https://issues.apache.org/jira/browse/KYLIN-2929 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v2.0.0 >Reporter: fengYu >Assignee: fengYu >Priority: Major > Labels: Performance > Fix For: v2.3.0 > > Attachments: > 0001-KYLIN-2929-speed-up-dump-performance-write-dump-file.patch > > > when I work on KYLIN-2926, I find coprocessor will dump to disk once > estimatedMemSize is bigger than spillThreshold, and found that spill data > size is extraordinary smaller that estimatedMemSize, in my case dump file > size is about 8MB and spillThreshold is setting to 3GB. > So, I try to keep the spill data in memory rather than write the file to disk > immediately, and when those in-memory spill data reach the threshold, write > all spill files together. > In my case, the coprocessor process cost time drop from 22s to 16s, it is > about 30% upgrade。 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-2929) speed up Dump file performance
[ https://issues.apache.org/jira/browse/KYLIN-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16337544#comment-16337544 ] fengYu commented on KYLIN-2929: --- Sorry for my bad memory, I will do this review this week. > speed up Dump file performance > -- > > Key: KYLIN-2929 > URL: https://issues.apache.org/jira/browse/KYLIN-2929 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v2.0.0 >Reporter: fengYu >Assignee: fengYu >Priority: Major > Labels: Performance > Attachments: > 0002-KYLIN-2929-speed-up-dump-performance-write-dump-file.patch > > > when I work on KYLIN-2926, I find coprocessor will dump to disk once > estimatedMemSize is bigger than spillThreshold, and found that spill data > size is extraordinary smaller that estimatedMemSize, in my case dump file > size is about 8MB and spillThreshold is setting to 3GB. > So, I try to keep the spill data in memory rather than write the file to disk > immediately, and when those in-memory spill data reach the threshold, write > all spill files together. > In my case, the coprocessor process cost time drop from 22s to 16s, it is > about 30% upgrade。 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KYLIN-2950) Change build engine smoothly
[ https://issues.apache.org/jira/browse/KYLIN-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fengYu updated KYLIN-2950: -- Attachment: 0001-KYLIN-2950-Change-build-engine-smoothly.patch > Change build engine smoothly > > > Key: KYLIN-2950 > URL: https://issues.apache.org/jira/browse/KYLIN-2950 > Project: Kylin > Issue Type: Bug >Affects Versions: v2.0.0 >Reporter: fengYu >Assignee: fengYu > Attachments: 0001-KYLIN-2950-Change-build-engine-smoothly.patch > > > Currently, we can not change build engine without disable cube and purging > all existing segments. But it is expensive。 > After my test building cube with MR engine and Spark engine, there generate > the same result, and Merge job always use mr engine. Hence, I think take > engineType as a part of cube's Signature is improper。 > I change the code and modify cube json. it work will with MR and spark engine > Alternately。 > TO BE DONE:modify web UI. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KYLIN-2950) Change build engine smoothly
fengYu created KYLIN-2950: - Summary: Change build engine smoothly Key: KYLIN-2950 URL: https://issues.apache.org/jira/browse/KYLIN-2950 Project: Kylin Issue Type: Bug Affects Versions: v2.0.0 Reporter: fengYu Assignee: fengYu Currently, we can not change build engine without disable cube and purging all existing segments. But it is expensive。 After my test building cube with MR engine and Spark engine, there generate the same result, and Merge job always use mr engine. Hence, I think take engineType as a part of cube's Signature is improper。 I change the code and modify cube json. it work will with MR and spark engine Alternately。 TO BE DONE:modify web UI. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KYLIN-2926) DumpMerger return incorrect results
[ https://issues.apache.org/jira/browse/KYLIN-2926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16207240#comment-16207240 ] fengYu commented on KYLIN-2926: --- ok, for a quickly fix, I think the patch is useful, a new jira need to be created. > DumpMerger return incorrect results > --- > > Key: KYLIN-2926 > URL: https://issues.apache.org/jira/browse/KYLIN-2926 > Project: Kylin > Issue Type: Bug >Affects Versions: v2.0.0 >Reporter: fengYu >Assignee: fengYu > Attachments: > 0001-KYLIN-2926-DumpMerger-return-incorrect-results-creat.patch > > > I our scenario, a cube query will get wrong result once coprocessor need to > spill to disk, Our version is 2.0.0 and I find the root cause is that in > DumpMerger.enqueueFromDump > because in DataTypeSerializer kylin use a ThreadLocal variable ‘current’, It > leading to different elements in dumpCurrentValues share the same object, so > next fill up measure values will change the existing values. > the incorrect measures is HLLC and raw, which use current variable in > deserialize. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KYLIN-1892) merge interval support
[ https://issues.apache.org/jira/browse/KYLIN-1892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16206955#comment-16206955 ] fengYu commented on KYLIN-1892: --- sorry for delay, you can finish it if you have planed, Thanks. > merge interval support > -- > > Key: KYLIN-1892 > URL: https://issues.apache.org/jira/browse/KYLIN-1892 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Affects Versions: v1.5.2 >Reporter: fengYu >Assignee: Yang Hao > > We always has some data need to be amended some days later > in current kylin, once I set Auto Merge Thresholds, the segment newly build > will merge if reach Thresholds, the next day refresh will refresh merged > segemnt, which is unnecessary. > So I want to add a interval configuration means auto merge will merge > segments outside of the interval. > for example, if interval = 2, Auto Merge Thresholds=7, if 07-01 to 07-07 is > built, auto merge will not trigger, when 07-09 built success, auto merge will > trigger and merge segments from 07-01 to 07-07. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KYLIN-2926) DumpMerger return incorrect results
[ https://issues.apache.org/jira/browse/KYLIN-2926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16206944#comment-16206944 ] fengYu commented on KYLIN-2926: --- This is what I means on above response, I think create a codec for every dump is a good way too, However, for the finally solving the problem, remove the ThreadLocal is a better way, which can avoid the trap for the following delevoper. > DumpMerger return incorrect results > --- > > Key: KYLIN-2926 > URL: https://issues.apache.org/jira/browse/KYLIN-2926 > Project: Kylin > Issue Type: Bug >Affects Versions: v2.0.0 >Reporter: fengYu >Assignee: fengYu > Attachments: > 0001-KYLIN-2926-DumpMerger-return-incorrect-results-creat.patch > > > I our scenario, a cube query will get wrong result once coprocessor need to > spill to disk, Our version is 2.0.0 and I find the root cause is that in > DumpMerger.enqueueFromDump > because in DataTypeSerializer kylin use a ThreadLocal variable ‘current’, It > leading to different elements in dumpCurrentValues share the same object, so > next fill up measure values will change the existing values. > the incorrect measures is HLLC and raw, which use current variable in > deserialize. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KYLIN-2926) DumpMerger return incorrect results
[ https://issues.apache.org/jira/browse/KYLIN-2926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16206898#comment-16206898 ] fengYu commented on KYLIN-2926: --- [~Shaofengshi] I am totally agree with you, but there are some more places refer to HLLC and RAW, it need to more test to cover those. I quickly fix it by this patch, I think the bug is very serious, A big cube which contains HLLC or RAW measure(once coprocessor need dump data) maybe be return incorrect results. > DumpMerger return incorrect results > --- > > Key: KYLIN-2926 > URL: https://issues.apache.org/jira/browse/KYLIN-2926 > Project: Kylin > Issue Type: Bug >Affects Versions: v2.0.0 >Reporter: fengYu >Assignee: fengYu > Attachments: > 0001-KYLIN-2926-DumpMerger-return-incorrect-results-creat.patch > > > I our scenario, a cube query will get wrong result once coprocessor need to > spill to disk, Our version is 2.0.0 and I find the root cause is that in > DumpMerger.enqueueFromDump > because in DataTypeSerializer kylin use a ThreadLocal variable ‘current’, It > leading to different elements in dumpCurrentValues share the same object, so > next fill up measure values will change the existing values. > the incorrect measures is HLLC and raw, which use current variable in > deserialize. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KYLIN-2929) speed up Dump file performance
[ https://issues.apache.org/jira/browse/KYLIN-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fengYu updated KYLIN-2929: -- Attachment: 0002-KYLIN-2929-speed-up-dump-performance-write-dump-file.patch this is my patch and test result : run the same sql three times and watch coprocessor process time. Before : 2017-10-13 16:53:34,986 INFO [kylin-coproc--pool5-t70] v2.CubeHBaseEndpointRPC:200 : Endpoint RPC returned from HTable V200_NEW_KYLIN_KIJXSDW18F Shard \x56\x32\x30\x30\x5F\x4E\x45\x57\x5F\x4B\x59\x4C\x49\x4E\x5F\x4B\x49\x4A\x58\x53\x44\x57\x31\x38\x46\x2C\x00\x01\x2C\x31\x35\x30\x37\x37\x30\x33\x35\x32\x33\x36\x39\x35\x2E\x65\x39\x37\x63\x64\x38\x34\x32\x62\x33\x61\x63\x37\x63\x66\x30\x32\x38\x31\x64\x36\x32\x66\x38\x31\x63\x62\x36\x61\x38\x64\x39\x2E on host: db-53.photo.163.org.Total scanned row: 634776. Total scanned bytes: 134956120. Total filtered/aggred row: 527872. Time elapsed in EP: 19082(ms). Server CPU usage: 0.0, server physical mem left: 1.381179392E9, server swap mem left:2.075181056E9.Etc message: start latency: 34@57,agg done@18265,compress done@19081,server stats done@19081, debugGitTag:a08e52e24c99f312eaa63bd3f9ef4cdc53fa2a67;.Normal Complete: true.Compressed row size: 13954413 2017-10-13 16:55:30,633 INFO [kylin-coproc--pool5-t72] v2.CubeHBaseEndpointRPC:200 : Endpoint RPC returned from HTable V200_NEW_KYLIN_KIJXSDW18F Shard \x56\x32\x30\x30\x5F\x4E\x45\x57\x5F\x4B\x59\x4C\x49\x4E\x5F\x4B\x49\x4A\x58\x53\x44\x57\x31\x38\x46\x2C\x00\x01\x2C\x31\x35\x30\x37\x37\x30\x33\x35\x32\x33\x36\x39\x35\x2E\x65\x39\x37\x63\x64\x38\x34\x32\x62\x33\x61\x63\x37\x63\x66\x30\x32\x38\x31\x64\x36\x32\x66\x38\x31\x63\x62\x36\x61\x38\x64\x39\x2E on host: db-53.photo.163.org.Total scanned row: 634776. Total scanned bytes: 134956120. Total filtered/aggred row: 527872. Time elapsed in EP: 17371(ms). Server CPU usage: 0.08703703703703704, server physical mem left: 1.340674048E9, server swap mem left:2.075181056E9.Etc message: start latency: 12@3,agg done@16586,compress done@17371,server stats done@17371, debugGitTag:a08e52e24c99f312eaa63bd3f9ef4cdc53fa2a67;.Normal Complete: true.Compressed row size: 13954413 2017-10-13 16:56:33,382 INFO [kylin-coproc--pool5-t74] v2.CubeHBaseEndpointRPC:200 : Endpoint RPC returned from HTable V200_NEW_KYLIN_KIJXSDW18F Shard \x56\x32\x30\x30\x5F\x4E\x45\x57\x5F\x4B\x59\x4C\x49\x4E\x5F\x4B\x49\x4A\x58\x53\x44\x57\x31\x38\x46\x2C\x00\x01\x2C\x31\x35\x30\x37\x37\x30\x33\x35\x32\x33\x36\x39\x35\x2E\x65\x39\x37\x63\x64\x38\x34\x32\x62\x33\x61\x63\x37\x63\x66\x30\x32\x38\x31\x64\x36\x32\x66\x38\x31\x63\x62\x36\x61\x38\x64\x39\x2E on host: db-53.photo.163.org.Total scanned row: 634776. Total scanned bytes: 134956120. Total filtered/aggred row: 527872. Time elapsed in EP: 17184(ms). Server CPU usage: 0.0624334964886146, server physical mem left: 1.320890368E9, server swap mem left:2.075181056E9.Etc message: start latency: 12@1,agg done@16397,compress done@17184,server stats done@17184, debugGitTag:a08e52e24c99f312eaa63bd3f9ef4cdc53fa2a67;.Normal Complete: true.Compressed row size: 13954413 After : 2017-10-13 17:01:05,660 INFO [kylin-coproc--pool5-t76] v2.CubeHBaseEndpointRPC:200 : Endpoint RPC returned from HTable V200_NEW_KYLIN_KIJXSDW18F Shard \x56\x32\x30\x30\x5F\x4E\x45\x57\x5F\x4B\x59\x4C\x49\x4E\x5F\x4B\x49\x4A\x58\x53\x44\x57\x31\x38\x46\x2C\x00\x01\x2C\x31\x35\x30\x37\x37\x30\x33\x35\x32\x33\x36\x39\x35\x2E\x65\x39\x37\x63\x64\x38\x34\x32\x62\x33\x61\x63\x37\x63\x66\x30\x32\x38\x31\x64\x36\x32\x66\x38\x31\x63\x62\x36\x61\x38\x64\x39\x2E on host: db-53.photo.163.org.Total scanned row: 634776. Total scanned bytes: 134956120. Total filtered/aggred row: 527872. Time elapsed in EP: 12253(ms). Server CPU usage: 0.0900900900900901, server physical mem left: 1.328091136E9, server swap mem left:2.075181056E9.Etc message: start latency: 33@58,agg done@11463,compress done@12253,server stats done@12253, debugGitTag:a08e52e24c99f312eaa63bd3f9ef4cdc53fa2a67;.Normal Complete: true.Compressed row size: 13954413 2017-10-13 17:02:05,746 INFO [kylin-coproc--pool5-t78] v2.CubeHBaseEndpointRPC:200 : Endpoint RPC returned from HTable V200_NEW_KYLIN_KIJXSDW18F Shard \x56\x32\x30\x30\x5F\x4E\x45\x57\x5F\x4B\x59\x4C\x49\x4E\x5F\x4B\x49\x4A\x58\x53\x44\x57\x31\x38\x46\x2C\x00\x01\x2C\x31\x35\x30\x37\x37\x30\x33\x35\x32\x33\x36\x39\x35\x2E\x65\x39\x37\x63\x64\x38\x34\x32\x62\x33\x61\x63\x37\x63\x66\x30\x32\x38\x31\x64\x36\x32\x66\x38\x31\x63\x62\x36\x61\x38\x64\x39\x2E on host: db-53.photo.163.org.Total scanned row: 634776. Total scanned bytes: 134956120. Total filtered/aggred row: 527872. Time elapsed in EP: 11394(ms). Server CPU usage: 0.09580838323353294, server physical mem left: 1.10680064E9, server swap mem left:2.075181056E9.Etc message: start latency: 12@3,agg done@10605,compress done@11394,server stats done@11394, debugGitTag:a08e52e24c99f312eaa63bd3f9ef4cdc53fa2a67;.Normal
[jira] [Updated] (KYLIN-2926) DumpMerger return incorrect results
[ https://issues.apache.org/jira/browse/KYLIN-2926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fengYu updated KYLIN-2926: -- Attachment: 0001-KYLIN-2926-DumpMerger-return-incorrect-results-creat.patch create codec object for each dump to avoid shared variable. > DumpMerger return incorrect results > --- > > Key: KYLIN-2926 > URL: https://issues.apache.org/jira/browse/KYLIN-2926 > Project: Kylin > Issue Type: Bug >Affects Versions: v2.0.0 >Reporter: fengYu >Assignee: fengYu > Attachments: > 0001-KYLIN-2926-DumpMerger-return-incorrect-results-creat.patch > > > I our scenario, a cube query will get wrong result once coprocessor need to > spill to disk, Our version is 2.0.0 and I find the root cause is that in > DumpMerger.enqueueFromDump > because in DataTypeSerializer kylin use a ThreadLocal variable ‘current’, It > leading to different elements in dumpCurrentValues share the same object, so > next fill up measure values will change the existing values. > the incorrect measures is HLLC and raw, which use current variable in > deserialize. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KYLIN-2929) speed up Dump file performance
[ https://issues.apache.org/jira/browse/KYLIN-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fengYu updated KYLIN-2929: -- Component/s: Query Engine > speed up Dump file performance > -- > > Key: KYLIN-2929 > URL: https://issues.apache.org/jira/browse/KYLIN-2929 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v2.0.0 >Reporter: fengYu >Assignee: fengYu > Labels: Performance > > when I work on KYLIN-2926, I find coprocessor will dump to disk once > estimatedMemSize is bigger than spillThreshold, and found that spill data > size is extraordinary smaller that estimatedMemSize, in my case dump file > size is about 8MB and spillThreshold is setting to 3GB. > So, I try to keep the spill data in memory rather than write the file to disk > immediately, and when those in-memory spill data reach the threshold, write > all spill files together. > In my case, the coprocessor process cost time drop from 22s to 16s, it is > about 30% upgrade。 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KYLIN-2929) speed up Dump file performance
[ https://issues.apache.org/jira/browse/KYLIN-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fengYu updated KYLIN-2929: -- Affects Version/s: v2.0.0 > speed up Dump file performance > -- > > Key: KYLIN-2929 > URL: https://issues.apache.org/jira/browse/KYLIN-2929 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v2.0.0 >Reporter: fengYu >Assignee: fengYu > Labels: Performance > > when I work on KYLIN-2926, I find coprocessor will dump to disk once > estimatedMemSize is bigger than spillThreshold, and found that spill data > size is extraordinary smaller that estimatedMemSize, in my case dump file > size is about 8MB and spillThreshold is setting to 3GB. > So, I try to keep the spill data in memory rather than write the file to disk > immediately, and when those in-memory spill data reach the threshold, write > all spill files together. > In my case, the coprocessor process cost time drop from 22s to 16s, it is > about 30% upgrade。 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KYLIN-2929) speed up Dump file performance
[ https://issues.apache.org/jira/browse/KYLIN-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fengYu updated KYLIN-2929: -- Labels: Performance (was: ) > speed up Dump file performance > -- > > Key: KYLIN-2929 > URL: https://issues.apache.org/jira/browse/KYLIN-2929 > Project: Kylin > Issue Type: Bug >Reporter: fengYu >Assignee: fengYu > Labels: Performance > > when I work on KYLIN-2926, I find coprocessor will dump to disk once > estimatedMemSize is bigger than spillThreshold, and found that spill data > size is extraordinary smaller that estimatedMemSize, in my case dump file > size is about 8MB and spillThreshold is setting to 3GB. > So, I try to keep the spill data in memory rather than write the file to disk > immediately, and when those in-memory spill data reach the threshold, write > all spill files together. > In my case, the coprocessor process cost time drop from 22s to 16s, it is > about 30% upgrade。 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KYLIN-2929) speed up Dump file performance
fengYu created KYLIN-2929: - Summary: speed up Dump file performance Key: KYLIN-2929 URL: https://issues.apache.org/jira/browse/KYLIN-2929 Project: Kylin Issue Type: Bug Reporter: fengYu Assignee: fengYu when I work on KYLIN-2926, I find coprocessor will dump to disk once estimatedMemSize is bigger than spillThreshold, and found that spill data size is extraordinary smaller that estimatedMemSize, in my case dump file size is about 8MB and spillThreshold is setting to 3GB. So, I try to keep the spill data in memory rather than write the file to disk immediately, and when those in-memory spill data reach the threshold, write all spill files together. In my case, the coprocessor process cost time drop from 22s to 16s, it is about 30% upgrade。 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KYLIN-2926) DumpMerger return incorrect results
[ https://issues.apache.org/jira/browse/KYLIN-2926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fengYu updated KYLIN-2926: -- Description: I our scenario, a cube query will get wrong result once coprocessor need to spill to disk, Our version is 2.0.0 and I find the root cause is that in DumpMerger.enqueueFromDump because in DataTypeSerializer kylin use a ThreadLocal variable ‘current’, It leading to different elements in dumpCurrentValues share the same object, so next fill up measure values will change the existing values. the incorrect measures is HLLC and raw, which use current variable in deserialize. was: I our scenario, a cube query will get wrong result once coprocessor need to spill to disk, Our version is 2.0.0 and I find the root cause is that in DumpMerger.enqueueFromDump because in DataTypeSerializer kylin use a ThreadLocal variable ‘current’, It leading to different elements in dumpCurrentValues share the same object, so next fill up measure values will change the existing values. the incorrect measures is HLLC. > DumpMerger return incorrect results > --- > > Key: KYLIN-2926 > URL: https://issues.apache.org/jira/browse/KYLIN-2926 > Project: Kylin > Issue Type: Bug >Affects Versions: v2.0.0 >Reporter: fengYu >Assignee: fengYu > > I our scenario, a cube query will get wrong result once coprocessor need to > spill to disk, Our version is 2.0.0 and I find the root cause is that in > DumpMerger.enqueueFromDump > because in DataTypeSerializer kylin use a ThreadLocal variable ‘current’, It > leading to different elements in dumpCurrentValues share the same object, so > next fill up measure values will change the existing values. > the incorrect measures is HLLC and raw, which use current variable in > deserialize. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KYLIN-2926) DumpMerger return incorrect results
[ https://issues.apache.org/jira/browse/KYLIN-2926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16199892#comment-16199892 ] fengYu commented on KYLIN-2926: --- for simply modify and test, I modify the function like this : private void enqueueFromDump(int index) { if (dumpIterators.get(index) != null && dumpIterators.get(index).hasNext()) { Pair pair = dumpIterators.get(index).next(); minHeap.offer(new Pair(pair.getKey(), index)); Object[] metricValues = new Object[metrics.trueBitCount()]; BufferedMeasureCodec codec= request.createMeasureCodec(); codec.decode(ByteBuffer.wrap(pair.getValue()), metricValues); dumpCurrentValues.set(index, metricValues); } } the result will be correct. > DumpMerger return incorrect results > --- > > Key: KYLIN-2926 > URL: https://issues.apache.org/jira/browse/KYLIN-2926 > Project: Kylin > Issue Type: Bug >Affects Versions: v2.0.0 >Reporter: fengYu >Assignee: fengYu > > I our scenario, a cube query will get wrong result once coprocessor need to > spill to disk, Our version is 2.0.0 and I find the root cause is that in > DumpMerger.enqueueFromDump > because in DataTypeSerializer kylin use a ThreadLocal variable ‘current’, It > leading to different elements in dumpCurrentValues share the same object, so > next fill up measure values will change the existing values. > the incorrect measures is HLLC. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KYLIN-2926) DumpMerger return incorrect results
[ https://issues.apache.org/jira/browse/KYLIN-2926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fengYu updated KYLIN-2926: -- Description: I our scenario, a cube query will get wrong result once coprocessor need to spill to disk, Our version is 2.0.0 and I find the root cause is that in DumpMerger.enqueueFromDump because in DataTypeSerializer kylin use a ThreadLocal variable ‘current’, It leading to different elements in dumpCurrentValues share the same object, so next fill up measure values will change the existing values. the incorrect measures is HLLC. was: I our scenario, a cube query will get wrong result once coprocessor need to spill to disk, Our version is 2.0.0 and I find the root cause is that in DumpMerger.enqueueFromDump because in DataTypeSerializer kylin use a ThreadLocal variable ‘current’, It leading to different elements in dumpCurrentValues share the same object, so next fill up measure values will change the existing values. > DumpMerger return incorrect results > --- > > Key: KYLIN-2926 > URL: https://issues.apache.org/jira/browse/KYLIN-2926 > Project: Kylin > Issue Type: Bug >Affects Versions: v2.0.0 >Reporter: fengYu >Assignee: fengYu > > I our scenario, a cube query will get wrong result once coprocessor need to > spill to disk, Our version is 2.0.0 and I find the root cause is that in > DumpMerger.enqueueFromDump > because in DataTypeSerializer kylin use a ThreadLocal variable ‘current’, It > leading to different elements in dumpCurrentValues share the same object, so > next fill up measure values will change the existing values. > the incorrect measures is HLLC. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KYLIN-2926) DumpMerger return incorrect results
fengYu created KYLIN-2926: - Summary: DumpMerger return incorrect results Key: KYLIN-2926 URL: https://issues.apache.org/jira/browse/KYLIN-2926 Project: Kylin Issue Type: Bug Affects Versions: v2.0.0 Reporter: fengYu Assignee: fengYu I our scenario, a cube query will get wrong result once coprocessor need to spill to disk, Our version is 2.0.0 and I find the root cause is that in DumpMerger.enqueueFromDump because in DataTypeSerializer kylin use a ThreadLocal variable ‘current’, It leading to different elements in dumpCurrentValues share the same object, so next fill up measure values will change the existing values. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KYLIN-2810) Kylin UDF support
fengYu created KYLIN-2810: - Summary: Kylin UDF support Key: KYLIN-2810 URL: https://issues.apache.org/jira/browse/KYLIN-2810 Project: Kylin Issue Type: Bug Reporter: fengYu Kylin do not support some function calcite do not support, May I contribute some UDF in kylin, In this way, some of our BI tools can use kylin everywhere. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KYLIN-2286) global snapshot table for one cube
[ https://issues.apache.org/jira/browse/KYLIN-2286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16139716#comment-16139716 ] fengYu commented on KYLIN-2286: --- What do you think about the feature, If it meet some demand, I can share our implements. Thanks a lot. > global snapshot table for one cube > --- > > Key: KYLIN-2286 > URL: https://issues.apache.org/jira/browse/KYLIN-2286 > Project: Kylin > Issue Type: Improvement >Reporter: fengYu >Assignee: fengYu > > I current version, Kylin build a snapshot table for a segment and isolate > with each other in the same cube, even though some segments share the same > snapshot table storage . > I some scene, we need global snapshot table for one cube, such as we has a > cube with snapshot table,ID is PK,the first day, the table look like: > id name > 1 A > 2 B > 3 C > the query 'select name, count(1) from fact join dimension group by name' get > result: > A xx > B xx > C xx > the next day(segment), lookup table modified, it looks like : > id name > 1 A > 2 D > 3 E > the same query return : > A xx > B xx > C xx > D xx > E xx > However B and D, C and E has the same ID, we need the newest result. so a > global snapshot table shared by all segments which has always the newest > values is needed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (KYLIN-1890) support hbase table prefix configurable
[ https://issues.apache.org/jira/browse/KYLIN-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fengYu closed KYLIN-1890. - Resolution: Won't Fix > support hbase table prefix configurable > --- > > Key: KYLIN-1890 > URL: https://issues.apache.org/jira/browse/KYLIN-1890 > Project: Kylin > Issue Type: Improvement > Components: General >Affects Versions: v1.5.2 >Reporter: fengYu >Assignee: fengYu > Attachments: > 0001-KYLIN-1890-support-hbase-table-prefix-configurable.patch > > > some times we need deploy two kylin env based on same hbase, I want to change > hbase table name prefix based two reasons: > 1、different kylin env will generate the same table name > 2、while clean invalid htable for one env will cause delete all tables belong > to another env. > different kylin env use different namespace is acceptable either. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (KYLIN-1172) kylin support multi-hive on different hadoop cluster
[ https://issues.apache.org/jira/browse/KYLIN-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fengYu closed KYLIN-1172. - Resolution: Won't Fix > kylin support multi-hive on different hadoop cluster > > > Key: KYLIN-1172 > URL: https://issues.apache.org/jira/browse/KYLIN-1172 > Project: Kylin > Issue Type: Improvement >Affects Versions: v1.0 >Reporter: fengYu >Assignee: fengYu > Attachments: 0001-kerberos.patch, > 0001-support-more-hives-depend-on-different-hadoop-add-co.patch, > 0002-hadoop-jar-files.patch, > 0003-git-common-package-part-patch-KYLIN-1172.patch, > 0004-git-cube-package-part-patch-KYLIN-1172.patch, > 0005-git-metadata-package-part-patch-KYLIN-1172.patch, > 0006-git-server-package-part-patch-KYLIN-1172.patch, > 0007-dictionary-package-part-patch-KYLIN-1172.patch, > 0008-job-package-part-patch-KYLIN-1172.patch > > > Hi, I recently modify kylin to support multi-hive on different hadoop > cluster and take them as input source to kylin, we do this since the > following reasons: > 1、we have more than one hadoop cluster and many hive depend on them(products > may has its own hive), we cannot migrate those hives to one and don't want to > deploy one kylin for every hive source. > 2、our hadoop cluster deploy in different DC, we need to support them in one > kylin instance. > 3、source data in hive is much less than hfile, so copy those files cross > different different is more efficient(fact distinct column job and base > cuboid job need take datas at hive as input), so we deploy hbase and hadoop > in one DC (separated in different HDFS). > So, we divide data flow into 3 parts, hive is input source, hadoop do > computing which will generate many temporary files, hbase is output. After > cube building, queries on kylin just interactive with hbase. therefore, what > we need to do is how to build cube base on differnet hives and hadoops. > Our method are summarized below : > 1、Deploy hive and hadoops, before start kylin, user should deploy all hives > and hadoop, and ensure you can run hive sql in ./hive. and access every HDFS > with 'hadoop fs 'command(add more nameservice in hdfs-site.xml). > 2、Divide hives into two part: the hive that used when kylin start(we call it > default one) and others are additional, we should allocate a name for every > hive (default one is null), For simplicity, we just add a config property > that tells root directory of all hive client, and every hive client is a > directory whose name is the hive name(default one do not need locate in). > 3、Attach only a hive to one project , so when creating a project, you should > specify a hive name, and according to it we can find the hive client(include > hive command and config files). > 4、when load table in one project, find the hive-site.xml and create a > HiveClient using this config file. > 5、can not take HCatInputFormat as inputFormat in FactDistinctColumnsJob, so > we change the job and take the intermediate hive table location as input file > and change FactDistinctColumnsMapper. HiveColumnCardinalityJob will fail if > we use additional hive. > 6、Because we need to run MR in one hadoop cluster and input or output located > at other HDFS, so when we set input location to real name node address > instead of name service(this is a config property too). > That is all we do, I think it can make things easy to manage more > than one hives and hadoops. we have apply it in our env and it works well. I > hope it can help other people... > patch uploaded, illustrations: > 1、add two config property, > 2、add hivename to projectInstance and make projectName in cube persistence in > hbase. > 3、create HiveClient with a hive-site.xml file or use default one that in > kylin classpath > 4、modify two hadoop job: FactDistinctColumnsJob and CuboidJob, take > Intermediate table name as input and change to table location in run() > 5、transform nameservice to master name node while access data located in > other hadoop if necessary. > the patch is based on 1.0-incubating and we add patchs KYLIN-1014、KYLIN-1021 > and KYLIN-957 in order .. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KYLIN-2363) Prune cuboids by capping number of dimensions
[ https://issues.apache.org/jira/browse/KYLIN-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15980702#comment-15980702 ] fengYu commented on KYLIN-2363: --- [~roger.shi] sorry for delay. I am waiting for the release of kylin 2.0, I want to add this feature beyond it, I think this week it will release and I will do this job. > Prune cuboids by capping number of dimensions > - > > Key: KYLIN-2363 > URL: https://issues.apache.org/jira/browse/KYLIN-2363 > Project: Kylin > Issue Type: Improvement >Reporter: fengYu > > the scene like this: > I have 20+ dimensions, However the query will only use at most 5 dimensions > in all dimensions, so cuboid that contains 5+ dimensions(except base cuboid) > is useless. > I think we can add a configuration in cube, which limit the max dimensions > that cuboid includes. > What's more, we can config which level(number of dimension) need to > calculate. in above scene, we only calculate leve 1,2,3,4,5. and skip level 5+ -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KYLIN-2363) support limit of dimensions in a cuboid
[ https://issues.apache.org/jira/browse/KYLIN-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15807081#comment-15807081 ] fengYu commented on KYLIN-2363: --- yes, set a range or enumerate all levels to be calculated is a more user-friendly solution. > support limit of dimensions in a cuboid > --- > > Key: KYLIN-2363 > URL: https://issues.apache.org/jira/browse/KYLIN-2363 > Project: Kylin > Issue Type: Improvement >Reporter: fengYu > > the scene like this: > I have 20+ dimensions, However the query will only use at most 5 dimensions > in all dimensions, so cuboid that contains 5+ dimensions(except base cuboid) > is useless. > I think we can add a configuration in cube, which limit the max dimensions > that cuboid includes. > What's more, we can config which level(number of dimension) need to > calculate. in above scene, we only calculate leve 1,2,3,4,5. and skip level 5+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-2363) support limit of dimensions in a cuboid
[ https://issues.apache.org/jira/browse/KYLIN-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15806993#comment-15806993 ] fengYu commented on KYLIN-2363: --- Yes, that is what we want to do. I will working on it later. > support limit of dimensions in a cuboid > --- > > Key: KYLIN-2363 > URL: https://issues.apache.org/jira/browse/KYLIN-2363 > Project: Kylin > Issue Type: Improvement >Reporter: fengYu > > the scene like this: > I have 20+ dimensions, However the query will only use at most 5 dimensions > in all dimensions, so cuboid that contains 5+ dimensions(except base cuboid) > is useless. > I think we can add a configuration in cube, which limit the max dimensions > that cuboid includes. > What's more, we can config which level(number of dimension) need to > calculate. in above scene, we only calculate leve 1,2,3,4,5. and skip level 5+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-2363) support limit of dimensions in a cuboid
[ https://issues.apache.org/jira/browse/KYLIN-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15804025#comment-15804025 ] fengYu commented on KYLIN-2363: --- I know the configuration and what it means, In my scene, for example, I have 6 dimensions : A/B/C/D/E/F, I need at most 2 dimensions(in where and group by) in any query, so, I need calculate cuboids like AB/AC/AD/AE/AF/... and A/B/C/D/E/F and skip ABC/ABD/ACD/... and ABCD/ABCE... so I need a configuration specify the max dimensions that cuboid contains which need to be calculated. > support limit of dimensions in a cuboid > --- > > Key: KYLIN-2363 > URL: https://issues.apache.org/jira/browse/KYLIN-2363 > Project: Kylin > Issue Type: Improvement >Reporter: fengYu > > the scene like this: > I have 20+ dimensions, However the query will only use at most 5 dimensions > in all dimensions, so cuboid that contains 5+ dimensions(except base cuboid) > is useless. > I think we can add a configuration in cube, which limit the max dimensions > that cuboid includes. > What's more, we can config which level(number of dimension) need to > calculate. in above scene, we only calculate leve 1,2,3,4,5. and skip level 5+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-2363) support limit of dimensions in a cuboid
fengYu created KYLIN-2363: - Summary: support limit of dimensions in a cuboid Key: KYLIN-2363 URL: https://issues.apache.org/jira/browse/KYLIN-2363 Project: Kylin Issue Type: Improvement Reporter: fengYu the scene like this: I have 20+ dimensions, However the query will only use at most 5 dimensions in all dimensions, so cuboid that contains 5+ dimensions(except base cuboid) is useless. I think we can add a configuration in cube, which limit the max dimensions that cuboid includes. What's more, we can config which level(number of dimension) need to calculate. in above scene, we only calculate leve 1,2,3,4,5. and skip level 5+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-2286) global snapshot table for one cube
[ https://issues.apache.org/jira/browse/KYLIN-2286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15760291#comment-15760291 ] fengYu commented on KYLIN-2286: --- I agree the first one, however, if I always use the last one while dimension is derived and once lookup table decrease, we can not fetch the result about the decreased PK. But we need keep the PK with last appearance. a merge operation is need when build new lookup table in kylin. > global snapshot table for one cube > --- > > Key: KYLIN-2286 > URL: https://issues.apache.org/jira/browse/KYLIN-2286 > Project: Kylin > Issue Type: Improvement >Reporter: fengYu >Assignee: fengYu > > I current version, Kylin build a snapshot table for a segment and isolate > with each other in the same cube, even though some segments share the same > snapshot table storage . > I some scene, we need global snapshot table for one cube, such as we has a > cube with snapshot table,ID is PK,the first day, the table look like: > id name > 1 A > 2 B > 3 C > the query 'select name, count(1) from fact join dimension group by name' get > result: > A xx > B xx > C xx > the next day(segment), lookup table modified, it looks like : > id name > 1 A > 2 D > 3 E > the same query return : > A xx > B xx > C xx > D xx > E xx > However B and D, C and E has the same ID, we need the newest result. so a > global snapshot table shared by all segments which has always the newest > values is needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-2286) global snapshot table for one cube
[ https://issues.apache.org/jira/browse/KYLIN-2286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15759986#comment-15759986 ] fengYu commented on KYLIN-2286: --- I don't think it is friendly for user that refresh all segments lookup table when a cube segment created, our solve is add a cube level property whether to enable it, if enable we use snapshots attribute in cubeInstance(add it) instead of cubeSegment, and every time when build dictionary, it check the new snapshot from hive table and old snapshot from hbase. we merge it by PK to ensure the snapshot table is increasing. and take the merged lookup table as snapshot table input to rebuild the new one. Maybe you can help us to review the process, thanks a lot. > global snapshot table for one cube > --- > > Key: KYLIN-2286 > URL: https://issues.apache.org/jira/browse/KYLIN-2286 > Project: Kylin > Issue Type: Improvement >Reporter: fengYu >Assignee: fengYu > > I current version, Kylin build a snapshot table for a segment and isolate > with each other in the same cube, even though some segments share the same > snapshot table storage . > I some scene, we need global snapshot table for one cube, such as we has a > cube with snapshot table,ID is PK,the first day, the table look like: > id name > 1 A > 2 B > 3 C > the query 'select name, count(1) from fact join dimension group by name' get > result: > A xx > B xx > C xx > the next day(segment), lookup table modified, it looks like : > id name > 1 A > 2 D > 3 E > the same query return : > A xx > B xx > C xx > D xx > E xx > However B and D, C and E has the same ID, we need the newest result. so a > global snapshot table shared by all segments which has always the newest > values is needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-2286) global snapshot table for one cube
[ https://issues.apache.org/jira/browse/KYLIN-2286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15753987#comment-15753987 ] fengYu commented on KYLIN-2286: --- Yes, If you want to keep the dimension value be the latest modified one, you need define the dimension derived, otherwise, you can keep the dimension normal. in our usage, we add the normal dimension to fact table and leave derived dimensions in lookup table by views. which can make the smallest lookup table. > global snapshot table for one cube > --- > > Key: KYLIN-2286 > URL: https://issues.apache.org/jira/browse/KYLIN-2286 > Project: Kylin > Issue Type: Improvement >Reporter: fengYu >Assignee: fengYu > > I current version, Kylin build a snapshot table for a segment and isolate > with each other in the same cube, even though some segments share the same > snapshot table storage . > I some scene, we need global snapshot table for one cube, such as we has a > cube with snapshot table,ID is PK,the first day, the table look like: > id name > 1 A > 2 B > 3 C > the query 'select name, count(1) from fact join dimension group by name' get > result: > A xx > B xx > C xx > the next day(segment), lookup table modified, it looks like : > id name > 1 A > 2 D > 3 E > the same query return : > A xx > B xx > C xx > D xx > E xx > However B and D, C and E has the same ID, we need the newest result. so a > global snapshot table shared by all segments which has always the newest > values is needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-2286) global snapshot table for one cube
fengYu created KYLIN-2286: - Summary: global snapshot table for one cube Key: KYLIN-2286 URL: https://issues.apache.org/jira/browse/KYLIN-2286 Project: Kylin Issue Type: Improvement Reporter: fengYu Assignee: fengYu I current version, Kylin build a snapshot table for a segment and isolate with each other in the same cube, even though some segments share the same snapshot table storage . I some scene, we need global snapshot table for one cube, such as we has a cube with snapshot table,ID is PK,the first day, the table look like: id name 1 A 2 B 3 C the query 'select name, count(1) from fact join dimension group by name' get result: A xx B xx C xx the next day(segment), lookup table modified, it looks like : id name 1 A 2 D 3 E the same query return : A xx B xx C xx D xx E xx However B and D, C and E has the same ID, we need the newest result. so a global snapshot table shared by all segments which has always the newest values is needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1826) kylin support more than one hive based on different hadoop claster
[ https://issues.apache.org/jira/browse/KYLIN-1826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15704670#comment-15704670 ] fengYu commented on KYLIN-1826: --- Thanks for point out those improvements, I will work on this, I doubt that whether can I have different config at different project now, if not I have to do this job, According to my understanding, you want to make hive.home as a config which also is a kind of hive source rather than another source named external hives. > kylin support more than one hive based on different hadoop claster > -- > > Key: KYLIN-1826 > URL: https://issues.apache.org/jira/browse/KYLIN-1826 > Project: Kylin > Issue Type: Improvement > Components: Environment >Affects Versions: v1.5.2 >Reporter: fengYu >Assignee: fengYu > Attachments: > 0001-KYLIN-1826-add-external-hive-interface-project-table.patch, > 0002-KYLIN-1826-add-and-modify-cube-source-job-for-extern.patch, > 0003-KYLIN-1826-unify-hive-concept-forbid-modify-hive-nam.patch > > > Currently, kylin only support one hive which should run by 'hive' command, > However, when source data located in more than one hive we should deploy more > kylin instance and more than one metastore. which is difficult to manager and > may cause some conflict. > I has been working on it Recently, In our cluster, there are some hive > client(different metastore) which based on different hadoop cluster, I add a > new hive source type which called 'external hive' in kylin 1.5.x > Thanks to kylin Plug-in architecture in 2.x, which make this work easiler. > the main modification are: > 1. add hive root directory in hive config file, external hive client exist in > this directory. hive named by directory name. > 2. add hive-site.xml file while loading hive tables. > 3. store hive name into project, one project can only take one hive as source. > 4. change and add some job to support job building. > I will upload my patch if I finish all my tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KYLIN-2064) add non-runtime-aggregation measure support
[ https://issues.apache.org/jira/browse/KYLIN-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fengYu updated KYLIN-2064: -- Attachment: 0001-KYLIN-2064-add-non-runtime-aggregation-measure-for-d.patch format my patch, I have implements it with little changes on original codes. and add two kinds of measure which called nhllc and nbitmap. If a query need do runtime aggregation , the query will error with an exception. > add non-runtime-aggregation measure support > --- > > Key: KYLIN-2064 > URL: https://issues.apache.org/jira/browse/KYLIN-2064 > Project: Kylin > Issue Type: Improvement >Affects Versions: v1.5.2 >Reporter: fengYu >Assignee: fengYu > Attachments: > 0001-KYLIN-2064-add-non-runtime-aggregation-measure-for-d.patch > > > Kylin is based on pre-computation and store result in hbase. however, It > support runtime aggregation to satisfy the query which can not match computed > data. > But the runtime aggregation slow down the query and need more storage space > in hbase(which will slow down scan speed), such as distinct count/ topn > measures . > So we can use more pre-compution and less runtime aggregation, in some > scenario we do not need results cross different partition(date), we add a > kind of measure which only support result for count distinct, it will speed > up query and need less storage. > what's more, If a query on this measure which is not computed it will return > exception. > I will arrange our solution in my patch later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KYLIN-1826) kylin support more than one hive based on different hadoop claster
[ https://issues.apache.org/jira/browse/KYLIN-1826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fengYu updated KYLIN-1826: -- Attachment: 0003-KYLIN-1826-unify-hive-concept-forbid-modify-hive-nam.patch unify hive name concept,and forbid changing hive name when existing some cube in old project. > kylin support more than one hive based on different hadoop claster > -- > > Key: KYLIN-1826 > URL: https://issues.apache.org/jira/browse/KYLIN-1826 > Project: Kylin > Issue Type: Improvement > Components: Environment >Affects Versions: v1.5.2 >Reporter: fengYu >Assignee: fengYu > Attachments: > 0001-KYLIN-1826-add-external-hive-interface-project-table.patch, > 0002-KYLIN-1826-add-and-modify-cube-source-job-for-extern.patch, > 0003-KYLIN-1826-unify-hive-concept-forbid-modify-hive-nam.patch > > > Currently, kylin only support one hive which should run by 'hive' command, > However, when source data located in more than one hive we should deploy more > kylin instance and more than one metastore. which is difficult to manager and > may cause some conflict. > I has been working on it Recently, In our cluster, there are some hive > client(different metastore) which based on different hadoop cluster, I add a > new hive source type which called 'external hive' in kylin 1.5.x > Thanks to kylin Plug-in architecture in 2.x, which make this work easiler. > the main modification are: > 1. add hive root directory in hive config file, external hive client exist in > this directory. hive named by directory name. > 2. add hive-site.xml file while loading hive tables. > 3. store hive name into project, one project can only take one hive as source. > 4. change and add some job to support job building. > I will upload my patch if I finish all my tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1826) kylin support more than one hive based on different hadoop claster
[ https://issues.apache.org/jira/browse/KYLIN-1826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15624887#comment-15624887 ] fengYu commented on KYLIN-1826: --- Thanks for review and reply. 1、I will modify the job name and unify the name. 2、In early version, I do not pay attention to ID_STREAMING source, I will learn this and try to make it incompatible. 3、I have pass 'mvn test' in my cluster, I have to check the test code and rerun it. 4、"hive" in TableDesc is comes from ProjectInstance, it make something easy to store it in TableDesc, to keep inconsistency, we can take "hive" in ProjectInstance unchangeable, which it is meaningless. because we can just modify hive name in local filesystem rather than modify metadata. 5、currently, we still use cli for all hive source which is stable, We will consider and modify it to beeline if necessary. > kylin support more than one hive based on different hadoop claster > -- > > Key: KYLIN-1826 > URL: https://issues.apache.org/jira/browse/KYLIN-1826 > Project: Kylin > Issue Type: Improvement > Components: Environment >Affects Versions: v1.5.2 >Reporter: fengYu >Assignee: fengYu > Attachments: > 0001-KYLIN-1826-add-external-hive-interface-project-table.patch, > 0002-KYLIN-1826-add-and-modify-cube-source-job-for-extern.patch > > > Currently, kylin only support one hive which should run by 'hive' command, > However, when source data located in more than one hive we should deploy more > kylin instance and more than one metastore. which is difficult to manager and > may cause some conflict. > I has been working on it Recently, In our cluster, there are some hive > client(different metastore) which based on different hadoop cluster, I add a > new hive source type which called 'external hive' in kylin 1.5.x > Thanks to kylin Plug-in architecture in 2.x, which make this work easiler. > the main modification are: > 1. add hive root directory in hive config file, external hive client exist in > this directory. hive named by directory name. > 2. add hive-site.xml file while loading hive tables. > 3. store hive name into project, one project can only take one hive as source. > 4. change and add some job to support job building. > I will upload my patch if I finish all my tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1826) kylin support more than one hive based on different hadoop claster
[ https://issues.apache.org/jira/browse/KYLIN-1826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15624888#comment-15624888 ] fengYu commented on KYLIN-1826: --- Thanks for review and reply. 1、I will modify the job name and unify the name. 2、In early version, I do not pay attention to ID_STREAMING source, I will learn this and try to make it incompatible. 3、I have pass 'mvn test' in my cluster, I have to check the test code and rerun it. 4、"hive" in TableDesc is comes from ProjectInstance, it make something easy to store it in TableDesc, to keep inconsistency, we can take "hive" in ProjectInstance unchangeable, which it is meaningless. because we can just modify hive name in local filesystem rather than modify metadata. 5、currently, we still use cli for all hive source which is stable, We will consider and modify it to beeline if necessary. > kylin support more than one hive based on different hadoop claster > -- > > Key: KYLIN-1826 > URL: https://issues.apache.org/jira/browse/KYLIN-1826 > Project: Kylin > Issue Type: Improvement > Components: Environment >Affects Versions: v1.5.2 >Reporter: fengYu >Assignee: fengYu > Attachments: > 0001-KYLIN-1826-add-external-hive-interface-project-table.patch, > 0002-KYLIN-1826-add-and-modify-cube-source-job-for-extern.patch > > > Currently, kylin only support one hive which should run by 'hive' command, > However, when source data located in more than one hive we should deploy more > kylin instance and more than one metastore. which is difficult to manager and > may cause some conflict. > I has been working on it Recently, In our cluster, there are some hive > client(different metastore) which based on different hadoop cluster, I add a > new hive source type which called 'external hive' in kylin 1.5.x > Thanks to kylin Plug-in architecture in 2.x, which make this work easiler. > the main modification are: > 1. add hive root directory in hive config file, external hive client exist in > this directory. hive named by directory name. > 2. add hive-site.xml file while loading hive tables. > 3. store hive name into project, one project can only take one hive as source. > 4. change and add some job to support job building. > I will upload my patch if I finish all my tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (KYLIN-1826) kylin support more than one hive based on different hadoop claster
[ https://issues.apache.org/jira/browse/KYLIN-1826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fengYu updated KYLIN-1826: -- Comment: was deleted (was: Thanks for review and reply. 1、I will modify the job name and unify the name. 2、In early version, I do not pay attention to ID_STREAMING source, I will learn this and try to make it incompatible. 3、I have pass 'mvn test' in my cluster, I have to check the test code and rerun it. 4、"hive" in TableDesc is comes from ProjectInstance, it make something easy to store it in TableDesc, to keep inconsistency, we can take "hive" in ProjectInstance unchangeable, which it is meaningless. because we can just modify hive name in local filesystem rather than modify metadata. 5、currently, we still use cli for all hive source which is stable, We will consider and modify it to beeline if necessary. ) > kylin support more than one hive based on different hadoop claster > -- > > Key: KYLIN-1826 > URL: https://issues.apache.org/jira/browse/KYLIN-1826 > Project: Kylin > Issue Type: Improvement > Components: Environment >Affects Versions: v1.5.2 >Reporter: fengYu >Assignee: fengYu > Attachments: > 0001-KYLIN-1826-add-external-hive-interface-project-table.patch, > 0002-KYLIN-1826-add-and-modify-cube-source-job-for-extern.patch > > > Currently, kylin only support one hive which should run by 'hive' command, > However, when source data located in more than one hive we should deploy more > kylin instance and more than one metastore. which is difficult to manager and > may cause some conflict. > I has been working on it Recently, In our cluster, there are some hive > client(different metastore) which based on different hadoop cluster, I add a > new hive source type which called 'external hive' in kylin 1.5.x > Thanks to kylin Plug-in architecture in 2.x, which make this work easiler. > the main modification are: > 1. add hive root directory in hive config file, external hive client exist in > this directory. hive named by directory name. > 2. add hive-site.xml file while loading hive tables. > 3. store hive name into project, one project can only take one hive as source. > 4. change and add some job to support job building. > I will upload my patch if I finish all my tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KYLIN-1826) kylin support more than one hive based on different hadoop claster
[ https://issues.apache.org/jira/browse/KYLIN-1826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15615336#comment-15615336 ] fengYu edited comment on KYLIN-1826 at 10/28/16 1:05 PM: - Hi, I have remove the threadlocal variable and cut the patch to 2 parts, one is about controller such as create project, load table. another is about add or modify some source jobs. We have test this in our production environment for more than 4 months. so I think it is steady. the patch is base on kylin 1.5.4.1 release, wish it can be review and merge as soon as possible. Thanks a lot. was (Author: feng_xiao_yu): Hi, I have remove the threadlocal virable and cut the patch to 2 parts, one is about controller such as create project, load table. another is about add or modify some source jobs. We have test this in our production environment for more than 4 months. so I think it is steady. the patch is base on kylin 1.5.4.1 release, wish it can be review and merge as soon as possible. Thanks a lot. > kylin support more than one hive based on different hadoop claster > -- > > Key: KYLIN-1826 > URL: https://issues.apache.org/jira/browse/KYLIN-1826 > Project: Kylin > Issue Type: Improvement > Components: Environment >Affects Versions: v1.5.2 >Reporter: fengYu >Assignee: fengYu > Attachments: > 0001-KYLIN-1826-add-external-hive-interface-project-table.patch, > 0002-KYLIN-1826-add-and-modify-cube-source-job-for-extern.patch > > > Currently, kylin only support one hive which should run by 'hive' command, > However, when source data located in more than one hive we should deploy more > kylin instance and more than one metastore. which is difficult to manager and > may cause some conflict. > I has been working on it Recently, In our cluster, there are some hive > client(different metastore) which based on different hadoop cluster, I add a > new hive source type which called 'external hive' in kylin 1.5.x > Thanks to kylin Plug-in architecture in 2.x, which make this work easiler. > the main modification are: > 1. add hive root directory in hive config file, external hive client exist in > this directory. hive named by directory name. > 2. add hive-site.xml file while loading hive tables. > 3. store hive name into project, one project can only take one hive as source. > 4. change and add some job to support job building. > I will upload my patch if I finish all my tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KYLIN-1826) kylin support more than one hive based on different hadoop claster
[ https://issues.apache.org/jira/browse/KYLIN-1826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fengYu updated KYLIN-1826: -- Attachment: 0002-KYLIN-1826-add-and-modify-cube-source-job-for-extern.patch 0001-KYLIN-1826-add-external-hive-interface-project-table.patch Hi, I have remove the threadlocal virable and cut the patch to 2 parts, one is about controller such as create project, load table. another is about add or modify some source jobs. We have test this in our production environment for more than 4 months. so I think it is steady. the patch is base on kylin 1.5.4.1 release, wish it can be review and merge as soon as possible. Thanks a lot. > kylin support more than one hive based on different hadoop claster > -- > > Key: KYLIN-1826 > URL: https://issues.apache.org/jira/browse/KYLIN-1826 > Project: Kylin > Issue Type: Improvement > Components: Environment >Affects Versions: v1.5.2 >Reporter: fengYu >Assignee: fengYu > Attachments: > 0001-KYLIN-1826-add-external-hive-interface-project-table.patch, > 0002-KYLIN-1826-add-and-modify-cube-source-job-for-extern.patch > > > Currently, kylin only support one hive which should run by 'hive' command, > However, when source data located in more than one hive we should deploy more > kylin instance and more than one metastore. which is difficult to manager and > may cause some conflict. > I has been working on it Recently, In our cluster, there are some hive > client(different metastore) which based on different hadoop cluster, I add a > new hive source type which called 'external hive' in kylin 1.5.x > Thanks to kylin Plug-in architecture in 2.x, which make this work easiler. > the main modification are: > 1. add hive root directory in hive config file, external hive client exist in > this directory. hive named by directory name. > 2. add hive-site.xml file while loading hive tables. > 3. store hive name into project, one project can only take one hive as source. > 4. change and add some job to support job building. > I will upload my patch if I finish all my tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KYLIN-1826) kylin support more than one hive based on different hadoop claster
[ https://issues.apache.org/jira/browse/KYLIN-1826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fengYu updated KYLIN-1826: -- Attachment: (was: 0001-KYLIN-1826-support-more-hive-based-on-different-hado.patch) > kylin support more than one hive based on different hadoop claster > -- > > Key: KYLIN-1826 > URL: https://issues.apache.org/jira/browse/KYLIN-1826 > Project: Kylin > Issue Type: Improvement > Components: Environment >Affects Versions: v1.5.2 >Reporter: fengYu >Assignee: fengYu > > Currently, kylin only support one hive which should run by 'hive' command, > However, when source data located in more than one hive we should deploy more > kylin instance and more than one metastore. which is difficult to manager and > may cause some conflict. > I has been working on it Recently, In our cluster, there are some hive > client(different metastore) which based on different hadoop cluster, I add a > new hive source type which called 'external hive' in kylin 1.5.x > Thanks to kylin Plug-in architecture in 2.x, which make this work easiler. > the main modification are: > 1. add hive root directory in hive config file, external hive client exist in > this directory. hive named by directory name. > 2. add hive-site.xml file while loading hive tables. > 3. store hive name into project, one project can only take one hive as source. > 4. change and add some job to support job building. > I will upload my patch if I finish all my tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KYLIN-1839) improvement set classpath before submitting mr job
[ https://issues.apache.org/jira/browse/KYLIN-1839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fengYu updated KYLIN-1839: -- Attachment: 0001-KYLIN-1839-modify-kylin-config-for-extra-lib.patch add kylin.properties illustration for this configuration. > improvement set classpath before submitting mr job > -- > > Key: KYLIN-1839 > URL: https://issues.apache.org/jira/browse/KYLIN-1839 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Affects Versions: v1.5.2 >Reporter: fengYu >Assignee: fengYu > Fix For: v1.6.0 > > Attachments: 0001-KYLIN-1839-modify-kylin-config-for-extra-lib.patch, > 0001-KYLIN-1839-support-kylin-lib-in-HDFS.patch > > > in setClasspath, kylin will alway find hive jars from hive dependency using > regex, however, this will not change in one process lifetime, so I cache the > location of tmpjars and tmpfiles. > What is more, support extends user lib setting to hdfs path rather than only > support local filesystem which will cause upload jars every time if > DistributedCache do not exist. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1839) improvement set classpath before submitting mr job
[ https://issues.apache.org/jira/browse/KYLIN-1839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587499#comment-15587499 ] fengYu commented on KYLIN-1839: --- glad to do it, submit it latter. > improvement set classpath before submitting mr job > -- > > Key: KYLIN-1839 > URL: https://issues.apache.org/jira/browse/KYLIN-1839 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Affects Versions: v1.5.2 >Reporter: fengYu >Assignee: fengYu > Fix For: v1.6.0 > > Attachments: 0001-KYLIN-1839-support-kylin-lib-in-HDFS.patch > > > in setClasspath, kylin will alway find hive jars from hive dependency using > regex, however, this will not change in one process lifetime, so I cache the > location of tmpjars and tmpfiles. > What is more, support extends user lib setting to hdfs path rather than only > support local filesystem which will cause upload jars every time if > DistributedCache do not exist. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1839) improvement set classpath before submitting mr job
[ https://issues.apache.org/jira/browse/KYLIN-1839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587392#comment-15587392 ] fengYu commented on KYLIN-1839: --- This is a small change, it can be illustrated in kylin.properties. > improvement set classpath before submitting mr job > -- > > Key: KYLIN-1839 > URL: https://issues.apache.org/jira/browse/KYLIN-1839 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Affects Versions: v1.5.2 >Reporter: fengYu >Assignee: fengYu > Fix For: v1.6.0 > > Attachments: 0001-KYLIN-1839-support-kylin-lib-in-HDFS.patch > > > in setClasspath, kylin will alway find hive jars from hive dependency using > regex, however, this will not change in one process lifetime, so I cache the > location of tmpjars and tmpfiles. > What is more, support extends user lib setting to hdfs path rather than only > support local filesystem which will cause upload jars every time if > DistributedCache do not exist. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (KYLIN-1888) support backward cube building
[ https://issues.apache.org/jira/browse/KYLIN-1888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fengYu closed KYLIN-1888. - Resolution: Fixed I find this feature has been added in 1.5.4, close it. > support backward cube building > -- > > Key: KYLIN-1888 > URL: https://issues.apache.org/jira/browse/KYLIN-1888 > Project: Kylin > Issue Type: Improvement >Affects Versions: v1.5.2 >Reporter: fengYu >Assignee: fengYu > Attachments: 0001-forward-build-job.patch > > > This is used when user want to see data from last some days, and then fill up > history data from cube start date. > FORWARD just like reverse side of BUILD -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KYLIN-1839) improvement set classpath before submitting mr job
[ https://issues.apache.org/jira/browse/KYLIN-1839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fengYu updated KYLIN-1839: -- Attachment: (was: 0001-KYLIN-1839-support-extend-lib-from-HDFS-and-cache-tm.patch) > improvement set classpath before submitting mr job > -- > > Key: KYLIN-1839 > URL: https://issues.apache.org/jira/browse/KYLIN-1839 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Affects Versions: v1.5.2 >Reporter: fengYu >Assignee: fengYu > Attachments: 0001-KYLIN-1839-support-kylin-lib-in-HDFS.patch > > > in setClasspath, kylin will alway find hive jars from hive dependency using > regex, however, this will not change in one process lifetime, so I cache the > location of tmpjars and tmpfiles. > What is more, support extends user lib setting to hdfs path rather than only > support local filesystem which will cause upload jars every time if > DistributedCache do not exist. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KYLIN-1839) improvement set classpath before submitting mr job
[ https://issues.apache.org/jira/browse/KYLIN-1839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fengYu updated KYLIN-1839: -- Attachment: 0001-KYLIN-1839-support-kylin-lib-in-HDFS.patch upload new patch which just support HDFS path for mr lib. > improvement set classpath before submitting mr job > -- > > Key: KYLIN-1839 > URL: https://issues.apache.org/jira/browse/KYLIN-1839 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Affects Versions: v1.5.2 >Reporter: fengYu >Assignee: fengYu > Attachments: 0001-KYLIN-1839-support-kylin-lib-in-HDFS.patch > > > in setClasspath, kylin will alway find hive jars from hive dependency using > regex, however, this will not change in one process lifetime, so I cache the > location of tmpjars and tmpfiles. > What is more, support extends user lib setting to hdfs path rather than only > support local filesystem which will cause upload jars every time if > DistributedCache do not exist. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-2065) kylin result cache improvement
fengYu created KYLIN-2065: - Summary: kylin result cache improvement Key: KYLIN-2065 URL: https://issues.apache.org/jira/browse/KYLIN-2065 Project: Kylin Issue Type: Improvement Reporter: fengYu Assignee: fengYu data stored in kylin(hbase) is rarely modified, except some one refresh a segment, I think a result cache is useful, kylin support in-memory ehcache, but it is limited and can not share between query nodes, we develop a cache solution which store in hbase and expire based on cube last-modify-time or segment last-modify-time. I will arrange the patch on 1.5.4 which will upload later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-2064) add non-runtime-aggregation measure support
fengYu created KYLIN-2064: - Summary: add non-runtime-aggregation measure support Key: KYLIN-2064 URL: https://issues.apache.org/jira/browse/KYLIN-2064 Project: Kylin Issue Type: Improvement Affects Versions: v1.5.2 Reporter: fengYu Assignee: fengYu Kylin is based on pre-computation and store result in hbase. however, It support runtime aggregation to satisfy the query which can not match computed data. But the runtime aggregation slow down the query and need more storage space in hbase(which will slow down scan speed), such as distinct count/ topn measures . So we can use more pre-compution and less runtime aggregation, in some scenario we do not need results cross different partition(date), we add a kind of measure which only support result for count distinct, it will speed up query and need less storage. what's more, If a query on this measure which is not computed it will return exception. I will arrange our solution in my patch later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1833) union operation will cause error result
[ https://issues.apache.org/jira/browse/KYLIN-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15509668#comment-15509668 ] fengYu commented on KYLIN-1833: --- I am working for this problem currently for kylin-2.x > union operation will cause error result > --- > > Key: KYLIN-1833 > URL: https://issues.apache.org/jira/browse/KYLIN-1833 > Project: Kylin > Issue Type: Improvement > Components: Query Engine >Affects Versions: v1.3.0, v1.5.2 >Reporter: fengYu >Assignee: fengYu > Attachments: 0001-KYLIN-1833-union-query-get-error-result.patch > > > query like this will get error result : > select * from ( > select 'b', count(1) from kylin_sales where lstg_format_name >= 'Auction' > union all > select 'a', count(1) from kylin_sales where lstg_format_name >= 'Others' > union all > select 'c', count(1) from kylin_sales where lstg_format_name >= 'FP-GTC' > ) order by 1 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1839) improvement set classpath before submitting mr job
[ https://issues.apache.org/jira/browse/KYLIN-1839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15509663#comment-15509663 ] fengYu commented on KYLIN-1839: --- I think the patch is useable. I think tmpjars and tmpfiles which used for MR job is process level, So I cache them all. > improvement set classpath before submitting mr job > -- > > Key: KYLIN-1839 > URL: https://issues.apache.org/jira/browse/KYLIN-1839 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Affects Versions: v1.5.2 >Reporter: fengYu >Assignee: fengYu > Attachments: > 0001-KYLIN-1839-support-extend-lib-from-HDFS-and-cache-tm.patch > > > in setClasspath, kylin will alway find hive jars from hive dependency using > regex, however, this will not change in one process lifetime, so I cache the > location of tmpjars and tmpfiles. > What is more, support extends user lib setting to hdfs path rather than only > support local filesystem which will cause upload jars every time if > DistributedCache do not exist. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1014) Support kerberos authentication while getting status from RM
[ https://issues.apache.org/jira/browse/KYLIN-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15489015#comment-15489015 ] fengYu commented on KYLIN-1014: --- It works at 1.5.2.1 which I am using now, I think it will works in 1.5.2+. > Support kerberos authentication while getting status from RM > > > Key: KYLIN-1014 > URL: https://issues.apache.org/jira/browse/KYLIN-1014 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Affects Versions: v1.0, v0.7.2, v0.7.1 >Reporter: fengYu >Assignee: fengYu > Fix For: v1.4.0, v1.3.0 > > Attachments: 0001-KYLIN-1014-error-while-retry-rm-master.patch, > 0001-hadoop-status-checker-support-rm-with-kerberos.patch, > patch-for-2.0-rc.patch > > > I have used kylin-0.7.2 build cube and do some query, and I am trying > kylin-1.0 in another hadoop cluster. I get this problem below in kylin-0.7.2 > and kylin-1.0 : > Our hadoop cluster deals with authentication with kerberos, However, We find > after submit a mapreduce job(the second step in building cube), kylin will > send a http request to RM server and get the job status at regular intervals, > But we always get errors here because kylin do nothing about kerberos. > Finally , we do some change on source code and make it support kerberos > authentication . attachment is my patch file.. > I add a property named "kylin.job.status.with.kerberos" which means if we > need do authentication with kerberos when getting status from RM, the default > value is false. > It will be highly appreciated if you have some good idea or some suggestion. > Thanks... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1173) Can not load hive table after modify table metadata
[ https://issues.apache.org/jira/browse/KYLIN-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15482757#comment-15482757 ] fengYu commented on KYLIN-1173: --- Hi, as it is diffcult to detect modifying cloumn in hive cli by kylin server, So when user need to modify the column name, we recommend that add column rather than modify the name.(which is the advantage of hive view compare to hive table) Therefore I think this jira is a caution for kylin user, I don't think it is necessary to solve it in kylin. > Can not load hive table after modify table metadata > --- > > Key: KYLIN-1173 > URL: https://issues.apache.org/jira/browse/KYLIN-1173 > Project: Kylin > Issue Type: Bug >Affects Versions: v1.0 >Reporter: fengYu > > Hi all: > > when I want to change the column in hive source table and reload table in > kylin, I can not see any column in the table after reload, I restart kylin > server and reload the table , the column(name is modified) appeares > I write a test program like this(kylin do the same thing while reloading > table) : > HiveClient client = new HiveClient(); > List fields = client.getHiveTableFields(database, table); > \\waiting here and modify table column name > fields = client.getHiveTableFields(database, table); > client.getHiveTableFields return all columns in the table at the first time, > and after I modify one column and recall client.getHiveTableFields function, > it return am empty list. It will return the same list if I do not change the > column name in the middle. > I doubt maybe something error in hive metastore, any help will be > appreciate... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1826) kylin support more than one hive based on different hadoop claster
[ https://issues.apache.org/jira/browse/KYLIN-1826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15427879#comment-15427879 ] fengYu commented on KYLIN-1826: --- Thanks for your reply, I will remove LocalThreadProject and pass hive parameter in function while it is nessesary, Thank you for your advice。 > kylin support more than one hive based on different hadoop claster > -- > > Key: KYLIN-1826 > URL: https://issues.apache.org/jira/browse/KYLIN-1826 > Project: Kylin > Issue Type: Improvement > Components: Environment >Affects Versions: v1.5.2 >Reporter: fengYu >Assignee: fengYu > Attachments: > 0001-KYLIN-1826-support-more-hive-based-on-different-hado.patch > > > Currently, kylin only support one hive which should run by 'hive' command, > However, when source data located in more than one hive we should deploy more > kylin instance and more than one metastore. which is difficult to manager and > may cause some conflict. > I has been working on it Recently, In our cluster, there are some hive > client(different metastore) which based on different hadoop cluster, I add a > new hive source type which called 'external hive' in kylin 1.5.x > Thanks to kylin Plug-in architecture in 2.x, which make this work easiler. > the main modification are: > 1. add hive root directory in hive config file, external hive client exist in > this directory. hive named by directory name. > 2. add hive-site.xml file while loading hive tables. > 3. store hive name into project, one project can only take one hive as source. > 4. change and add some job to support job building. > I will upload my patch if I finish all my tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (KYLIN-1826) kylin support more than one hive based on different hadoop claster
[ https://issues.apache.org/jira/browse/KYLIN-1826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fengYu updated KYLIN-1826: -- Comment: was deleted (was: Sorry for delay, first of all, the hive in ProjectInstance is used in load hive table, then store the hive to TableDesc, the reason I use LocalThreadProject is I want do less code change, In my solution, I need project infomation to get hive instance, If pass this parameter in function, I have to modify so much functions. This is tricky but is the easiest way. About the metadata changed in projectInstance and TableDesc, I consider the compatibility, for old project and tableDesc, the hive variable is set to null which means use default hive(hive-site.xml located in kylin classpath). ) > kylin support more than one hive based on different hadoop claster > -- > > Key: KYLIN-1826 > URL: https://issues.apache.org/jira/browse/KYLIN-1826 > Project: Kylin > Issue Type: Improvement > Components: Environment >Affects Versions: v1.5.2 >Reporter: fengYu >Assignee: fengYu > Attachments: > 0001-KYLIN-1826-support-more-hive-based-on-different-hado.patch > > > Currently, kylin only support one hive which should run by 'hive' command, > However, when source data located in more than one hive we should deploy more > kylin instance and more than one metastore. which is difficult to manager and > may cause some conflict. > I has been working on it Recently, In our cluster, there are some hive > client(different metastore) which based on different hadoop cluster, I add a > new hive source type which called 'external hive' in kylin 1.5.x > Thanks to kylin Plug-in architecture in 2.x, which make this work easiler. > the main modification are: > 1. add hive root directory in hive config file, external hive client exist in > this directory. hive named by directory name. > 2. add hive-site.xml file while loading hive tables. > 3. store hive name into project, one project can only take one hive as source. > 4. change and add some job to support job building. > I will upload my patch if I finish all my tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1826) kylin support more than one hive based on different hadoop claster
[ https://issues.apache.org/jira/browse/KYLIN-1826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416434#comment-15416434 ] fengYu commented on KYLIN-1826: --- Sorry for delay, first of all, the hive in ProjectInstance is used in load hive table, then store the hive to TableDesc, the reason I use LocalThreadProject is I want do less code change, In my solution, I need project infomation to get hive instance, If pass this parameter in function, I have to modify so much functions. This is tricky but is the easiest way. About the metadata changed in projectInstance and TableDesc, I consider the compatibility, for old project and tableDesc, the hive variable is set to null which means use default hive(hive-site.xml located in kylin classpath). > kylin support more than one hive based on different hadoop claster > -- > > Key: KYLIN-1826 > URL: https://issues.apache.org/jira/browse/KYLIN-1826 > Project: Kylin > Issue Type: Improvement > Components: Environment >Affects Versions: v1.5.2 >Reporter: fengYu >Assignee: fengYu > Attachments: > 0001-KYLIN-1826-support-more-hive-based-on-different-hado.patch > > > Currently, kylin only support one hive which should run by 'hive' command, > However, when source data located in more than one hive we should deploy more > kylin instance and more than one metastore. which is difficult to manager and > may cause some conflict. > I has been working on it Recently, In our cluster, there are some hive > client(different metastore) which based on different hadoop cluster, I add a > new hive source type which called 'external hive' in kylin 1.5.x > Thanks to kylin Plug-in architecture in 2.x, which make this work easiler. > the main modification are: > 1. add hive root directory in hive config file, external hive client exist in > this directory. hive named by directory name. > 2. add hive-site.xml file while loading hive tables. > 3. store hive name into project, one project can only take one hive as source. > 4. change and add some job to support job building. > I will upload my patch if I finish all my tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1826) kylin support more than one hive based on different hadoop claster
[ https://issues.apache.org/jira/browse/KYLIN-1826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416436#comment-15416436 ] fengYu commented on KYLIN-1826: --- Sorry for delay, first of all, the hive in ProjectInstance is used in load hive table, then store the hive to TableDesc, the reason I use LocalThreadProject is I want do less code change, In my solution, I need project infomation to get hive instance, If pass this parameter in function, I have to modify so much functions. This is tricky but is the easiest way. About the metadata changed in projectInstance and TableDesc, I consider the compatibility, for old project and tableDesc, the hive variable is set to null which means use default hive(hive-site.xml located in kylin classpath). > kylin support more than one hive based on different hadoop claster > -- > > Key: KYLIN-1826 > URL: https://issues.apache.org/jira/browse/KYLIN-1826 > Project: Kylin > Issue Type: Improvement > Components: Environment >Affects Versions: v1.5.2 >Reporter: fengYu >Assignee: fengYu > Attachments: > 0001-KYLIN-1826-support-more-hive-based-on-different-hado.patch > > > Currently, kylin only support one hive which should run by 'hive' command, > However, when source data located in more than one hive we should deploy more > kylin instance and more than one metastore. which is difficult to manager and > may cause some conflict. > I has been working on it Recently, In our cluster, there are some hive > client(different metastore) which based on different hadoop cluster, I add a > new hive source type which called 'external hive' in kylin 1.5.x > Thanks to kylin Plug-in architecture in 2.x, which make this work easiler. > the main modification are: > 1. add hive root directory in hive config file, external hive client exist in > this directory. hive named by directory name. > 2. add hive-site.xml file while loading hive tables. > 3. store hive name into project, one project can only take one hive as source. > 4. change and add some job to support job building. > I will upload my patch if I finish all my tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1826) kylin support more than one hive based on different hadoop claster
[ https://issues.apache.org/jira/browse/KYLIN-1826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15391002#comment-15391002 ] fengYu commented on KYLIN-1826: --- That's what I'm concerned about. please review it if someone has free time. > kylin support more than one hive based on different hadoop claster > -- > > Key: KYLIN-1826 > URL: https://issues.apache.org/jira/browse/KYLIN-1826 > Project: Kylin > Issue Type: Improvement > Components: Environment >Affects Versions: v1.5.2 >Reporter: fengYu >Assignee: fengYu > Attachments: > 0001-KYLIN-1826-support-more-hive-based-on-different-hado.patch > > > Currently, kylin only support one hive which should run by 'hive' command, > However, when source data located in more than one hive we should deploy more > kylin instance and more than one metastore. which is difficult to manager and > may cause some conflict. > I has been working on it Recently, In our cluster, there are some hive > client(different metastore) which based on different hadoop cluster, I add a > new hive source type which called 'external hive' in kylin 1.5.x > Thanks to kylin Plug-in architecture in 2.x, which make this work easiler. > the main modification are: > 1. add hive root directory in hive config file, external hive client exist in > this directory. hive named by directory name. > 2. add hive-site.xml file while loading hive tables. > 3. store hive name into project, one project can only take one hive as source. > 4. change and add some job to support job building. > I will upload my patch if I finish all my tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (KYLIN-1839) improvement set classpath before submitting mr job
[ https://issues.apache.org/jira/browse/KYLIN-1839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fengYu resolved KYLIN-1839. --- Resolution: Resolved > improvement set classpath before submitting mr job > -- > > Key: KYLIN-1839 > URL: https://issues.apache.org/jira/browse/KYLIN-1839 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Affects Versions: v1.5.2 >Reporter: fengYu >Assignee: fengYu > Attachments: > 0001-KYLIN-1839-support-extend-lib-from-HDFS-and-cache-tm.patch > > > in setClasspath, kylin will alway find hive jars from hive dependency using > regex, however, this will not change in one process lifetime, so I cache the > location of tmpjars and tmpfiles. > What is more, support extends user lib setting to hdfs path rather than only > support local filesystem which will cause upload jars every time if > DistributedCache do not exist. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1890) support hbase table prefix configurable
[ https://issues.apache.org/jira/browse/KYLIN-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15390998#comment-15390998 ] fengYu commented on KYLIN-1890: --- I think make hbase table prefix configurable and add cube name between the prefix and real random name is more reasonable. With it, I can find all htables belongs to one cube in `hbase shell` just depneds on htable name. > support hbase table prefix configurable > --- > > Key: KYLIN-1890 > URL: https://issues.apache.org/jira/browse/KYLIN-1890 > Project: Kylin > Issue Type: Improvement > Components: General >Affects Versions: v1.5.2 >Reporter: fengYu >Assignee: fengYu > Attachments: > 0001-KYLIN-1890-support-hbase-table-prefix-configurable.patch > > > some times we need deploy two kylin env based on same hbase, I want to change > hbase table name prefix based two reasons: > 1、different kylin env will generate the same table name > 2、while clean invalid htable for one env will cause delete all tables belong > to another env. > different kylin env use different namespace is acceptable either. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (KYLIN-1891) merge interval support
[ https://issues.apache.org/jira/browse/KYLIN-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fengYu closed KYLIN-1891. - Resolution: Duplicate > merge interval support > -- > > Key: KYLIN-1891 > URL: https://issues.apache.org/jira/browse/KYLIN-1891 > Project: Kylin > Issue Type: Improvement >Reporter: fengYu >Assignee: fengYu > > We always has some data need to be amended some days later > in current kylin, once I set Auto Merge Thresholds, the segment newly build > will merge if reach Thresholds, the next day refresh will refresh merged > segemnt, which is unnecessary. > So I want to add a interval configuration means auto merge will merge > segments outside of the interval. > for example, if interval = 2, Auto Merge Thresholds=7, if 07-01 to 07-07 is > built, auto merge will not trigger, when 07-09 built success, auto merge will > trigger and merge segments from 07-01 to 07-07. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1892) merge interval support
[ https://issues.apache.org/jira/browse/KYLIN-1892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15390995#comment-15390995 ] fengYu commented on KYLIN-1892: --- I am glad to commit a patch for it once I am free.. > merge interval support > -- > > Key: KYLIN-1892 > URL: https://issues.apache.org/jira/browse/KYLIN-1892 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Affects Versions: v1.5.2 >Reporter: fengYu >Assignee: fengYu > > We always has some data need to be amended some days later > in current kylin, once I set Auto Merge Thresholds, the segment newly build > will merge if reach Thresholds, the next day refresh will refresh merged > segemnt, which is unnecessary. > So I want to add a interval configuration means auto merge will merge > segments outside of the interval. > for example, if interval = 2, Auto Merge Thresholds=7, if 07-01 to 07-07 is > built, auto merge will not trigger, when 07-09 built success, auto merge will > trigger and merge segments from 07-01 to 07-07. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1888) support backward cube building
[ https://issues.apache.org/jira/browse/KYLIN-1888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15390994#comment-15390994 ] fengYu commented on KYLIN-1888: --- Yeah, you are correct, I will do some change about the name. > support backward cube building > -- > > Key: KYLIN-1888 > URL: https://issues.apache.org/jira/browse/KYLIN-1888 > Project: Kylin > Issue Type: Improvement >Affects Versions: v1.5.2 >Reporter: fengYu >Assignee: fengYu > Attachments: 0001-forward-build-job.patch > > > This is used when user want to see data from last some days, and then fill up > history data from cube start date. > FORWARD just like reverse side of BUILD -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KYLIN-1890) support hbase table prefix configurable
[ https://issues.apache.org/jira/browse/KYLIN-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fengYu updated KYLIN-1890: -- Attachment: 0001-KYLIN-1890-support-hbase-table-prefix-configurable.patch > support hbase table prefix configurable > --- > > Key: KYLIN-1890 > URL: https://issues.apache.org/jira/browse/KYLIN-1890 > Project: Kylin > Issue Type: Improvement > Components: General >Affects Versions: v1.5.2 >Reporter: fengYu >Assignee: fengYu > Attachments: > 0001-KYLIN-1890-support-hbase-table-prefix-configurable.patch > > > some times we need deploy two kylin env based on same hbase, I want to change > hbase table name prefix based two reasons: > 1、different kylin env will generate the same table name > 2、while clean invalid htable for one env will cause delete all tables belong > to another env. > different kylin env use different namespace is acceptable either. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-1891) merge interval support
fengYu created KYLIN-1891: - Summary: merge interval support Key: KYLIN-1891 URL: https://issues.apache.org/jira/browse/KYLIN-1891 Project: Kylin Issue Type: Improvement Reporter: fengYu Assignee: fengYu We always has some data need to be amended some days later in current kylin, once I set Auto Merge Thresholds, the segment newly build will merge if reach Thresholds, the next day refresh will refresh merged segemnt, which is unnecessary. So I want to add a interval configuration means auto merge will merge segments outside of the interval. for example, if interval = 2, Auto Merge Thresholds=7, if 07-01 to 07-07 is built, auto merge will not trigger, when 07-09 built success, auto merge will trigger and merge segments from 07-01 to 07-07. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-1892) merge interval support
fengYu created KYLIN-1892: - Summary: merge interval support Key: KYLIN-1892 URL: https://issues.apache.org/jira/browse/KYLIN-1892 Project: Kylin Issue Type: Improvement Components: Job Engine Affects Versions: v1.5.2 Reporter: fengYu Assignee: fengYu We always has some data need to be amended some days later in current kylin, once I set Auto Merge Thresholds, the segment newly build will merge if reach Thresholds, the next day refresh will refresh merged segemnt, which is unnecessary. So I want to add a interval configuration means auto merge will merge segments outside of the interval. for example, if interval = 2, Auto Merge Thresholds=7, if 07-01 to 07-07 is built, auto merge will not trigger, when 07-09 built success, auto merge will trigger and merge segments from 07-01 to 07-07. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (KYLIN-1808) unload non existing table cause NPE
[ https://issues.apache.org/jira/browse/KYLIN-1808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fengYu resolved KYLIN-1808. --- Resolution: Fixed Assignee: fengYu > unload non existing table cause NPE > --- > > Key: KYLIN-1808 > URL: https://issues.apache.org/jira/browse/KYLIN-1808 > Project: Kylin > Issue Type: Bug >Affects Versions: v1.5.2 >Reporter: fengYu >Assignee: fengYu > Attachments: 0001-KYLIN-1808-unload-table-cause-NPE.patch > > > in TableController.java > private boolean unLoadHiveTable(String tableName, String project), > do not judge TableDesc object is null or not .. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (KYLIN-1480) NPE throws while execute sql with more than two join.
[ https://issues.apache.org/jira/browse/KYLIN-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fengYu resolved KYLIN-1480. --- Resolution: Fixed test passed both in kylin-1.3.0 and kylin-1.5.2 > NPE throws while execute sql with more than two join. > - > > Key: KYLIN-1480 > URL: https://issues.apache.org/jira/browse/KYLIN-1480 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v1.4.0, v1.2, v1.1, v1.0 >Reporter: fengYu >Assignee: fengYu > Attachments: 0001-unit-test-case-for-KYLIN-1480.patch, > NPE-in-more-joins.patch > > > Hi, I encounter NPE while execute sql more than two join, for example : > select A.type, A.cmd, count(1) from fact as A inner join (select type, > count(1) from fact group by type having count(1) > 2) as B on A.type = B.type > inner join (select cmd, count(1) from fact group by cmd having count(1) > 2) > as C on A.cmd = C.cmd group by A.type, A.cmd; > the fact table is create like this : > CREATE TABLE `fact`( > `fname` string, > `lname` string, > `dt` date, > `cost` int, > `type` string, > `cmd` string); > Kylin throws exception like this : > Caused by: java.lang.NullPointerException > at > org.apache.kylin.query.relnode.OLAPProjectRel.implementOLAP(OLAPProjectRel.java:103) > at > org.apache.kylin.query.relnode.OLAPRel$OLAPImplementor.visitChild(OLAPRel.java:81) > at > org.apache.kylin.query.relnode.OLAPAggregateRel.implementOLAP(OLAPAggregateRel.java:132) > at > org.apache.kylin.query.relnode.OLAPRel$OLAPImplementor.visitChild(OLAPRel.java:81) > at > org.apache.kylin.query.relnode.OLAPLimitRel.implementOLAP(OLAPLimitRel.java:73) > at > org.apache.kylin.query.relnode.OLAPRel$OLAPImplementor.visitChild(OLAPRel.java:81) > at > org.apache.kylin.query.relnode.OLAPToEnumerableConverter.implement(OLAPToEnumerableConverter.java:67) > at > org.apache.calcite.adapter.enumerable.EnumerableRelImplementor.implementRoot(EnumerableRelImplementor.java:99) > at > org.apache.calcite.adapter.enumerable.EnumerableInterpretable.toBindable(EnumerableInterpretable.java:92) > at > org.apache.calcite.prepare.CalcitePrepareImpl$CalcitePreparingStmt.implement(CalcitePrepareImpl.java:1050) > at org.apache.calcite.prepare.Prepare.prepareSql(Prepare.java:293) > at org.apache.calcite.prepare.Prepare.prepareSql(Prepare.java:188) > at > org.apache.calcite.prepare.CalcitePrepareImpl.prepare2_(CalcitePrepareImpl.java:671) > at > org.apache.calcite.prepare.CalcitePrepareImpl.prepare_(CalcitePrepareImpl.java:572) > at > org.apache.calcite.prepare.CalcitePrepareImpl.prepareSql(CalcitePrepareImpl.java:541) > at > org.apache.calcite.jdbc.CalciteConnectionImpl.parseQuery(CalciteConnectionImpl.java:173) > at > org.apache.calcite.jdbc.CalciteMetaImpl.prepareAndExecute(CalciteMetaImpl.java:561) > at > org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:477) > at > org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:109) > I try it in kylin-1.0 and kylin-2.x-staging, the same exception throws. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (KYLIN-1480) NPE throws while execute sql with more than two join.
[ https://issues.apache.org/jira/browse/KYLIN-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fengYu reassigned KYLIN-1480: - Assignee: fengYu (was: liyang) > NPE throws while execute sql with more than two join. > - > > Key: KYLIN-1480 > URL: https://issues.apache.org/jira/browse/KYLIN-1480 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v1.4.0, v1.2, v1.1, v1.0 >Reporter: fengYu >Assignee: fengYu > Attachments: 0001-unit-test-case-for-KYLIN-1480.patch, > NPE-in-more-joins.patch > > > Hi, I encounter NPE while execute sql more than two join, for example : > select A.type, A.cmd, count(1) from fact as A inner join (select type, > count(1) from fact group by type having count(1) > 2) as B on A.type = B.type > inner join (select cmd, count(1) from fact group by cmd having count(1) > 2) > as C on A.cmd = C.cmd group by A.type, A.cmd; > the fact table is create like this : > CREATE TABLE `fact`( > `fname` string, > `lname` string, > `dt` date, > `cost` int, > `type` string, > `cmd` string); > Kylin throws exception like this : > Caused by: java.lang.NullPointerException > at > org.apache.kylin.query.relnode.OLAPProjectRel.implementOLAP(OLAPProjectRel.java:103) > at > org.apache.kylin.query.relnode.OLAPRel$OLAPImplementor.visitChild(OLAPRel.java:81) > at > org.apache.kylin.query.relnode.OLAPAggregateRel.implementOLAP(OLAPAggregateRel.java:132) > at > org.apache.kylin.query.relnode.OLAPRel$OLAPImplementor.visitChild(OLAPRel.java:81) > at > org.apache.kylin.query.relnode.OLAPLimitRel.implementOLAP(OLAPLimitRel.java:73) > at > org.apache.kylin.query.relnode.OLAPRel$OLAPImplementor.visitChild(OLAPRel.java:81) > at > org.apache.kylin.query.relnode.OLAPToEnumerableConverter.implement(OLAPToEnumerableConverter.java:67) > at > org.apache.calcite.adapter.enumerable.EnumerableRelImplementor.implementRoot(EnumerableRelImplementor.java:99) > at > org.apache.calcite.adapter.enumerable.EnumerableInterpretable.toBindable(EnumerableInterpretable.java:92) > at > org.apache.calcite.prepare.CalcitePrepareImpl$CalcitePreparingStmt.implement(CalcitePrepareImpl.java:1050) > at org.apache.calcite.prepare.Prepare.prepareSql(Prepare.java:293) > at org.apache.calcite.prepare.Prepare.prepareSql(Prepare.java:188) > at > org.apache.calcite.prepare.CalcitePrepareImpl.prepare2_(CalcitePrepareImpl.java:671) > at > org.apache.calcite.prepare.CalcitePrepareImpl.prepare_(CalcitePrepareImpl.java:572) > at > org.apache.calcite.prepare.CalcitePrepareImpl.prepareSql(CalcitePrepareImpl.java:541) > at > org.apache.calcite.jdbc.CalciteConnectionImpl.parseQuery(CalciteConnectionImpl.java:173) > at > org.apache.calcite.jdbc.CalciteMetaImpl.prepareAndExecute(CalciteMetaImpl.java:561) > at > org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:477) > at > org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:109) > I try it in kylin-1.0 and kylin-2.x-staging, the same exception throws. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-1890) support hbase table prefix configurable
fengYu created KYLIN-1890: - Summary: support hbase table prefix configurable Key: KYLIN-1890 URL: https://issues.apache.org/jira/browse/KYLIN-1890 Project: Kylin Issue Type: Improvement Components: General Affects Versions: v1.5.2 Reporter: fengYu some times we need deploy two kylin env based on same hbase, I want to change hbase table name prefix based two reasons: 1、different kylin env will generate the same table name 2、while clean invalid htable for one env will cause delete all tables belong to another env. different kylin env use different namespace is acceptable either. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1480) NPE throws while execute sql with more than two join.
[ https://issues.apache.org/jira/browse/KYLIN-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376255#comment-15376255 ] fengYu commented on KYLIN-1480: --- Why this patch do not merge to 1.5.2? > NPE throws while execute sql with more than two join. > - > > Key: KYLIN-1480 > URL: https://issues.apache.org/jira/browse/KYLIN-1480 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v1.4.0, v1.2, v1.1, v1.0 >Reporter: fengYu >Assignee: liyang > Attachments: 0001-unit-test-case-for-KYLIN-1480.patch, > NPE-in-more-joins.patch > > > Hi, I encounter NPE while execute sql more than two join, for example : > select A.type, A.cmd, count(1) from fact as A inner join (select type, > count(1) from fact group by type having count(1) > 2) as B on A.type = B.type > inner join (select cmd, count(1) from fact group by cmd having count(1) > 2) > as C on A.cmd = C.cmd group by A.type, A.cmd; > the fact table is create like this : > CREATE TABLE `fact`( > `fname` string, > `lname` string, > `dt` date, > `cost` int, > `type` string, > `cmd` string); > Kylin throws exception like this : > Caused by: java.lang.NullPointerException > at > org.apache.kylin.query.relnode.OLAPProjectRel.implementOLAP(OLAPProjectRel.java:103) > at > org.apache.kylin.query.relnode.OLAPRel$OLAPImplementor.visitChild(OLAPRel.java:81) > at > org.apache.kylin.query.relnode.OLAPAggregateRel.implementOLAP(OLAPAggregateRel.java:132) > at > org.apache.kylin.query.relnode.OLAPRel$OLAPImplementor.visitChild(OLAPRel.java:81) > at > org.apache.kylin.query.relnode.OLAPLimitRel.implementOLAP(OLAPLimitRel.java:73) > at > org.apache.kylin.query.relnode.OLAPRel$OLAPImplementor.visitChild(OLAPRel.java:81) > at > org.apache.kylin.query.relnode.OLAPToEnumerableConverter.implement(OLAPToEnumerableConverter.java:67) > at > org.apache.calcite.adapter.enumerable.EnumerableRelImplementor.implementRoot(EnumerableRelImplementor.java:99) > at > org.apache.calcite.adapter.enumerable.EnumerableInterpretable.toBindable(EnumerableInterpretable.java:92) > at > org.apache.calcite.prepare.CalcitePrepareImpl$CalcitePreparingStmt.implement(CalcitePrepareImpl.java:1050) > at org.apache.calcite.prepare.Prepare.prepareSql(Prepare.java:293) > at org.apache.calcite.prepare.Prepare.prepareSql(Prepare.java:188) > at > org.apache.calcite.prepare.CalcitePrepareImpl.prepare2_(CalcitePrepareImpl.java:671) > at > org.apache.calcite.prepare.CalcitePrepareImpl.prepare_(CalcitePrepareImpl.java:572) > at > org.apache.calcite.prepare.CalcitePrepareImpl.prepareSql(CalcitePrepareImpl.java:541) > at > org.apache.calcite.jdbc.CalciteConnectionImpl.parseQuery(CalciteConnectionImpl.java:173) > at > org.apache.calcite.jdbc.CalciteMetaImpl.prepareAndExecute(CalciteMetaImpl.java:561) > at > org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:477) > at > org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:109) > I try it in kylin-1.0 and kylin-2.x-staging, the same exception throws. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1808) unload non existing table cause NPE
[ https://issues.apache.org/jira/browse/KYLIN-1808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376254#comment-15376254 ] fengYu commented on KYLIN-1808: --- upload my patch just judge null. > unload non existing table cause NPE > --- > > Key: KYLIN-1808 > URL: https://issues.apache.org/jira/browse/KYLIN-1808 > Project: Kylin > Issue Type: Bug >Affects Versions: v1.5.2 >Reporter: fengYu > Attachments: 0001-KYLIN-1808-unload-table-cause-NPE.patch > > > in TableController.java > private boolean unLoadHiveTable(String tableName, String project), > do not judge TableDesc object is null or not .. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KYLIN-1808) unload non existing table cause NPE
[ https://issues.apache.org/jira/browse/KYLIN-1808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fengYu updated KYLIN-1808: -- Attachment: 0001-KYLIN-1808-unload-table-cause-NPE.patch > unload non existing table cause NPE > --- > > Key: KYLIN-1808 > URL: https://issues.apache.org/jira/browse/KYLIN-1808 > Project: Kylin > Issue Type: Bug >Affects Versions: v1.5.2 >Reporter: fengYu > Attachments: 0001-KYLIN-1808-unload-table-cause-NPE.patch > > > in TableController.java > private boolean unLoadHiveTable(String tableName, String project), > do not judge TableDesc object is null or not .. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KYLIN-1888) support forward cube building
[ https://issues.apache.org/jira/browse/KYLIN-1888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fengYu updated KYLIN-1888: -- Attachment: 0001-forward-build-job.patch > support forward cube building > - > > Key: KYLIN-1888 > URL: https://issues.apache.org/jira/browse/KYLIN-1888 > Project: Kylin > Issue Type: Improvement >Affects Versions: v1.5.2 >Reporter: fengYu >Assignee: fengYu > Attachments: 0001-forward-build-job.patch > > > This is used when user want to see data from last some days, and then fill up > history data from cube start date. > FORWARD just like reverse side of BUILD -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-1888) support forward cube building
fengYu created KYLIN-1888: - Summary: support forward cube building Key: KYLIN-1888 URL: https://issues.apache.org/jira/browse/KYLIN-1888 Project: Kylin Issue Type: Improvement Affects Versions: v1.5.2 Reporter: fengYu Assignee: fengYu This is used when user want to see data from last some days, and then fill up history data from cube start date. FORWARD just like reverse side of BUILD -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KYLIN-1826) kylin support more than one hive based on different hadoop claster
[ https://issues.apache.org/jira/browse/KYLIN-1826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fengYu updated KYLIN-1826: -- Attachment: 0001-KYLIN-1826-support-more-hive-based-on-different-hado.patch I’m sorry for keeping you waiting. I finish my patch and add an source named ID_EXTERNAL_HIVE, one project can only use one hive source, for the sake of modify less code, I add Thread local virable tells which project is using in current thread. However, the patch is a litter big, It can apply or merge to kylin-1.5.2.1, I have test in our env. many times, And It works for more hive client based on two hadoop cluster, what is more, our hadoop engine(kylin calculate engin) is another cluster, and It works fine for me. I wish you can do more test in other env, Hope for your feedback. > kylin support more than one hive based on different hadoop claster > -- > > Key: KYLIN-1826 > URL: https://issues.apache.org/jira/browse/KYLIN-1826 > Project: Kylin > Issue Type: Improvement > Components: Environment >Affects Versions: v1.5.2 >Reporter: fengYu >Assignee: fengYu > Attachments: > 0001-KYLIN-1826-support-more-hive-based-on-different-hado.patch > > > Currently, kylin only support one hive which should run by 'hive' command, > However, when source data located in more than one hive we should deploy more > kylin instance and more than one metastore. which is difficult to manager and > may cause some conflict. > I has been working on it Recently, In our cluster, there are some hive > client(different metastore) which based on different hadoop cluster, I add a > new hive source type which called 'external hive' in kylin 1.5.x > Thanks to kylin Plug-in architecture in 2.x, which make this work easiler. > the main modification are: > 1. add hive root directory in hive config file, external hive client exist in > this directory. hive named by directory name. > 2. add hive-site.xml file while loading hive tables. > 3. store hive name into project, one project can only take one hive as source. > 4. change and add some job to support job building. > I will upload my patch if I finish all my tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1833) union operation will cause error result
[ https://issues.apache.org/jira/browse/KYLIN-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15362481#comment-15362481 ] fengYu commented on KYLIN-1833: --- 1.5.2.1 has this problem too, what is more, do some change like the patch does can not resolve it in 1.5, I will analysis it in new version once I am free. > union operation will cause error result > --- > > Key: KYLIN-1833 > URL: https://issues.apache.org/jira/browse/KYLIN-1833 > Project: Kylin > Issue Type: Improvement > Components: Query Engine >Affects Versions: v1.3.0, v1.5.2 >Reporter: fengYu >Assignee: fengYu > Attachments: 0001-KYLIN-1833-union-query-get-error-result.patch > > > query like this will get error result : > select * from ( > select 'b', count(1) from kylin_sales where lstg_format_name >= 'Auction' > union all > select 'a', count(1) from kylin_sales where lstg_format_name >= 'Others' > union all > select 'c', count(1) from kylin_sales where lstg_format_name >= 'FP-GTC' > ) order by 1 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1808) unload non existing table cause NPE
[ https://issues.apache.org/jira/browse/KYLIN-1808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15362338#comment-15362338 ] fengYu commented on KYLIN-1808: --- OK, I will upload my patch later. > unload non existing table cause NPE > --- > > Key: KYLIN-1808 > URL: https://issues.apache.org/jira/browse/KYLIN-1808 > Project: Kylin > Issue Type: Bug >Affects Versions: v1.5.2 >Reporter: fengYu > > in TableController.java > private boolean unLoadHiveTable(String tableName, String project), > do not judge TableDesc object is null or not .. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (KYLIN-1280) Convert Cuboid Data to HFile failed when hbase in different HDFS
[ https://issues.apache.org/jira/browse/KYLIN-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fengYu closed KYLIN-1280. - Resolution: Fixed > Convert Cuboid Data to HFile failed when hbase in different HDFS > > > Key: KYLIN-1280 > URL: https://issues.apache.org/jira/browse/KYLIN-1280 > Project: Kylin > Issue Type: Bug >Affects Versions: v1.4.0 >Reporter: fengYu > Attachments: > 0001-transform-path-in-other-HDFS-to-real-name-node-path.patch > > > I deploy kylin-2.0 with hbase which rely on a different HDFS with hadoop > cluster, so I config this property 'kylin.hbase.cluster.fs' = hdfs://A, the > name service is different with 'fs.defaultFS' in hadoop cluster which is > hdfs://B. > In the step 'Convert Cuboid Data to HFile' execute failed, error log is : > java.io.IOException: Failed to run job : Unable to map logical nameservice > URI 'hdfs://A' to a NameNode. Local configuration does not have a failover > proxy provide > r configured. > at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:300) > at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:432) > at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285) > at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282) > at > org.apache.kylin.engine.mr.common.AbstractHadoopJob.waitForCompletion(AbstractHadoopJob.java:129) > at > org.apache.kylin.storage.hbase.steps.CubeHFileJob.run(CubeHFileJob.java:93) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at > org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:119) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107) > at > org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107) > at > org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > I think it is because node manager in hadoop cluster can not recognition > hdfs://A in they config. So, I have to tranform the path > hdfs://A/path/to/hfile to hdfs://namenode_ip:port/path/to/hfile before > execute this step. and it works for me. > Here is my patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1280) Convert Cuboid Data to HFile failed when hbase in different HDFS
[ https://issues.apache.org/jira/browse/KYLIN-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15362335#comment-15362335 ] fengYu commented on KYLIN-1280: --- I forget it, We will redeploy hadoop cluster too, so it is useless. I will close it. > Convert Cuboid Data to HFile failed when hbase in different HDFS > > > Key: KYLIN-1280 > URL: https://issues.apache.org/jira/browse/KYLIN-1280 > Project: Kylin > Issue Type: Bug >Affects Versions: v1.4.0 >Reporter: fengYu > Attachments: > 0001-transform-path-in-other-HDFS-to-real-name-node-path.patch > > > I deploy kylin-2.0 with hbase which rely on a different HDFS with hadoop > cluster, so I config this property 'kylin.hbase.cluster.fs' = hdfs://A, the > name service is different with 'fs.defaultFS' in hadoop cluster which is > hdfs://B. > In the step 'Convert Cuboid Data to HFile' execute failed, error log is : > java.io.IOException: Failed to run job : Unable to map logical nameservice > URI 'hdfs://A' to a NameNode. Local configuration does not have a failover > proxy provide > r configured. > at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:300) > at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:432) > at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285) > at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282) > at > org.apache.kylin.engine.mr.common.AbstractHadoopJob.waitForCompletion(AbstractHadoopJob.java:129) > at > org.apache.kylin.storage.hbase.steps.CubeHFileJob.run(CubeHFileJob.java:93) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at > org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:119) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107) > at > org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107) > at > org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > I think it is because node manager in hadoop cluster can not recognition > hdfs://A in they config. So, I have to tranform the path > hdfs://A/path/to/hfile to hdfs://namenode_ip:port/path/to/hfile before > execute this step. and it works for me. > Here is my patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1840) project admin should has right to load table
[ https://issues.apache.org/jira/browse/KYLIN-1840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15360783#comment-15360783 ] fengYu commented on KYLIN-1840: --- I hive some other question : 1、only admin has right to do Advance Settings while building cube 2、Every one can see the System tab, which I think it can only open to Admin 3、Slow Queries is opened to everyone, however, in backend interface, users except Admin will get Access is denied > project admin should has right to load table > > > Key: KYLIN-1840 > URL: https://issues.apache.org/jira/browse/KYLIN-1840 > Project: Kylin > Issue Type: Bug > Components: REST Service >Affects Versions: v1.5.2 >Reporter: fengYu >Assignee: Zhong,Jason > > only admin has the right to load table , I try to find whether has some other > logical like this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-1840) project admin should has right to load table
fengYu created KYLIN-1840: - Summary: project admin should has right to load table Key: KYLIN-1840 URL: https://issues.apache.org/jira/browse/KYLIN-1840 Project: Kylin Issue Type: Bug Components: REST Service Affects Versions: v1.5.2 Reporter: fengYu Assignee: Zhong,Jason only admin has the right to load table , I try to find whether has some other logical like this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KYLIN-1839) improvement set classpath before submitting mr job
[ https://issues.apache.org/jira/browse/KYLIN-1839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fengYu updated KYLIN-1839: -- Attachment: 0001-KYLIN-1839-support-extend-lib-from-HDFS-and-cache-tm.patch patch attached, support set kylin.job.mr.lib.dir to HDFS path and cache temp jars > improvement set classpath before submitting mr job > -- > > Key: KYLIN-1839 > URL: https://issues.apache.org/jira/browse/KYLIN-1839 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Affects Versions: v1.5.2 >Reporter: fengYu >Assignee: fengYu > Attachments: > 0001-KYLIN-1839-support-extend-lib-from-HDFS-and-cache-tm.patch > > > in setClasspath, kylin will alway find hive jars from hive dependency using > regex, however, this will not change in one process lifetime, so I cache the > location of tmpjars and tmpfiles. > What is more, support extends user lib setting to hdfs path rather than only > support local filesystem which will cause upload jars every time if > DistributedCache do not exist. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-1839) improvement set classpath before submitting mr job
fengYu created KYLIN-1839: - Summary: improvement set classpath before submitting mr job Key: KYLIN-1839 URL: https://issues.apache.org/jira/browse/KYLIN-1839 Project: Kylin Issue Type: Improvement Components: Job Engine Affects Versions: v1.5.2 Reporter: fengYu Assignee: fengYu in setClasspath, kylin will alway find hive jars from hive dependency using regex, however, this will not change in one process lifetime, so I cache the location of tmpjars and tmpfiles. What is more, support extends user lib setting to hdfs path rather than only support local filesystem which will cause upload jars every time if DistributedCache do not exist. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KYLIN-1833) union operation will cause error result
[ https://issues.apache.org/jira/browse/KYLIN-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fengYu updated KYLIN-1833: -- Attachment: 0001-KYLIN-1833-union-query-get-error-result.patch patch for kylin-1.3.0, do not allocate OLAPRel.OLAPImplementor every time in OLAPToEnumerableConverter.implement, move it to threadlocal virable. > union operation will cause error result > --- > > Key: KYLIN-1833 > URL: https://issues.apache.org/jira/browse/KYLIN-1833 > Project: Kylin > Issue Type: Improvement > Components: Query Engine >Affects Versions: v1.3.0, v1.5.2 >Reporter: fengYu >Assignee: fengYu > Attachments: 0001-KYLIN-1833-union-query-get-error-result.patch > > > query like this will get error result : > select * from ( > select 'b', count(1) from kylin_sales where lstg_format_name >= 'Auction' > union all > select 'a', count(1) from kylin_sales where lstg_format_name >= 'Others' > union all > select 'c', count(1) from kylin_sales where lstg_format_name >= 'FP-GTC' > ) order by 1 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1833) union operation will cause error result
[ https://issues.apache.org/jira/browse/KYLIN-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15354387#comment-15354387 ] fengYu commented on KYLIN-1833: --- the result is error. In kylin-2.x, the result is : a 1987 b 5844 c 5844 and the right result is : a 2021 b 7987 c 5946 in kylin-1.x, the id of OLAPContext is incorrect because for every subquery, kylin allocate a OLAPRel.OLAPImplementor and OLAPContext.id always equals to 0, which cause the generated code has some error. I move OLAPRel.OLAPImplementor to ThreadLocal virable and get right result. However, in kylin-2.x, there are seems more effects, I am trying to find the reason in kylin-2.x > union operation will cause error result > --- > > Key: KYLIN-1833 > URL: https://issues.apache.org/jira/browse/KYLIN-1833 > Project: Kylin > Issue Type: Improvement > Components: Query Engine >Affects Versions: v1.3.0, v1.5.2 >Reporter: fengYu >Assignee: fengYu > > query like this will get error result : > select * from ( > select 'b', count(1) from kylin_sales where lstg_format_name >= 'Auction' > union all > select 'a', count(1) from kylin_sales where lstg_format_name >= 'Others' > union all > select 'c', count(1) from kylin_sales where lstg_format_name >= 'FP-GTC' > ) order by 1 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KYLIN-1832) HyperLogLog speed is too slow in encode and decode
[ https://issues.apache.org/jira/browse/KYLIN-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15352646#comment-15352646 ] fengYu edited comment on KYLIN-1832 at 6/28/16 9:19 AM: the max size of biggerIndexSet is half of m, and I will test about it later, if this use too much memory, bitmap is a better choice. was (Author: feng_xiao_yu): the max size of biggerIndexSet is half of m, and I will test about it later > HyperLogLog speed is too slow in encode and decode > -- > > Key: KYLIN-1832 > URL: https://issues.apache.org/jira/browse/KYLIN-1832 > Project: Kylin > Issue Type: Improvement > Components: Metadata >Affects Versions: v1.3.0, v1.5.2 >Reporter: fengYu >Assignee: fengYu > Attachments: HyperLogLogPlusCounter.java > > > We have a cube with more than ten distinct count measure, and use hll15 store > the value, we found it is too slow of HyperLogLogPlusCounter, there are three > methods will called frequentlly: merge/writeRegisters/readRegisters. > I found in kylin-1.5.x add a parameter 'singleBucket' to store the only one > bucket which can optimize base cuboid. > However, in other step of cuboid building, it will slow down. I has modify > the code to speed up the speed of three operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1832) HyperLogLog speed is too slow in encode and decode
[ https://issues.apache.org/jira/browse/KYLIN-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15352646#comment-15352646 ] fengYu commented on KYLIN-1832: --- the max size of biggerIndexSet is half of m, and I will test about it later > HyperLogLog speed is too slow in encode and decode > -- > > Key: KYLIN-1832 > URL: https://issues.apache.org/jira/browse/KYLIN-1832 > Project: Kylin > Issue Type: Improvement > Components: Metadata >Affects Versions: v1.3.0, v1.5.2 >Reporter: fengYu >Assignee: fengYu > Attachments: HyperLogLogPlusCounter.java > > > We have a cube with more than ten distinct count measure, and use hll15 store > the value, we found it is too slow of HyperLogLogPlusCounter, there are three > methods will called frequentlly: merge/writeRegisters/readRegisters. > I found in kylin-1.5.x add a parameter 'singleBucket' to store the only one > bucket which can optimize base cuboid. > However, in other step of cuboid building, it will slow down. I has modify > the code to speed up the speed of three operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1832) HyperLogLog speed is too slow in encode and decode
[ https://issues.apache.org/jira/browse/KYLIN-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15352592#comment-15352592 ] fengYu commented on KYLIN-1832: --- I will upload the patch for 1.x and 2.x, but replace the whole file is ok if you can accept this kind of implemention. > HyperLogLog speed is too slow in encode and decode > -- > > Key: KYLIN-1832 > URL: https://issues.apache.org/jira/browse/KYLIN-1832 > Project: Kylin > Issue Type: Improvement > Components: Metadata >Affects Versions: v1.3.0, v1.5.2 >Reporter: fengYu >Assignee: fengYu > Attachments: HyperLogLogPlusCounter.java > > > We have a cube with more than ten distinct count measure, and use hll15 store > the value, we found it is too slow of HyperLogLogPlusCounter, there are three > methods will called frequentlly: merge/writeRegisters/readRegisters. > I found in kylin-1.5.x add a parameter 'singleBucket' to store the only one > bucket which can optimize base cuboid. > However, in other step of cuboid building, it will slow down. I has modify > the code to speed up the speed of three operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-1833) union operation will cause error result
fengYu created KYLIN-1833: - Summary: union operation will cause error result Key: KYLIN-1833 URL: https://issues.apache.org/jira/browse/KYLIN-1833 Project: Kylin Issue Type: Improvement Components: Query Engine Affects Versions: v1.5.2, v1.3.0 Reporter: fengYu Assignee: fengYu query like this will get error result : select * from ( select 'b', count(1) from kylin_sales where lstg_format_name >= 'Auction' union all select 'a', count(1) from kylin_sales where lstg_format_name >= 'Others' union all select 'c', count(1) from kylin_sales where lstg_format_name >= 'FP-GTC' ) order by 1 I will work with it and upload my patch later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KYLIN-1833) union operation will cause error result
[ https://issues.apache.org/jira/browse/KYLIN-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fengYu updated KYLIN-1833: -- Description: query like this will get error result : select * from ( select 'b', count(1) from kylin_sales where lstg_format_name >= 'Auction' union all select 'a', count(1) from kylin_sales where lstg_format_name >= 'Others' union all select 'c', count(1) from kylin_sales where lstg_format_name >= 'FP-GTC' ) order by 1 was: query like this will get error result : select * from ( select 'b', count(1) from kylin_sales where lstg_format_name >= 'Auction' union all select 'a', count(1) from kylin_sales where lstg_format_name >= 'Others' union all select 'c', count(1) from kylin_sales where lstg_format_name >= 'FP-GTC' ) order by 1 I will work with it and upload my patch later. > union operation will cause error result > --- > > Key: KYLIN-1833 > URL: https://issues.apache.org/jira/browse/KYLIN-1833 > Project: Kylin > Issue Type: Improvement > Components: Query Engine >Affects Versions: v1.3.0, v1.5.2 >Reporter: fengYu >Assignee: fengYu > > query like this will get error result : > select * from ( > select 'b', count(1) from kylin_sales where lstg_format_name >= 'Auction' > union all > select 'a', count(1) from kylin_sales where lstg_format_name >= 'Others' > union all > select 'c', count(1) from kylin_sales where lstg_format_name >= 'FP-GTC' > ) order by 1 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KYLIN-1832) HyperLogLog speed is too slow in encode and decode
[ https://issues.apache.org/jira/browse/KYLIN-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fengYu updated KYLIN-1832: -- Attachment: HyperLogLogPlusCounter.java > HyperLogLog speed is too slow in encode and decode > -- > > Key: KYLIN-1832 > URL: https://issues.apache.org/jira/browse/KYLIN-1832 > Project: Kylin > Issue Type: Improvement > Components: Metadata >Affects Versions: v1.3.0, v1.5.2 >Reporter: fengYu >Assignee: fengYu > Attachments: HyperLogLogPlusCounter.java > > > We have a cube with more than ten distinct count measure, and use hll15 store > the value, we found it is too slow of HyperLogLogPlusCounter, there are three > methods will called frequentlly: merge/writeRegisters/readRegisters. > I found in kylin-1.5.x add a parameter 'singleBucket' to store the only one > bucket which can optimize base cuboid. > However, in other step of cuboid building, it will slow down. I has modify > the code to speed up the speed of three operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-1832) HyperLogLog speed is too slow in encode and decode
fengYu created KYLIN-1832: - Summary: HyperLogLog speed is too slow in encode and decode Key: KYLIN-1832 URL: https://issues.apache.org/jira/browse/KYLIN-1832 Project: Kylin Issue Type: Improvement Components: Metadata Affects Versions: v1.5.2, v1.3.0 Reporter: fengYu Assignee: fengYu We have a cube with more than ten distinct count measure, and use hll15 store the value, we found it is too slow of HyperLogLogPlusCounter, there are three methods will called frequentlly: merge/writeRegisters/readRegisters. I found in kylin-1.5.x add a parameter 'singleBucket' to store the only one bucket which can optimize base cuboid. However, in other step of cuboid building, it will slow down. I has modify the code to speed up the speed of three operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-1826) kylin support more than one hive based on different hadoop claster
fengYu created KYLIN-1826: - Summary: kylin support more than one hive based on different hadoop claster Key: KYLIN-1826 URL: https://issues.apache.org/jira/browse/KYLIN-1826 Project: Kylin Issue Type: Improvement Components: Environment Affects Versions: v1.5.2 Reporter: fengYu Assignee: fengYu Currently, kylin only support one hive which should run by 'hive' command, However, when source data located in more than one hive we should deploy more kylin instance and more than one metastore. which is difficult to manager and may cause some conflict. I has been working on it Recently, In our cluster, there are some hive client(different metastore) which based on different hadoop cluster, I add a new hive source type which called 'external hive' in kylin 1.5.x Thanks to kylin Plug-in architecture in 2.x, which make this work easiler. the main modification are: 1. add hive root directory in hive config file, external hive client exist in this directory. hive named by directory name. 2. add hive-site.xml file while loading hive tables. 3. store hive name into project, one project can only take one hive as source. 4. change and add some job to support job building. I will upload my patch if I finish all my tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-1808) unload non existing table cause NPE
fengYu created KYLIN-1808: - Summary: unload non existing table cause NPE Key: KYLIN-1808 URL: https://issues.apache.org/jira/browse/KYLIN-1808 Project: Kylin Issue Type: Bug Affects Versions: v1.5.2 Reporter: fengYu in TableController.java private boolean unLoadHiveTable(String tableName, String project), do not judge TableDesc object is null or not .. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KYLIN-1685) error happens while execute a sql contains '?' using Statement
[ https://issues.apache.org/jira/browse/KYLIN-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fengYu updated KYLIN-1685: -- Attachment: 0003-KYLIN-1685-error-happens-while-execute-a-sql-contain.patch add my patch and test case > error happens while execute a sql contains '?' using Statement > -- > > Key: KYLIN-1685 > URL: https://issues.apache.org/jira/browse/KYLIN-1685 > Project: Kylin > Issue Type: Bug > Components: Driver - JDBC >Affects Versions: v1.2, v1.5.1 >Reporter: fengYu > Attachments: > 0003-KYLIN-1685-error-happens-while-execute-a-sql-contain.patch > > > Exception happen : > java.sql.SQLException: Error while executing SQL "select * from test_table > where url not in ('http://a.b.com/?a=b')": > org.apache.kylin.jdbc.KylinStatement cannot be cast to > org.apache.kylin.jdbc.KylinPreparedStatement > at org.apache.kylin.jdbc.KylinResultSet.execute(KylinResultSet.java:54) > at > org.apache.calcite.avatica.AvaticaConnection$1.execute(AvaticaConnection.java:566) > at org.apache.kylin.jdbc.KylinMeta.prepareAndExecute(KylinMeta.java:79) > at > org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:571) > at > org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:135) > at > org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:186) > at > org.apache.kylin.jdbc.DriverTest.testStatementWithQuestionMask(DriverTest.java:79) > This caused by kylin jdbc will take a sql contain '?' as PreparedStatement > and cast as it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KYLIN-1685) error happens while execute a sql contains '?' using Statement
[ https://issues.apache.org/jira/browse/KYLIN-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fengYu updated KYLIN-1685: -- Attachment: (was: 0003-KYLIN-1685-error-happens-while-execute-a-sql-contain.patch) > error happens while execute a sql contains '?' using Statement > -- > > Key: KYLIN-1685 > URL: https://issues.apache.org/jira/browse/KYLIN-1685 > Project: Kylin > Issue Type: Bug > Components: Driver - JDBC >Affects Versions: v1.2, v1.5.1 >Reporter: fengYu > > Exception happen : > java.sql.SQLException: Error while executing SQL "select * from test_table > where url not in ('http://a.b.com/?a=b')": > org.apache.kylin.jdbc.KylinStatement cannot be cast to > org.apache.kylin.jdbc.KylinPreparedStatement > at org.apache.kylin.jdbc.KylinResultSet.execute(KylinResultSet.java:54) > at > org.apache.calcite.avatica.AvaticaConnection$1.execute(AvaticaConnection.java:566) > at org.apache.kylin.jdbc.KylinMeta.prepareAndExecute(KylinMeta.java:79) > at > org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:571) > at > org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:135) > at > org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:186) > at > org.apache.kylin.jdbc.DriverTest.testStatementWithQuestionMask(DriverTest.java:79) > This caused by kylin jdbc will take a sql contain '?' as PreparedStatement > and cast as it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KYLIN-1685) error happens while execute a sql contains '?' using Statement
[ https://issues.apache.org/jira/browse/KYLIN-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fengYu updated KYLIN-1685: -- Attachment: 0003-KYLIN-1685-error-happens-while-execute-a-sql-contain.patch > error happens while execute a sql contains '?' using Statement > -- > > Key: KYLIN-1685 > URL: https://issues.apache.org/jira/browse/KYLIN-1685 > Project: Kylin > Issue Type: Bug > Components: Driver - JDBC >Affects Versions: v1.2, v1.5.1 >Reporter: fengYu > Attachments: > 0003-KYLIN-1685-error-happens-while-execute-a-sql-contain.patch > > > Exception happen : > java.sql.SQLException: Error while executing SQL "select * from test_table > where url not in ('http://a.b.com/?a=b')": > org.apache.kylin.jdbc.KylinStatement cannot be cast to > org.apache.kylin.jdbc.KylinPreparedStatement > at org.apache.kylin.jdbc.KylinResultSet.execute(KylinResultSet.java:54) > at > org.apache.calcite.avatica.AvaticaConnection$1.execute(AvaticaConnection.java:566) > at org.apache.kylin.jdbc.KylinMeta.prepareAndExecute(KylinMeta.java:79) > at > org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:571) > at > org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:135) > at > org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:186) > at > org.apache.kylin.jdbc.DriverTest.testStatementWithQuestionMask(DriverTest.java:79) > This caused by kylin jdbc will take a sql contain '?' as PreparedStatement > and cast as it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-1685) error happens while execute a sql contains '?' using Statement
fengYu created KYLIN-1685: - Summary: error happens while execute a sql contains '?' using Statement Key: KYLIN-1685 URL: https://issues.apache.org/jira/browse/KYLIN-1685 Project: Kylin Issue Type: Bug Components: Driver - JDBC Affects Versions: v1.5.1, v1.2 Reporter: fengYu Exception happen : java.sql.SQLException: Error while executing SQL "select * from test_table where url not in ('http://a.b.com/?a=b')": org.apache.kylin.jdbc.KylinStatement cannot be cast to org.apache.kylin.jdbc.KylinPreparedStatement at org.apache.kylin.jdbc.KylinResultSet.execute(KylinResultSet.java:54) at org.apache.calcite.avatica.AvaticaConnection$1.execute(AvaticaConnection.java:566) at org.apache.kylin.jdbc.KylinMeta.prepareAndExecute(KylinMeta.java:79) at org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:571) at org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:135) at org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:186) at org.apache.kylin.jdbc.DriverTest.testStatementWithQuestionMask(DriverTest.java:79) This caused by kylin jdbc will take a sql contain '?' as PreparedStatement and cast as it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)