[jira] [Commented] (KYLIN-2764) Build the dict for UHC column with MR

2018-01-01 Thread Billy Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16307614#comment-16307614
 ] 

Billy Liu commented on KYLIN-2764:
--

Thanks [~kangkaisen] and [~liyang.g...@gmail.com].

To enable this feature, set 
kylin.engine.mr.build-uhc-dict-in-additional-step=true

> Build the dict for UHC column with MR
> -
>
> Key: KYLIN-2764
> URL: https://issues.apache.org/jira/browse/KYLIN-2764
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
> Fix For: v2.3.0
>
> Attachments: job-memory-after.png, job-memory-before.png
>
>
> KYLIN-2217 has built dict for  normal column with MR,  but the UHC column 
> still build dict in JobServer. Like KYLIN-2217, we also could use MR build 
> dict for UHC column. which could thoroughly release the memory pressure and  
> improve job concurrent for JobServer  as well as speed up multi UHC columns 
> procedure.
> The MR input is the output of  "Extract Fact Table Distinct Columns", the MR 
> output is the UHC column dict. Because it is very hard build global dict with 
> multi reducers, I use one reducer handle one UHC column and allocate enough 
> memory to the reducer. According to my test, 8G memory is enough.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-2764) Build the dict for UHC column with MR

2017-12-31 Thread liyang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16307336#comment-16307336
 ] 

liyang commented on KYLIN-2764:
---

Note I have renamed the config to 
"kylin.engine.mr.build-uhc-dict-in-additional-step" for better readability. FYI.

> Build the dict for UHC column with MR
> -
>
> Key: KYLIN-2764
> URL: https://issues.apache.org/jira/browse/KYLIN-2764
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
> Fix For: v2.3.0
>
> Attachments: job-memory-after.png, job-memory-before.png
>
>
> KYLIN-2217 has built dict for  normal column with MR,  but the UHC column 
> still build dict in JobServer. Like KYLIN-2217, we also could use MR build 
> dict for UHC column. which could thoroughly release the memory pressure and  
> improve job concurrent for JobServer  as well as speed up multi UHC columns 
> procedure.
> The MR input is the output of  "Extract Fact Table Distinct Columns", the MR 
> output is the UHC column dict. Because it is very hard build global dict with 
> multi reducers, I use one reducer handle one UHC column and allocate enough 
> memory to the reducer. According to my test, 8G memory is enough.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-2764) Build the dict for UHC column with MR

2017-10-22 Thread kangkaisen (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16214609#comment-16214609
 ] 

kangkaisen commented on KYLIN-2764:
---

OK. Thanks you, shaofeng!

> Build the dict for UHC column with MR
> -
>
> Key: KYLIN-2764
> URL: https://issues.apache.org/jira/browse/KYLIN-2764
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
> Fix For: v2.3.0
>
> Attachments: job-memory-after.png, job-memory-before.png
>
>
> KYLIN-2217 has built dict for  normal column with MR,  but the UHC column 
> still build dict in JobServer. Like KYLIN-2217, we also could use MR build 
> dict for UHC column. which could thoroughly release the memory pressure and  
> improve job concurrent for JobServer  as well as speed up multi UHC columns 
> procedure.
> The MR input is the output of  "Extract Fact Table Distinct Columns", the MR 
> output is the UHC column dict. Because it is very hard build global dict with 
> multi reducers, I use one reducer handle one UHC column and allocate enough 
> memory to the reducer. According to my test, 8G memory is enough.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-2764) Build the dict for UHC column with MR

2017-10-22 Thread Shaofeng SHI (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16214604#comment-16214604
 ] 

Shaofeng SHI commented on KYLIN-2764:
-

The changes in 2764 branch (together with your commit for KYLIN-2800) has 
passed the integration test. I have merged it to master branch. Thanks kaisen!

> Build the dict for UHC column with MR
> -
>
> Key: KYLIN-2764
> URL: https://issues.apache.org/jira/browse/KYLIN-2764
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
> Fix For: v2.3.0
>
> Attachments: job-memory-after.png, job-memory-before.png
>
>
> KYLIN-2217 has built dict for  normal column with MR,  but the UHC column 
> still build dict in JobServer. Like KYLIN-2217, we also could use MR build 
> dict for UHC column. which could thoroughly release the memory pressure and  
> improve job concurrent for JobServer  as well as speed up multi UHC columns 
> procedure.
> The MR input is the output of  "Extract Fact Table Distinct Columns", the MR 
> output is the UHC column dict. Because it is very hard build global dict with 
> multi reducers, I use one reducer handle one UHC column and allocate enough 
> memory to the reducer. According to my test, 8G memory is enough.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-2764) Build the dict for UHC column with MR

2017-10-21 Thread kangkaisen (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16214197#comment-16214197
 ] 

kangkaisen commented on KYLIN-2764:
---

Thanks you very much, liyang and shaofeng.

Shaofeng, you should let me do the merge work,thanks you. 

I don't have further change, but there is a issue in 2764 branch:
After KYLIN-2800 
https://github.com/apache/kylin/commit/ac77008ee81d4dcc2956b1a2cfd6eaa7ae9fc5d9
There isn't the first point I had pointed in the comment:
{quote}
1. The FK column in fact table could be UHC column.
{quote}
So the latest commit in 2764 branch coube be simplify, This is the commit to 
apply KYLIN-2800:
https://github.com/apache/kylin/commit/48f3fb1953a413acfdd405539a7cfd211a5e85de.

> Build the dict for UHC column with MR
> -
>
> Key: KYLIN-2764
> URL: https://issues.apache.org/jira/browse/KYLIN-2764
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
> Fix For: v2.3.0
>
> Attachments: job-memory-after.png, job-memory-before.png
>
>
> KYLIN-2217 has built dict for  normal column with MR,  but the UHC column 
> still build dict in JobServer. Like KYLIN-2217, we also could use MR build 
> dict for UHC column. which could thoroughly release the memory pressure and  
> improve job concurrent for JobServer  as well as speed up multi UHC columns 
> procedure.
> The MR input is the output of  "Extract Fact Table Distinct Columns", the MR 
> output is the UHC column dict. Because it is very hard build global dict with 
> multi reducers, I use one reducer handle one UHC column and allocate enough 
> memory to the reducer. According to my test, 8G memory is enough.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-2764) Build the dict for UHC column with MR

2017-10-21 Thread Shaofeng SHI (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213934#comment-16213934
 ] 

Shaofeng SHI commented on KYLIN-2764:
-

Merged in branch '2764'. Will run full integration test on it.
Kaisen, your '2622-2764' is too old, please sync it with the master branch if 
you have further change on it. I spent lots effort to resolve the conflicts.

> Build the dict for UHC column with MR
> -
>
> Key: KYLIN-2764
> URL: https://issues.apache.org/jira/browse/KYLIN-2764
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
> Attachments: job-memory-after.png, job-memory-before.png
>
>
> KYLIN-2217 has built dict for  normal column with MR,  but the UHC column 
> still build dict in JobServer. Like KYLIN-2217, we also could use MR build 
> dict for UHC column. which could thoroughly release the memory pressure and  
> improve job concurrent for JobServer  as well as speed up multi UHC columns 
> procedure.
> The MR input is the output of  "Extract Fact Table Distinct Columns", the MR 
> output is the UHC column dict. Because it is very hard build global dict with 
> multi reducers, I use one reducer handle one UHC column and allocate enough 
> memory to the reducer. According to my test, 8G memory is enough.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-2764) Build the dict for UHC column with MR

2017-10-20 Thread liyang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213631#comment-16213631
 ] 

liyang commented on KYLIN-2764:
---

Understood that this is 2-level reducer pattern, to control the total records 
one reducer receives and improves performance. Let's move on this patch.

> Build the dict for UHC column with MR
> -
>
> Key: KYLIN-2764
> URL: https://issues.apache.org/jira/browse/KYLIN-2764
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
> Attachments: job-memory-after.png, job-memory-before.png
>
>
> KYLIN-2217 has built dict for  normal column with MR,  but the UHC column 
> still build dict in JobServer. Like KYLIN-2217, we also could use MR build 
> dict for UHC column. which could thoroughly release the memory pressure and  
> improve job concurrent for JobServer  as well as speed up multi UHC columns 
> procedure.
> The MR input is the output of  "Extract Fact Table Distinct Columns", the MR 
> output is the UHC column dict. Because it is very hard build global dict with 
> multi reducers, I use one reducer handle one UHC column and allocate enough 
> memory to the reducer. According to my test, 8G memory is enough.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-2764) Build the dict for UHC column with MR

2017-10-17 Thread Shaofeng SHI (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16207216#comment-16207216
 ] 

Shaofeng SHI commented on KYLIN-2764:
-

I believe the performance would be better to use multiple reducers in the 
FactDistinctColumns step, than only using 1 reducers. If use multiple reducers, 
then need another step to build the dictionary. 

> Build the dict for UHC column with MR
> -
>
> Key: KYLIN-2764
> URL: https://issues.apache.org/jira/browse/KYLIN-2764
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
> Attachments: job-memory-after.png, job-memory-before.png
>
>
> KYLIN-2217 has built dict for  normal column with MR,  but the UHC column 
> still build dict in JobServer. Like KYLIN-2217, we also could use MR build 
> dict for UHC column. which could thoroughly release the memory pressure and  
> improve job concurrent for JobServer  as well as speed up multi UHC columns 
> procedure.
> The MR input is the output of  "Extract Fact Table Distinct Columns", the MR 
> output is the UHC column dict. Because it is very hard build global dict with 
> multi reducers, I use one reducer handle one UHC column and allocate enough 
> memory to the reducer. According to my test, 8G memory is enough.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-2764) Build the dict for UHC column with MR

2017-10-07 Thread liyang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16195957#comment-16195957
 ] 

liyang commented on KYLIN-2764:
---

> If the UHC columns have Hundreds of billions of rows and we use one reducer 
> to handle it , the FactDistinctColumnsReducer will be very very slow.

Correct me if I'm wrong. I see UHCDictionaryJob is also building global dict in 
one reducer. It is not different than FactDistinctColumnsReducer. Then why 
adding a new job? We can just do it in FactDistinctColumnsReducer, the result 
is same.

> Build the dict for UHC column with MR
> -
>
> Key: KYLIN-2764
> URL: https://issues.apache.org/jira/browse/KYLIN-2764
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
> Attachments: job-memory-after.png, job-memory-before.png
>
>
> KYLIN-2217 has built dict for  normal column with MR,  but the UHC column 
> still build dict in JobServer. Like KYLIN-2217, we also could use MR build 
> dict for UHC column. which could thoroughly release the memory pressure and  
> improve job concurrent for JobServer  as well as speed up multi UHC columns 
> procedure.
> The MR input is the output of  "Extract Fact Table Distinct Columns", the MR 
> output is the UHC column dict. Because it is very hard build global dict with 
> multi reducers, I use one reducer handle one UHC column and allocate enough 
> memory to the reducer. According to my test, 8G memory is enough.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-2764) Build the dict for UHC column with MR

2017-09-23 Thread kangkaisen (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16178081#comment-16178081
 ] 

kangkaisen commented on KYLIN-2764:
---

If the UHC columns have Hundreds of billions of rows and we use one reducer to 
handle it , the {{FactDistinctColumnsReducer}} will be very very slow.  In 
other words,if we could use multiple reducers to build one global dict, we will 
needn't add a new UHCDictionaryJob, but it is very hard build global dict with 
multi reducers.

> Build the dict for UHC column with MR
> -
>
> Key: KYLIN-2764
> URL: https://issues.apache.org/jira/browse/KYLIN-2764
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
> Attachments: job-memory-after.png, job-memory-before.png
>
>
> KYLIN-2217 has built dict for  normal column with MR,  but the UHC column 
> still build dict in JobServer. Like KYLIN-2217, we also could use MR build 
> dict for UHC column. which could thoroughly release the memory pressure and  
> improve job concurrent for JobServer  as well as speed up multi UHC columns 
> procedure.
> The MR input is the output of  "Extract Fact Table Distinct Columns", the MR 
> output is the UHC column dict. Because it is very hard build global dict with 
> multi reducers, I use one reducer handle one UHC column and allocate enough 
> memory to the reducer. According to my test, 8G memory is enough.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-2764) Build the dict for UHC column with MR

2017-09-23 Thread liyang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16178077#comment-16178077
 ] 

liyang commented on KYLIN-2764:
---

For a UHC column, if all values must gather to one reducer to build Global 
dictionary (like the UHCDictionaryReducer.java below), then why not just assign 
one reducer per column in the  "Extract Fact Table Distinct Columns" job and 
build the dictionary there? That's my confusion. I don't see the reason to add 
a new UHCDictionaryJob.
https://github.com/apache/kylin/commit/ccd076861415a40926ab42e6f9b71cd5258d4daf

> Build the dict for UHC column with MR
> -
>
> Key: KYLIN-2764
> URL: https://issues.apache.org/jira/browse/KYLIN-2764
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
> Attachments: job-memory-after.png, job-memory-before.png
>
>
> KYLIN-2217 has built dict for  normal column with MR,  but the UHC column 
> still build dict in JobServer. Like KYLIN-2217, we also could use MR build 
> dict for UHC column. which could thoroughly release the memory pressure and  
> improve job concurrent for JobServer  as well as speed up multi UHC columns 
> procedure.
> The MR input is the output of  "Extract Fact Table Distinct Columns", the MR 
> output is the UHC column dict. Because it is very hard build global dict with 
> multi reducers, I use one reducer handle one UHC column and allocate enough 
> memory to the reducer. According to my test, 8G memory is enough.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-2764) Build the dict for UHC column with MR

2017-09-23 Thread kangkaisen (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16177724#comment-16177724
 ] 

kangkaisen commented on KYLIN-2764:
---

liyang, Thanks very much for your review.

In KYLIN-2135, we use multiple reducers to speed up  "Extract Fact Table 
Distinct Columns"  for UHC column.  This is the reason why I couldn't build 
global dict in {{FactDistinctColumnsReducer}}.

In addition to this point, do you have any other suggestions? 

> Build the dict for UHC column with MR
> -
>
> Key: KYLIN-2764
> URL: https://issues.apache.org/jira/browse/KYLIN-2764
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
> Attachments: job-memory-after.png, job-memory-before.png
>
>
> KYLIN-2217 has built dict for  normal column with MR,  but the UHC column 
> still build dict in JobServer. Like KYLIN-2217, we also could use MR build 
> dict for UHC column. which could thoroughly release the memory pressure and  
> improve job concurrent for JobServer  as well as speed up multi UHC columns 
> procedure.
> The MR input is the output of  "Extract Fact Table Distinct Columns", the MR 
> output is the UHC column dict. Because it is very hard build global dict with 
> multi reducers, I use one reducer handle one UHC column and allocate enough 
> memory to the reducer. According to my test, 8G memory is enough.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-2764) Build the dict for UHC column with MR

2017-09-23 Thread liyang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16177707#comment-16177707
 ] 

liyang commented on KYLIN-2764:
---

[~kangkaisen], KYLIN-2622 has been merged.

However KYLIN-2764 we need to discuss. I don't see why an additional 
{{UHCDictionaryJob}} is required. Could the UHC global dictionary be built in 
{{FactDistinctColumnsReducer}} just like normal dictionaries?

Also the KYLIN-2764 patch no longer applies to master (sorry that because of my 
late review). It needs a rebase.

> Build the dict for UHC column with MR
> -
>
> Key: KYLIN-2764
> URL: https://issues.apache.org/jira/browse/KYLIN-2764
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
> Attachments: job-memory-after.png, job-memory-before.png
>
>
> KYLIN-2217 has built dict for  normal column with MR,  but the UHC column 
> still build dict in JobServer. Like KYLIN-2217, we also could use MR build 
> dict for UHC column. which could thoroughly release the memory pressure and  
> improve job concurrent for JobServer  as well as speed up multi UHC columns 
> procedure.
> The MR input is the output of  "Extract Fact Table Distinct Columns", the MR 
> output is the UHC column dict. Because it is very hard build global dict with 
> multi reducers, I use one reducer handle one UHC column and allocate enough 
> memory to the reducer. According to my test, 8G memory is enough.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-2764) Build the dict for UHC column with MR

2017-09-10 Thread liyang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16160264#comment-16160264
 ] 

liyang commented on KYLIN-2764:
---

OK, let me find time to review.

> Build the dict for UHC column with MR
> -
>
> Key: KYLIN-2764
> URL: https://issues.apache.org/jira/browse/KYLIN-2764
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
> Attachments: job-memory-after.png, job-memory-before.png
>
>
> KYLIN-2217 has built dict for  normal column with MR,  but the UHC column 
> still build dict in JobServer. Like KYLIN-2217, we also could use MR build 
> dict for UHC column. which could thoroughly release the memory pressure and  
> improve job concurrent for JobServer  as well as speed up multi UHC columns 
> procedure.
> The MR input is the output of  "Extract Fact Table Distinct Columns", the MR 
> output is the UHC column dict. Because it is very hard build global dict with 
> multi reducers, I use one reducer handle one UHC column and allocate enough 
> memory to the reducer. According to my test, 8G memory is enough.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-2764) Build the dict for UHC column with MR

2017-09-04 Thread kangkaisen (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16152630#comment-16152630
 ] 

kangkaisen commented on KYLIN-2764:
---

I have rebased KYLIN-2622 and KYLIN-2764 on master branch. KYLIN-2622 and 
KYLIN-2764 are both about global dict, So I put those two commit on one branch 
2622-2764 and run IT together.

> Build the dict for UHC column with MR
> -
>
> Key: KYLIN-2764
> URL: https://issues.apache.org/jira/browse/KYLIN-2764
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
> Attachments: job-memory-after.png, job-memory-before.png
>
>
> KYLIN-2217 has built dict for  normal column with MR,  but the UHC column 
> still build dict in JobServer. Like KYLIN-2217, we also could use MR build 
> dict for UHC column. which could thoroughly release the memory pressure and  
> improve job concurrent for JobServer  as well as speed up multi UHC columns 
> procedure.
> The MR input is the output of  "Extract Fact Table Distinct Columns", the MR 
> output is the UHC column dict. Because it is very hard build global dict with 
> multi reducers, I use one reducer handle one UHC column and allocate enough 
> memory to the reducer. According to my test, 8G memory is enough.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-2764) Build the dict for UHC column with MR

2017-09-03 Thread liyang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151786#comment-16151786
 ] 

liyang commented on KYLIN-2764:
---

Sounds great. [~kangkaisen], could you rebase the commit on master? It cannot 
apply at the moment as it's from a private branch.

> Build the dict for UHC column with MR
> -
>
> Key: KYLIN-2764
> URL: https://issues.apache.org/jira/browse/KYLIN-2764
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
> Attachments: job-memory-after.png, job-memory-before.png
>
>
> KYLIN-2217 has built dict for  normal column with MR,  but the UHC column 
> still build dict in JobServer. Like KYLIN-2217, we also could use MR build 
> dict for UHC column. which could thoroughly release the memory pressure and  
> improve job concurrent for JobServer  as well as speed up multi UHC columns 
> procedure.
> The MR input is the output of  "Extract Fact Table Distinct Columns", the MR 
> output is the UHC column dict. Because it is very hard build global dict with 
> multi reducers, I use one reducer handle one UHC column and allocate enough 
> memory to the reducer. According to my test, 8G memory is enough.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-2764) Build the dict for UHC column with MR

2017-09-02 Thread Billy Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151505#comment-16151505
 ] 

Billy Liu commented on KYLIN-2764:
--

+1. The improvement is quite impressive. 

> Build the dict for UHC column with MR
> -
>
> Key: KYLIN-2764
> URL: https://issues.apache.org/jira/browse/KYLIN-2764
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
> Attachments: job-memory-after.png, job-memory-before.png
>
>
> KYLIN-2217 has built dict for  normal column with MR,  but the UHC column 
> still build dict in JobServer. Like KYLIN-2217, we also could use MR build 
> dict for UHC column. which could thoroughly release the memory pressure and  
> improve job concurrent for JobServer  as well as speed up multi UHC columns 
> procedure.
> The MR input is the output of  "Extract Fact Table Distinct Columns", the MR 
> output is the UHC column dict. Because it is very hard build global dict with 
> multi reducers, I use one reducer handle one UHC column and allocate enough 
> memory to the reducer. According to my test, 8G memory is enough.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-2764) Build the dict for UHC column with MR

2017-09-02 Thread kangkaisen (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151478#comment-16151478
 ] 

kangkaisen commented on KYLIN-2764:
---

This is the commit: 
https://github.com/apache/kylin/commit/2607e18b5e17d2a68f4079a76b8c990f144cbbd6.

The core idea is easy, but there are four special points we should note:

1. The FK column in fact table could be UHC column.
2. we could not get correct HDFS working dir from KylinConfig in MR.
3. The one or all UHC columns maybe NULL.
4. There maybe timeout in setup phase of Reducer because of global dict copy 
and lock.

> Build the dict for UHC column with MR
> -
>
> Key: KYLIN-2764
> URL: https://issues.apache.org/jira/browse/KYLIN-2764
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
>
> KYLIN-2217 has built dict for  normal column with MR,  but the UHC column 
> still build dict in JobServer. Like KYLIN-2217, we also could use MR build 
> dict for UHC column. which could thoroughly release the memory pressure and  
> improve job concurrent for JobServer  as well as speed up multi UHC columns 
> procedure.
> The MR input is the output of  "Extract Fact Table Distinct Columns", the MR 
> output is the UHC column dict. Because it is very hard build global dict with 
> multi reducers, I use one reducer handle one UHC column and allocate enough 
> memory to the reducer. According to my test, 8G memory is enough.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-2764) Build the dict for UHC column with MR

2017-08-03 Thread liyang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112386#comment-16112386
 ] 

liyang commented on KYLIN-2764:
---

Could simplify FactDistinctColumns a bit along the way.  :-)

> Build the dict for UHC column with MR
> -
>
> Key: KYLIN-2764
> URL: https://issues.apache.org/jira/browse/KYLIN-2764
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
>
> KYLIN-2217 has built dict for  normal column with MR,  but the UHC column 
> still build dict in JobServer. Like KYLIN-2217, we also could use MR build 
> dict for UHC column. which could thoroughly release the memory pressure and  
> improve job concurrent for JobServer  as well as speed up multi UHC columns 
> procedure.
> The MR input is the output of  "Extract Fact Table Distinct Columns", the MR 
> output is the UHC column dict. Because it is very hard build global dict with 
> multi reducers, I use one reducer handle one UHC column and allocate enough 
> memory to the reducer. According to my test, 8G memory is enough.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)