[ 
https://issues.apache.org/jira/browse/KYLIN-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15740954#comment-15740954
 ] 

kangkaisen commented on KYLIN-2269:
-----------------------------------

To resolve the issue, we could use {{CLUSTER BY}} to make the the mapper input 
of {{Build Base Cuboid Data}} is sequential.  since the input is sequential, we 
could only use default memory size for mapper to load the global dict slice in 
turn. 

Of course, this method could only handle one ultra high cardinality column 
well. but which is most scenarios.



> Reduce MR memory usage for global dict
> --------------------------------------
>
>                 Key: KYLIN-2269
>                 URL: https://issues.apache.org/jira/browse/KYLIN-2269
>             Project: Kylin
>          Issue Type: Improvement
>    Affects Versions: v1.6.0
>            Reporter: kangkaisen
>            Assignee: kangkaisen
>
> currently, in {{Build Base Cuboid Data}}, if user use the global dict and the 
> global dict size significantly larger the mapper memory size, the 
> {{CachedTreeMap}} will load all values as much as possible and the soft 
> references object will stick around for a while when GC, So which will make 
> the {{Build Base Cuboid Data}}  mapper pause for a long time even could not  
> finish.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to