[ https://issues.apache.org/jira/browse/KYLIN-980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shaofeng SHI updated KYLIN-980: ------------------------------- Fix Version/s: (was: 2.0) v2.1 Just noticed the change wasn't included in v2.0. Set the fixVerson to v2.1 > FactDistinctColumnsJob to support high cardinality columns > ---------------------------------------------------------- > > Key: KYLIN-980 > URL: https://issues.apache.org/jira/browse/KYLIN-980 > Project: Kylin > Issue Type: Improvement > Components: Job Engine > Affects Versions: v0.7.2 > Reporter: Shaofeng SHI > Assignee: Shaofeng SHI > Labels: newbie > Fix For: v2.1 > > > In FactDistinctColumnsJob's combiner and reducer, it uses a HashSet to remove > the duplicated values; But if a column's cardinality is very big, say > 10 > Million, it may reports OutOfMemory error; > It should be enhanced to support such case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)