[ https://issues.apache.org/jira/browse/KYLIN-3423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
liyang updated KYLIN-3423: -------------------------- Description: Currently FactDistinctColumnsMapper writes every cell to mapper output. In spite of mapper side Combiner, we could do better de-dup using available mapper memory. The situation becomes worse after KYLIN-3370, because not only dictionary columns, now it is every dimension column get written as mapper output. Suggest * For non-dictionary dimension column, only write min/max value to mapper output. was: Currently FactDistinctColumnsMapper writes every cell to mapper output. In spite of mapper side Combiner, we could do better de-dup using available mapper memory. The situation becomes worse after KYLIN-3370, because not only dictionary columns, now it is every dimension column get written as mapper output. Suggest * Use available mapper memory to de-dup before write to mapper output. * For non-dictionary dimension column, only write min/max value to mapper output. > Performance improvement in FactDistinctColumnsMapper > ---------------------------------------------------- > > Key: KYLIN-3423 > URL: https://issues.apache.org/jira/browse/KYLIN-3423 > Project: Kylin > Issue Type: Improvement > Reporter: liyang > Assignee: Shaoxiong Zhan > Priority: Major > > Currently FactDistinctColumnsMapper writes every cell to mapper output. In > spite of mapper side Combiner, we could do better de-dup using available > mapper memory. > The situation becomes worse after KYLIN-3370, because not only dictionary > columns, now it is every dimension column get written as mapper output. > Suggest > * For non-dictionary dimension column, only write min/max value to mapper > output. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)