[ https://issues.apache.org/jira/browse/KYLIN-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16805072#comment-16805072 ]
Shaofeng SHI commented on KYLIN-3920: ------------------------------------- hi yuzhang, you're correct; it won't need to merge the duplicated dict; but it should be very fast, won't add much overhead. Did you observe a remarkable performance downgrade in this caes? > Don't merge same dictionaries when merge dictionary > --------------------------------------------------- > > Key: KYLIN-3920 > URL: https://issues.apache.org/jira/browse/KYLIN-3920 > Project: Kylin > Issue Type: Improvement > Components: Others > Affects Versions: v2.5.2 > Reporter: Yuzhang QIU > Priority: Minor > > Hi team: > I found DictionaryManager will pass some dictionaries to > DictionaryGenerator to merge them when there is different one among them. But > If there are 3 dictionaries {Dic1, Dic1, Dic2} in 3 segments, kylin may don't > need to merge Dic1 and Dic1, which won't add same value into new dictionary > twice. > If I misunderstand the merge job logic, please feel free to correct me! > Here is the code snapshot at DictionaryManager.java:251 > ``` > boolean identicalSourceDicts = true; > for (int i = 1; i < dicts.size(); ++i) { > if > (!dicts.get(0).getDictionaryObject().equals(dicts.get(i).getDictionaryObject())) > { > identicalSourceDicts = false; > break; > } > } > if (identicalSourceDicts) { > logger.info("Use one of the merging dictionaries directly"); > return dicts.get(0); > } else { > Dictionary<String> newDict = > DictionaryGenerator.mergeDictionaries(DataType.getType(newDictInfo.getDataType()), > dicts); > return trySaveNewDict(newDict, newDictInfo); > } > ``` > -- This message was sent by Atlassian JIRA (v7.6.3#76005)