KKcorps commented on code in PR #8398:
URL: https://github.com/apache/pinot/pull/8398#discussion_r842521362
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -260,6 +260,26 @@ private boolean
createDictionaryForColumn(ColumnIndexCreationInfo info, SegmentG
.containsKey(column)) {
return false;
}
+
+ // Do not create dictionary if size with dictionary is going to be larger
than size without dictionary
+ // This is done to reduce the cost of dictionary for high cardinality
columns
+ // Off by default and needs optimizeDictionaryEnabled to be set to true
+ if (config.isOptimizeDictionaryEnabled() && spec.getFieldType() ==
FieldType.METRIC
+ && spec.isSingleValueField() && spec.getDataType().isFixedWidth()) {
+ long dictionarySize = info.getDistinctValueCount() *
spec.getDataType().size();
+ long forwardIndexSize =
+ ((long) info.getTotalNumberOfEntries() *
PinotDataBitSet.getNumBitsPerValue(info.getDistinctValueCount() - 1)
+ + Byte.SIZE - 1) / Byte.SIZE;
+
+ double indexWithDictSize = dictionarySize + forwardIndexSize;
+ double indexWithoutDictSize = info.getTotalNumberOfEntries() *
spec.getDataType().size();
+
+ double storageSaved = (indexWithDictSize - indexWithoutDictSize) /
indexWithDictSize;
Review Comment:
I think it is the opposite. We want to compute the storage saved by not
creating the dictionary (hence returning 'false'). Creation of dictionary is
the default behaviour in this function.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]