Jackie-Jiang commented on code in PR #18364:
URL: https://github.com/apache/pinot/pull/18364#discussion_r3177724574
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/BaseSegmentCreator.java:
##########
@@ -531,9 +545,16 @@ protected void writeMetadata()
String column = entry.getKey();
ColumnStatistics columnStatistics = entry.getValue();
SegmentDictionaryCreator dictionaryCreator =
_colIndexes.get(column).getDictionaryCreator();
- int dictionaryElementSize = (dictionaryCreator != null) ?
dictionaryCreator.getNumBytesPerEntry() : 0;
+ boolean hasDictionary = dictionaryCreator != null;
+ int dictionaryElementSize = hasDictionary ?
dictionaryCreator.getNumBytesPerEntry() : 0;
+ ForwardIndexConfig fwdConfig =
Review Comment:
(MAJOR) This config is from the original config, not the adapted one (line
177). We need to pass the adapted one. You may pass the adapted
`FieldIndexConfigs` through `ColumnIndexCreators`
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/forward/ForwardIndexCreatorFactory.java:
##########
@@ -52,25 +52,8 @@ public static ForwardIndexCreator
createIndexCreator(IndexCreationContext contex
String columnName = fieldSpec.getName();
int numTotalDocs = context.getTotalDocs();
- if (context.hasDictionary()) {
- // Dictionary enabled columns
- int cardinality = context.getCardinality();
- if (fieldSpec.isSingleValueField()) {
- if (context.isSorted()) {
- return new SingleValueSortedForwardIndexCreator(indexDir,
columnName, cardinality);
- } else {
- return new SingleValueUnsortedForwardIndexCreator(indexDir,
columnName, cardinality, numTotalDocs);
- }
- } else {
- if (indexConfig.getDictIdCompressionType() ==
DictIdCompressionType.MV_ENTRY_DICT) {
- return new MultiValueEntryDictForwardIndexCreator(indexDir,
columnName, cardinality, numTotalDocs);
- } else {
- return new MultiValueUnsortedForwardIndexCreator(indexDir,
columnName, cardinality, numTotalDocs,
- context.getTotalNumberOfEntries());
- }
- }
- } else {
- // Dictionary disabled columns
+ if (indexConfig.getEncodingType() == FieldConfig.EncodingType.RAW) {
Review Comment:
(minor) Keep the order as the original code by checking
`FieldConfig.EncodingType.DICTIONARY` here
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/forward/ForwardIndexType.java:
##########
@@ -176,21 +220,40 @@ protected ColumnConfigDeserializer<ForwardIndexConfig>
createDeserializerForLega
Maps.newHashMapWithExpectedSize(Math.max(dictConfigs.size(),
schema.size()));
Collection<FieldConfig> fieldConfigs = tableConfig.getFieldConfigList();
+ Set<String> columnsWithExplicitForwardIndex = new HashSet<>();
if (fieldConfigs != null) {
for (FieldConfig fieldConfig : fieldConfigs) {
+ if (fieldConfig.getIndexes().has(INDEX_DISPLAY_NAME)) {
Review Comment:
This is incorrect. We want to detect conflicting config
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/forward/ForwardIndexType.java:
##########
@@ -166,6 +179,37 @@ public String getPrettyName() {
return INDEX_DISPLAY_NAME;
}
+ @Override
+ protected ColumnConfigDeserializer<ForwardIndexConfig> createDeserializer() {
+ // Override the default `fromIndexes` deserializer so that legacy
`indexes.forward` JSON blocks that lack
Review Comment:
(MAJOR) This part of the code has become very messy, with a lot of cross
reference between `createDeserializer()` and
`createDeserializerForLegacyConfigs()`, and I cannot really tell if the logic
is correct.
We don't need to follow the way of
`fromIndexes.withExclusiveAlternative(createDeserializerForLegacyConfigs())`.
For forward index, it is much more clear if we directly implement
`createDeserializer`. We need to check the following properties:
- IndexingConfig.getNoDictionaryColumns()
- IndexingConfig.getNoDictionaryConfig()
- FieldConfig.getEncodingType()
- FieldConfig.getCompressionCodec()
- FieldConfig.getIndexes()
Within the deserializer, we want to detect conflicts for:
- No dictionary column, but DICT encoding
- Different encoding in FieldConfig vs indexes JSON
- Different compression in FieldConfig vs indexes
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/forward/ForwardIndexType.java:
##########
@@ -166,6 +179,37 @@ public String getPrettyName() {
return INDEX_DISPLAY_NAME;
}
+ @Override
+ protected ColumnConfigDeserializer<ForwardIndexConfig> createDeserializer() {
+ // Override the default `fromIndexes` deserializer so that legacy
`indexes.forward` JSON blocks that lack
Review Comment:
Please add a test for these different conflicting scenarios
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/forward/ForwardIndexType.java:
##########
@@ -166,6 +179,37 @@ public String getPrettyName() {
return INDEX_DISPLAY_NAME;
}
+ @Override
+ protected ColumnConfigDeserializer<ForwardIndexConfig> createDeserializer() {
+ // Override the default `fromIndexes` deserializer so that legacy
`indexes.forward` JSON blocks that lack
+ // `encodingType` inherit it from `FieldConfig.encodingType` instead of
silently defaulting to DICTIONARY.
+ ColumnConfigDeserializer<ForwardIndexConfig> fromIndexes = (tableConfig,
schema) -> {
+ Map<String, ForwardIndexConfig> result = new HashMap<>();
+ Collection<FieldConfig> fieldConfigs = tableConfig.getFieldConfigList();
+ if (fieldConfigs != null) {
+ for (FieldConfig fieldConfig : fieldConfigs) {
+ JsonNode jsonNode = fieldConfig.getIndexes().get(INDEX_DISPLAY_NAME);
+ if (jsonNode == null) {
+ continue;
+ }
+ Preconditions.checkState(jsonNode.isObject(), "Invalid forward index
config for column: %s",
+ fieldConfig.getName());
+ ObjectNode configNode = ((ObjectNode) jsonNode).deepCopy();
Review Comment:
(nit) redundant cast
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/BaseSegmentCreator.java:
##########
@@ -307,14 +309,26 @@ private boolean isNullable(FieldSpec fieldSpec) {
* Adapts field index configs based on column properties.
*/
private FieldIndexConfigs adaptConfig(String columnName, FieldIndexConfigs
config,
- ColumnStatistics columnStatistics, SegmentGeneratorConfig
segmentCreationSpec) {
+ ColumnStatistics columnStatistics, SegmentGeneratorConfig
segmentCreationSpec, boolean dictEnabledColumn) {
FieldIndexConfigs.Builder builder = new FieldIndexConfigs.Builder(config);
// Sorted columns treat the 'forwardIndexDisabled' flag as a no-op
ForwardIndexConfig fwdConfig = config.getConfig(StandardIndexes.forward());
if (!fwdConfig.isEnabled() && columnStatistics.isSorted()) {
builder.add(StandardIndexes.forward(),
new
ForwardIndexConfig.Builder(fwdConfig).withLegacyProperties(segmentCreationSpec.getColumnProperties(),
columnName).build());
+ } else if (fwdConfig.isEnabled() && !dictEnabledColumn
+ && fwdConfig.getEncodingType() == FieldConfig.EncodingType.DICTIONARY)
{
+ // Segment-time dictionary optimizers (`optimizeDictionary` /
`optimizeDictionaryForMetrics`) can flip
+ // dictEnabledColumn=false for a column whose configured encoding is
DICTIONARY. Without a dictionary the
+ // forward index must be RAW, so reconcile here before the creator
factory branches on encoding.
+ builder.add(StandardIndexes.forward(), new
ForwardIndexConfig.Builder(FieldConfig.EncodingType.RAW)
Review Comment:
Builder can directly take the config, which simplifies the code
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]