Jackie-Jiang commented on code in PR #18364:
URL: https://github.com/apache/pinot/pull/18364#discussion_r3177724574


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/BaseSegmentCreator.java:
##########
@@ -531,9 +545,16 @@ protected void writeMetadata()
       String column = entry.getKey();
       ColumnStatistics columnStatistics = entry.getValue();
       SegmentDictionaryCreator dictionaryCreator = 
_colIndexes.get(column).getDictionaryCreator();
-      int dictionaryElementSize = (dictionaryCreator != null) ? 
dictionaryCreator.getNumBytesPerEntry() : 0;
+      boolean hasDictionary = dictionaryCreator != null;
+      int dictionaryElementSize = hasDictionary ? 
dictionaryCreator.getNumBytesPerEntry() : 0;
+      ForwardIndexConfig fwdConfig =

Review Comment:
   (MAJOR) This config is from the original config, not the adapted one (line 
177). We need to pass the adapted one. You may pass the adapted 
`FieldIndexConfigs` through `ColumnIndexCreators`



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/forward/ForwardIndexCreatorFactory.java:
##########
@@ -52,25 +52,8 @@ public static ForwardIndexCreator 
createIndexCreator(IndexCreationContext contex
     String columnName = fieldSpec.getName();
     int numTotalDocs = context.getTotalDocs();
 
-    if (context.hasDictionary()) {
-      // Dictionary enabled columns
-      int cardinality = context.getCardinality();
-      if (fieldSpec.isSingleValueField()) {
-        if (context.isSorted()) {
-          return new SingleValueSortedForwardIndexCreator(indexDir, 
columnName, cardinality);
-        } else {
-          return new SingleValueUnsortedForwardIndexCreator(indexDir, 
columnName, cardinality, numTotalDocs);
-        }
-      } else {
-        if (indexConfig.getDictIdCompressionType() == 
DictIdCompressionType.MV_ENTRY_DICT) {
-          return new MultiValueEntryDictForwardIndexCreator(indexDir, 
columnName, cardinality, numTotalDocs);
-        } else {
-          return new MultiValueUnsortedForwardIndexCreator(indexDir, 
columnName, cardinality, numTotalDocs,
-              context.getTotalNumberOfEntries());
-        }
-      }
-    } else {
-      // Dictionary disabled columns
+    if (indexConfig.getEncodingType() == FieldConfig.EncodingType.RAW) {

Review Comment:
   (minor) Keep the order as the original code by checking 
`FieldConfig.EncodingType.DICTIONARY` here



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/forward/ForwardIndexType.java:
##########
@@ -176,21 +220,40 @@ protected ColumnConfigDeserializer<ForwardIndexConfig> 
createDeserializerForLega
           Maps.newHashMapWithExpectedSize(Math.max(dictConfigs.size(), 
schema.size()));
 
       Collection<FieldConfig> fieldConfigs = tableConfig.getFieldConfigList();
+      Set<String> columnsWithExplicitForwardIndex = new HashSet<>();
       if (fieldConfigs != null) {
         for (FieldConfig fieldConfig : fieldConfigs) {
+          if (fieldConfig.getIndexes().has(INDEX_DISPLAY_NAME)) {

Review Comment:
   This is incorrect. We want to detect conflicting config



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/forward/ForwardIndexType.java:
##########
@@ -166,6 +179,37 @@ public String getPrettyName() {
     return INDEX_DISPLAY_NAME;
   }
 
+  @Override
+  protected ColumnConfigDeserializer<ForwardIndexConfig> createDeserializer() {
+    // Override the default `fromIndexes` deserializer so that legacy 
`indexes.forward` JSON blocks that lack

Review Comment:
   (MAJOR) This part of the code has become very messy, with a lot of cross 
reference between `createDeserializer()` and 
`createDeserializerForLegacyConfigs()`, and I cannot really tell if the logic 
is correct.
   We don't need to follow the way of 
`fromIndexes.withExclusiveAlternative(createDeserializerForLegacyConfigs())`. 
For forward index, it is much more clear if we directly implement 
`createDeserializer`. We need to check the following properties:
   - IndexingConfig.getNoDictionaryColumns()
   - IndexingConfig.getNoDictionaryConfig()
   - FieldConfig.getEncodingType()
   - FieldConfig.getCompressionCodec()
   - FieldConfig.getIndexes()
   
   Within the deserializer, we want to detect conflicts for:
   - No dictionary column, but DICT encoding
   - Different encoding in FieldConfig vs indexes JSON
   - Different compression in FieldConfig vs indexes



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/forward/ForwardIndexType.java:
##########
@@ -166,6 +179,37 @@ public String getPrettyName() {
     return INDEX_DISPLAY_NAME;
   }
 
+  @Override
+  protected ColumnConfigDeserializer<ForwardIndexConfig> createDeserializer() {
+    // Override the default `fromIndexes` deserializer so that legacy 
`indexes.forward` JSON blocks that lack

Review Comment:
   Please add a test for these different conflicting scenarios



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/forward/ForwardIndexType.java:
##########
@@ -166,6 +179,37 @@ public String getPrettyName() {
     return INDEX_DISPLAY_NAME;
   }
 
+  @Override
+  protected ColumnConfigDeserializer<ForwardIndexConfig> createDeserializer() {
+    // Override the default `fromIndexes` deserializer so that legacy 
`indexes.forward` JSON blocks that lack
+    // `encodingType` inherit it from `FieldConfig.encodingType` instead of 
silently defaulting to DICTIONARY.
+    ColumnConfigDeserializer<ForwardIndexConfig> fromIndexes = (tableConfig, 
schema) -> {
+      Map<String, ForwardIndexConfig> result = new HashMap<>();
+      Collection<FieldConfig> fieldConfigs = tableConfig.getFieldConfigList();
+      if (fieldConfigs != null) {
+        for (FieldConfig fieldConfig : fieldConfigs) {
+          JsonNode jsonNode = fieldConfig.getIndexes().get(INDEX_DISPLAY_NAME);
+          if (jsonNode == null) {
+            continue;
+          }
+          Preconditions.checkState(jsonNode.isObject(), "Invalid forward index 
config for column: %s",
+              fieldConfig.getName());
+          ObjectNode configNode = ((ObjectNode) jsonNode).deepCopy();

Review Comment:
   (nit) redundant cast



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/BaseSegmentCreator.java:
##########
@@ -307,14 +309,26 @@ private boolean isNullable(FieldSpec fieldSpec) {
    * Adapts field index configs based on column properties.
    */
   private FieldIndexConfigs adaptConfig(String columnName, FieldIndexConfigs 
config,
-      ColumnStatistics columnStatistics, SegmentGeneratorConfig 
segmentCreationSpec) {
+      ColumnStatistics columnStatistics, SegmentGeneratorConfig 
segmentCreationSpec, boolean dictEnabledColumn) {
     FieldIndexConfigs.Builder builder = new FieldIndexConfigs.Builder(config);
     // Sorted columns treat the 'forwardIndexDisabled' flag as a no-op
     ForwardIndexConfig fwdConfig = config.getConfig(StandardIndexes.forward());
     if (!fwdConfig.isEnabled() && columnStatistics.isSorted()) {
       builder.add(StandardIndexes.forward(),
           new 
ForwardIndexConfig.Builder(fwdConfig).withLegacyProperties(segmentCreationSpec.getColumnProperties(),
               columnName).build());
+    } else if (fwdConfig.isEnabled() && !dictEnabledColumn
+        && fwdConfig.getEncodingType() == FieldConfig.EncodingType.DICTIONARY) 
{
+      // Segment-time dictionary optimizers (`optimizeDictionary` / 
`optimizeDictionaryForMetrics`) can flip
+      // dictEnabledColumn=false for a column whose configured encoding is 
DICTIONARY. Without a dictionary the
+      // forward index must be RAW, so reconcile here before the creator 
factory branches on encoding.
+      builder.add(StandardIndexes.forward(), new 
ForwardIndexConfig.Builder(FieldConfig.EncodingType.RAW)

Review Comment:
   Builder can directly take the config, which simplifies the code



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to