Jackie-Jiang commented on code in PR #18496:
URL: https://github.com/apache/pinot/pull/18496#discussion_r3243273664


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/ForwardIndexHandler.java:
##########
@@ -144,6 +142,11 @@ public void updateIndices(SegmentDirectory.Writer 
segmentWriter)
       String column = entry.getKey();
       List<Operation> operations = entry.getValue();
 
+      // Backfill missing 1.6.0-era stats for this column before dispatching 
ops so the per-op handlers can read
+      // them off `ColumnMetadata` when they construct their forward-index 
creator. No-op when the column is
+      // fixed-width, already has the stats, or has no forward index on disk.
+      backfillMissingStats(column, segmentWriter);

Review Comment:
   Good catch. Added a `needsBackfill` gate in `updateIndices`: the pre-pass 
only runs when the column has an op that actually constructs a new 
forward-index creator reading length stats — `CHANGE_INDEX_COMPRESSION_TYPE`, 
`DISABLE_DICTIONARY` on a dict-encoded forward index that's staying on, and 
`ENABLE_RAW_FORWARD_INDEX` when the forward index already exists and is 
dict-encoded. `DISABLE_FORWARD_INDEX` (alone or with `DISABLE_DICTIONARY`), 
`ENABLE_DICT_FORWARD_INDEX` / `ENABLE_RAW_FORWARD_INDEX` rebuilds (which go 
through the inv→fwd creator with its own inline backfill), and 
`ENABLE_DICTIONARY` (which has its own stats collector) all skip the pre-pass 
now.



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/InvertedIndexAndDictionaryBasedForwardIndexCreator.java:
##########
@@ -116,32 +118,32 @@ public class 
InvertedIndexAndDictionaryBasedForwardIndexCreator implements AutoC
   private PinotDataBuffer _forwardIndexMaxSizeBuffer;
 
   public InvertedIndexAndDictionaryBasedForwardIndexCreator(SegmentDirectory 
segmentDirectory,
-      SegmentDirectory.Writer segmentWriter, TableConfig tableConfig, String 
columnName, boolean dictionaryPresent,
-      boolean dictionaryBasedForwardIndex, ForwardIndexConfig 
forwardIndexConfig, boolean isTemporaryForwardIndex)
+      SegmentDirectory.Writer segmentWriter, TableConfig tableConfig, String 
columnName,
+      FieldIndexConfigs fieldIndexConfigs, boolean isTemporaryForwardIndex)
       throws IOException {
     _segmentDirectory = segmentDirectory;
     _segmentWriter = segmentWriter;
     _tableConfig = tableConfig;
     _columnName = columnName;
-    _dictionaryPresent = dictionaryPresent;
-    _forwardIndexConfig = forwardIndexConfig;
+    _keepDictionary = 
fieldIndexConfigs.getConfig(StandardIndexes.dictionary()).isEnabled();
+    _forwardIndexConfig = 
fieldIndexConfigs.getConfig(StandardIndexes.forward());
     _isTemporaryForwardIndex = isTemporaryForwardIndex;
 
     _segmentMetadata = segmentDirectory.getSegmentMetadata();
     _columnMetadata = _segmentMetadata.getColumnMetadataFor(columnName);
     _singleValue = _columnMetadata.isSingleValue();
-    _cardinality = _columnMetadata.getCardinality();
     _numDocs = _columnMetadata.getTotalDocs();
+    _cardinality = _columnMetadata.getCardinality();
+    assert _numDocs > 0 && _cardinality > 0;
     _totalNumberOfEntries = _columnMetadata.getTotalNumberOfEntries();
     _maxNumberOfMultiValues = _columnMetadata.getMaxNumberOfMultiValues();
-    int numValues = _singleValue ? _numDocs : _totalNumberOfEntries;
-    _useMMapBuffer = numValues > NUM_VALUES_THRESHOLD_FOR_MMAP_BUFFER;
+    _useMMapBuffer = _totalNumberOfEntries > 
NUM_VALUES_THRESHOLD_FOR_MMAP_BUFFER;

Review Comment:
   No code change — the reported failure can't happen. 
`ColumnMetadataImpl.Builder.build()` at lines 634-635 unconditionally sets 
`_totalNumberOfEntries = _totalDocs` for SV columns, regardless of whether the 
`totalNumberOfEntries` key was present in `metadata.properties`:
   
   ```java
   if (_fieldSpec.isSingleValueField()) {
     _totalNumberOfEntries = _totalDocs;
     ...
   }
   ```
   
   So for any SV column (legacy or modern), 
`_columnMetadata.getTotalNumberOfEntries()` returns `_totalDocs > 0` (the 
empty-segment case is gated out by `BaseIndexHandler`'s 
`Preconditions.checkState`). For MV columns, `TOTAL_NUMBER_OF_ENTRIES` is 
mandatory writer-side and is always persisted. The buffer sizing is safe.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to