Copilot commented on code in PR #18751:
URL: https://github.com/apache/pinot/pull/18751#discussion_r3417370742
##########
pinot-segment-local/src/test/java/org/apache/pinot/segment/local/utils/TableConfigUtilsTest.java:
##########
@@ -1437,7 +1437,6 @@ public void testValidateFieldConfig() {
tableConfig = new
TableConfigBuilder(TableType.OFFLINE).setTableName(TABLE_NAME)
.setNoDictionaryColumns(List.of("myCol2"))
.setInvertedIndexColumns(List.of("myCol1"))
- .setSortedColumn("myCol1")
.build();
try {
// Enable forward index disabled flag for a column with inverted index
and is sorted
Review Comment:
The comment says the column "is sorted", but this TableConfig no longer sets
any sorted column (the .setSortedColumn("myCol1") call was removed). This makes
the test misleading and harder to maintain.
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/text/TextIndexType.java:
##########
@@ -86,6 +86,20 @@ public void validate(FieldIndexConfigs indexConfigs,
FieldSpec fieldSpec, TableC
if (textIndexConfig.isEnabled()) {
Preconditions.checkState(fieldSpec.getDataType().getStoredType() ==
FieldSpec.DataType.STRING,
"Cannot create TEXT index on column: %s of stored type other than
STRING", fieldSpec.getName());
+ for (IndexType indexType : List.of(
+ StandardIndexes.bloomFilter(),
+ StandardIndexes.dictionary(),
+ StandardIndexes.inverted(),
+ StandardIndexes.vector(),
+ StandardIndexes.range(),
+ StandardIndexes.json(),
+ StandardIndexes.fst(),
+ StandardIndexes.h3(),
+ StandardIndexes.ifst())) {
+
Preconditions.checkState(indexConfigs.getConfig(indexType).isDisabled(),
+ "Anti pattern to enable both text index and %s on column: %s",
+ indexType.getPrettyName(), fieldSpec.getName());
+ }
Review Comment:
Hard-failing when other indexes are enabled alongside TEXT (including
dictionary/inverted/range/fst) is a breaking behavior change and will
invalidate configurations that are already used in the repo (e.g.
EncodingType.DICTIONARY + TEXT, and INVERTED + TEXT in segment preprocessor
tests). If the intent is to discourage these combinations, prefer logging a
warning instead of rejecting the table config at validation time.
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/json/JsonIndexType.java:
##########
@@ -85,6 +84,22 @@ public void validate(FieldIndexConfigs indexConfigs,
FieldSpec fieldSpec, TableC
DataType storedType = fieldSpec.getDataType().getStoredType();
Preconditions.checkState(storedType == DataType.STRING || storedType ==
DataType.MAP,
"Cannot create JSON index on column: %s of stored type other than
STRING or MAP", column);
+ Preconditions.checkState(fieldSpec.isSingleValueField(), "Cannot create
JSON index on multi-value column: %s",
+ column);
Review Comment:
Duplicate single-value validation: isSingleValueField() is checked twice for
the same enabled JSON index. This is redundant and should be removed to avoid
confusion and keep validation logic DRY.
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/range/RangeIndexType.java:
##########
@@ -97,6 +96,19 @@ public void validate(FieldIndexConfigs indexConfigs,
FieldSpec fieldSpec, TableC
"Cannot create range index version %s on column: %s with RAW forward
index and dictionary; "
+ "use BitSliced range index version %s instead",
RangeIndexCreator.VERSION, column,
BitSlicedRangeIndexCreator.VERSION);
+ for (IndexType indexType : List.of(
+ StandardIndexes.bloomFilter(),
+ StandardIndexes.inverted(),
+ StandardIndexes.vector(),
+ StandardIndexes.json(),
+ StandardIndexes.text(),
+ StandardIndexes.fst(),
+ StandardIndexes.h3(),
+ StandardIndexes.ifst())) {
Review Comment:
Inverted index + range index is used in existing tests/configs (e.g.
TableConfigUtilsTest builds configs with both on the same column). This new
validation treats that as an anti-pattern and hard-fails, which will break
those tests and existing table configs.
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/TableConfigUtils.java:
##########
@@ -1674,6 +1668,12 @@ private static void
validateIndexingConfigAndFieldConfigList(TableConfig tableCo
FieldSpec fieldSpec = schema.getFieldSpecFor(column);
Preconditions.checkState(fieldSpec != null, "Failed to find sorted
column: %s in schema", column);
Preconditions.checkState(fieldSpec.isSingleValueField(), "Cannot sort
on multi-value column: %s", column);
+ FieldIndexConfigs indexConfigsForSortedColumn =
indexConfigsMap.get(column);
+ for (IndexType indexType : List.of(StandardIndexes.bloomFilter(),
StandardIndexes.inverted(),
+ StandardIndexes.range())) {
+
Preconditions.checkState(indexConfigsForSortedColumn.getConfig(indexType).isDisabled(),
+ "Redundant to enable %s on a sorted column: %s",
indexType.getPrettyName(), fieldSpec.getName());
+ }
Review Comment:
This adds a hard validation error for bloomFilter/inverted/range on sorted
columns. Bloom filters on sorted columns are currently used in-repo (e.g.
pinot-sql-ddl/src/test/.../RoundTripTest builds a config with
sortedColumn("country") and bloomFilterColumns(["country"]). If sorted+inverted
is redundant, consider restricting this to inverted only (or warning) to avoid
rejecting existing configs.
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/h3/H3IndexType.java:
##########
@@ -156,7 +169,7 @@ private ReaderFactory() {
@Override
protected H3IndexReader createIndexReader(PinotDataBuffer dataBuffer,
ColumnMetadata metadata,
- H3IndexConfig indexConfig) {
+ H3IndexConfig indexConfig) {
return new ImmutableH3IndexReader(dataBuffer);
Review Comment:
Wrapped parameter indentation in this method signature is also likely to
violate Checkstyle Indentation (lineWrappingIndentation=4). Re-indent the
wrapped parameter line with a 4-space continuation indent for consistency and
to avoid CI failures.
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/dictionary/DictionaryIndexType.java:
##########
@@ -110,11 +109,22 @@ public DictionaryIndexConfig getDefaultConfig() {
@Override
public void validate(FieldIndexConfigs indexConfigs, FieldSpec fieldSpec,
TableConfig tableConfig) {
DictionaryIndexConfig dictionaryConfig =
indexConfigs.getConfig(StandardIndexes.dictionary());
- if (dictionaryConfig.isEnabled() &&
dictionaryConfig.isUseVarLengthDictionary()) {
- DataType storedType = fieldSpec.getDataType().getStoredType();
- Preconditions.checkState(!storedType.isFixedWidth(),
- "Cannot create var-length dictionary on column: %s of fixed-width
stored type: %s", fieldSpec.getName(),
- storedType);
+ if (dictionaryConfig.isEnabled()) {
+ if (dictionaryConfig.isUseVarLengthDictionary()) {
+ DataType storedType = fieldSpec.getDataType().getStoredType();
+ Preconditions.checkState(!storedType.isFixedWidth(),
+ "Cannot create var-length dictionary on column: %s of fixed-width
stored type: %s", fieldSpec.getName(),
+ storedType);
+ }
+ for (IndexType indexType : List.of(
+ StandardIndexes.vector(),
+ StandardIndexes.json(),
+ StandardIndexes.text(),
+ StandardIndexes.h3())) {
Review Comment:
This new validation rejects TEXT index columns that still have the
dictionary index enabled. That breaks existing, currently-supported
configurations where TEXT is used with DICTIONARY encoding (e.g. tests in
pinot-core and pinot-segment-local use EncodingType.DICTIONARY with
FieldConfig.IndexType.TEXT, and TextIndexConfigBuilder supports
noRawDataForTextIndex). Unless this is an intentional breaking change with
broad repo updates, TEXT should not be treated as incompatible with dictionary.
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/h3/H3IndexType.java:
##########
@@ -121,7 +134,7 @@ protected IndexReaderFactory<H3IndexReader>
createReaderFactory() {
@Override
public IndexHandler createIndexHandler(SegmentDirectory segmentDirectory,
Map<String, FieldIndexConfigs> configsByCol,
- Schema schema, TableConfig tableConfig) {
+ Schema schema, TableConfig
tableConfig) {
return new H3IndexHandler(segmentDirectory, configsByCol, tableConfig,
schema);
Review Comment:
Indentation for wrapped method parameters is far beyond the repo Checkstyle
settings (basicOffset=2, lineWrappingIndentation=4 in config/checkstyle.xml).
This will likely fail Checkstyle; align wrapped params with a 4-space
continuation indent (consistent with other IndexType implementations).
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/inverted/InvertedIndexType.java:
##########
@@ -87,6 +87,19 @@ public void validate(FieldIndexConfigs indexConfigs,
FieldSpec fieldSpec, TableC
"Cannot create inverted index on column: %s without dictionary",
column);
Preconditions.checkState(fieldSpec.getDataType() != DataType.MAP,
"Cannot create inverted index on MAP column: %s", column);
+ for (IndexType indexType : List.of(
+ StandardIndexes.bloomFilter(),
+ StandardIndexes.vector(),
+ StandardIndexes.range(),
+ StandardIndexes.json(),
+ StandardIndexes.text(),
+ StandardIndexes.fst(),
+ StandardIndexes.h3(),
+ StandardIndexes.ifst())) {
Review Comment:
This mutual-exclusion list makes `inverted + bloom` (and `inverted + range`)
invalid, but those combinations are already asserted as valid in tests (e.g.
IndexCombinationValidationTest#testDictEncodedWithInvertedAndTextAndBloomPasses
expects inverted + bloom to pass; TableConfigUtilsTest builds configs with
inverted + range). This validation should not hard-fail for these combinations.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]