vvivekiyer opened a new issue, #12254:
URL: https://github.com/apache/pinot/issues/12254
### Issue
**Table Config**
```
{
"tableName": "customer_OFFLINE",
"tableType": "OFFLINE",
"segmentsConfig": {
"replication": "1",
"segmentPushType": "REFRESH",
"schemaName": "customer",
"minimizeDataMovement": false
},
"tenants": {},
"tableIndexConfig": {
"autoGeneratedInvertedIndex": false,
"enableDynamicStarTreeCreation": false,
"columnMajorSegmentBuilderEnabled": false,
"optimizeDictionaryForMetrics": false,
"noDictionarySizeRatioThreshold": 0.85,
"rangeIndexVersion": 2,
"sortedColumn": [
"C_NAME"
],
"loadMode": "HEAP",
"enableDefaultStarTree": false,
"aggregateMetrics": false,
"nullHandlingEnabled": false,
"optimizeDictionary": false,
"createInvertedIndexDuringSegmentGeneration": false
},
"metadata": {
"customConfigs": {}
},
"fieldConfigList": [
{
"name": "C_NAME",
"encodingType": "DICTIONARY",
"indexTypes": [],
"indexes": {
"dictionary": {
"disabled": false,
"onHeap": true,
"useVarLengthDictionary": true
}
},
"tierOverwrites": null
}
],
"ingestionConfig": {
"transformConfigs": [],
"continueOnError": false,
"rowTimeValueCheck": false,
"segmentTimeValueCheck": true
},
"isDimTable": false
}
```
- Based on this config, we expect the dictionary for C_NAME column to be
loaded on-heap. However, that doesn't happen.
### RCA
- The issue is that `IndexLoadingConfig` loads the configs for each column
by calling `createIndexConfigsByColName(tableConfig, schema,
this::getDeserializer);` at
[link](https://github.com/apache/pinot/blob/7132a2203f13478f450cbf8f0524ba72bdc267b7/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/IndexLoadingConfig.java#L288)
which uses the deserializer as defined
[here](https://github.com/apache/pinot/blob/7132a2203f13478f450cbf8f0524ba72bdc267b7/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/IndexLoadingConfig.java#L291).
- This deserializer provider (onConflict.PICK_FIRST). gives priority to
`DictionaryIndexType.fromIndexLoadingConfig()` over `fieldConfigs`. As
`DictionaryIndexType.fromIndexLoadingConfig()` returns a DictionaryConfig with
onHeap set to false. So the fieldConfig settings are never considered.
### Questions/Solution
- As per my understanding of Index-SPI changes, `fieldConfigList` should be
given higher priority over Index `tableIndexConfig`. Given this, shouldn't the
code
[here](https://github.com/apache/pinot/blob/7132a2203f13478f450cbf8f0524ba72bdc267b7/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/IndexLoadingConfig.java#L305)
look as follows
```
deserializer =
stdDeserializer.withFallbackAlternative(
IndexConfigDeserializer.fromMap(table -> fromIndexLoadingConfig));
```
instead of
```
deserializer =
IndexConfigDeserializer.fromMap(table ->
fromIndexLoadingConfig).withFallbackAlternative(stdDeserializer);
```
@gortiz eager to know your thoughts on this. Do you see any issues with
making the above change?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]