Copilot commented on code in PR #17719:
URL: https://github.com/apache/pinot/pull/17719#discussion_r2820830410
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/stats/MapColumnPreIndexStatsCollector.java:
##########
@@ -276,12 +310,48 @@ private AbstractColumnStatisticsCollector
createKeyStatsCollector(String key, Ob
case BOOLEAN:
case STRING:
case MAP:
- case OBJECT:
return new StringColumnPreIndexStatsCollector(key, config);
default:
- LOGGER.warn("Unknown data type {} for key {} and value type {}", type,
key, value.getClass().getName());
+ LOGGER.warn("Unknown data type {} for key {}", dataType, key);
return new StringColumnPreIndexStatsCollector(key, config);
+ }
+ }
+
+ private AbstractColumnStatisticsCollector
promoteNumericKeyStatsToStringCollector(String key,
+ AbstractColumnStatisticsCollector keyStats, Object value) {
+ AbstractColumnStatisticsCollector stringKeyStats =
createStringKeyStatsCollector(key);
+ if (keyStats instanceof IntColumnPreIndexStatsCollector) {
+ for (int intValue : ((IntColumnPreIndexStatsCollector)
keyStats).getValues()) {
+ stringKeyStats.collect(String.valueOf(intValue));
+ }
+ } else if (keyStats instanceof LongColumnPreIndexStatsCollector) {
+ for (long longValue : ((LongColumnPreIndexStatsCollector)
keyStats).getValues()) {
+ stringKeyStats.collect(String.valueOf(longValue));
+ }
+ } else if (keyStats instanceof FloatColumnPreIndexStatsCollector) {
+ for (float floatValue : ((FloatColumnPreIndexStatsCollector)
keyStats).getValues()) {
+ stringKeyStats.collect(String.valueOf(floatValue));
+ }
+ } else if (keyStats instanceof DoubleColumnPreIndexStatsCollector) {
+ for (double doubleValue : ((DoubleColumnPreIndexStatsCollector)
keyStats).getValues()) {
+ stringKeyStats.collect(String.valueOf(doubleValue));
+ }
+ } else if (keyStats instanceof BigDecimalColumnPreIndexStatsCollector) {
+ for (BigDecimal decimalValue : ((BigDecimalColumnPreIndexStatsCollector)
keyStats).getValues()) {
+ stringKeyStats.collect(String.valueOf(decimalValue));
+ }
}
+ _keyStats.put(key, stringKeyStats);
+ stringKeyStats.collect(String.valueOf(value));
+ return stringKeyStats;
+ }
+
+ private StringColumnPreIndexStatsCollector
createStringKeyStatsCollector(String key) {
+ TableConfig tableConfig = new
TableConfigBuilder(TableType.OFFLINE).setTableName(key).build();
+ Schema keySchema = new Schema.SchemaBuilder().setSchemaName(key)
+ .addField(new DimensionFieldSpec(key, FieldSpec.DataType.STRING,
false)).build();
+ StatsCollectorConfig config = new StatsCollectorConfig(tableConfig,
keySchema, null);
+ return new StringColumnPreIndexStatsCollector(key, config);
Review Comment:
`createStringKeyStatsCollector()` duplicates the collector-creation logic
and bypasses `_createNoDictCollectorsForKeys`. If the parent column is using
the no-dict optimized stats collector, promotion will currently force a
dictionary-based `StringColumnPreIndexStatsCollector` instead of respecting
that setting. Consider reusing `createKeyStatsCollector(key,
FieldSpec.DataType.STRING)` (or otherwise honoring the no-dict setting) to keep
behavior consistent with the original collector configuration.
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/stats/MapColumnPreIndexStatsCollector.java:
##########
@@ -276,12 +310,48 @@ private AbstractColumnStatisticsCollector
createKeyStatsCollector(String key, Ob
case BOOLEAN:
case STRING:
case MAP:
- case OBJECT:
return new StringColumnPreIndexStatsCollector(key, config);
default:
- LOGGER.warn("Unknown data type {} for key {} and value type {}", type,
key, value.getClass().getName());
+ LOGGER.warn("Unknown data type {} for key {}", dataType, key);
return new StringColumnPreIndexStatsCollector(key, config);
+ }
+ }
+
+ private AbstractColumnStatisticsCollector
promoteNumericKeyStatsToStringCollector(String key,
+ AbstractColumnStatisticsCollector keyStats, Object value) {
+ AbstractColumnStatisticsCollector stringKeyStats =
createStringKeyStatsCollector(key);
+ if (keyStats instanceof IntColumnPreIndexStatsCollector) {
+ for (int intValue : ((IntColumnPreIndexStatsCollector)
keyStats).getValues()) {
+ stringKeyStats.collect(String.valueOf(intValue));
+ }
Review Comment:
`promoteNumericKeyStatsToStringCollector()` replays prior values via
`getValues()`, which returns the collectors' *unique* sets. This means the
promoted STRING collector’s `totalNumberOfEntries` (and potentially `isSorted`
/ multi-value counters) will no longer reflect what was collected before
promotion when duplicates were present. Consider copying relevant counters from
the original collector (e.g., `getTotalNumberOfEntries()` /
`getMaxNumberOfMultiValues()` / sorted flag) instead of letting them be
recomputed from the unique set replay.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]