Copilot commented on code in PR #18724:
URL: https://github.com/apache/pinot/pull/18724#discussion_r3385505323
##########
pinot-core/src/main/java/org/apache/pinot/core/segment/processing/utils/SegmentProcessorUtils.java:
##########
@@ -45,6 +69,23 @@ private SegmentProcessorUtils() {
*/
public static Pair<List<FieldSpec>, Integer> getFieldSpecs(Schema schema,
MergeType mergeType,
@Nullable List<String> sortOrder) {
+ return getFieldSpecs(schema, mergeType, sortOrder, false);
+ }
+
+ /**
+ * Returns the field specs (physical only) and number of sort fields based
on the merge type and sort order.
+ * <p>When {@code includeOriginalTimeField} is {@code true} (only allowed
for ROLLUP), a hidden LONG column
+ * ({@link TimeHandler#ORIGINAL_TIME_MS_COLUMN}) is appended as the last
sort field. It carries the original
+ * (pre-rounding) time value in millis so that rows within the same rollup
group are sorted by the original time,
+ * which is required by order sensitive aggregations
(FIRSTWITHTIME/LASTWITHTIME). The hidden column is not part of
+ * the rollup group key, and is stripped by the reducer before the output
segments are created.
+ */
+ public static Pair<List<FieldSpec>, Integer> getFieldSpecs(Schema schema,
MergeType mergeType,
+ @Nullable List<String> sortOrder, boolean includeOriginalTimeField) {
+ Preconditions.checkArgument(!includeOriginalTimeField || mergeType ==
MergeType.ROLLUP,
+ "Original time field can only be included for ROLLUP merge type");
+
Preconditions.checkArgument(!schema.hasColumn(TimeHandler.ORIGINAL_TIME_MS_COLUMN),
+ "Schema must not contain the reserved column: %s",
TimeHandler.ORIGINAL_TIME_MS_COLUMN);
Review Comment:
`SegmentProcessorUtils.getFieldSpecs(...)` unconditionally rejects schemas
containing the reserved `$originalTimeMs$` column even when
`includeOriginalTimeField` is `false`. That makes unrelated CONCAT/DEDUP/ROLLUP
flows fail for tables that never use the hidden column. The reserved-name check
only needs to run when we actually plan to add the hidden field.
##########
pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/realtimetoofflinesegments/RealtimeToOfflineSegmentsTaskGenerator.java:
##########
@@ -337,12 +339,14 @@ public void validateTaskConfigs(TableConfig tableConfig,
Schema schema, Map<Stri
Set<String> columnNames = schema.getColumnNames();
for (Map.Entry<String, String> entry : taskConfigs.entrySet()) {
if (entry.getKey().endsWith(".aggregationType")) {
- Preconditions.checkState(columnNames.contains(
- StringUtils.removeEnd(entry.getKey(),
RealtimeToOfflineSegmentsTask.AGGREGATION_TYPE_KEY_SUFFIX)),
+ String column =
+ StringUtils.removeEnd(entry.getKey(),
RealtimeToOfflineSegmentsTask.AGGREGATION_TYPE_KEY_SUFFIX);
+ Preconditions.checkState(columnNames.contains(column),
String.format("Column \"%s\" not found in schema!",
entry.getKey()));
Review Comment:
The schema-missing-column error message currently prints the task config key
(e.g. `myCol.aggregationType`) instead of the actual column name, which is
confusing given you already derive `column` just above.
##########
pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/realtimetoofflinesegments/RealtimeToOfflineSegmentsTaskGenerator.java:
##########
@@ -352,6 +356,12 @@ public void validateTaskConfigs(TableConfig tableConfig,
Schema schema, Map<Stri
String.format("Column \"%s\" has invalid aggregate type: %s",
entry.getKey(), entry.getValue());
throw new IllegalStateException(err);
Review Comment:
The invalid-aggregation-type exception drops the original
`IllegalArgumentException` cause and also formats the error as if
`entry.getKey()` were the column name (it includes the `.aggregationType`
suffix). Keeping the cause and using `column` makes validation failures much
easier to debug.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]