This is an automated email from the ASF dual-hosted git repository.
xiangfu0 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/pinot.git
The following commit(s) were added to refs/heads/master by this push:
new 7542d229674 fix(transformer): preserve existing BYTES value when
transform yields null (#18503)
7542d229674 is described below
commit 7542d2296742f62c76fa1429cf94400f37af62aa
Author: Xiang Fu <[email protected]>
AuthorDate: Thu May 14 22:11:35 2026 -0700
fix(transformer): preserve existing BYTES value when transform yields null
(#18503)
* fix(transformer): preserve existing BYTES value when transform yields null
`ExpressionTransformer` was overwriting an existing non-null `byte[]` value
with null when a transform expression evaluated to null. `byte[]` is
logically
a scalar but is caught by the `isArray()` branch intended for nested fields,
so the null-result path applied `applyTransformedValue(record, column,
null)`
and cleared the BYTES column.
Extract a `shouldPreserveExistingValue` helper that skips the apply step in
two cases when the transform yields null and an existing non-null value is
present:
1. Implicit map transforms (`mapField__KEYS`/`__VALUES`) — generalizes the
prior guard so it also covers the `shouldApplyTransform` branch (e.g.
columns flagged null via `putDefaultNullValue`).
2. BYTES single-value columns — treat `byte[]` as the scalar it represents.
Guard applies in both the populate-missing/overwrite branch and the existing
array/Collection/Map branch, including the post-upsert overwrite path
(`_overwriteExistingValues=true`).
Adds three regression tests covering implicit-map + BYTES, explicit
null-returning transform on BYTES, and the post-upsert overwrite variant.
Co-Authored-By: Claude Opus 4.7 <[email protected]>
* Scope BYTES preservation to the existing-value branch only
The `shouldApplyTransform` branch is reached when `existingValue == null`,
the column is null-flagged, or `_overwriteExistingValues == true`. In all
three cases the transform is meant to apply — for the post-upsert overwrite
path in particular, a null-returning transform is the configured way to
clear a derived column. Keeping the preservation guard there blocks that
intent.
Restrict the `shouldPreserveExistingValue` guard to the array/Collection/Map
branch, which only fires when the existing value is genuinely non-null and
not null-flagged. Drop the two tests that exercised the misplaced guard
(implicit-map + putDefaultNullValue, and post-upsert overwrite).
Co-Authored-By: Claude Opus 4.7 <[email protected]>
* Exclude byte[] from the array/Collection/Map override branch
Cleaner shape: filter `byte[]` out of the else-if predicate itself so BYTES
single-value columns fall through as scalars (matching STRING/INT/etc.).
The transform is never evaluated when the column already has a value, which
makes the rule self-evident at the branch test. Drop the
`shouldPreserveExistingValue` helper since it's no longer needed.
Multi-value BYTES (`byte[][]`) is still caught by `isArray()` and continues
to be treated as a nested array. The implicit-map null guard inside the
branch is unchanged.
Co-Authored-By: Claude Opus 4.7 <[email protected]>
---------
Co-authored-by: Claude Opus 4.7 <[email protected]>
---
.../recordtransformer/ExpressionTransformer.java | 8 +++++--
.../ExpressionTransformerTest.java | 25 ++++++++++++++++++++++
2 files changed, 31 insertions(+), 2 deletions(-)
diff --git
a/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/ExpressionTransformer.java
b/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/ExpressionTransformer.java
index 3e154039242..b7ac7ecbf41 100644
---
a/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/ExpressionTransformer.java
+++
b/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/ExpressionTransformer.java
@@ -197,8 +197,12 @@ public class ExpressionTransformer implements
RecordTransformer {
_throttledLogger.warn("Caught exception while evaluation transform
function for column: " + column, e);
record.markIncomplete();
}
- } else if (existingValue.getClass().isArray() || existingValue
instanceof Collection
- || existingValue instanceof Map) {
+ } else if ((existingValue.getClass().isArray() && !(existingValue
instanceof byte[]))
+ || existingValue instanceof Collection || existingValue instanceof
Map) {
+ // BYTES single-value columns are intentionally excluded from this
branch. `byte[]` is logically a scalar
+ // and is preserved like other scalar types; without the exclusion, a
transform yielding null (e.g. when
+ // the source field is absent) would clobber the existing `byte[]`
value. Multi-value BYTES (`byte[][]`)
+ // is still treated as a nested array.
try {
Object transformedValue =
transformFunctionEvaluator.evaluate(record);
if (transformedValue == null &&
_implicitMapTransformColumns.contains(column)) {
diff --git
a/pinot-segment-local/src/test/java/org/apache/pinot/segment/local/recordtransformer/ExpressionTransformerTest.java
b/pinot-segment-local/src/test/java/org/apache/pinot/segment/local/recordtransformer/ExpressionTransformerTest.java
index 30256b88f96..5f00033421e 100644
---
a/pinot-segment-local/src/test/java/org/apache/pinot/segment/local/recordtransformer/ExpressionTransformerTest.java
+++
b/pinot-segment-local/src/test/java/org/apache/pinot/segment/local/recordtransformer/ExpressionTransformerTest.java
@@ -221,6 +221,31 @@ public class ExpressionTransformerTest {
Assert.assertFalse(row.isNullValue("mapDim1__VALUES"));
}
+ @Test
+ public void testTransformReturningNullDoesNotOverrideExistingBytesValue() {
+ // A BYTES column with an explicit transform that yields null should not
clobber the existing byte[] value.
+ // BYTES is a scalar type even though byte[] is technically an array.
+ Schema schema = new Schema.SchemaBuilder()
+ .addSingleValueDimension("payload", FieldSpec.DataType.BYTES)
+ .build();
+ IngestionConfig ingestionConfig = new IngestionConfig();
+ ingestionConfig.setTransformConfigs(Collections.singletonList(new
TransformConfig("payload", "Groovy({null})")));
+ TableConfig tableConfig = new TableConfigBuilder(TableType.OFFLINE)
+ .setTableName("testBytesNullTransform")
+ .setIngestionConfig(ingestionConfig)
+ .build();
+ ExpressionTransformer expressionTransformer = new
ExpressionTransformer(tableConfig, schema);
+
+ GenericRow row = new GenericRow();
+ byte[] existing = new byte[]{7, 8, 9};
+ row.putValue("payload", existing);
+
+ expressionTransformer.transform(row);
+
+ Assert.assertSame(row.getValue("payload"), existing);
+ Assert.assertFalse(row.isNullValue("payload"));
+ }
+
/**
* If destination field already exists in the row, do not execute transform
function
*/
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]