This is an automated email from the ASF dual-hosted git repository.

xiangfu0 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/pinot.git


The following commit(s) were added to refs/heads/master by this push:
     new 7542d229674 fix(transformer): preserve existing BYTES value when 
transform yields null (#18503)
7542d229674 is described below

commit 7542d2296742f62c76fa1429cf94400f37af62aa
Author: Xiang Fu <[email protected]>
AuthorDate: Thu May 14 22:11:35 2026 -0700

    fix(transformer): preserve existing BYTES value when transform yields null 
(#18503)
    
    * fix(transformer): preserve existing BYTES value when transform yields null
    
    `ExpressionTransformer` was overwriting an existing non-null `byte[]` value
    with null when a transform expression evaluated to null. `byte[]` is 
logically
    a scalar but is caught by the `isArray()` branch intended for nested fields,
    so the null-result path applied `applyTransformedValue(record, column, 
null)`
    and cleared the BYTES column.
    
    Extract a `shouldPreserveExistingValue` helper that skips the apply step in
    two cases when the transform yields null and an existing non-null value is
    present:
    1. Implicit map transforms (`mapField__KEYS`/`__VALUES`) — generalizes the
       prior guard so it also covers the `shouldApplyTransform` branch (e.g.
       columns flagged null via `putDefaultNullValue`).
    2. BYTES single-value columns — treat `byte[]` as the scalar it represents.
    
    Guard applies in both the populate-missing/overwrite branch and the existing
    array/Collection/Map branch, including the post-upsert overwrite path
    (`_overwriteExistingValues=true`).
    
    Adds three regression tests covering implicit-map + BYTES, explicit
    null-returning transform on BYTES, and the post-upsert overwrite variant.
    
    Co-Authored-By: Claude Opus 4.7 <[email protected]>
    
    * Scope BYTES preservation to the existing-value branch only
    
    The `shouldApplyTransform` branch is reached when `existingValue == null`,
    the column is null-flagged, or `_overwriteExistingValues == true`. In all
    three cases the transform is meant to apply — for the post-upsert overwrite
    path in particular, a null-returning transform is the configured way to
    clear a derived column. Keeping the preservation guard there blocks that
    intent.
    
    Restrict the `shouldPreserveExistingValue` guard to the array/Collection/Map
    branch, which only fires when the existing value is genuinely non-null and
    not null-flagged. Drop the two tests that exercised the misplaced guard
    (implicit-map + putDefaultNullValue, and post-upsert overwrite).
    
    Co-Authored-By: Claude Opus 4.7 <[email protected]>
    
    * Exclude byte[] from the array/Collection/Map override branch
    
    Cleaner shape: filter `byte[]` out of the else-if predicate itself so BYTES
    single-value columns fall through as scalars (matching STRING/INT/etc.).
    The transform is never evaluated when the column already has a value, which
    makes the rule self-evident at the branch test. Drop the
    `shouldPreserveExistingValue` helper since it's no longer needed.
    
    Multi-value BYTES (`byte[][]`) is still caught by `isArray()` and continues
    to be treated as a nested array. The implicit-map null guard inside the
    branch is unchanged.
    
    Co-Authored-By: Claude Opus 4.7 <[email protected]>
    
    ---------
    
    Co-authored-by: Claude Opus 4.7 <[email protected]>
---
 .../recordtransformer/ExpressionTransformer.java   |  8 +++++--
 .../ExpressionTransformerTest.java                 | 25 ++++++++++++++++++++++
 2 files changed, 31 insertions(+), 2 deletions(-)

diff --git 
a/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/ExpressionTransformer.java
 
b/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/ExpressionTransformer.java
index 3e154039242..b7ac7ecbf41 100644
--- 
a/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/ExpressionTransformer.java
+++ 
b/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/ExpressionTransformer.java
@@ -197,8 +197,12 @@ public class ExpressionTransformer implements 
RecordTransformer {
           _throttledLogger.warn("Caught exception while evaluation transform 
function for column: " + column, e);
           record.markIncomplete();
         }
-      } else if (existingValue.getClass().isArray() || existingValue 
instanceof Collection
-          || existingValue instanceof Map) {
+      } else if ((existingValue.getClass().isArray() && !(existingValue 
instanceof byte[]))
+          || existingValue instanceof Collection || existingValue instanceof 
Map) {
+        // BYTES single-value columns are intentionally excluded from this 
branch. `byte[]` is logically a scalar
+        // and is preserved like other scalar types; without the exclusion, a 
transform yielding null (e.g. when
+        // the source field is absent) would clobber the existing `byte[]` 
value. Multi-value BYTES (`byte[][]`)
+        // is still treated as a nested array.
         try {
           Object transformedValue = 
transformFunctionEvaluator.evaluate(record);
           if (transformedValue == null && 
_implicitMapTransformColumns.contains(column)) {
diff --git 
a/pinot-segment-local/src/test/java/org/apache/pinot/segment/local/recordtransformer/ExpressionTransformerTest.java
 
b/pinot-segment-local/src/test/java/org/apache/pinot/segment/local/recordtransformer/ExpressionTransformerTest.java
index 30256b88f96..5f00033421e 100644
--- 
a/pinot-segment-local/src/test/java/org/apache/pinot/segment/local/recordtransformer/ExpressionTransformerTest.java
+++ 
b/pinot-segment-local/src/test/java/org/apache/pinot/segment/local/recordtransformer/ExpressionTransformerTest.java
@@ -221,6 +221,31 @@ public class ExpressionTransformerTest {
     Assert.assertFalse(row.isNullValue("mapDim1__VALUES"));
   }
 
+  @Test
+  public void testTransformReturningNullDoesNotOverrideExistingBytesValue() {
+    // A BYTES column with an explicit transform that yields null should not 
clobber the existing byte[] value.
+    // BYTES is a scalar type even though byte[] is technically an array.
+    Schema schema = new Schema.SchemaBuilder()
+        .addSingleValueDimension("payload", FieldSpec.DataType.BYTES)
+        .build();
+    IngestionConfig ingestionConfig = new IngestionConfig();
+    ingestionConfig.setTransformConfigs(Collections.singletonList(new 
TransformConfig("payload", "Groovy({null})")));
+    TableConfig tableConfig = new TableConfigBuilder(TableType.OFFLINE)
+        .setTableName("testBytesNullTransform")
+        .setIngestionConfig(ingestionConfig)
+        .build();
+    ExpressionTransformer expressionTransformer = new 
ExpressionTransformer(tableConfig, schema);
+
+    GenericRow row = new GenericRow();
+    byte[] existing = new byte[]{7, 8, 9};
+    row.putValue("payload", existing);
+
+    expressionTransformer.transform(row);
+
+    Assert.assertSame(row.getValue("payload"), existing);
+    Assert.assertFalse(row.isNullValue("payload"));
+  }
+
   /**
    * If destination field already exists in the row, do not execute transform 
function
    */


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to