james-willis opened a new pull request, #55990:
URL: https://github.com/apache/spark/pull/55990

   Backport of #54701 to branch-4.0.
   
   ### What changes were proposed in this pull request?
   
   `ColumnarRow.get()`, `ColumnarBatchRow.get()`, and `ColumnarArray.get()` 
throw `SparkUnsupportedOperationException` when called with a `UserDefinedType` 
because they have no branch to handle UDTs.
   
   This PR adds UDT handling to all three methods:
   - **ColumnarRow** and **ColumnarBatchRow**: Add an `instanceof 
UserDefinedType` branch that recurses with `udt.sqlType()`, matching the 
pattern already used in `SpecializedGettersReader.read()`.
   - **ColumnarArray**: Change the `handleUserDefinedType` flag from `false` to 
`true` in the existing call to `SpecializedGettersReader.read()`.
   
   ### Why are the changes needed?
   
   The codegen path (`CodeGenerator.getValue()`) unwraps `udt.sqlType()` before 
generating accessor calls, so UDT columns work when whole-stage codegen is 
active. However, on the interpreted eval path — when codegen is disabled, falls 
back, or the number of fields exceeds `spark.sql.codegen.maxFields` — 
`GetStructField.nullSafeEval` calls `ColumnarRow.get(ordinal, udtType)` 
directly, which hits the unhandled branch and throws.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes. UDT columns in columnar data sources (e.g., Parquet) now work correctly 
on the interpreted evaluation path. Previously they would throw 
`SparkUnsupportedOperationException`.
   
   ### How was this patch tested?
   
   Added 6 new tests in `ColumnarBatchSuite` covering all 3 methods x 2 UDT 
backing types (primitive `IntegerType` and complex `StructType`). Each test 
creates columnar vectors with UDT data and verifies that `get()` returns the 
correct value. Two helper UDT classes (`TestIntUDT`, `TestStructWrapperUDT`) 
are defined for the tests.
   
   Cherry-picked from 472735cefef on master. The cherry-pick had a trivial 
conflict in `ColumnarBatchSuite.scala`: the neighboring `[SPARK-55552] Variant` 
test exists on branch-4.1+ but not on branch-4.0, so its insertion point was 
contested. Resolved by keeping only the SPARK-55897 tests (the Variant test is 
unrelated).
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Yes. Opus 4.6


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to