James Willis created SPARK-55897:
------------------------------------
Summary: ColumnarRow.get() and ColumnarBatchRow.get() throw on
UserDefinedType
Key: SPARK-55897
URL: https://issues.apache.org/jira/browse/SPARK-55897
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 4.1.1
Environment: I don't think this is hardware-dependent but I discovered
this on an M3 Macbook pro.
Reporter: James Willis
{{ColumnarRow.get()}} and {{ColumnarBatchRow.get()}} do not handle
{{{}UserDefinedType{}}}, throwing
{{SparkUnsupportedOperationException("_LEGACY_ERROR_TEMP_3155")}} when a UDT
field is accessed via the interpreted eval path (e.g.,
{{GetStructField.nullSafeEval}} on a nested struct from the vectorized Parquet
reader).
{code:java}
org.apache.spark.SparkException: [INTERNAL_ERROR] Undefined error message
parameter for error class:
'_LEGACY_ERROR_TEMP_3155', MessageTemplate: Datatype not supported <dataType>,
Parameters: Map()
at org.apache.spark.sql.vectorized.ColumnarRow.get(ColumnarRow.java:221)
at
org.apache.spark.sql.catalyst.expressions.GetStructField.nullSafeEval(complexTypeExtractors.scala:207){code}
h3. Root Cause
{{ColumnarRow.get()}} and {{ColumnarBatchRow.get()}} dispatch on {{dataType}}
via {{instanceof}} checks for all concrete Spark types but have no branch for
{{{}UserDefinedType{}}}. When {{{}GetStructField.nullSafeEval(){}}}passes a UDT
type to {{{}get(){}}}, it falls through to the default error branch.
The codegen path is unaffected because {{CodeGenerator.getValue()}} unwraps
{{udt.sqlType()}} before generating type-specific accessor calls
({{{}getInt{}}}, {{{}getStruct{}}}, etc.), bypassing {{get()}} entirely. This
is why the existing SPARK-39086 tests pass — they run through whole-stage
codegen.
The bug surfaces when the interpreted path is used (codegen disabled, codegen
fallback, or exceeding {{{}spark.sql.codegen.maxFields{}}}).
h3. Affected Code
* {{ColumnarRow.java:184-223}} — {{get(int ordinal, DataType dataType)}}
* {{ColumnarBatchRow.java:179-222}} — {{get(int ordinal, DataType dataType)}}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]