[ 
https://issues.apache.org/jira/browse/SPARK-48921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao updated SPARK-48921:
-----------------------------
    Fix Version/s: 3.5.2
                       (was: 3.5.3)

> ScalaUDF in subquery should run through analyzer
> ------------------------------------------------
>
>                 Key: SPARK-48921
>                 URL: https://issues.apache.org/jira/browse/SPARK-48921
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 4.0.0, 3.5.1, 3.4.3
>            Reporter: L. C. Hsieh
>            Assignee: L. C. Hsieh
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0, 3.5.2
>
>
> We got a customer issue that a `MergeInto` query on Iceberg table works 
> earlier but cannot work after upgrading to Spark 3.4.
> The error looks like
> ```
> Caused by: org.apache.spark.SparkRuntimeException: Error while decoding: 
> org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to 
> nullable on unresolved object
> upcast(getcolumnbyordinal(0, StringType), StringType, - root class: 
> java.lang.String).toString.
> ```
> The source table of `MergeInto` uses `ScalaUDF`. The error happens when Spark 
> invokes the deserializer of input encoder of the `ScalaUDF` and the 
> deserializer is not resolved yet.
> The encoders of ScalaUDF are resolved by the rule `ResolveEncodersInUDF` 
> which will be applied at the end of analysis phase.
> During rewriting `MergeInto` to `ReplaceData` query, Spark creates an 
> `Exists` subquery and `ScalaUDF` is part of the plan of the subquery. Note 
> that the `ScalaUDF` is already resolved by the analyzer.
> Then, in `ResolveSubquery` rule which resolves the subquery, it will resolve 
> the subquery plan if it is not resolved yet. Because the subquery containing 
> `ScalaUDF` is resolved, the rule skips it so `ResolveEncodersInUDF` won't be 
> applied on it. So the analyzed `ReplaceData` query contains a `ScalaUDF` with 
> encoders unresolved that cause the error.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to