[ 
https://issues.apache.org/jira/browse/SPARK-50114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18040396#comment-18040396
 ] 

Abinaya Jayaprakasam commented on SPARK-50114:
----------------------------------------------

I am working on thisĀ 

> Lateral column alias doesn't work with array_contains
> -----------------------------------------------------
>
>                 Key: SPARK-50114
>                 URL: https://issues.apache.org/jira/browse/SPARK-50114
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.5.3
>            Reporter: Chao Sun
>            Priority: Major
>
> It seems there is a regression from Spark 3.5.1. The following example:
> {code:java}
> CREATE TABLE foo (a ARRAY<STRING>);
> INSERT INTO foo VALUES (array('apple', 'banana', 'cherry')), (array('orange', 
> 'grape')), (array('kiwi', 'mango', 'pineapple'));
> SELECT explode(a) AS item, array_contains(a, item) AS is_contain FROM foo;
> {code}
> Used to work in Spark 3.5.0 and 3.5.1:
> {code:java}
> +---------+----------+
> |     item|is_contain|
> +---------+----------+
> |     kiwi|      true|
> |    mango|      true|
> |pineapple|      true|
> |   orange|      true|
> |    grape|      true|
> |    apple|      true|
> |   banana|      true|
> |   cherry|      true|
> +---------+----------+
> {code}
> However, starting from Spark 3.5.2 it gives the following error:
> {code:java}
> org.apache.spark.SparkException: [INTERNAL_ERROR] Found the unresolved 
> operator: 'Project [explode(a#2) AS item#0, 'array_contains(a#2, 
> lateralAliasReference(item)) AS is_contain#1]
> == SQL(line 1, position 1) ==
> SELECT explode(a) AS item, array_contains(a, item) AS is_contain FROM foo
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>   at org.apache.spark.SparkException$.internalError(SparkException.scala:92)
>   at org.apache.spark.SparkException$.internalError(SparkException.scala:79)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$61(CheckAnalysis.scala:828)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$61$adapted(CheckAnalysis.scala:823)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:244)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis0(CheckAnalysis.scala:823)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis0$(CheckAnalysis.scala:197)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis0(Analyzer.scala:202)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:193)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:171)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:202)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:225)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:222)
>   at 
> org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:77)
>   at 
> org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:138)
>   at 
> org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:219)
>   at 
> org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:546)
>   at 
> org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:219)
>   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
>   at 
> org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:218)
>   at 
> org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:77)
>   at 
> org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:74)
>   at 
> org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:66)
>   at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
>   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
>   at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97)
>   at org.apache.spark.sql.SparkSession.$anonfun$sql$4(SparkSession.scala:691)
>   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
>   at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:682)
>   at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:713)
>   at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:744)
>   ... 47 elided
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to