[
https://issues.apache.org/jira/browse/SPARK-50114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18040396#comment-18040396
]
Abinaya Jayaprakasam commented on SPARK-50114:
----------------------------------------------
I am working on thisĀ
> Lateral column alias doesn't work with array_contains
> -----------------------------------------------------
>
> Key: SPARK-50114
> URL: https://issues.apache.org/jira/browse/SPARK-50114
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.5.3
> Reporter: Chao Sun
> Priority: Major
>
> It seems there is a regression from Spark 3.5.1. The following example:
> {code:java}
> CREATE TABLE foo (a ARRAY<STRING>);
> INSERT INTO foo VALUES (array('apple', 'banana', 'cherry')), (array('orange',
> 'grape')), (array('kiwi', 'mango', 'pineapple'));
> SELECT explode(a) AS item, array_contains(a, item) AS is_contain FROM foo;
> {code}
> Used to work in Spark 3.5.0 and 3.5.1:
> {code:java}
> +---------+----------+
> | item|is_contain|
> +---------+----------+
> | kiwi| true|
> | mango| true|
> |pineapple| true|
> | orange| true|
> | grape| true|
> | apple| true|
> | banana| true|
> | cherry| true|
> +---------+----------+
> {code}
> However, starting from Spark 3.5.2 it gives the following error:
> {code:java}
> org.apache.spark.SparkException: [INTERNAL_ERROR] Found the unresolved
> operator: 'Project [explode(a#2) AS item#0, 'array_contains(a#2,
> lateralAliasReference(item)) AS is_contain#1]
> == SQL(line 1, position 1) ==
> SELECT explode(a) AS item, array_contains(a, item) AS is_contain FROM foo
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> at org.apache.spark.SparkException$.internalError(SparkException.scala:92)
> at org.apache.spark.SparkException$.internalError(SparkException.scala:79)
> at
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$61(CheckAnalysis.scala:828)
> at
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$61$adapted(CheckAnalysis.scala:823)
> at
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:244)
> at
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis0(CheckAnalysis.scala:823)
> at
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis0$(CheckAnalysis.scala:197)
> at
> org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis0(Analyzer.scala:202)
> at
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:193)
> at
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:171)
> at
> org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:202)
> at
> org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:225)
> at
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330)
> at
> org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:222)
> at
> org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:77)
> at
> org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:138)
> at
> org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:219)
> at
> org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:546)
> at
> org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:219)
> at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
> at
> org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:218)
> at
> org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:77)
> at
> org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:74)
> at
> org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:66)
> at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
> at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
> at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97)
> at org.apache.spark.sql.SparkSession.$anonfun$sql$4(SparkSession.scala:691)
> at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
> at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:682)
> at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:713)
> at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:744)
> ... 47 elided
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]