[ 
https://issues.apache.org/jira/browse/SPARK-47525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47525:
-----------------------------------

    Assignee: Jack Chen

> Support subquery correlation joining on map attributes
> ------------------------------------------------------
>
>                 Key: SPARK-47525
>                 URL: https://issues.apache.org/jira/browse/SPARK-47525
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.5.0
>            Reporter: Jack Chen
>            Assignee: Jack Chen
>            Priority: Major
>              Labels: pull-request-available
>
> Currently, when a subquery is correlated on a condition like `outer_map[1] = 
> inner_map[1]`, DecorrelateInnerQuery generates a join on the map itself,
> which is unsupported, so the query cannot run - for example:
>  
> {code:java}
> scala> Seq(Map(0 -> 0)).toDF.createOrReplaceTempView("v")scala> sql("select 
> v1.value[0] from v v1 where v1.value[0] > (select avg(v2.value[0]) from v v2 
> where v1.value[1] = v2.value[1])").explain
> org.apache.spark.sql.AnalysisException: 
> [UNSUPPORTED_SUBQUERY_EXPRESSION_CATEGORY.UNSUPPORTED_CORRELATED_REFERENCE_DATA_TYPE]
>  Unsupported subquery expression: Correlated column reference 'v1.value' 
> cannot be map type. SQLSTATE: 0A000; line 1 pos 49
> at 
> org.apache.spark.sql.errors.QueryCompilationErrors$.unsupportedCorrelatedReferenceDataTypeError(QueryCompilationErrors.scala:2463)
> ... {code}
> However, if we rewrite the query to pull out the map access `outer_map[1]` 
> into the outer plan, it succeeds:
>  
> {code:java}
> scala> sql("""with tmp as (
> select value[0] as value0, value[1] as value1 from v
> )
> select v1.value0 from tmp v1 where v1.value0 > (select avg(v2.value0) from 
> tmp v2 where v1.value1 = v2.value1)""").explain{code}
> Another point that can be improved is that, even if the data type supports 
> join, we still don’t need to join on the full attribute, and we can get a 
> better plan by doing the same rewrite to pull out the extract expression.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to