[ https://issues.apache.org/jira/browse/SPARK-47525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wenchen Fan reassigned SPARK-47525: ----------------------------------- Assignee: Jack Chen > Support subquery correlation joining on map attributes > ------------------------------------------------------ > > Key: SPARK-47525 > URL: https://issues.apache.org/jira/browse/SPARK-47525 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.5.0 > Reporter: Jack Chen > Assignee: Jack Chen > Priority: Major > Labels: pull-request-available > > Currently, when a subquery is correlated on a condition like `outer_map[1] = > inner_map[1]`, DecorrelateInnerQuery generates a join on the map itself, > which is unsupported, so the query cannot run - for example: > > {code:java} > scala> Seq(Map(0 -> 0)).toDF.createOrReplaceTempView("v")scala> sql("select > v1.value[0] from v v1 where v1.value[0] > (select avg(v2.value[0]) from v v2 > where v1.value[1] = v2.value[1])").explain > org.apache.spark.sql.AnalysisException: > [UNSUPPORTED_SUBQUERY_EXPRESSION_CATEGORY.UNSUPPORTED_CORRELATED_REFERENCE_DATA_TYPE] > Unsupported subquery expression: Correlated column reference 'v1.value' > cannot be map type. SQLSTATE: 0A000; line 1 pos 49 > at > org.apache.spark.sql.errors.QueryCompilationErrors$.unsupportedCorrelatedReferenceDataTypeError(QueryCompilationErrors.scala:2463) > ... {code} > However, if we rewrite the query to pull out the map access `outer_map[1]` > into the outer plan, it succeeds: > > {code:java} > scala> sql("""with tmp as ( > select value[0] as value0, value[1] as value1 from v > ) > select v1.value0 from tmp v1 where v1.value0 > (select avg(v2.value0) from > tmp v2 where v1.value1 = v2.value1)""").explain{code} > Another point that can be improved is that, even if the data type supports > join, we still don’t need to join on the full attribute, and we can get a > better plan by doing the same rewrite to pull out the extract expression. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org