[ https://issues.apache.org/jira/browse/SPARK-45657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
John Zhuge resolved SPARK-45657. -------------------------------- Fix Version/s: 3.5.0 Resolution: Fixed The issue is fixed in 3.5.0 > Caching SQL UNION of different column data types does not work inside > Dataset.union > ----------------------------------------------------------------------------------- > > Key: SPARK-45657 > URL: https://issues.apache.org/jira/browse/SPARK-45657 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.3.2, 3.4.0, 3.4.1 > Reporter: John Zhuge > Priority: Major > Fix For: 3.5.0 > > > > Cache SQL UNION of 2 sides with different column data types > {code:java} > scala> spark.sql("select 1 id union select 's2' id").cache() {code} > Dataset.union does not leverage the cache > {code:java} > scala> spark.sql("select 1 id union select 's2' id").union(spark.sql("select > 's3'")).queryExecution.optimizedPlan > res15: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan = > Union false, false > :- Aggregate [id#109], [id#109] > : +- Union false, false > : :- Project [1 AS id#109] > : : +- OneRowRelation > : +- Project [s2 AS id#108] > : +- OneRowRelation > +- Project [s3 AS s3#111] > +- OneRowRelation {code} > SQL UNION of the cached SQL UNION does use the cache! Please note > `InMemoryRelation` used. > {code:java} > scala> spark.sql("(select 1 id union select 's2' id) union select > 's3'").queryExecution.optimizedPlan > res16: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan = > Aggregate [id#117], [id#117] > +- Union false, false > :- InMemoryRelation [id#117], StorageLevel(disk, memory, deserialized, 1 > replicas) > : +- *(4) HashAggregate(keys=[id#100], functions=[], output=[id#100]) > : +- Exchange hashpartitioning(id#100, 500), ENSURE_REQUIREMENTS, > [plan_id=241] > : +- *(3) HashAggregate(keys=[id#100], functions=[], > output=[id#100]) > : +- Union > : :- *(1) Project [1 AS id#100] > : : +- *(1) Scan OneRowRelation[] > : +- *(2) Project [s2 AS id#99] > : +- *(2) Scan OneRowRelation[] > +- Project [s3 AS s3#116] > +- OneRowRelation {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org