[ https://issues.apache.org/jira/browse/SPARK-32168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17151118#comment-17151118 ]
Apache Spark commented on SPARK-32168: -------------------------------------- User 'rdblue' has created a pull request for this issue: https://github.com/apache/spark/pull/28993 > DSv2 SQL overwrite incorrectly uses static plan with hidden partitions > ---------------------------------------------------------------------- > > Key: SPARK-32168 > URL: https://issues.apache.org/jira/browse/SPARK-32168 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.0.0 > Reporter: Ryan Blue > Priority: Blocker > Labels: correctness > > The v2 analyzer rule {{ResolveInsertInto}} tries to detect when a static > overwrite and a dynamic overwrite would produce the same result and will > choose to use static overwrite in that case. It will only use a dynamic > overwrite if there is a partition column without a static value and the SQL > mode is set to dynamic. > {code:lang=scala} > val dynamicPartitionOverwrite = partCols.size > staticPartitions.size && > conf.partitionOverwriteMode == PartitionOverwriteMode.DYNAMIC > {code} > The problem is that {{partCols}} are the names of only partitions that are in > the column list (identity partitions) and does not include hidden partitions, > like {{days(ts)}}. As a result, this doesn't detect hidden partitions and use > dynamic overwrite. Static overwrite is used instead; when a table has only > hidden partitions, the static filter drops all table data. > This is a correctness bug because Spark will overwrite more data than just > the set of partitions being written to in dynamic mode. The impact is limited > because this rule is only used for SQL queries (not plans from > DataFrameWriters) and only affects tables with hidden partitions. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org