[jira] [Assigned] (SPARK-32168) DSv2 SQL overwrite incorrectly uses static plan with hidden partitions
[ https://issues.apache.org/jira/browse/SPARK-32168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-32168: - Assignee: Ryan Blue > DSv2 SQL overwrite incorrectly uses static plan with hidden partitions > -- > > Key: SPARK-32168 > URL: https://issues.apache.org/jira/browse/SPARK-32168 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Ryan Blue >Assignee: Ryan Blue >Priority: Blocker > Labels: correctness > > The v2 analyzer rule {{ResolveInsertInto}} tries to detect when a static > overwrite and a dynamic overwrite would produce the same result and will > choose to use static overwrite in that case. It will only use a dynamic > overwrite if there is a partition column without a static value and the SQL > mode is set to dynamic. > {code:lang=scala} > val dynamicPartitionOverwrite = partCols.size > staticPartitions.size && > conf.partitionOverwriteMode == PartitionOverwriteMode.DYNAMIC > {code} > The problem is that {{partCols}} are the names of only partitions that are in > the column list (identity partitions) and does not include hidden partitions, > like {{days(ts)}}. As a result, this doesn't detect hidden partitions and use > dynamic overwrite. Static overwrite is used instead; when a table has only > hidden partitions, the static filter drops all table data. > This is a correctness bug because Spark will overwrite more data than just > the set of partitions being written to in dynamic mode. The impact is limited > because this rule is only used for SQL queries (not plans from > DataFrameWriters) and only affects tables with hidden partitions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32168) DSv2 SQL overwrite incorrectly uses static plan with hidden partitions
[ https://issues.apache.org/jira/browse/SPARK-32168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32168: Assignee: Apache Spark > DSv2 SQL overwrite incorrectly uses static plan with hidden partitions > -- > > Key: SPARK-32168 > URL: https://issues.apache.org/jira/browse/SPARK-32168 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Ryan Blue >Assignee: Apache Spark >Priority: Blocker > Labels: correctness > > The v2 analyzer rule {{ResolveInsertInto}} tries to detect when a static > overwrite and a dynamic overwrite would produce the same result and will > choose to use static overwrite in that case. It will only use a dynamic > overwrite if there is a partition column without a static value and the SQL > mode is set to dynamic. > {code:lang=scala} > val dynamicPartitionOverwrite = partCols.size > staticPartitions.size && > conf.partitionOverwriteMode == PartitionOverwriteMode.DYNAMIC > {code} > The problem is that {{partCols}} are the names of only partitions that are in > the column list (identity partitions) and does not include hidden partitions, > like {{days(ts)}}. As a result, this doesn't detect hidden partitions and use > dynamic overwrite. Static overwrite is used instead; when a table has only > hidden partitions, the static filter drops all table data. > This is a correctness bug because Spark will overwrite more data than just > the set of partitions being written to in dynamic mode. The impact is limited > because this rule is only used for SQL queries (not plans from > DataFrameWriters) and only affects tables with hidden partitions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32168) DSv2 SQL overwrite incorrectly uses static plan with hidden partitions
[ https://issues.apache.org/jira/browse/SPARK-32168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32168: Assignee: (was: Apache Spark) > DSv2 SQL overwrite incorrectly uses static plan with hidden partitions > -- > > Key: SPARK-32168 > URL: https://issues.apache.org/jira/browse/SPARK-32168 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Ryan Blue >Priority: Blocker > Labels: correctness > > The v2 analyzer rule {{ResolveInsertInto}} tries to detect when a static > overwrite and a dynamic overwrite would produce the same result and will > choose to use static overwrite in that case. It will only use a dynamic > overwrite if there is a partition column without a static value and the SQL > mode is set to dynamic. > {code:lang=scala} > val dynamicPartitionOverwrite = partCols.size > staticPartitions.size && > conf.partitionOverwriteMode == PartitionOverwriteMode.DYNAMIC > {code} > The problem is that {{partCols}} are the names of only partitions that are in > the column list (identity partitions) and does not include hidden partitions, > like {{days(ts)}}. As a result, this doesn't detect hidden partitions and use > dynamic overwrite. Static overwrite is used instead; when a table has only > hidden partitions, the static filter drops all table data. > This is a correctness bug because Spark will overwrite more data than just > the set of partitions being written to in dynamic mode. The impact is limited > because this rule is only used for SQL queries (not plans from > DataFrameWriters) and only affects tables with hidden partitions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org