In the case of saveAsTable("tablename") you specified the partition: ' partitionBy("partitionCol")'
On Sat, Jan 21, 2023 at 4:03 AM krexos <kre...@protonmail.com.invalid> wrote: > My periodically running process writes data to a table over parquet files > with the configuration "spark.sql.sources.partitionOverwriteMode" = > "dynamic" with the following code: > > if (!tableExists) { > df.write > .mode("overwrite") > .partitionBy("partitionCol") > .format("parquet") > .saveAsTable("tablename") > }else { > df.write > .format("parquet") > .mode("overwrite") > .insertInto("table") > } > > If the table doesn't exist and is created in the first clause, it works > fine and on the next run when the table does exist and the else clause runs > it works as expected. > > However, when I create the table over existing parquet files either > through a hive session or using spark.sql("CREATE TABLE...") and then run > the process it fails to write with the error: > > "org.apache.spark.SparkException: Dynamic partition strict mode requires > at least one static partition column. To turn this off set > hive.exec.dynamic.partition.mode=nonstrict" > Adding this configuration to the spark conf solves the issue but I don't > understand why it is needed when creating the table through a command but > isn't needed when creating the table with saveAsTable. > > Also, I don't understand how this configuration is relevant for spark. From > what I've read > <https://cwiki.apache.org/confluence/display/hive/tutorial#Tutorial-Dynamic-PartitionInsert>, > static partition here means we directly specify the partition to write into > instead of specifying the column to partition by. Is it even possible to do > such an insert in spark (as opposed to HiveQL)? > > Spark 2.4, Hadoop 3.1 > > > thanks >