In the case of saveAsTable("tablename") you specified the partition: '
partitionBy("partitionCol")'

On Sat, Jan 21, 2023 at 4:03 AM krexos <kre...@protonmail.com.invalid>
wrote:

> My periodically running process writes data to a table over parquet files
> with the configuration "spark.sql.sources.partitionOverwriteMode" =
> "dynamic" with the following code:
>
> if (!tableExists) {
>   df.write
>     .mode("overwrite")
>     .partitionBy("partitionCol")
>     .format("parquet")
>     .saveAsTable("tablename")
> }else {
>   df.write
>     .format("parquet")
>     .mode("overwrite")
>     .insertInto("table")
> }
>
> If the table doesn't exist and is created in the first clause, it works
> fine and on the next run when the table does exist and the else clause runs
> it works as expected.
>
> However, when I create the table over existing parquet files either
> through a hive session or using spark.sql("CREATE TABLE...") and then run
> the process it fails to write with the error:
>
> "org.apache.spark.SparkException: Dynamic partition strict mode requires
> at least one static partition column. To turn this off set
> hive.exec.dynamic.partition.mode=nonstrict"
> Adding this configuration to the spark conf solves the issue but I don't
> understand why it is needed when creating the table through a command but
> isn't needed when creating the table with saveAsTable.
>
> Also, I don't understand how this configuration is relevant for spark. From
> what I've read
> <https://cwiki.apache.org/confluence/display/hive/tutorial#Tutorial-Dynamic-PartitionInsert>,
> static partition here means we directly specify the partition to write into
> instead of specifying the column to partition by. Is it even possible to do
> such an insert in spark (as opposed to HiveQL)?
>
> Spark 2.4, Hadoop 3.1
>
>
> thanks
>

Reply via email to