Re: Table created with saveAsTable behaves differently than a table created with spark.sql("CREATE TABLE....)

krexos Sat, 21 Jan 2023 11:55:08 -0800

But in this case too the single partition is dynamic. I would expect the error 
to be thrown here too.


When I create the table through a query I do it with PARTITION BY 'partitionCol'
thanks

------- Original Message -------
On Saturday, January 21st, 2023 at 9:27 PM, Peyman Mohajerian 
<mohaj...@gmail.com> wrote:

> In the case of saveAsTable("tablename") you specified the partition: 
> 'partitionBy("partitionCol")'
>
> On Sat, Jan 21, 2023 at 4:03 AM krexos <kre...@protonmail.com.invalid> wrote:
>
>> My periodically running process writes data to a table over parquet files 
>> with the configuration"spark.sql.sources.partitionOverwriteMode" = 
>> "dynamic"with the following code:
>>
>> if
>>
>> (!tableExists) {
>>   df.write
>>     .mode(
>>
>> "overwrite"
>>
>> )
>>     .partitionBy(
>>
>> "partitionCol"
>>
>> )
>>     .format(
>>
>> "parquet"
>>
>> )
>>     .saveAsTable(
>>
>> "tablename"
>>
>> )
>> }
>>
>> else
>>
>> {
>>   df.write
>>     .format(
>>
>> "parquet"
>>
>> )
>>     .mode(
>>
>> "overwrite"
>>
>> )
>>     .insertInto(
>>
>> "table"
>>
>> )
>> }
>>
>> If the table doesn't exist and is created in the first clause, it works fine 
>> and on the next run when the table does exist and the else clause runs it 
>> works as expected.
>>
>> However, when I create the table over existing parquet files either through 
>> a hive session or usingspark.sql("CREATE TABLE...")and then run the process 
>> it fails to write with the error:
>>
>> "org.apache.spark.SparkException: Dynamic partition strict mode requires at 
>> least one static partition column. To turn this off set 
>> hive.exec.dynamic.partition.mode=nonstrict"
>>
>> Adding this configuration to the spark conf solves the issue but I don't 
>> understand why it is needed when creating the table through a command but 
>> isn't needed when creating the table with saveAsTable.
>>
>> Also, I don't understand how this configuration is relevant for spark.[From 
>> what I've 
>> read](https://cwiki.apache.org/confluence/display/hive/tutorial#Tutorial-Dynamic-PartitionInsert),
>>  static partition here means we directly specify the partition to write into 
>> instead of specifying the column to partition by. Is it even possible to do 
>> such an insert in spark (as opposed to HiveQL)?
>>
>> Spark 2.4, Hadoop 3.1
>>
>> thanks

Re: Table created with saveAsTable behaves differently than a table created with spark.sql("CREATE TABLE....)

Reply via email to