[ 
https://issues.apache.org/jira/browse/SPARK-27599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16845378#comment-16845378
 ] 

Nick Dimiduk commented on SPARK-27599:
--------------------------------------

Sure [~Alexander_Fedosov]. Hive DDL let's you specify a partitioning strategy 
for the physical data layout 
(https://cwiki.apache.org/confluence/display/hive/LanguageManual+DDL#LanguageManualDDL-PartitionedTables).
 This is identical to the physical data layout partitioning 
https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/DataFrameWriter.html#partitionBy-java.lang.String...-).
 I observe that when writing to a hive table that has partitioned specified 
from a DataFrameWriter, Spark will throw an error when the hive table metadata 
partition definition does not agree with the partitionBy specified on the 
DataFrameWriter.

My request here is that instead of only erring on disagreement, Spark should 
use the partitioning information from table metadata when no partitionBy call 
has been made. Basically, Spark knows what the destination table needs, so 
don't require that the caller provide it. Furthermore, there are likely other 
aspects of overlapping DDL between the hive table's metadata and methods on 
DataFrameWriter (bucketing comes to mind). When working with a table defined in 
hive metastore, Spark should defer to that metadata rather than require the 
user repeat it all in code.

> DataFrameWriter.partitionBy should be optional when writing to a hive table
> ---------------------------------------------------------------------------
>
>                 Key: SPARK-27599
>                 URL: https://issues.apache.org/jira/browse/SPARK-27599
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.4.1
>            Reporter: Nick Dimiduk
>            Priority: Minor
>
> When writing to an existing, partitioned table stored in the Hive metastore, 
> Spark requires the call to {{saveAsTable}} to provide a value for 
> {{partitionedBy}}, even though that information is provided by the metastore 
> itself. Indeed, that information is available to Spark, as it will error if 
> the specified {{partitionBy}} does not match that of the table definition in 
> metastore.
> There may be other attributes of the save call that can be retrieved from the 
> metastore...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to