[ https://issues.apache.org/jira/browse/SPARK-27599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16845378#comment-16845378 ]
Nick Dimiduk commented on SPARK-27599: -------------------------------------- Sure [~Alexander_Fedosov]. Hive DDL let's you specify a partitioning strategy for the physical data layout (https://cwiki.apache.org/confluence/display/hive/LanguageManual+DDL#LanguageManualDDL-PartitionedTables). This is identical to the physical data layout partitioning https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/DataFrameWriter.html#partitionBy-java.lang.String...-). I observe that when writing to a hive table that has partitioned specified from a DataFrameWriter, Spark will throw an error when the hive table metadata partition definition does not agree with the partitionBy specified on the DataFrameWriter. My request here is that instead of only erring on disagreement, Spark should use the partitioning information from table metadata when no partitionBy call has been made. Basically, Spark knows what the destination table needs, so don't require that the caller provide it. Furthermore, there are likely other aspects of overlapping DDL between the hive table's metadata and methods on DataFrameWriter (bucketing comes to mind). When working with a table defined in hive metastore, Spark should defer to that metadata rather than require the user repeat it all in code. > DataFrameWriter.partitionBy should be optional when writing to a hive table > --------------------------------------------------------------------------- > > Key: SPARK-27599 > URL: https://issues.apache.org/jira/browse/SPARK-27599 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.4.1 > Reporter: Nick Dimiduk > Priority: Minor > > When writing to an existing, partitioned table stored in the Hive metastore, > Spark requires the call to {{saveAsTable}} to provide a value for > {{partitionedBy}}, even though that information is provided by the metastore > itself. Indeed, that information is available to Spark, as it will error if > the specified {{partitionBy}} does not match that of the table definition in > metastore. > There may be other attributes of the save call that can be retrieved from the > metastore... -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org