[ https://issues.apache.org/jira/browse/SPARK-27453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855086#comment-16855086 ]
Apache Spark commented on SPARK-27453: -------------------------------------- User 'liwensun' has created a pull request for this issue: https://github.com/apache/spark/pull/24784 > DataFrameWriter.partitionBy is Silently Dropped by DSV1 > ------------------------------------------------------- > > Key: SPARK-27453 > URL: https://issues.apache.org/jira/browse/SPARK-27453 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.4.1, 1.5.2, 1.6.3, 2.0.2, 2.1.3, 2.2.3, 2.4.1 > Reporter: Michael Armbrust > Assignee: Liwen Sun > Priority: Critical > Fix For: 2.4.2, 3.0.0 > > > This is a long standing quirk of the interaction between {{DataFrameWriter}} > and {{CreatableRelationProvider}} (and the other forms of the DSV1 API). > Users can specify columns in {{partitionBy}} and our internal data sources > will use this information. Unfortunately, for external systems, this data is > silently dropped with no feedback given to the user. > In the long run, I think that DataSourceV2 is a better answer. However, I > don't think we should wait for that API to stabilize before offering some > kind of solution to developers of external data sources. I also do not think > we should break binary compatibility of this API, but I do think that small > surgical fix could alleviate the issue. > I would propose that we could propagate partitioning information (when > present) along with the other configuration options passed to the data source > in the {{String, String}} map. > I think its very unlikely that there are both data sources that validate > extra options and users who are using (no-op) partitioning with them, but out > of an abundance of caution we should protect the behavior change behind a > {{legacy}} flag that can be turned off. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org