[jira] [Assigned] (SPARK-27453) DataFrameWriter.partitionBy is Silently Dropped by DSV1

Michael Armbrust (JIRA) Fri, 12 Apr 2019 15:26:07 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-27453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Michael Armbrust reassigned SPARK-27453:
----------------------------------------

    Assignee: Liwen Sun

> DataFrameWriter.partitionBy is Silently Dropped by DSV1
> -------------------------------------------------------
>
>                 Key: SPARK-27453
>                 URL: https://issues.apache.org/jira/browse/SPARK-27453
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.4.1, 1.5.2, 1.6.3, 2.0.2, 2.1.3, 2.2.3, 2.4.1
>            Reporter: Michael Armbrust
>            Assignee: Liwen Sun
>            Priority: Critical
>
> This is a long standing quirk of the interaction between {{DataFrameWriter}} 
> and {{CreatableRelationProvider}} (and the other forms of the DSV1 API).  
> Users can specify columns in {{partitionBy}} and our internal data sources 
> will use this information.  Unfortunately, for external systems, this data is 
> silently dropped with no feedback given to the user.
> In the long run, I think that DataSourceV2 is a better answer. However, I 
> don't think we should wait for that API to stabilize before offering some 
> kind of solution to developers of external data sources. I also do not think 
> we should break binary compatibility of this API, but I do think that  small 
> surgical fix could alleviate the issue.
> I would propose that we could propagate partitioning information (when 
> present) along with the other configuration options passed to the data source 
> in the {{String, String}} map.
> I think its very unlikely that there are both data sources that validate 
> extra options and users who are using (no-op) partitioning with them, but out 
> of an abundance of caution we should protect the behavior change behind a 
> {{legacy}} flag that can be turned off.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-27453) DataFrameWriter.partitionBy is Silently Dropped by DSV1

Reply via email to