Michael Armbrust created SPARK-27453:
----------------------------------------

             Summary: DataFrameWriter.partitionBy is Silently Dropped by DSV1
                 Key: SPARK-27453
                 URL: https://issues.apache.org/jira/browse/SPARK-27453
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.4.1, 2.2.3, 2.1.3, 2.0.2, 1.6.3, 1.5.2, 1.4.1
            Reporter: Michael Armbrust


This is a long standing quirk of the interaction between {{DataFrameWriter}} 
and {{CreatableRelationProvider}} (and the other forms of the DSV1 API).  Users 
can specify columns in {{partitionBy}} and our internal data sources will use 
this information.  Unfortunately, for external systems, this data is silently 
dropped with no feedback given to the user.

In the long run, I think that DataSourceV2 is a better answer. However, I don't 
think we should wait for that API to stabilize before offering some kind of 
solution to developers of external data sources. I also do not think we should 
break binary compatibility of this API, but I do think that  small surgical fix 
could alleviate the issue.

I would propose that we could propagate partitioning information (when present) 
along with the other configuration options passed to the data source in the 
{{String, String}} map.

I think its very unlikely that there are both data sources that validate extra 
options and users who are using (no-op) partitioning with them, but out of an 
abundance of caution we should protect the behavior change behind a {{legacy}} 
flag that can be turned off.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to