[GitHub] [spark] HyukjinKwon commented on a change in pull request #27331: [SPARK-29157][SQL][PYSPARK] Add DataFrameWriterV2 to Python API

GitBox Wed, 24 Jun 2020 19:15:10 -0700


HyukjinKwon commented on a change in pull request #27331:
URL: https://github.com/apache/spark/pull/27331#discussion_r445268946




##########
File path: python/pyspark/sql/readwriter.py
##########
@@ -1048,6 +1048,128 @@ def jdbc(self, url, table, mode=None, properties=None):
         self.mode(mode)._jwrite.jdbc(url, table, jprop)
 
 
+class DataFrameWriterV2(object):
+    """
+    Interface used to write a class:`pyspark.sql.dataframe.DataFrame`
+    to external storage using the v2 API.
+
+    .. versionadded:: 3.1.0
+    """
+
+    def __init__(self, df, table):
+        self._df = df
+        self._spark = df.sql_ctx
+        self._jwriter = df._jdf.writeTo(table)
+
+    @since(3.1)
+    def using(self, provider):
+        """
+        Specifies a provider for the underlying output data source.
+        Spark's default catalog supports "parquet", "json", etc.
+        """
+        self._jwriter.using(provider)
+        return self
+
+    @since(3.1)
+    def option(self, key, value):
+        """
+        Add a write option.
+        """
+        self._jwriter.option(key, to_str(value))
+        return self
+
+    @since(3.1)
+    def options(self, **options):
+        """
+        Add write options.
+        """
+        options = {k: to_str(v) for k, v in options.items()}
+        self._jwriter.options(options)
+        return self
+
+    @since(3.1)
+    def partitionedBy(self, col, *cols):

Review comment:
       Maybe it's important to describe what are expected for `col`. Only 
columns and the partition transform functions are allowed, not the regular 
Spark Column.
   
   I still don't like it we made this API looks like it takes regular Spark 
Columns, this was one of the reason why Pandas UDFs were redesigned and 
separate into two separate groups .. let's at least clarify it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #27331: [SPARK-29157][SQL][PYSPARK] Add DataFrameWriterV2 to Python API

Reply via email to