[GitHub] [spark] HyukjinKwon commented on a change in pull request #27331: [SPARK-29157][SQL][PYSPARK] Add DataFrameWriterV2 to Python API

GitBox Wed, 24 Jun 2020 19:19:10 -0700


HyukjinKwon commented on a change in pull request #27331:
URL: https://github.com/apache/spark/pull/27331#discussion_r445270934




##########
File path: python/pyspark/sql/readwriter.py
##########
@@ -1048,6 +1048,128 @@ def jdbc(self, url, table, mode=None, properties=None):
         self.mode(mode)._jwrite.jdbc(url, table, jprop)
 
 
+class DataFrameWriterV2(object):
+    """
+    Interface used to write a class:`pyspark.sql.dataframe.DataFrame`
+    to external storage using the v2 API.
+
+    .. versionadded:: 3.1.0
+    """
+
+    def __init__(self, df, table):
+        self._df = df
+        self._spark = df.sql_ctx
+        self._jwriter = df._jdf.writeTo(table)
+
+    @since(3.1)
+    def using(self, provider):
+        """
+        Specifies a provider for the underlying output data source.
+        Spark's default catalog supports "parquet", "json", etc.
+        """
+        self._jwriter.using(provider)
+        return self
+
+    @since(3.1)
+    def option(self, key, value):
+        """
+        Add a write option.
+        """
+        self._jwriter.option(key, to_str(value))
+        return self
+
+    @since(3.1)
+    def options(self, **options):
+        """
+        Add write options.
+        """
+        options = {k: to_str(v) for k, v in options.items()}
+        self._jwriter.options(options)
+        return self
+
+    @since(3.1)
+    def partitionedBy(self, col, *cols):

Review comment:
       @rdblue, @brkyvz, @cloud-fan, Should we maybe at least use a different 
class for these partition column expressions such as `PartitionedColumn` like 
we do for `TypedColumn`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #27331: [SPARK-29157][SQL][PYSPARK] Add DataFrameWriterV2 to Python API

Reply via email to