[GitHub] [spark] Yikun commented on a change in pull request #32431: [SPARK-35173][SQL][PYTHON] Add multiple columns adding support

GitBox Fri, 07 May 2021 01:34:51 -0700


Yikun commented on a change in pull request #32431:
URL: https://github.com/apache/spark/pull/32431#discussion_r628004164




##########
File path: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
##########
@@ -2395,6 +2395,36 @@ class Dataset[T] private[sql](
    */
   def withColumn(colName: String, col: Column): DataFrame = 
withColumns(Seq(colName), Seq(col))
 
+  /**
+   * (Scala-specific) Returns a new Dataset by adding columns or replacing the 
existing columns
+   * that has the same names.
+   *
+   * `colsMap` is a map of column name and column, the column must only refer 
to attributes
+   * supplied by this Dataset. It is an error to add columns that refers to 
some other Dataset.
+   *
+   * @group untypedrel
+   * @since 3.2.0
+   */
+  def withColumns(colsMap: Map[String, Column]): DataFrame = {
+    val colNames = colsMap.flatMap{ case (colName, _) => Seq(colName) }.toSeq

Review comment:
       done, thanks for your suggestion!

##########
File path: python/pyspark/sql/dataframe.pyi
##########
@@ -250,6 +250,7 @@ class DataFrame(PandasMapOpsMixin, PandasConversionMixin):
         self, cols: Union[List[str], Tuple[str]], support: Optional[float] = 
...
     ) -> DataFrame: ...
     def withColumn(self, colName: str, col: Column) -> DataFrame: ...
+    def withColumns(self, colsMap: Dict[str, Column] ) -> DataFrame: ...

Review comment:
       done

##########
File path: python/pyspark/sql/dataframe.py
##########
@@ -2423,6 +2423,38 @@ def freqItems(self, cols, support=None):
             support = 0.01
         return DataFrame(self._jdf.stat().freqItems(_to_seq(self._sc, cols), 
support), self.sql_ctx)
 
+    def withColumns(self, colsMap):
+        """
+        Returns a new :class:`DataFrame` by adding multiple columns or 
replacing the
+        existing columns that has the same name.
+
+        The colsMap is a map of column name and column, the column must only 
refer to attribute
+        supplied by this Dataset. It is an error to add columns that refers to 
some other Dataset.

Review comment:
       done

##########
File path: python/pyspark/sql/dataframe.py
##########
@@ -2423,6 +2423,38 @@ def freqItems(self, cols, support=None):
             support = 0.01
         return DataFrame(self._jdf.stat().freqItems(_to_seq(self._sc, cols), 
support), self.sql_ctx)
 
+    def withColumns(self, colsMap):
+        """
+        Returns a new :class:`DataFrame` by adding multiple columns or 
replacing the
+        existing columns that has the same name.

Review comment:
       done

##########
File path: python/pyspark/sql/dataframe.py
##########
@@ -2423,6 +2423,38 @@ def freqItems(self, cols, support=None):
             support = 0.01
         return DataFrame(self._jdf.stat().freqItems(_to_seq(self._sc, cols), 
support), self.sql_ctx)
 
+    def withColumns(self, colsMap):
+        """
+        Returns a new :class:`DataFrame` by adding multiple columns or 
replacing the
+        existing columns that has the same name.
+
+        The colsMap is a map of column name and column, the column must only 
refer to attribute

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Yikun commented on a change in pull request #32431: [SPARK-35173][SQL][PYTHON] Add multiple columns adding support

Reply via email to