spark git commit: [SPARK-23081][PYTHON] Add colRegex API to PySpark

gurwls223 Thu, 25 Jan 2018 14:51:51 -0800

Repository: spark
Updated Branches:
  refs/heads/branch-2.3 8866f9c24 -> 2f65c20ea



[SPARK-23081][PYTHON] Add colRegex API to PySpark

## What changes were proposed in this pull request?

Add colRegex API to PySpark

## How was this patch tested?

add a test in sql/tests.py

Author: Huaxin Gao <huax...@us.ibm.com>

Closes #20390 from huaxingao/spark-23081.

(cherry picked from commit 8480c0c57698b7dcccec5483d67b17cf2c7527ed)
Signed-off-by: hyukjinkwon <gurwls...@gmail.com>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2f65c20e
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2f65c20e
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2f65c20e

Branch: refs/heads/branch-2.3
Commit: 2f65c20ea74a87729eaf3c9b2aebcfb10c0ecf4b
Parents: 8866f9c
Author: Huaxin Gao <huax...@us.ibm.com>
Authored: Fri Jan 26 07:50:48 2018 +0900
Committer: hyukjinkwon <gurwls...@gmail.com>
Committed: Fri Jan 26 07:51:01 2018 +0900

----------------------------------------------------------------------
 python/pyspark/sql/dataframe.py                 | 23 ++++++++++++++++++++
 .../scala/org/apache/spark/sql/Dataset.scala    |  8 +++----
 2 files changed, 27 insertions(+), 4 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/2f65c20e/python/pyspark/sql/dataframe.py
----------------------------------------------------------------------
diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py
index 2d5e9b9..ac40308 100644
--- a/python/pyspark/sql/dataframe.py
+++ b/python/pyspark/sql/dataframe.py
@@ -819,6 +819,29 @@ class DataFrame(object):
         """
         return [f.name for f in self.schema.fields]
 
+    @since(2.3)
+    def colRegex(self, colName):
+        """
+        Selects column based on the column name specified as a regex and 
returns it
+        as :class:`Column`.
+
+        :param colName: string, column name specified as a regex.
+
+        >>> df = spark.createDataFrame([("a", 1), ("b", 2), ("c",  3)], 
["Col1", "Col2"])
+        >>> df.select(df.colRegex("`(Col1)?+.+`")).show()
+        +----+
+        |Col2|
+        +----+
+        |   1|
+        |   2|
+        |   3|
+        +----+
+        """
+        if not isinstance(colName, basestring):
+            raise ValueError("colName should be provided as string")
+        jc = self._jdf.colRegex(colName)
+        return Column(jc)
+
     @ignore_unicode_prefix
     @since(1.3)
     def alias(self, alias):

http://git-wip-us.apache.org/repos/asf/spark/blob/2f65c20e/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
----------------------------------------------------------------------
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
index 912f411..edb6644 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
@@ -1194,7 +1194,7 @@ class Dataset[T] private[sql](
   def orderBy(sortExprs: Column*): Dataset[T] = sort(sortExprs : _*)
 
   /**
-   * Selects column based on the column name and return it as a [[Column]].
+   * Selects column based on the column name and returns it as a [[Column]].
    *
    * @note The column name can also reference to a nested column like `a.b`.
    *
@@ -1220,7 +1220,7 @@ class Dataset[T] private[sql](
   }
 
   /**
-   * Selects column based on the column name and return it as a [[Column]].
+   * Selects column based on the column name and returns it as a [[Column]].
    *
    * @note The column name can also reference to a nested column like `a.b`.
    *
@@ -1240,7 +1240,7 @@ class Dataset[T] private[sql](
   }
 
   /**
-   * Selects column based on the column name specified as a regex and return 
it as [[Column]].
+   * Selects column based on the column name specified as a regex and returns 
it as [[Column]].
    * @group untypedrel
    * @since 2.3.0
    */
@@ -2729,7 +2729,7 @@ class Dataset[T] private[sql](
   }
 
   /**
-   * Return an iterator that contains all rows in this Dataset.
+   * Returns an iterator that contains all rows in this Dataset.
    *
    * The iterator will consume as much memory as the largest partition in this 
Dataset.
    *


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-23081][PYTHON] Add colRegex API to PySpark

Reply via email to