spark git commit: [SPARK-9691] [SQL] PySpark SQL rand function treats seed 0 as no seed

rxin Thu, 06 Aug 2015 17:29:30 -0700

Repository: spark
Updated Branches:
  refs/heads/branch-1.4 116f61187 -> e5a994f21



[SPARK-9691] [SQL] PySpark SQL rand function treats seed 0 as no seed

https://issues.apache.org/jira/browse/SPARK-9691

jkbradley rxin

Author: Yin Huai <yh...@databricks.com>

Closes #7999 from yhuai/pythonRand and squashes the following commits:

4187e0c [Yin Huai] Regression test.
a985ef9 [Yin Huai] Use "if seed is not None" instead "if seed" because "if 
seed" returns false when seed is 0.

(cherry picked from commit baf4587a569b49e39020c04c2785041bdd00789b)
Signed-off-by: Reynold Xin <r...@databricks.com>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e5a994f2
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e5a994f2
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e5a994f2

Branch: refs/heads/branch-1.4
Commit: e5a994f21f85b1f8522e89629eb1379d22454ba9
Parents: 116f611
Author: Yin Huai <yh...@databricks.com>
Authored: Thu Aug 6 17:03:14 2015 -0700
Committer: Reynold Xin <r...@databricks.com>
Committed: Thu Aug 6 17:28:37 2015 -0700

----------------------------------------------------------------------
 python/pyspark/sql/functions.py |  4 ++--
 python/pyspark/sql/tests.py     | 10 ++++++++++
 2 files changed, 12 insertions(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/e5a994f2/python/pyspark/sql/functions.py
----------------------------------------------------------------------
diff --git a/python/pyspark/sql/functions.py b/python/pyspark/sql/functions.py
index bbf465a..0fc50c8 100644
--- a/python/pyspark/sql/functions.py
+++ b/python/pyspark/sql/functions.py
@@ -329,7 +329,7 @@ def rand(seed=None):
     """Generates a random column with i.i.d. samples from U[0.0, 1.0].
     """
     sc = SparkContext._active_spark_context
-    if seed:
+    if seed is not None:
         jc = sc._jvm.functions.rand(seed)
     else:
         jc = sc._jvm.functions.rand()
@@ -341,7 +341,7 @@ def randn(seed=None):
     """Generates a column with i.i.d. samples from the standard normal 
distribution.
     """
     sc = SparkContext._active_spark_context
-    if seed:
+    if seed is not None:
         jc = sc._jvm.functions.randn(seed)
     else:
         jc = sc._jvm.functions.randn()

http://git-wip-us.apache.org/repos/asf/spark/blob/e5a994f2/python/pyspark/sql/tests.py
----------------------------------------------------------------------
diff --git a/python/pyspark/sql/tests.py b/python/pyspark/sql/tests.py
index 75ad223..5818487 100644
--- a/python/pyspark/sql/tests.py
+++ b/python/pyspark/sql/tests.py
@@ -498,6 +498,16 @@ class SQLTests(ReusedPySparkTestCase):
         for row in rndn:
             assert row[1] >= -4.0 and row[1] <= 4.0, "got: %s" % row[1]
 
+        # If the specified seed is 0, we should use it.
+        # https://issues.apache.org/jira/browse/SPARK-9691
+        rnd1 = df.select('key', functions.rand(0)).collect()
+        rnd2 = df.select('key', functions.rand(0)).collect()
+        self.assertEqual(sorted(rnd1), sorted(rnd2))
+
+        rndn1 = df.select('key', functions.randn(0)).collect()
+        rndn2 = df.select('key', functions.randn(0)).collect()
+        self.assertEqual(sorted(rndn1), sorted(rndn2))
+
     def test_between_function(self):
         df = self.sc.parallelize([
             Row(a=1, b=2, c=3),


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-9691] [SQL] PySpark SQL rand function treats seed 0 as no seed

Reply via email to