[GitHub] spark pull request #15432: [SPARK-17854][SQL] rand/randn allows null/long as...

2016-11-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15432


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15432: [SPARK-17854][SQL] rand/randn allows null/long as...

2016-11-04 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15432#discussion_r86658659
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala
 ---
@@ -97,17 +101,15 @@ case class Rand(seed: Long) extends RDG {
-0.3254147983080288
   > SELECT _FUNC_(0);
1.1164209726833079
+  > SELECT _FUNC_(null);
+   1.1164209726833079
   """)
 // scalastyle:on line.size.limit
-case class Randn(seed: Long) extends RDG {
-  override protected def evalInternal(input: InternalRow): Double = 
rng.nextGaussian()
+case class Randn(child: Expression) extends RDG {
 
-  def this() = this(Utils.random.nextLong())
+  def this() = this(Literal(Utils.random.nextLong()))
--- End diff --

Yes, makes sense. I will fix them here first.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15432: [SPARK-17854][SQL] rand/randn allows null/long as...

2016-11-04 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15432#discussion_r86658562
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala
 ---
@@ -97,17 +101,15 @@ case class Rand(seed: Long) extends RDG {
-0.3254147983080288
   > SELECT _FUNC_(0);
1.1164209726833079
+  > SELECT _FUNC_(null);
+   1.1164209726833079
   """)
 // scalastyle:on line.size.limit
-case class Randn(seed: Long) extends RDG {
-  override protected def evalInternal(input: InternalRow): Double = 
rng.nextGaussian()
+case class Randn(child: Expression) extends RDG {
 
-  def this() = this(Utils.random.nextLong())
+  def this() = this(Literal(Utils.random.nextLong()))
--- End diff --

If you meant something like `Literal(Long: Utils.random.nextLong()))`, I 
guess both are fine assuming from the discussion in the PR 12452.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15432: [SPARK-17854][SQL] rand/randn allows null/long as...

2016-11-04 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15432#discussion_r86658461
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala
 ---
@@ -97,17 +101,15 @@ case class Rand(seed: Long) extends RDG {
-0.3254147983080288
   > SELECT _FUNC_(0);
1.1164209726833079
+  > SELECT _FUNC_(null);
+   1.1164209726833079
   """)
 // scalastyle:on line.size.limit
-case class Randn(seed: Long) extends RDG {
-  override protected def evalInternal(input: InternalRow): Double = 
rng.nextGaussian()
+case class Randn(child: Expression) extends RDG {
 
-  def this() = this(Utils.random.nextLong())
+  def this() = this(Literal(Utils.random.nextLong()))
--- End diff --

The complier seems complaining if we specify the return type in `def this`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15432: [SPARK-17854][SQL] rand/randn allows null/long as...

2016-11-04 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15432#discussion_r86658154
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala
 ---
@@ -97,17 +101,15 @@ case class Rand(seed: Long) extends RDG {
-0.3254147983080288
   > SELECT _FUNC_(0);
1.1164209726833079
+  > SELECT _FUNC_(null);
+   1.1164209726833079
   """)
 // scalastyle:on line.size.limit
-case class Randn(seed: Long) extends RDG {
-  override protected def evalInternal(input: InternalRow): Double = 
rng.nextGaussian()
+case class Randn(child: Expression) extends RDG {
 
-  def this() = this(Utils.random.nextLong())
+  def this() = this(Literal(Utils.random.nextLong()))
--- End diff --

Why not specifying the data type here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15432: [SPARK-17854][SQL] rand/randn allows null/long as...

2016-11-04 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15432#discussion_r86658156
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala
 ---
@@ -87,6 +87,10 @@ case class Rand(seed: Long) extends RDG {
   }
 }
 
+object Rand {
+  def apply(seed: Long): Rand = Rand(Literal(seed))
--- End diff --

The same here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15432: [SPARK-17854][SQL] rand/randn allows null/long as...

2016-11-04 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15432#discussion_r86658157
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala
 ---
@@ -64,17 +66,15 @@ abstract class RDG extends LeafExpression with 
Nondeterministic {
0.9629742951434543
   > SELECT _FUNC_(0);
0.8446490682263027
+  > SELECT _FUNC_(null);
+   0.8446490682263027
   """)
 // scalastyle:on line.size.limit
-case class Rand(seed: Long) extends RDG {
-  override protected def evalInternal(input: InternalRow): Double = 
rng.nextDouble()
+case class Rand(child: Expression) extends RDG {
 
-  def this() = this(Utils.random.nextLong())
+  def this() = this(Literal(Utils.random.nextLong()))
--- End diff --

The same here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15432: [SPARK-17854][SQL] rand/randn allows null/long as...

2016-11-04 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15432#discussion_r86495088
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala 
---
@@ -1728,4 +1728,29 @@ class DataFrameSuite extends QueryTest with 
SharedSQLContext {
 val df = spark.createDataFrame(spark.sparkContext.makeRDD(rows), 
schema)
 assert(df.filter($"array1" === $"array2").count() == 1)
   }
+
+  test("SPARK-17854: rand/randn allows null and long as input seed") {
--- End diff --

Sure!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15432: [SPARK-17854][SQL] rand/randn allows null/long as...

2016-11-04 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15432#discussion_r86492775
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala 
---
@@ -1728,4 +1728,29 @@ class DataFrameSuite extends QueryTest with 
SharedSQLContext {
 val df = spark.createDataFrame(spark.sparkContext.makeRDD(rows), 
schema)
 assert(df.filter($"array1" === $"array2").count() == 1)
   }
+
+  test("SPARK-17854: rand/randn allows null and long as input seed") {
--- End diff --

move this into sql query test suite


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15432: [SPARK-17854][SQL] rand/randn allows null/long as...

2016-10-18 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15432#discussion_r83817947
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala
 ---
@@ -50,22 +54,27 @@ abstract class RDG extends LeafExpression with 
Nondeterministic {
 
   override def dataType: DataType = DoubleType
 
-  // NOTE: Even if the user doesn't provide a seed, Spark SQL adds a 
default seed.
-  override def sql: String = s"$prettyName($seed)"
+  override def inputTypes: Seq[AbstractDataType] = 
Seq(TypeCollection(IntegerType, LongType))
 }
 
 /** Generate a random column with i.i.d. uniformly distributed values in 
[0, 1). */
+// scalastyle:off line.size.limit
 @ExpressionDescription(
-  usage = "_FUNC_(a) - Returns a random column with i.i.d. uniformly 
distributed values in [0, 1).")
-case class Rand(seed: Long) extends RDG {
-  override protected def evalInternal(input: InternalRow): Double = 
rng.nextDouble()
+  usage =
+"""
+  _FUNC_() - Returns a random column with i.i.d. uniformly distributed 
values in [0, 1].
+seed is given randomly.
+
+  _FUNC_(seed) - Returns a random column with i.i.d. uniformly 
distributed values in [0, 1].
+seed should be an integer/long/NULL literal.
+""",
+  extended = "> SELECT _FUNC_();\n 0.9629742951434543\n> SELECT 
_FUNC_(0);\n 0.8446490682263027\n> SELECT _FUNC_(NULL);\n 0.8446490682263027")
--- End diff --

Sure, I will make it consistent.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15432: [SPARK-17854][SQL] rand/randn allows null/long as...

2016-10-18 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/15432#discussion_r83814907
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala
 ---
@@ -77,18 +86,28 @@ case class Rand(seed: Long) extends RDG {
   }
 }
 
+object Rand {
+  def apply(seed: Long): Rand = Rand(Literal(seed))
+}
+
 /** Generate a random column with i.i.d. gaussian random distribution. */
+// scalastyle:off line.size.limit
 @ExpressionDescription(
-  usage = "_FUNC_(a) - Returns a random column with i.i.d. gaussian random 
distribution.")
-case class Randn(seed: Long) extends RDG {
-  override protected def evalInternal(input: InternalRow): Double = 
rng.nextGaussian()
+  usage =
+"""
+  _FUNC_() - Returns a random column with i.i.d. gaussian random 
distribution.
--- End diff --

Nit: gaussian -> Gaussian. Well, really we need to specify that it has mean 
0 and stdev 1. And that's called the standard normal distribution. And I assume 
the 'n' in 'randn' is for normal. So maybe better to say "Returns a random 
column with i.i.d. values drawn from the standard normal distribution" 
everywhere including the comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15432: [SPARK-17854][SQL] rand/randn allows null/long as...

2016-10-18 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/15432#discussion_r83814390
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala
 ---
@@ -50,22 +54,27 @@ abstract class RDG extends LeafExpression with 
Nondeterministic {
 
   override def dataType: DataType = DoubleType
 
-  // NOTE: Even if the user doesn't provide a seed, Spark SQL adds a 
default seed.
-  override def sql: String = s"$prettyName($seed)"
+  override def inputTypes: Seq[AbstractDataType] = 
Seq(TypeCollection(IntegerType, LongType))
 }
 
 /** Generate a random column with i.i.d. uniformly distributed values in 
[0, 1). */
+// scalastyle:off line.size.limit
 @ExpressionDescription(
-  usage = "_FUNC_(a) - Returns a random column with i.i.d. uniformly 
distributed values in [0, 1).")
-case class Rand(seed: Long) extends RDG {
-  override protected def evalInternal(input: InternalRow): Double = 
rng.nextDouble()
+  usage =
+"""
+  _FUNC_() - Returns a random column with i.i.d. uniformly distributed 
values in [0, 1].
+seed is given randomly.
+
+  _FUNC_(seed) - Returns a random column with i.i.d. uniformly 
distributed values in [0, 1].
+seed should be an integer/long/NULL literal.
+""",
+  extended = "> SELECT _FUNC_();\n 0.9629742951434543\n> SELECT 
_FUNC_(0);\n 0.8446490682263027\n> SELECT _FUNC_(NULL);\n 0.8446490682263027")
--- End diff --

Given the discussion in the other PR, should the 'usage' message be shorter 
and push some of this into 'extended'? whatever would be consistent.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15432: [SPARK-17854][SQL] rand/randn allows null/long as...

2016-10-15 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15432#discussion_r83537639
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala
 ---
@@ -50,22 +54,17 @@ abstract class RDG extends LeafExpression with 
Nondeterministic {
 
   override def dataType: DataType = DoubleType
 
-  // NOTE: Even if the user doesn't provide a seed, Spark SQL adds a 
default seed.
-  override def sql: String = s"$prettyName($seed)"
+  override def inputTypes: Seq[AbstractDataType] = 
Seq(TypeCollection(IntegerType, LongType))
 }
 
 /** Generate a random column with i.i.d. uniformly distributed values in 
[0, 1). */
 @ExpressionDescription(
   usage = "_FUNC_(a) - Returns a random column with i.i.d. uniformly 
distributed values in [0, 1).")
--- End diff --

IMO, we need to update this. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15432: [SPARK-17854][SQL] rand/randn allows null/long as...

2016-10-12 Thread HyukjinKwon
GitHub user HyukjinKwon reopened a pull request:

https://github.com/apache/spark/pull/15432

[SPARK-17854][SQL] rand/randn allows null/long as input seed

## What changes were proposed in this pull request?

This PR proposes `rand`/`randn` accept `null` as input in Scala/SQL and 
`LongType` as input in SQL. In this case, it treats the values as `0`.

So, this PR includes both changes below:

- `null` support

  It seems MySQL also accepts this.


  ```sql
mysql> select rand(0);
+-+
| rand(0) |
+-+
| 0.15522042769493574 |
+-+
1 row in set (0.00 sec)

mysql> select rand(NULL);
+-+
| rand(NULL)  |
+-+
| 0.15522042769493574 |
+-+
1 row in set (0.00 sec)
```

  and also Hive does according to 
[HIVE-14694](https://issues.apache.org/jira/browse/HIVE-14694)

  So the codes below:

  ```scala
  spark.range(1).selectExpr("rand(null)").show()
  ```

  prints..

  **Before**

```
Input argument to rand must be an integer literal.;; line 1 pos 0
org.apache.spark.sql.AnalysisException: Input argument to rand must be an 
integer literal.;; line 1 pos 0
at 
org.apache.spark.sql.catalyst.analysis.FunctionRegistry$$anonfun$5.apply(FunctionRegistry.scala:465)
at 
org.apache.spark.sql.catalyst.analysis.FunctionRegistry$$anonfun$5.apply(FunctionRegistry.scala:444)
```

  **After**

```
+---+
|rand(CAST(NULL AS INT))|
+---+
|0.13385709732307427|
+---+
```

- `LongType` support in SQL.

  In addition, it make the function allows to take `LongType` consistently 
within Scala/SQL.

  In more details, the codes below:

  ```scala
spark.range(1).select(rand(1), rand(1L)).show()
spark.range(1).selectExpr("rand(1)", "rand(1L)").show()
```

  prints..

  **Before**

```
+--+--+
|   rand(1)|   rand(1)|
+--+--+
|0.2630967864682161|0.2630967864682161|
+--+--+


Input argument to rand must be an integer literal.;; line 1 pos 0
org.apache.spark.sql.AnalysisException: Input argument to rand must be an 
integer literal.;; line 1 pos 0
at 
org.apache.spark.sql.catalyst.analysis.FunctionRegistry$$anonfun$5.apply(FunctionRegistry.scala:465)
at
```

  **After**

```
+--+--+
|   rand(1)|   rand(1)|
+--+--+
|0.2630967864682161|0.2630967864682161|
+--+--+

+--+--+
|   rand(1)|   rand(1)|
+--+--+
|0.2630967864682161|0.2630967864682161|
+--+--+
```


## How was this patch tested?

Unit tests in `DataFrameSuite.scala` and `RandomSuite.scala`.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark SPARK-17854

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15432.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15432


commit 7fa7db22dd4f2ba88ab1f09e4b776003b3f62fdb
Author: hyukjinkwon 
Date:   2016-10-11T09:21:18Z

rand/randn allows null as input seed

commit 6f8f3f33f9b67d77285048bfd7d794990e072b8a
Author: hyukjinkwon 
Date:   2016-10-12T12:23:56Z

Use ExpectsInputTypes and allow LongType and IntegerType

commit a99f674ff9b9cebb730a1e290c0fa05af8627f1d
Author: hyukjinkwon 
Date:   2016-10-12T14:31:11Z

Override constructor




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15432: [SPARK-17854][SQL] rand/randn allows null/long as...

2016-10-12 Thread HyukjinKwon
Github user HyukjinKwon closed the pull request at:

https://github.com/apache/spark/pull/15432


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org