[GitHub] spark pull request #21025: [SPARK-23918][SQL] Add array_min function

2018-04-17 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21025


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21025: [SPARK-23918][SQL] Add array_min function

2018-04-13 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21025#discussion_r181299913
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -287,3 +287,67 @@ case class ArrayContains(left: Expression, right: 
Expression)
 
   override def prettyName: String = "array_contains"
 }
+
+
+/**
+ * Returns the minimum value in the array.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(array) - Returns the minimum value in the array.",
--- End diff --

We will ignore NULL, right? Please document it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21025: [SPARK-23918][SQL] Add array_min function

2018-04-11 Thread mn-mikke
Github user mn-mikke commented on a diff in the pull request:

https://github.com/apache/spark/pull/21025#discussion_r180894697
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -287,3 +287,70 @@ case class ArrayContains(left: Expression, right: 
Expression)
 
   override def prettyName: String = "array_contains"
 }
+
+
+/**
+ * Returns the minimum value in the array.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(array) - Returns the minimum value in the array.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 20, null, 3));
+   1
+  """, since = "2.4.0")
+case class ArrayMin(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
+
+  override def nullable: Boolean =
+child.nullable || child.dataType.asInstanceOf[ArrayType].containsNull
+
+  override def foldable: Boolean = child.foldable
--- End diff --

It's already specified in `UnaryExpression`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21025: [SPARK-23918][SQL] Add array_min function

2018-04-10 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21025#discussion_r180431349
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -2080,6 +2080,21 @@ def size(col):
 return Column(sc._jvm.functions.size(_to_java_column(col)))
 
 
+@since(2.4)
+def array_min(col):
+"""
+Collection function: returns the minimum value of the array.
+
+:param col: name of column or expression
+
+>>> df = spark.createDataFrame([([2, 1, 3],), ([None, 10, -1],)], 
['data'])
+>>> df.select(array_min(df.data).alias('min')).collect()
+[Row(min=1), Row(min=-1)]
+ """
--- End diff --

you are right, good catch! I was looking for reference at the `sort_array` 
function below which has the same issue. I will fix it there too, thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21025: [SPARK-23918][SQL] Add array_min function

2018-04-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21025#discussion_r180429701
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -2080,6 +2080,21 @@ def size(col):
 return Column(sc._jvm.functions.size(_to_java_column(col)))
 
 
+@since(2.4)
+def array_min(col):
+"""
+Collection function: returns the minimum value of the array.
+
+:param col: name of column or expression
+
+>>> df = spark.createDataFrame([([2, 1, 3],), ([None, 10, -1],)], 
['data'])
+>>> df.select(array_min(df.data).alias('min')).collect()
+[Row(min=1), Row(min=-1)]
+ """
--- End diff --

""" seems having one more leading space .. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21025: [SPARK-23918][SQL] Add array_min function

2018-04-10 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21025#discussion_r180427313
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -2080,6 +2080,21 @@ def size(col):
 return Column(sc._jvm.functions.size(_to_java_column(col)))
 
 
+@since(2.4)
+def array_min(col):
+"""
+Collection function: returns the minimum value of the array.
+
+:param col: name of column or expression
+
+>>> df = spark.createDataFrame([([2, 1, 3],), ([None, 10, -1],)], 
['data'])
+>>> df.select(array_min(df.data).alias('min')).collect()
+[Row(min=1), Row(min=-1)]
+ """
--- End diff --

sorry, I can't see what is the problem here. May you please clarify? Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21025: [SPARK-23918][SQL] Add array_min function

2018-04-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21025#discussion_r180422191
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -287,3 +287,70 @@ case class ArrayContains(left: Expression, right: 
Expression)
 
   override def prettyName: String = "array_contains"
 }
+
+
+/**
+ * Returns the minimum value in the array.
+ */
+@ExpressionDescription(
+usage = "_FUNC_(array) - Returns the minimum value in the array.",
+examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 20, null, 3));
+   1
+  """, since = "2.4.0")
--- End diff --

indentation .. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21025: [SPARK-23918][SQL] Add array_min function

2018-04-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21025#discussion_r180421841
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -2080,6 +2080,21 @@ def size(col):
 return Column(sc._jvm.functions.size(_to_java_column(col)))
 
 
+@since(2.4)
+def array_min(col):
+"""
+Collection function: returns the minimum value of the array.
+
+:param col: name of column or expression
+
+>>> df = spark.createDataFrame([([2, 1, 3],), ([None, 10, -1],)], 
['data'])
+>>> df.select(array_min(df.data).alias('min')).collect()
+[Row(min=1), Row(min=-1)]
+ """
--- End diff --

quick nit


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21025: [SPARK-23918][SQL] Add array_min function

2018-04-10 Thread mgaido91
GitHub user mgaido91 opened a pull request:

https://github.com/apache/spark/pull/21025

[SPARK-23918][SQL] Add array_min function

## What changes were proposed in this pull request?

The PR adds the SQL function `array_min`. It takes an array as argument and 
returns the minimum value in it.

## How was this patch tested?

added UTs


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mgaido91/spark SPARK-23918

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21025.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21025


commit b176f8d94a175190f3ef478d418341aa66d8a82c
Author: Marco Gaido 
Date:   2018-04-10T11:12:40Z

[SPARK-23918][SQL] Add array_min function




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org