[GitHub] spark pull request #21025: [SPARK-23918][SQL] Add array_min function
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21025 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21025: [SPARK-23918][SQL] Add array_min function
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21025#discussion_r181299913 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -287,3 +287,67 @@ case class ArrayContains(left: Expression, right: Expression) override def prettyName: String = "array_contains" } + + +/** + * Returns the minimum value in the array. + */ +@ExpressionDescription( + usage = "_FUNC_(array) - Returns the minimum value in the array.", --- End diff -- We will ignore NULL, right? Please document it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21025: [SPARK-23918][SQL] Add array_min function
Github user mn-mikke commented on a diff in the pull request: https://github.com/apache/spark/pull/21025#discussion_r180894697 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -287,3 +287,70 @@ case class ArrayContains(left: Expression, right: Expression) override def prettyName: String = "array_contains" } + + +/** + * Returns the minimum value in the array. + */ +@ExpressionDescription( + usage = "_FUNC_(array) - Returns the minimum value in the array.", + examples = """ +Examples: + > SELECT _FUNC_(array(1, 20, null, 3)); + 1 + """, since = "2.4.0") +case class ArrayMin(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { + + override def nullable: Boolean = +child.nullable || child.dataType.asInstanceOf[ArrayType].containsNull + + override def foldable: Boolean = child.foldable --- End diff -- It's already specified in `UnaryExpression` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21025: [SPARK-23918][SQL] Add array_min function
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/21025#discussion_r180431349 --- Diff: python/pyspark/sql/functions.py --- @@ -2080,6 +2080,21 @@ def size(col): return Column(sc._jvm.functions.size(_to_java_column(col))) +@since(2.4) +def array_min(col): +""" +Collection function: returns the minimum value of the array. + +:param col: name of column or expression + +>>> df = spark.createDataFrame([([2, 1, 3],), ([None, 10, -1],)], ['data']) +>>> df.select(array_min(df.data).alias('min')).collect() +[Row(min=1), Row(min=-1)] + """ --- End diff -- you are right, good catch! I was looking for reference at the `sort_array` function below which has the same issue. I will fix it there too, thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21025: [SPARK-23918][SQL] Add array_min function
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21025#discussion_r180429701 --- Diff: python/pyspark/sql/functions.py --- @@ -2080,6 +2080,21 @@ def size(col): return Column(sc._jvm.functions.size(_to_java_column(col))) +@since(2.4) +def array_min(col): +""" +Collection function: returns the minimum value of the array. + +:param col: name of column or expression + +>>> df = spark.createDataFrame([([2, 1, 3],), ([None, 10, -1],)], ['data']) +>>> df.select(array_min(df.data).alias('min')).collect() +[Row(min=1), Row(min=-1)] + """ --- End diff -- """ seems having one more leading space .. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21025: [SPARK-23918][SQL] Add array_min function
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/21025#discussion_r180427313 --- Diff: python/pyspark/sql/functions.py --- @@ -2080,6 +2080,21 @@ def size(col): return Column(sc._jvm.functions.size(_to_java_column(col))) +@since(2.4) +def array_min(col): +""" +Collection function: returns the minimum value of the array. + +:param col: name of column or expression + +>>> df = spark.createDataFrame([([2, 1, 3],), ([None, 10, -1],)], ['data']) +>>> df.select(array_min(df.data).alias('min')).collect() +[Row(min=1), Row(min=-1)] + """ --- End diff -- sorry, I can't see what is the problem here. May you please clarify? Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21025: [SPARK-23918][SQL] Add array_min function
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21025#discussion_r180422191 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -287,3 +287,70 @@ case class ArrayContains(left: Expression, right: Expression) override def prettyName: String = "array_contains" } + + +/** + * Returns the minimum value in the array. + */ +@ExpressionDescription( +usage = "_FUNC_(array) - Returns the minimum value in the array.", +examples = """ +Examples: + > SELECT _FUNC_(array(1, 20, null, 3)); + 1 + """, since = "2.4.0") --- End diff -- indentation .. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21025: [SPARK-23918][SQL] Add array_min function
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21025#discussion_r180421841 --- Diff: python/pyspark/sql/functions.py --- @@ -2080,6 +2080,21 @@ def size(col): return Column(sc._jvm.functions.size(_to_java_column(col))) +@since(2.4) +def array_min(col): +""" +Collection function: returns the minimum value of the array. + +:param col: name of column or expression + +>>> df = spark.createDataFrame([([2, 1, 3],), ([None, 10, -1],)], ['data']) +>>> df.select(array_min(df.data).alias('min')).collect() +[Row(min=1), Row(min=-1)] + """ --- End diff -- quick nit --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21025: [SPARK-23918][SQL] Add array_min function
GitHub user mgaido91 opened a pull request: https://github.com/apache/spark/pull/21025 [SPARK-23918][SQL] Add array_min function ## What changes were proposed in this pull request? The PR adds the SQL function `array_min`. It takes an array as argument and returns the minimum value in it. ## How was this patch tested? added UTs You can merge this pull request into a Git repository by running: $ git pull https://github.com/mgaido91/spark SPARK-23918 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21025.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21025 commit b176f8d94a175190f3ef478d418341aa66d8a82c Author: Marco GaidoDate: 2018-04-10T11:12:40Z [SPARK-23918][SQL] Add array_min function --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org