[GitHub] spark pull request #22047: [SPARK-19851] Add support for EVERY and ANY (SOME...
Github user dilipbiswal closed the pull request at: https://github.com/apache/spark/pull/22047 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22047: [SPARK-19851] Add support for EVERY and ANY (SOME...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22047#discussion_r227367500 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/AnyAgg.scala --- @@ -0,0 +1,64 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.aggregate + +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult +import org.apache.spark.sql.catalyst.dsl.expressions._ +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.util.TypeUtils +import org.apache.spark.sql.types._ + +@ExpressionDescription( + usage = "_FUNC_(expr) - Returns true if at least one value of `expr` is true.") --- End diff -- BTW, don't forget to add `since`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22047: [SPARK-19851] Add support for EVERY and ANY (SOME...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/22047#discussion_r227054412 --- Diff: python/pyspark/sql/functions.py --- @@ -403,6 +403,28 @@ def countDistinct(col, *cols): return Column(jc) +def every(col): --- End diff -- @cloud-fan Thank you very much for your response. I will create a new PR based on option-1 today and close this one. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22047: [SPARK-19851] Add support for EVERY and ANY (SOME...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22047#discussion_r226985475 --- Diff: python/pyspark/sql/functions.py --- @@ -403,6 +403,28 @@ def countDistinct(col, *cols): return Column(jc) +def every(col): --- End diff -- +1 for option 1 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22047: [SPARK-19851] Add support for EVERY and ANY (SOME...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/22047#discussion_r225700828 --- Diff: python/pyspark/sql/functions.py --- @@ -403,6 +403,28 @@ def countDistinct(col, *cols): return Column(jc) +def every(col): --- End diff -- @gatorsmile Hi Sean, I have prepared two branches. One in which these new aggregate functions are extending from the base Max and Min class basically reusing code. The other in which we replace these aggregate expressions in the optimizer. Below are the links. 1. [branch-extend](https://github.com/dilipbiswal/spark/tree/SPARK-19851-extend) 2. [branch-rewrite](https://github.com/dilipbiswal/spark/tree/SPARK-19851-rewrite) I would prefer option 1 because of the following reasons. 1. Code changes are simpler 2. Supports these aggregates as window expressions naturally. In the other option i have to block it. 3. It seems to me for these simple mapping, we probably don't need a rewrite frame work. We could add it in the future if we need a little complex transformation. Please let me know how we want to move forward with this. Thanks !! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22047: [SPARK-19851] Add support for EVERY and ANY (SOME...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/22047#discussion_r223171963 --- Diff: python/pyspark/sql/functions.py --- @@ -403,6 +403,28 @@ def countDistinct(col, *cols): return Column(jc) +def every(col): --- End diff -- @gatorsmile OK. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22047: [SPARK-19851] Add support for EVERY and ANY (SOME...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/22047#discussion_r223164281 --- Diff: python/pyspark/sql/functions.py --- @@ -403,6 +403,28 @@ def countDistinct(col, *cols): return Column(jc) +def every(col): --- End diff -- Please keep the SQL functions and remove the function APIs. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22047: [SPARK-19851] Add support for EVERY and ANY (SOME...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/22047#discussion_r209157977 --- Diff: python/pyspark/sql/functions.py --- @@ -202,6 +202,12 @@ def _(): """, } +_functions_2_2 = { --- End diff -- @HyukjinKwon Will look into this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22047: [SPARK-19851] Add support for EVERY and ANY (SOME...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/22047#discussion_r209157855 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala --- @@ -88,7 +88,7 @@ class RelationalGroupedDataset protected[sql]( } private[this] def aggregateNumericColumns(colNames: String*)(f: Expression => AggregateFunction) -: DataFrame = { --- End diff -- @HyukjinKwon Thanks ... will fix. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22047: [SPARK-19851] Add support for EVERY and ANY (SOME...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/22047#discussion_r209157799 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala --- @@ -297,9 +318,44 @@ class RelationalGroupedDataset protected[sql]( } /** - * Pivots a column of the current `DataFrame` and performs the specified aggregation. + * Compute the logical and of all boolean columns for each group. + * The resulting `DataFrame` will also contain the grouping columns. + * When specified columns are given, only compute the sum for them. + * + * @since 2.2.0 --- End diff -- Thanks. will fix --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22047: [SPARK-19851] Add support for EVERY and ANY (SOME...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/22047#discussion_r209157669 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -1555,9 +1555,11 @@ case class Left(str: Expression, len: Expression, child: Expression) extends Run * number of bytes of the given binary expression. */ @ExpressionDescription( - usage = "_FUNC_(expr) - Returns the character length of string data or number of bytes of " + -"binary data. The length of string data includes the trailing spaces. The length of binary " + -"data includes binary zeros.", + usage = """ --- End diff -- @HyukjinKwon Not sure why.. when i did a build/sbt doc , i got an error here. Thats the reason i had to fix. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22047: [SPARK-19851] Add support for EVERY and ANY (SOME...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22047#discussion_r209130247 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala --- @@ -297,9 +318,44 @@ class RelationalGroupedDataset protected[sql]( } /** - * Pivots a column of the current `DataFrame` and performs the specified aggregation. + * Compute the logical and of all boolean columns for each group. + * The resulting `DataFrame` will also contain the grouping columns. + * When specified columns are given, only compute the sum for them. + * + * @since 2.2.0 --- End diff -- nit: since version --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22047: [SPARK-19851] Add support for EVERY and ANY (SOME...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22047#discussion_r209130219 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala --- @@ -88,7 +88,7 @@ class RelationalGroupedDataset protected[sql]( } private[this] def aggregateNumericColumns(colNames: String*)(f: Expression => AggregateFunction) -: DataFrame = { --- End diff -- nit: previous indentation was correct. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22047: [SPARK-19851] Add support for EVERY and ANY (SOME...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22047#discussion_r209130145 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -1555,9 +1555,11 @@ case class Left(str: Expression, len: Expression, child: Expression) extends Run * number of bytes of the given binary expression. */ @ExpressionDescription( - usage = "_FUNC_(expr) - Returns the character length of string data or number of bytes of " + -"binary data. The length of string data includes the trailing spaces. The length of binary " + -"data includes binary zeros.", + usage = """ --- End diff -- Looks unrelated --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22047: [SPARK-19851] Add support for EVERY and ANY (SOME...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22047#discussion_r209130078 --- Diff: python/pyspark/sql/functions.py --- @@ -202,6 +202,12 @@ def _(): """, } +_functions_2_2 = { --- End diff -- hm, looks unrelated. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22047: [SPARK-19851] Add support for EVERY and ANY (SOME...
GitHub user dilipbiswal opened a pull request: https://github.com/apache/spark/pull/22047 [SPARK-19851] Add support for EVERY and ANY (SOME) aggregates ## What changes were proposed in this pull request? This PR is a rebased version of original work [link](https://github.com/apache/spark/pull/17648) by @ptkool. Please give credit to @ptkool for this work. Description from original PR: This pull request implements the EVERY and ANY aggregates. ## How was this patch tested? Testing was performed using unit tests, integration tests, and manual tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dilipbiswal/spark SPARK-19851 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22047.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22047 commit 8cf4a3963961083e1feffd45c649ed31785a54ac Author: ptkool Date: 2017-03-07T19:09:32Z Add new aggregates EVERY and ANY (SOME). commit f30b4cde8e398f2d1b8c6cb9842b7a87640f09ee Author: ptkool Date: 2017-03-07T22:59:48Z Fix Scala style check errors. commit 5adeb17eac47aef3d17ea7cbb015cba313a6f4e5 Author: ptkool Date: 2017-03-13T13:37:02Z Resolved issue with Any aggregate and added window function test. commit c192de003a2011440886fcc34562ba1818ba3c3b Author: ptkool Date: 2017-03-13T17:49:21Z Added additional pyspark.sql tests. commit 14bf7a10a0cf1b27e94a7a8087c0a536252cc95f Author: ptkool Date: 2017-03-14T01:34:40Z Fix pyspark window function tests. commit 525392ce0b673416832f3cbea20cc1a4e6885ae2 Author: ptkool Date: 2017-03-15T13:21:47Z Resolve several issues with Pyspark tests. commit ee78f22d236047820c16077f0130ea75bc705aba Author: ptkool Date: 2017-03-15T13:27:53Z Resolve Scala style issues. commit 9503d9e408d1347fc29aaa2e147b7ef17b269c0c Author: ptkool Date: 2017-03-15T13:42:11Z Fix Python style errors. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org