[GitHub] spark pull request #22047: [SPARK-19851] Add support for EVERY and ANY (SOME...

2018-10-27 Thread dilipbiswal
Github user dilipbiswal closed the pull request at:

https://github.com/apache/spark/pull/22047


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22047: [SPARK-19851] Add support for EVERY and ANY (SOME...

2018-10-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22047#discussion_r227367500
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/AnyAgg.scala
 ---
@@ -0,0 +1,64 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.aggregate
+
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.util.TypeUtils
+import org.apache.spark.sql.types._
+
+@ExpressionDescription(
+  usage = "_FUNC_(expr) - Returns true if at least one value of `expr` is 
true.")
--- End diff --

BTW, don't forget to add `since`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22047: [SPARK-19851] Add support for EVERY and ANY (SOME...

2018-10-22 Thread dilipbiswal
Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/22047#discussion_r227054412
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -403,6 +403,28 @@ def countDistinct(col, *cols):
 return Column(jc)
 
 
+def every(col):
--- End diff --

@cloud-fan Thank you very much for your response. I will create a new PR 
based on option-1 today and close this one.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22047: [SPARK-19851] Add support for EVERY and ANY (SOME...

2018-10-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22047#discussion_r226985475
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -403,6 +403,28 @@ def countDistinct(col, *cols):
 return Column(jc)
 
 
+def every(col):
--- End diff --

+1 for option 1


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22047: [SPARK-19851] Add support for EVERY and ANY (SOME...

2018-10-16 Thread dilipbiswal
Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/22047#discussion_r225700828
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -403,6 +403,28 @@ def countDistinct(col, *cols):
 return Column(jc)
 
 
+def every(col):
--- End diff --

@gatorsmile Hi Sean, I have prepared two branches. One in which these new 
aggregate functions are extending from the base Max and Min class basically 
reusing code. The other in which we replace these aggregate expressions in the 
optimizer. Below are the links.

1. 
[branch-extend](https://github.com/dilipbiswal/spark/tree/SPARK-19851-extend)

2. 
[branch-rewrite](https://github.com/dilipbiswal/spark/tree/SPARK-19851-rewrite)

I would prefer option 1 because of the following reasons.
1. Code changes are simpler
2. Supports these aggregates as window expressions naturally. In the other 
option i have
to block it. 
3. It seems to me for these simple mapping, we probably don't need a 
rewrite frame work. We could add it in the future if we need a little complex 
transformation.

Please let me know how we want to move forward with this. Thanks !!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22047: [SPARK-19851] Add support for EVERY and ANY (SOME...

2018-10-05 Thread dilipbiswal
Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/22047#discussion_r223171963
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -403,6 +403,28 @@ def countDistinct(col, *cols):
 return Column(jc)
 
 
+def every(col):
--- End diff --

@gatorsmile OK.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22047: [SPARK-19851] Add support for EVERY and ANY (SOME...

2018-10-05 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22047#discussion_r223164281
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -403,6 +403,28 @@ def countDistinct(col, *cols):
 return Column(jc)
 
 
+def every(col):
--- End diff --

Please keep the SQL functions and remove the function APIs. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22047: [SPARK-19851] Add support for EVERY and ANY (SOME...

2018-08-09 Thread dilipbiswal
Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/22047#discussion_r209157977
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -202,6 +202,12 @@ def _():
""",
 }
 
+_functions_2_2 = {
--- End diff --

@HyukjinKwon Will look into this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22047: [SPARK-19851] Add support for EVERY and ANY (SOME...

2018-08-09 Thread dilipbiswal
Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/22047#discussion_r209157855
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
@@ -88,7 +88,7 @@ class RelationalGroupedDataset protected[sql](
   }
 
   private[this] def aggregateNumericColumns(colNames: String*)(f: 
Expression => AggregateFunction)
-: DataFrame = {
--- End diff --

@HyukjinKwon Thanks ... will fix.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22047: [SPARK-19851] Add support for EVERY and ANY (SOME...

2018-08-09 Thread dilipbiswal
Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/22047#discussion_r209157799
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
@@ -297,9 +318,44 @@ class RelationalGroupedDataset protected[sql](
   }
 
   /**
-   * Pivots a column of the current `DataFrame` and performs the specified 
aggregation.
+   * Compute the logical and of all boolean columns for each group.
+   * The resulting `DataFrame` will also contain the grouping columns.
+   * When specified columns are given, only compute the sum for them.
+   *
+   * @since 2.2.0
--- End diff --

Thanks. will fix


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22047: [SPARK-19851] Add support for EVERY and ANY (SOME...

2018-08-09 Thread dilipbiswal
Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/22047#discussion_r209157669
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -1555,9 +1555,11 @@ case class Left(str: Expression, len: Expression, 
child: Expression) extends Run
  * number of bytes of the given binary expression.
  */
 @ExpressionDescription(
-  usage = "_FUNC_(expr) - Returns the character length of string data or 
number of bytes of " +
-"binary data. The length of string data includes the trailing spaces. 
The length of binary " +
-"data includes binary zeros.",
+  usage = """
--- End diff --

@HyukjinKwon Not sure why.. when i did a build/sbt doc , i got an error 
here. Thats the reason i had to fix.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22047: [SPARK-19851] Add support for EVERY and ANY (SOME...

2018-08-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22047#discussion_r209130247
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
@@ -297,9 +318,44 @@ class RelationalGroupedDataset protected[sql](
   }
 
   /**
-   * Pivots a column of the current `DataFrame` and performs the specified 
aggregation.
+   * Compute the logical and of all boolean columns for each group.
+   * The resulting `DataFrame` will also contain the grouping columns.
+   * When specified columns are given, only compute the sum for them.
+   *
+   * @since 2.2.0
--- End diff --

nit: since version


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22047: [SPARK-19851] Add support for EVERY and ANY (SOME...

2018-08-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22047#discussion_r209130219
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
@@ -88,7 +88,7 @@ class RelationalGroupedDataset protected[sql](
   }
 
   private[this] def aggregateNumericColumns(colNames: String*)(f: 
Expression => AggregateFunction)
-: DataFrame = {
--- End diff --

nit: previous indentation was correct.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22047: [SPARK-19851] Add support for EVERY and ANY (SOME...

2018-08-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22047#discussion_r209130145
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -1555,9 +1555,11 @@ case class Left(str: Expression, len: Expression, 
child: Expression) extends Run
  * number of bytes of the given binary expression.
  */
 @ExpressionDescription(
-  usage = "_FUNC_(expr) - Returns the character length of string data or 
number of bytes of " +
-"binary data. The length of string data includes the trailing spaces. 
The length of binary " +
-"data includes binary zeros.",
+  usage = """
--- End diff --

Looks unrelated


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22047: [SPARK-19851] Add support for EVERY and ANY (SOME...

2018-08-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22047#discussion_r209130078
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -202,6 +202,12 @@ def _():
""",
 }
 
+_functions_2_2 = {
--- End diff --

hm, looks unrelated.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22047: [SPARK-19851] Add support for EVERY and ANY (SOME...

2018-08-08 Thread dilipbiswal
GitHub user dilipbiswal opened a pull request:

https://github.com/apache/spark/pull/22047

[SPARK-19851] Add support for EVERY and ANY (SOME) aggregates

## What changes were proposed in this pull request?
This PR is a rebased version of original work 
[link](https://github.com/apache/spark/pull/17648) by 
@ptkool. 

Please give credit to @ptkool for this work.

Description from original PR:
This pull request implements the EVERY and ANY aggregates.

## How was this patch tested?
Testing was performed using unit tests, integration tests, and manual tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dilipbiswal/spark SPARK-19851

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22047.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22047


commit 8cf4a3963961083e1feffd45c649ed31785a54ac
Author: ptkool 
Date:   2017-03-07T19:09:32Z

Add new aggregates EVERY and ANY (SOME).

commit f30b4cde8e398f2d1b8c6cb9842b7a87640f09ee
Author: ptkool 
Date:   2017-03-07T22:59:48Z

Fix Scala style check errors.

commit 5adeb17eac47aef3d17ea7cbb015cba313a6f4e5
Author: ptkool 
Date:   2017-03-13T13:37:02Z

Resolved issue with Any aggregate and added window function test.

commit c192de003a2011440886fcc34562ba1818ba3c3b
Author: ptkool 
Date:   2017-03-13T17:49:21Z

Added additional pyspark.sql tests.

commit 14bf7a10a0cf1b27e94a7a8087c0a536252cc95f
Author: ptkool 
Date:   2017-03-14T01:34:40Z

Fix pyspark window function tests.

commit 525392ce0b673416832f3cbea20cc1a4e6885ae2
Author: ptkool 
Date:   2017-03-15T13:21:47Z

Resolve several issues with Pyspark tests.

commit ee78f22d236047820c16077f0130ea75bc705aba
Author: ptkool 
Date:   2017-03-15T13:27:53Z

Resolve Scala style issues.

commit 9503d9e408d1347fc29aaa2e147b7ef17b269c0c
Author: ptkool 
Date:   2017-03-15T13:42:11Z

Fix Python style errors.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org