[GitHub] spark pull request: [SPARK-15648][SQL]add TeradataDialect.scala

2016-05-30 Thread lihongliustc
Github user lihongliustc commented on the pull request:

https://github.com/apache/spark/pull/13359#issuecomment-222607185
  
@srowen Hi,srowen,does the PR be delayed?What should I do to it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15647] [SQL] Fix Boundary Cases in Opti...

2016-05-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13392#issuecomment-222604482
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15647] [SQL] Fix Boundary Cases in Opti...

2016-05-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13392#issuecomment-222604484
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59636/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15647] [SQL] Fix Boundary Cases in Opti...

2016-05-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13392#issuecomment-222604316
  
**[Test build #59636 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59636/consoleFull)**
 for PR 13392 at commit 
[`b2849e8`](https://github.com/apache/spark/commit/b2849e8f514c1265f7c6199aba980e95b72aa7c2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

2016-05-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13270#issuecomment-222603352
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

2016-05-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13270#issuecomment-222603353
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59635/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

2016-05-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13270#issuecomment-222603171
  
**[Test build #59635 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59635/consoleFull)**
 for PR 13270 at commit 
[`336fb55`](https://github.com/apache/spark/commit/336fb55406ad19eb7cc7276cd771ebd92ed8dec1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15644] [MLlib] [SQL] Replace SQLContext...

2016-05-30 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/13380#issuecomment-222602457
  
cc @jkbradley / @mengxr  are we ok with changing the API?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15658][SQL] UDT serializer should decla...

2016-05-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13402#issuecomment-222601872
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15658][SQL] UDT serializer should decla...

2016-05-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13402#issuecomment-222601873
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59633/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15647] [SQL] Fix Boundary Cases in Opti...

2016-05-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13392#issuecomment-222601745
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59634/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15647] [SQL] Fix Boundary Cases in Opti...

2016-05-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13392#issuecomment-222601744
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15658][SQL] UDT serializer should decla...

2016-05-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13402#issuecomment-222601649
  
**[Test build #59633 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59633/consoleFull)**
 for PR 13402 at commit 
[`6d614dd`](https://github.com/apache/spark/commit/6d614dd3ae4d5ee97083ae99ea527a7c5eaa9f0a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15647] [SQL] Fix Boundary Cases in Opti...

2016-05-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13392#issuecomment-222601564
  
**[Test build #59634 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59634/consoleFull)**
 for PR 13392 at commit 
[`4306c4f`](https://github.com/apache/spark/commit/4306c4fe0b741689bb0ff5349506707e8a7ec520).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15076][SQL] Improve ConstantFolding opt...

2016-05-30 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/12850#discussion_r65127553
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -751,6 +751,16 @@ object ConstantFolding extends Rule[LogicalPlan] {
 
   // Fold expressions that are foldable.
   case e if e.foldable => Literal.create(e.eval(EmptyRow), e.dataType)
+
+  // Use associative property for integral type
+  case e if e.isInstanceOf[BinaryArithmetic] && 
e.dataType.isInstanceOf[IntegralType]
+=> e match {
+case Add(Add(a, b), c) if b.foldable && c.foldable => Add(a, 
Add(b, c))
--- End diff --

Thank you for review, @cloud-fan ! 
I see. That sounds great.
Let me think about how to eliminate all constants then.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15659][SQL] Ensure FileSystem is gotten...

2016-05-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13405#issuecomment-222600020
  
**[Test build #59640 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59640/consoleFull)**
 for PR 13405 at commit 
[`7e01426`](https://github.com/apache/spark/commit/7e01426f98dd8f5f68a1cb6d3fa8a5d47686ac0b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15659][SQL] Ensure FileSystem is gotten...

2016-05-30 Thread jerryshao
GitHub user jerryshao opened a pull request:

https://github.com/apache/spark/pull/13405

[SPARK-15659][SQL] Ensure FileSystem is gotten from path

## What changes were proposed in this pull request?

Currently `spark.sql.warehouse.dir` is pointed to local dir by default, 
which will throw exception when HADOOP_CONF_DIR is configured and default FS is 
hdfs.

```
java.lang.IllegalArgumentException: Wrong FS: 
file:/Users/sshao/projects/apache-spark/spark-warehouse, expected: 
hdfs://localhost:8020
```

So we should always get the `FileSystem` from `Path` to avoid wrong FS 
problem.

## How was this patch tested?

Local test.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jerryshao/apache-spark SPARK-15659

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13405.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13405


commit 7e01426f98dd8f5f68a1cb6d3fa8a5d47686ac0b
Author: jerryshao 
Date:   2016-05-31T05:58:29Z

Ensure FileSystem is gotten from path to avoid default FileSystem conflicts




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15076][SQL] Improve ConstantFolding opt...

2016-05-30 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/12850#discussion_r65126741
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -751,6 +751,16 @@ object ConstantFolding extends Rule[LogicalPlan] {
 
   // Fold expressions that are foldable.
   case e if e.foldable => Literal.create(e.eval(EmptyRow), e.dataType)
+
+  // Use associative property for integral type
+  case e if e.isInstanceOf[BinaryArithmetic] && 
e.dataType.isInstanceOf[IntegralType]
+=> e match {
+case Add(Add(a, b), c) if b.foldable && c.foldable => Add(a, 
Add(b, c))
--- End diff --

what about `a + 1 + b + 2`? I think we need a more general approach, like 
reordering the `Add` nodes to put all literals together.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15638][SQL] Audit Dataset, SparkSession...

2016-05-30 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13370


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MINOR][CORE][DOCS] Fix description of FilterF...

2016-05-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13404#issuecomment-222597618
  
**[Test build #59639 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59639/consoleFull)**
 for PR 13404 at commit 
[`d94b2f6`](https://github.com/apache/spark/commit/d94b2f66c2da1d2cd7f7638b6cde0a2b7b354149).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15638][SQL] Audit Dataset, SparkSession...

2016-05-30 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/13370#issuecomment-222597634
  
Thanks - merging in master/2.0.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15638][SQL] Audit Dataset, SparkSession...

2016-05-30 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/13370#discussion_r65126050
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala ---
@@ -204,8 +205,8 @@ class KeyValueGroupedDataset[K, V] private[sql](
* Internal helper function for building typed aggregations that return 
tuples.  For simplicity
* and code reuse, we do this without the help of the type system and 
then use helper functions
* that cast appropriately for the user facing interface.
-   * TODO: does not handle aggregations that return nonflat results,
*/
+  // TODO: does not handle aggregations that return nonflat results.
--- End diff --

cool i will remove it



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MINOR][CORE][DOCS] Fix description of FilterF...

2016-05-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13404#issuecomment-222597423
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MINOR][CORE][DOCS] Fix description of FilterF...

2016-05-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13404#issuecomment-222597425
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59638/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MINOR][CORE][DOCS] Fix description of FilterF...

2016-05-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13404#issuecomment-222597411
  
**[Test build #59638 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59638/consoleFull)**
 for PR 13404 at commit 
[`94f666a`](https://github.com/apache/spark/commit/94f666a54c4865ec2d915ae1a7250506aa836faf).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MINOR][CORE][DOCS] Fix description of FilterF...

2016-05-30 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13404#discussion_r65125923
  
--- Diff: 
core/src/main/java/org/apache/spark/api/java/function/package.scala ---
@@ -22,4 +22,5 @@ package org.apache.spark.api.java
  * these interfaces to pass functions to various Java API methods for 
Spark. Please visit Spark's
  * Java programming guide for more details.
  */
-package object function 
--- End diff --

This is just removing one ending space and adding one blank line.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MINOR][CORE][DOCS] Fix description of FilterF...

2016-05-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13404#issuecomment-222596540
  
**[Test build #59638 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59638/consoleFull)**
 for PR 13404 at commit 
[`94f666a`](https://github.com/apache/spark/commit/94f666a54c4865ec2d915ae1a7250506aa836faf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...

2016-05-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12836#issuecomment-222596477
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...

2016-05-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12836#issuecomment-222596478
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59631/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MINOR][CORE][DOC Fix description of FilterFun...

2016-05-30 Thread dongjoon-hyun
GitHub user dongjoon-hyun opened a pull request:

https://github.com/apache/spark/pull/13404

[MINOR][CORE][DOC Fix description of FilterFunction

## What changes were proposed in this pull request?

This PR fixes the wrong description of `FilterFunction`.
```
- * If the function returns true, the element is discarded in the returned 
Dataset.
+ * If the function returns true, the element is included in the returned 
Dataset.
```

## How was this patch tested?




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dongjoon-hyun/spark minor_fix_java_api

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13404.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13404


commit 94f666a54c4865ec2d915ae1a7250506aa836faf
Author: Dongjoon Hyun 
Date:   2016-05-31T05:31:39Z

[MINOR][CORE] Fix description of FilterFunction




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...

2016-05-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12836#issuecomment-222596375
  
**[Test build #59631 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59631/consoleFull)**
 for PR 12836 at commit 
[`7b5767a`](https://github.com/apache/spark/commit/7b5767ad25aaa1f091c4b2d22d7a99cf3d8ec00b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15557][SQL] expressi[on ((cast(99 as de...

2016-05-30 Thread cloud-fan
Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/13368#issuecomment-222596139
  
cc @yhuai @davies , do you still remember why we promote string to 
decimal(38, 18) instead of double?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...

2016-05-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12836#issuecomment-222595900
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59630/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...

2016-05-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12836#issuecomment-222595899
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...

2016-05-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12836#issuecomment-222595798
  
**[Test build #59630 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59630/consoleFull)**
 for PR 12836 at commit 
[`a0425c1`](https://github.com/apache/spark/commit/a0425c17906fcd2ea1d8dd6fb33c0fd8a860d4a7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15638][SQL] Audit Dataset, SparkSession...

2016-05-30 Thread cloud-fan
Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/13370#issuecomment-222595668
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15638][SQL] Audit Dataset, SparkSession...

2016-05-30 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13370#discussion_r65125196
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala ---
@@ -204,8 +205,8 @@ class KeyValueGroupedDataset[K, V] private[sql](
* Internal helper function for building typed aggregations that return 
tuples.  For simplicity
* and code reuse, we do this without the help of the type system and 
then use helper functions
* that cast appropriately for the user facing interface.
-   * TODO: does not handle aggregations that return nonflat results,
*/
+  // TODO: does not handle aggregations that return nonflat results.
--- End diff --

I'm pretty sure this TODO is already done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15660][CORE] RDD and Dataset should sho...

2016-05-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13403#issuecomment-222595435
  
**[Test build #59637 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59637/consoleFull)**
 for PR 13403 at commit 
[`3fe0cb6`](https://github.com/apache/spark/commit/3fe0cb6024ba44b1645bc74f1fbe29267571caa0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15638][SQL] Audit Dataset, SparkSession...

2016-05-30 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13370#discussion_r65125156
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Column.scala ---
@@ -132,6 +130,15 @@ class Column(protected[sql] val expr: Expression) 
extends Logging {
 case _ => UnresolvedAttribute.quotedString(name)
   })
 
+  override def toString: String = usePrettyExpression(expr).sql
+
+  override def equals(that: Any): Boolean = that match {
+case that: Column => that.expr.equals(this.expr)
--- End diff --

oh sorry you just move them up, it's fine to keep them same as before.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15638][SQL] Audit Dataset, SparkSession...

2016-05-30 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13370#discussion_r65125096
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Column.scala ---
@@ -132,6 +130,15 @@ class Column(protected[sql] val expr: Expression) 
extends Logging {
 case _ => UnresolvedAttribute.quotedString(name)
   })
 
+  override def toString: String = usePrettyExpression(expr).sql
+
+  override def equals(that: Any): Boolean = that match {
+case that: Column => that.expr.equals(this.expr)
--- End diff --

how about `that.expr.semanticEquals(this.expr)`? One column equals to 
another if they always produce same result.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15660][CORE] RDD and Dataset should sho...

2016-05-30 Thread dongjoon-hyun
GitHub user dongjoon-hyun opened a pull request:

https://github.com/apache/spark/pull/13403

[SPARK-15660][CORE] RDD and Dataset should show the consistent values for 
variance/stdev.

## What changes were proposed in this pull request?

In Spark-11490, `variance/stdev` are redefined as the **sample** 
`variance/stdev` instead of population ones. This PR addresses the only 
remaining legacy in RDD. This may cause breaking changes, but we had better be 
consistent in Spark 2.0 if possible. This PR also `popVariance` and `popStdev` 
functions.

```scala
scala> val rdd = sc.parallelize(Seq(1.0, 2.0, 3.0))
rdd: org.apache.spark.rdd.RDD[Double] = ParallelCollectionRDD[0] at 
parallelize at :24

scala> rdd.stdev
res0: Double = 0.816496580927726


scala> rdd.toDS().describe().show()
16/05/30 22:20:12 WARN ObjectStore: Version information not found in 
metastore. hive.metastore.schema.verification is not enabled so recording the 
schema version 1.2.0
16/05/30 22:20:12 WARN ObjectStore: Failed to get database default, 
returning NoSuchObjectException
+---+-+
|summary|value|
+---+-+
|  count|3|
|   mean|  2.0|
| stddev|  1.0|
|min|  1.0|
|max|  3.0|
+---+-+
```

## How was this patch tested?

Pass the updated Jenkins tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dongjoon-hyun/spark SPARK-15660

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13403.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13403


commit 3fe0cb6024ba44b1645bc74f1fbe29267571caa0
Author: Dongjoon Hyun 
Date:   2016-05-31T05:22:16Z

[SPARK-15660][CORE] RDD and Dataset should show the consistent value for 
variance/stdev.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15647] [SQL] Fix Boundary Cases in Opti...

2016-05-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13392#issuecomment-22259
  
**[Test build #59636 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59636/consoleFull)**
 for PR 13392 at commit 
[`b2849e8`](https://github.com/apache/spark/commit/b2849e8f514c1265f7c6199aba980e95b72aa7c2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15647] [SQL] Fix Boundary Cases in Opti...

2016-05-30 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/13392#discussion_r65124178
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/internal/SQLConfSuite.scala ---
@@ -219,4 +220,41 @@ class SQLConfSuite extends QueryTest with 
SharedSQLContext {
 }
   }
 
+  test("MAX_CASES_BRANCHES") {
+import testImplicits._
+
+val original = spark.conf.get(SQLConf.MAX_CASES_BRANCHES)
+try {
+  withTable("tab1") {
+spark
+  .range(10)
+  .select('id as 'a, 'id as 'b, 'id as 'c, 'id as 'd)
--- End diff --

Sure, will do it. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15647] [SQL] Fix Boundary Cases in Opti...

2016-05-30 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/13392#discussion_r65124192
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/internal/SQLConfSuite.scala ---
@@ -219,4 +220,41 @@ class SQLConfSuite extends QueryTest with 
SharedSQLContext {
 }
   }
 
+  test("MAX_CASES_BRANCHES") {
+import testImplicits._
+
+val original = spark.conf.get(SQLConf.MAX_CASES_BRANCHES)
+try {
+  withTable("tab1") {
+spark
+  .range(10)
+  .select('id as 'a, 'id as 'b, 'id as 'c, 'id as 'd)
+  .write
+  .saveAsTable("tab1")
+
+val sql_one_branch_caseWhen = "SELECT CASE WHEN a = 1 THEN 1 END 
FROM tab1"
+val sql_two_branch_caseWhen = "SELECT CASE WHEN a = 1 THEN 1 ELSE 
0 END FROM tab1"
+
+spark.conf.set(SQLConf.MAX_CASES_BRANCHES.key, "0")
--- End diff --

Yeah, will do it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15657][SQL] RowEncoder should validate ...

2016-05-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13401#issuecomment-222593249
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15657][SQL] RowEncoder should validate ...

2016-05-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13401#issuecomment-222593250
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59629/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15657][SQL] RowEncoder should validate ...

2016-05-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13401#issuecomment-222593159
  
**[Test build #59629 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59629/consoleFull)**
 for PR 13401 at commit 
[`b6c1a5f`](https://github.com/apache/spark/commit/b6c1a5fc6013b643ae39aad32224d08d71b63e00).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class ValidateExternalType(child: Expression, expected: DataType)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

2016-05-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13270#issuecomment-222592978
  
**[Test build #59635 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59635/consoleFull)**
 for PR 13270 at commit 
[`336fb55`](https://github.com/apache/spark/commit/336fb55406ad19eb7cc7276cd771ebd92ed8dec1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15601][CORE] CircularBuffer's toString(...

2016-05-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13351#issuecomment-222592731
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59628/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15601][CORE] CircularBuffer's toString(...

2016-05-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13351#issuecomment-222592730
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15601][CORE] CircularBuffer's toString(...

2016-05-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13351#issuecomment-222592645
  
**[Test build #59628 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59628/consoleFull)**
 for PR 13351 at commit 
[`a0ae62e`](https://github.com/apache/spark/commit/a0ae62eaf7ecc19565695da68d3b42cc4aac8f09).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15647] [SQL] Fix Boundary Cases in Opti...

2016-05-30 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13392#discussion_r65123693
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/internal/SQLConfSuite.scala ---
@@ -219,4 +220,41 @@ class SQLConfSuite extends QueryTest with 
SharedSQLContext {
 }
   }
 
+  test("MAX_CASES_BRANCHES") {
+import testImplicits._
+
+val original = spark.conf.get(SQLConf.MAX_CASES_BRANCHES)
+try {
+  withTable("tab1") {
+spark
+  .range(10)
+  .select('id as 'a, 'id as 'b, 'id as 'c, 'id as 'd)
+  .write
+  .saveAsTable("tab1")
+
+val sql_one_branch_caseWhen = "SELECT CASE WHEN a = 1 THEN 1 END 
FROM tab1"
+val sql_two_branch_caseWhen = "SELECT CASE WHEN a = 1 THEN 1 ELSE 
0 END FROM tab1"
+
+spark.conf.set(SQLConf.MAX_CASES_BRANCHES.key, "0")
--- End diff --

how about:
```
withTable {
  spark.range(10)
  val oneBranchCaseWhen = 
  val twoBranchCaseWhen = ...
  withConf {
assert(...)
  }
  withConf {
assert(...)
  }
  withConf {
assert(...)
  }
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15647] [SQL] Fix Boundary Cases in Opti...

2016-05-30 Thread cloud-fan
Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/13392#issuecomment-222592448
  
LGTM except some style comments


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

2016-05-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13270#issuecomment-222592244
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59632/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

2016-05-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13270#issuecomment-222592243
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15647] [SQL] Fix Boundary Cases in Opti...

2016-05-30 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13392#discussion_r65123624
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/internal/SQLConfSuite.scala ---
@@ -219,4 +220,41 @@ class SQLConfSuite extends QueryTest with 
SharedSQLContext {
 }
   }
 
+  test("MAX_CASES_BRANCHES") {
+import testImplicits._
+
+val original = spark.conf.get(SQLConf.MAX_CASES_BRANCHES)
+try {
+  withTable("tab1") {
+spark
+  .range(10)
+  .select('id as 'a, 'id as 'b, 'id as 'c, 'id as 'd)
--- End diff --

I think we only need `a`? or just `spark.range(10).write.saveAsTable`, then 
we can use `id` in the case when


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

2016-05-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13270#issuecomment-222592234
  
**[Test build #59632 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59632/consoleFull)**
 for PR 13270 at commit 
[`3830dbb`](https://github.com/apache/spark/commit/3830dbb646b0b076eb994ebaec1a14d8a8d502dd).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15658][SQL] UDT serializer should decla...

2016-05-30 Thread cloud-fan
Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/13402#issuecomment-222592066
  
cc @yhuai @zsxwing 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15647] [SQL] Fix Boundary Cases in Opti...

2016-05-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13392#issuecomment-222592083
  
**[Test build #59634 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59634/consoleFull)**
 for PR 13392 at commit 
[`4306c4f`](https://github.com/apache/spark/commit/4306c4fe0b741689bb0ff5349506707e8a7ec520).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15658][SQL] UDT serializer should decla...

2016-05-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13402#issuecomment-222592077
  
**[Test build #59633 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59633/consoleFull)**
 for PR 13402 at commit 
[`6d614dd`](https://github.com/apache/spark/commit/6d614dd3ae4d5ee97083ae99ea527a7c5eaa9f0a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15658][SQL] UDT serializer should decla...

2016-05-30 Thread cloud-fan
GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/13402

[SPARK-15658][SQL] UDT serializer should declare its data type as udt 
instead of udt.sqlType

## What changes were proposed in this pull request?

When we build serializer for UDT object, we should declare its data type as 
udt instead of udt.sqlType, or if we deserialize it again, we lose the 
information that it's a udt object and throw analysis exception.


## How was this patch tested?

new test in `UserDefiendTypeSuite`

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark udt

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13402.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13402


commit 6d614dd3ae4d5ee97083ae99ea527a7c5eaa9f0a
Author: Wenchen Fan 
Date:   2016-05-31T04:48:39Z

UDT serializer should declare its data type as udt instead of udt.sqlType




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

2016-05-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13270#issuecomment-222591642
  
**[Test build #59632 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59632/consoleFull)**
 for PR 13270 at commit 
[`3830dbb`](https://github.com/apache/spark/commit/3830dbb646b0b076eb994ebaec1a14d8a8d502dd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

2016-05-30 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13270#discussion_r65123201
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -212,11 +212,46 @@ class SessionCatalog(
* If no such database is specified, create it in the current database.
*/
   def createTable(tableDefinition: CatalogTable, ignoreIfExists: Boolean): 
Unit = {
-val db = 
formatDatabaseName(tableDefinition.identifier.database.getOrElse(getCurrentDatabase))
-val table = formatTableName(tableDefinition.identifier.table)
+val tableId = tableDefinition.identifier
+val db = 
formatDatabaseName(tableId.database.getOrElse(getCurrentDatabase))
+val table = formatTableName(tableId.table)
 val newTableDefinition = tableDefinition.copy(identifier = 
TableIdentifier(table, Some(db)))
 requireDbExists(db)
-externalCatalog.createTable(db, newTableDefinition, ignoreIfExists)
+
+if (
+  // If this is an external data source table...
+  tableDefinition.properties.contains("spark.sql.sources.provider") &&
+  newTableDefinition.tableType == CatalogTableType.EXTERNAL &&
+  // ... that is not persisted as Hive compatible format (external 
tables in Hive compatible
+  // format always set `locationUri` to the actual data location and 
should NOT be hacked as
+  // following.)
+  tableDefinition.storage.locationUri.isEmpty
+) {
+  // !! HACK ALERT !!
+  //
+  // Due to a restriction of Hive metastore, here we have to set 
`locationUri` to a temporary
+  // directory that doesn't exist yet but can definitely be 
successfully created, and then
+  // delete it right after creating the external data source table. 
This location will be
+  // persisted to Hive metastore as standard Hive table location URI, 
but Spark SQL doesn't
+  // really use it. Also, since we only do this workaround for 
external tables, deleting the
+  // directory after the fact doesn't do any harm.
+  //
+  // Please refer to https://issues.apache.org/jira/browse/SPARK-15269 
for more details.
+
+  val tempPath =
+new Path(defaultTablePath(tableId).stripSuffix(Path.SEPARATOR) + 
"-__PLACEHOLDER__")
+
+  try {
+externalCatalog.createTable(
+  db,
+  newTableDefinition.withNewStorage(locationUri = 
Some(tempPath.toString)),
+  ignoreIfExists)
+  } finally {
+FileSystem.get(tempPath.toUri, hadoopConf).delete(tempPath, true)
+  }
+} else {
+  externalCatalog.createTable(db, newTableDefinition, ignoreIfExists)
+}
--- End diff --

Added these changes here mostly because `HiveExternalCatalog` doesn't have 
access to the Hadoop configuration, which is used to instantiate the 
`FileSystem` instance. Added an extra constructor argument to 
`HiveExternalCatalog` and moved this change there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15647] [SQL] Fix Boundary Cases in Opti...

2016-05-30 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/13392#discussion_r65122900
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/internal/SQLConfSuite.scala ---
@@ -17,13 +17,36 @@
 
 package org.apache.spark.sql.internal
 
+import org.scalatest.BeforeAndAfterAll
+
 import org.apache.spark.sql.{QueryTest, Row, SparkSession, SQLContext}
+import org.apache.spark.sql.execution.WholeStageCodegenExec
 import org.apache.spark.sql.test.{SharedSQLContext, TestSQLContext}
 
-class SQLConfSuite extends QueryTest with SharedSQLContext {
+class SQLConfSuite extends QueryTest with SharedSQLContext with 
BeforeAndAfterAll {
+  import testImplicits._
+
   private val testKey = "test.key.0"
   private val testVal = "test.val.0"
 
+  override def beforeAll() {
+super.beforeAll()
+sql("DROP TABLE IF EXISTS testData")
+spark
+  .range(10)
+  .select('id as 'a, 'id as 'b, 'id as 'c, 'id as 'd)
+  .write
+  .saveAsTable("testData")
--- End diff --

Sure, will do. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14649][CORE] DagScheduler should not ru...

2016-05-30 Thread sitalkedia
Github user sitalkedia commented on the pull request:

https://github.com/apache/spark/pull/12436#issuecomment-222590636
  
@kayousterhout - Sure, I will resolve the conflicts. Can you take a cursory 
look at the diff and let me know if the approach is reasonable? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15647] [SQL] Fix Boundary Cases in Opti...

2016-05-30 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13392#discussion_r65122821
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/internal/SQLConfSuite.scala ---
@@ -17,13 +17,36 @@
 
 package org.apache.spark.sql.internal
 
+import org.scalatest.BeforeAndAfterAll
+
 import org.apache.spark.sql.{QueryTest, Row, SparkSession, SQLContext}
+import org.apache.spark.sql.execution.WholeStageCodegenExec
 import org.apache.spark.sql.test.{SharedSQLContext, TestSQLContext}
 
-class SQLConfSuite extends QueryTest with SharedSQLContext {
+class SQLConfSuite extends QueryTest with SharedSQLContext with 
BeforeAndAfterAll {
+  import testImplicits._
+
   private val testKey = "test.key.0"
   private val testVal = "test.val.0"
 
+  override def beforeAll() {
+super.beforeAll()
+sql("DROP TABLE IF EXISTS testData")
+spark
+  .range(10)
+  .select('id as 'a, 'id as 'b, 'id as 'c, 'id as 'd)
+  .write
+  .saveAsTable("testData")
--- End diff --

instead of creating this table in `beforeAll`, can we create it just in the 
test case?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

2016-05-30 Thread devaraj-kavali
Github user devaraj-kavali commented on the pull request:

https://github.com/apache/spark/pull/11996#issuecomment-222589679
  
Thanks @kayousterhout.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15655] [SQL] Fix Wrong Partition Column...

2016-05-30 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/13400#discussion_r65122095
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala 
---
@@ -1537,6 +1537,35 @@ class SQLQuerySuite extends QueryTest with 
SQLTestUtils with TestHiveSingleton {
 assert(fs.exists(path), "This is an external table, so the data should 
not have been dropped")
   }
 
+  test("select partitioned table") {
+sql(
+  s"""
+ |CREATE TABLE table_with_partition(c1 string)
+ |PARTITIONED BY (p1 string,p2 string,p3 string,p4 string,p5 
string)
--- End diff --

There are multiple related test cases in `InsertIntoHiveTableSuite`. It has 
more than one bugs in this statement. For example, below is a common mistake 
users might make:
```
hive> CREATE TABLE partitioned (id bigint, data string) PARTITIONED BY 
(data string, part string);
FAILED: SemanticException [Error 10035]: Column repeated in partitioning 
columns
```
Currently, we return a confusing error message:
```
org.apache.spark.sql.AnalysisException: 
org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:For 
direct MetaStore DB connections, we don't support retries at the client level.);
```

Try to submit another PR to detect these user errors and output a 
understandable error message.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15647] [SQL] Fix Boundary Cases in Opti...

2016-05-30 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/13392#discussion_r65121914
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -937,9 +937,14 @@ object SimplifyConditionals extends Rule[LogicalPlan] 
with PredicateHelper {
  */
 case class OptimizeCodegen(conf: CatalystConf) extends Rule[LogicalPlan] {
   def apply(plan: LogicalPlan): LogicalPlan = plan transformAllExpressions 
{
-case e @ CaseWhen(branches, _) if branches.size < 
conf.maxCaseBranchesForCodegen =>
+case e: CaseWhen if canCodeGen(e) =>
   e.toCodegen()
--- End diff --

Sure, will do. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15647] [SQL] Fix Boundary Cases in Opti...

2016-05-30 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/13392#discussion_r65121902
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/internal/SQLConfSuite.scala ---
@@ -17,13 +17,36 @@
 
 package org.apache.spark.sql.internal
 
+import org.scalatest.BeforeAndAfterAll
+
 import org.apache.spark.sql.{QueryTest, Row, SparkSession, SQLContext}
+import org.apache.spark.sql.execution.WholeStageCodegenExec
 import org.apache.spark.sql.test.{SharedSQLContext, TestSQLContext}
 
-class SQLConfSuite extends QueryTest with SharedSQLContext {
+class SQLConfSuite extends QueryTest with SharedSQLContext with 
BeforeAndAfterAll {
--- End diff --

Initially, I tried it. The behavior is controlled by the configuration 
`SQLConf.MAX_CASES_BRANCHES`. However, I am not sure how to change the default 
conf value of `SQLConf.MAX_CASES_BRANCHES` in `OptimizeCodegenSuite`. 

At the same time, we do not have a test case to verify the configuration 
`MAX_CASES_BRANCHES`.

That is why I added the test case here. Let me know if you have any idea. 
Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...

2016-05-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12836#issuecomment-222586936
  
**[Test build #59631 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59631/consoleFull)**
 for PR 12836 at commit 
[`7b5767a`](https://github.com/apache/spark/commit/7b5767ad25aaa1f091c4b2d22d7a99cf3d8ec00b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...

2016-05-30 Thread NarineK
Github user NarineK commented on a diff in the pull request:

https://github.com/apache/spark/pull/12836#discussion_r65120888
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -1268,6 +1268,82 @@ setMethod("dapplyCollect",
 ldf
   })
 
+#' gapply
+#'
+#' Apply a R function to each group of a DataFrame. The group is defined 
by an input
--- End diff --

done!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15655] [SQL] Fix Wrong Partition Column...

2016-05-30 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13400#discussion_r65120902
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala 
---
@@ -1537,6 +1537,35 @@ class SQLQuerySuite extends QueryTest with 
SQLTestUtils with TestHiveSingleton {
 assert(fs.exists(path), "This is an external table, so the data should 
not have been dropped")
   }
 
+  test("select partitioned table") {
+sql(
+  s"""
+ |CREATE TABLE table_with_partition(c1 string)
+ |PARTITIONED BY (p1 string,p2 string,p3 string,p4 string,p5 
string)
--- End diff --

I'm surprised we support this hive style syntax, cc @andrewor14 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...

2016-05-30 Thread NarineK
Github user NarineK commented on a diff in the pull request:

https://github.com/apache/spark/pull/12836#discussion_r65120891
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -1268,6 +1268,82 @@ setMethod("dapplyCollect",
 ldf
   })
 
+#' gapply
+#'
+#' Apply a R function to each group of a DataFrame. The group is defined 
by an input
+#' grouping column.
+#'
+#' @param x A SparkDataFrame
+#' @param func A function to be applied to each group partition specified 
by grouping
+#' column of the SparkDataFrame.
+#' The output of func is a local R data.frame.
+#' @param schema The schema of the resulting SparkDataFrame after the 
function is applied.
+#'   It must match the output of func.
+#' @family SparkDataFrame functions
+#' @rdname gapply
+#' @name gapply
+#' @export
+#' @examples
+#' 
+#' \dontrun{
+#' Computes the arithmetic mean of the second column by grouping
+#' on the first and third columns. Output the grouping values and the 
average.
+#'
+#' df <- createDataFrame (
+#' sqlContext,
--- End diff --

done!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15647] [SQL] Fix Boundary Cases in Opti...

2016-05-30 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13392#discussion_r65120800
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/internal/SQLConfSuite.scala ---
@@ -17,13 +17,36 @@
 
 package org.apache.spark.sql.internal
 
+import org.scalatest.BeforeAndAfterAll
+
 import org.apache.spark.sql.{QueryTest, Row, SparkSession, SQLContext}
+import org.apache.spark.sql.execution.WholeStageCodegenExec
 import org.apache.spark.sql.test.{SharedSQLContext, TestSQLContext}
 
-class SQLConfSuite extends QueryTest with SharedSQLContext {
+class SQLConfSuite extends QueryTest with SharedSQLContext with 
BeforeAndAfterAll {
--- End diff --

I'm not sure if this is the proper suite to test this, how about 
OptimizeCodegenSuite?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...

2016-05-30 Thread NarineK
Github user NarineK commented on a diff in the pull request:

https://github.com/apache/spark/pull/12836#discussion_r65120675
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -1268,6 +1268,82 @@ setMethod("dapplyCollect",
 ldf
   })
 
+#' gapply
+#'
+#' Apply a R function to each group of a DataFrame. The group is defined 
by an input
+#' grouping column.
+#'
+#' @param x A SparkDataFrame
+#' @param func A function to be applied to each group partition specified 
by grouping
+#' column of the SparkDataFrame.
+#' The output of func is a local R data.frame.
+#' @param schema The schema of the resulting SparkDataFrame after the 
function is applied.
+#'   It must match the output of func.
+#' @family SparkDataFrame functions
+#' @rdname gapply
+#' @name gapply
+#' @export
+#' @examples
+#' 
+#' \dontrun{
+#' Computes the arithmetic mean of the second column by grouping
+#' on the first and third columns. Output the grouping values and the 
average.
+#'
+#' df <- createDataFrame (
+#' sqlContext,
+#' list(list(1L, 1, "1", 0.1), list(1L, 2, "1", 0.2), list(3L, 3, "3", 
0.3)),
+#'   c("a", "b", "c", "d"))
+#'
+#' schema <-  structType(structField("a", "integer"), structField("c", 
"string"),
+#'   structField("avg", "double"))
+#' df1 <- gapply(
+#'   df,
+#'   list("a", "c"),
+#'   function(x) {
+#' y <- data.frame(x$a[1], x$c[1], mean(x$b), stringsAsFactors = FALSE)
+#'   },
+#' schema)
+#' collect(df1)
+#'
+#' Result
+#' --
+#' a c avg
+#' 3 3 3.0
+#' 1 1 1.5
+#'
+#' Fits linear models on iris dataset by grouping on the 'Species' column 
and
+#' using 'Sepal_Length' as a target variable, 'Sepal_Width', 'Petal_Length'
+#' and 'Petal_Width' as training features.
+#' 
+#' df <- createDataFrame (sqlContext, iris)
--- End diff --

done!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...

2016-05-30 Thread NarineK
Github user NarineK commented on a diff in the pull request:

https://github.com/apache/spark/pull/12836#discussion_r65120665
  
--- Diff: R/pkg/R/deserialize.R ---
@@ -197,6 +197,31 @@ readMultipleObjects <- function(inputCon) {
   data # this is a list of named lists now
 }
 
+readMultipleObjectsWithKeys <- function(inputCon) {
+  # readMultipleObjectsWithKeys will read multiple continuous objects from
+  # a DataOutputStream. There is no preceding field telling the count
+  # of the objects, so the number of objects varies, we try to read
+  # all objects in a loop until the end of the stream. The rows in
--- End diff --

done!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...

2016-05-30 Thread NarineK
Github user NarineK commented on a diff in the pull request:

https://github.com/apache/spark/pull/12836#discussion_r65120671
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -1268,6 +1268,82 @@ setMethod("dapplyCollect",
 ldf
   })
 
+#' gapply
+#'
+#' Apply a R function to each group of a DataFrame. The group is defined 
by an input
+#' grouping column.
+#'
+#' @param x A SparkDataFrame
+#' @param func A function to be applied to each group partition specified 
by grouping
+#' column of the SparkDataFrame.
+#' The output of func is a local R data.frame.
+#' @param schema The schema of the resulting SparkDataFrame after the 
function is applied.
+#'   It must match the output of func.
+#' @family SparkDataFrame functions
+#' @rdname gapply
+#' @name gapply
+#' @export
+#' @examples
+#' 
+#' \dontrun{
+#' Computes the arithmetic mean of the second column by grouping
+#' on the first and third columns. Output the grouping values and the 
average.
+#'
+#' df <- createDataFrame (
+#' sqlContext,
+#' list(list(1L, 1, "1", 0.1), list(1L, 2, "1", 0.2), list(3L, 3, "3", 
0.3)),
+#'   c("a", "b", "c", "d"))
+#'
+#' schema <-  structType(structField("a", "integer"), structField("c", 
"string"),
+#'   structField("avg", "double"))
+#' df1 <- gapply(
+#'   df,
+#'   list("a", "c"),
+#'   function(x) {
+#' y <- data.frame(x$a[1], x$c[1], mean(x$b), stringsAsFactors = FALSE)
+#'   },
+#' schema)
+#' collect(df1)
+#'
+#' Result
+#' --
+#' a c avg
+#' 3 3 3.0
+#' 1 1 1.5
+#'
+#' Fits linear models on iris dataset by grouping on the 'Species' column 
and
+#' using 'Sepal_Length' as a target variable, 'Sepal_Width', 'Petal_Length'
+#' and 'Petal_Width' as training features.
+#' 
+#' df <- createDataFrame (sqlContext, iris)
+#' schema <- structType(structField("(Intercept)", "double"),
+#'   structField("Sepal_Width", "double"),structField("Petal_Length", 
"double"),
+#'   structField("Petal_Width", "double"))
+#' df1 <- gapply(
+#'   df,
+#'   list(df$"Species"),
+#'   function(x) {
+#' m <- suppressWarnings(lm(Sepal_Length ~
+#' Sepal_Width + Petal_Length + Petal_Width, x))
+#' data.frame(t(coef(m)))
+#'   }, schema)
+#' collect(df1)
+#'
+#'Result
+#'-
+#' Model  (Intercept)  Sepal_Width  Petal_Length  Petal_Width
+#' 10.6998830.33033700.9455356-0.1697527
+#' 21.8955400.38685760.9083370-0.6792238
+#' 32.3518900.65483500.2375602 0.2521257
+#'
+#'}
+setMethod("gapply",
+  signature(x = "SparkDataFrame"),
+  function(x, col, func, schema) {
--- End diff --

done!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...

2016-05-30 Thread NarineK
Github user NarineK commented on a diff in the pull request:

https://github.com/apache/spark/pull/12836#discussion_r65120653
  
--- Diff: R/pkg/R/group.R ---
@@ -142,3 +142,54 @@ createMethods <- function() {
 }
 
 createMethods()
+
+#' gapply
+#'
+#' Applies a R function to each group in the input GroupedData
+#'
+#' @param x a GroupedData
+#' @return a SparkDataFrame
+#' @rdname gapply
+#' @name gapply
+#' @family agg_funcs
--- End diff --

removed "agg func"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...

2016-05-30 Thread NarineK
Github user NarineK commented on a diff in the pull request:

https://github.com/apache/spark/pull/12836#discussion_r65120620
  
--- Diff: R/pkg/R/group.R ---
@@ -142,3 +142,54 @@ createMethods <- function() {
 }
 
 createMethods()
+
+#' gapply
+#'
+#' Applies a R function to each group in the input GroupedData
+#'
+#' @param x a GroupedData
+#' @return a SparkDataFrame
+#' @rdname gapply
+#' @name gapply
+#' @family agg_funcs
+#' @examples
+#' \dontrun{
+#' Computes the arithmetic mean of the second column by grouping
+#' on the first and third columns. Output the grouping values and the 
average.
+#'
+#' df <- createDataFrame (
+#' sqlContext,
--- End diff --

done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...

2016-05-30 Thread NarineK
Github user NarineK commented on a diff in the pull request:

https://github.com/apache/spark/pull/12836#discussion_r65120599
  
--- Diff: R/pkg/inst/worker/worker.R ---
@@ -84,67 +84,78 @@ broadcastElap <- elapsedSecs()
 # as number of partitions to create.
 numPartitions <- SparkR:::readInt(inputCon)
 
-isDataFrame <- as.logical(SparkR:::readInt(inputCon))
+# 0 - RDD mode, 1 - dapply mode, 2 - gapply mode
+mode <- SparkR:::readInt(inputCon)
 
-# If isDataFrame, then read column names
-if (isDataFrame) {
+# If DataFrame - mode = 1 and mode = 2, then read column names
+if (mode > 0) {
--- End diff --

I ended up leaving mode as is. I also think that one variable is less 
confusing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...

2016-05-30 Thread NarineK
Github user NarineK commented on a diff in the pull request:

https://github.com/apache/spark/pull/12836#discussion_r65120565
  
--- Diff: R/pkg/inst/worker/worker.R ---
@@ -84,67 +84,78 @@ broadcastElap <- elapsedSecs()
 # as number of partitions to create.
 numPartitions <- SparkR:::readInt(inputCon)
 
-isDataFrame <- as.logical(SparkR:::readInt(inputCon))
+# 0 - RDD mode, 1 - dapply mode, 2 - gapply mode
+mode <- SparkR:::readInt(inputCon)
 
-# If isDataFrame, then read column names
-if (isDataFrame) {
+# If DataFrame - mode = 1 and mode = 2, then read column names
+if (mode > 0) {
   colNames <- SparkR:::readObject(inputCon)
+  if (mode == 2) {
+key <- SparkR:::readObject(inputCon)
+  }
 }
 
 isEmpty <- SparkR:::readInt(inputCon)
 
 if (isEmpty != 0) {
-
   if (numPartitions == -1) {
 if (deserializer == "byte") {
   # Now read as many characters as described in funcLen
-  data <- SparkR:::readDeserialize(inputCon)
+  dataList <- list(SparkR:::readDeserialize(inputCon))
 } else if (deserializer == "string") {
-  data <- as.list(readLines(inputCon))
-} else if (deserializer == "row") {
-  data <- SparkR:::readMultipleObjects(inputCon)
+  dataList <- list(as.list(readLines(inputCon)))
+} else if (deserializer == "row" && mode == 2) {
+  dataList <- SparkR:::readMultipleObjectsWithKeys(inputCon)
+} else if (deserializer == "row"){
+  dataList <- list(SparkR:::readMultipleObjects(inputCon))
 }
 # Timing reading input data for execution
 inputElap <- elapsedSecs()
-
-if (isDataFrame) {
-  if (deserializer == "row") {
-# Transform the list of rows into a data.frame
-# Note that the optional argument stringsAsFactors for rbind is
-# available since R 3.2.4. So we set the global option here.
-oldOpt <- getOption("stringsAsFactors")
-options(stringsAsFactors = FALSE)
-data <- do.call(rbind.data.frame, data)
-options(stringsAsFactors = oldOpt)
-
-names(data) <- colNames
+for (i in 1:length(dataList)) {
--- End diff --

done! I called it computeHelper, thought compute might be too generic for 
this specific use case.
I can still rename it to compute if you think that it's a better name.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...

2016-05-30 Thread NarineK
Github user NarineK commented on a diff in the pull request:

https://github.com/apache/spark/pull/12836#discussion_r65120457
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
@@ -379,6 +383,50 @@ class RelationalGroupedDataset protected[sql](
   def pivot(pivotColumn: String, values: java.util.List[Any]): 
RelationalGroupedDataset = {
 pivot(pivotColumn, values.asScala)
   }
+
+  /**
+   * Applies the given serialized R function `func` to each group of data. 
For each unique group,
+   * the function will be passed the group key and an iterator that 
contains all of the elements in
+   * the group. The function can return an iterator containing elements of 
an arbitrary type which
+   * will be returned as a new [[DataFrame]].
+   *
+   * This function does not support partial aggregation, and as a result 
requires shuffling all
+   * the data in the [[Dataset]]. If an application intends to perform an 
aggregation over each
+   * key, it is best to use the reduce function or an
+   * [[org.apache.spark.sql.expressions#Aggregator Aggregator]].
+   *
+   * Internally, the implementation will spill to disk if any given group 
is too large to fit into
+   * memory.  However, users must take care to avoid materializing the 
whole iterator for a group
+   * (for example, by calling `toList`) unless they are sure that this is 
possible given the memory
+   * constraints of their cluster.
+   *
+   * @since 2.0.0
+   */
+  def flatMapGroupsInR(
--- End diff --

done!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...

2016-05-30 Thread NarineK
Github user NarineK commented on a diff in the pull request:

https://github.com/apache/spark/pull/12836#discussion_r65120461
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -2011,6 +2011,25 @@ class Dataset[T] private[sql](
   }
 
   /**
+   * Returns a new [[DataFrame]] which contains the aggregated result of 
applying
--- End diff --

done!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15647] [SQL] Fix Boundary Cases in Opti...

2016-05-30 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13392#discussion_r65120442
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -937,9 +937,14 @@ object SimplifyConditionals extends Rule[LogicalPlan] 
with PredicateHelper {
  */
 case class OptimizeCodegen(conf: CatalystConf) extends Rule[LogicalPlan] {
   def apply(plan: LogicalPlan): LogicalPlan = plan transformAllExpressions 
{
-case e @ CaseWhen(branches, _) if branches.size < 
conf.maxCaseBranchesForCodegen =>
+case e: CaseWhen if canCodeGen(e) =>
   e.toCodegen()
--- End diff --

nit: this can fit in the previous line?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...

2016-05-30 Thread NarineK
Github user NarineK commented on a diff in the pull request:

https://github.com/apache/spark/pull/12836#discussion_r65120450
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/objects.scala ---
@@ -325,6 +330,80 @@ case class MapGroupsExec(
 }
 
 /**
+ * Groups the input rows together and calls the R function with each group 
and an iterator
+ * containing all elements in the group.
+ * The result of this function is flattened before being output.
+ */
+case class FlatMapGroupsInRExec(
+func: Array[Byte],
+packageNames: Array[Byte],
+broadcastVars: Array[Broadcast[Object]],
+inputSchema: StructType,
+outputSchema: StructType,
+keyDeserializer: Expression,
+valueDeserializer: Expression,
+groupingAttributes: Seq[Attribute],
+dataAttributes: Seq[Attribute],
+outputObjAttr: Attribute,
+child: SparkPlan) extends UnaryExecNode with ObjectProducerExec {
+
+  override def output: Seq[Attribute] = outputObjAttr :: Nil
+  override def producedAttributes: AttributeSet = 
AttributeSet(outputObjAttr)
+
+  override def requiredChildDistribution: Seq[Distribution] =
+ClusteredDistribution(groupingAttributes) :: Nil
+
+  override def requiredChildOrdering: Seq[Seq[SortOrder]] =
+Seq(groupingAttributes.map(SortOrder(_, Ascending)))
+
+  override protected def doExecute(): RDD[InternalRow] = {
+val isDeserializedRData =
+  if (outputSchema == SERIALIZED_R_DATA_SCHEMA) true else false
+val serializerForR = if (!isDeserializedRData) {
+  SerializationFormats.ROW
+} else {
+  SerializationFormats.BYTE
+}
+val (deserializerForR, colNames) =
+  (SerializationFormats.ROW, inputSchema.fieldNames)
+
+child.execute().mapPartitionsInternal { iter =>
+  val grouped = GroupedIterator(iter, groupingAttributes, child.output)
+  val getKey = ObjectOperator.deserializeRowToObject(keyDeserializer, 
groupingAttributes)
+  val getValue = 
ObjectOperator.deserializeRowToObject(valueDeserializer, dataAttributes)
+  val outputObject = 
ObjectOperator.wrapObjectToRow(outputObjAttr.dataType)
+  val groupNames = groupingAttributes.map(_.name).toArray
+
+  val runner = new RRunner[Array[Byte]](
+func, deserializerForR, serializerForR, packageNames, 
broadcastVars,
+isDataFrame = true, colNames = colNames, key = groupNames)
+
+  val hasGroups = grouped.hasNext
--- End diff --

Did some refactoring!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15657][SQL] RowEncoder should validate ...

2016-05-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13401#issuecomment-222586062
  
**[Test build #59629 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59629/consoleFull)**
 for PR 13401 at commit 
[`b6c1a5f`](https://github.com/apache/spark/commit/b6c1a5fc6013b643ae39aad32224d08d71b63e00).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...

2016-05-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12836#issuecomment-222586065
  
**[Test build #59630 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59630/consoleFull)**
 for PR 12836 at commit 
[`a0425c1`](https://github.com/apache/spark/commit/a0425c17906fcd2ea1d8dd6fb33c0fd8a860d4a7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...

2016-05-30 Thread NarineK
Github user NarineK commented on a diff in the pull request:

https://github.com/apache/spark/pull/12836#discussion_r65120403
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -1268,6 +1268,82 @@ setMethod("dapplyCollect",
 ldf
   })
 
+#' gapply
+#'
+#' Apply a R function to each group of a DataFrame. The group is defined 
by an input
+#' grouping column.
+#'
+#' @param x A SparkDataFrame
+#' @param func A function to be applied to each group partition specified 
by grouping
--- End diff --

done!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...

2016-05-30 Thread NarineK
Github user NarineK commented on a diff in the pull request:

https://github.com/apache/spark/pull/12836#discussion_r65120399
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -1268,6 +1268,82 @@ setMethod("dapplyCollect",
 ldf
   })
 
+#' gapply
+#'
+#' Apply a R function to each group of a DataFrame. The group is defined 
by an input
+#' grouping column.
+#'
+#' @param x A SparkDataFrame
+#' @param func A function to be applied to each group partition specified 
by grouping
+#' column of the SparkDataFrame.
+#' The output of func is a local R data.frame.
+#' @param schema The schema of the resulting SparkDataFrame after the 
function is applied.
+#'   It must match the output of func.
+#' @family SparkDataFrame functions
+#' @rdname gapply
+#' @name gapply
+#' @export
+#' @examples
+#' 
+#' \dontrun{
+#' Computes the arithmetic mean of the second column by grouping
+#' on the first and third columns. Output the grouping values and the 
average.
+#'
+#' df <- createDataFrame (
+#' sqlContext,
+#' list(list(1L, 1, "1", 0.1), list(1L, 2, "1", 0.2), list(3L, 3, "3", 
0.3)),
+#'   c("a", "b", "c", "d"))
+#'
+#' schema <-  structType(structField("a", "integer"), structField("c", 
"string"),
+#'   structField("avg", "double"))
+#' df1 <- gapply(
+#'   df,
+#'   list("a", "c"),
+#'   function(x) {
--- End diff --

done!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15657][SQL] RowEncoder should validate ...

2016-05-30 Thread cloud-fan
Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/13401#issuecomment-222585795
  
cc @marmbrus  @yhuai @viirya  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15657][SQL] RowEncoder should validate ...

2016-05-30 Thread cloud-fan
GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/13401

[SPARK-15657][SQL] RowEncoder should validate the data type of input object

## What changes were proposed in this pull request?

This PR improves the error handling of `RowEncoder`. When we create a 
`RowEncoder` with a given schema, we should validate the data type of input 
object. e.g. we should throw an exception when a field is boolean but is 
declared as a string column.

This PR also removes the support to use `Product` as a valid external type 
of struct type.  This support is added at 
https://github.com/apache/spark/pull/9712, but is incomplete, e.g. nested 
product, product in array are both not working.  However, we never officially 
support this feature and I think it's ok to ban it.

## How was this patch tested?

new tests in `RowEncoderSuite`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark bug

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13401.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13401


commit 7b30e7030d14c99a42f7b1e23c9953c9bfbdb536
Author: Wenchen Fan 
Date:   2016-05-31T03:21:12Z

validates input data type in RowEncoder




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6320][SQL] Move planLater method into G...

2016-05-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13147#issuecomment-222585528
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59627/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6320][SQL] Move planLater method into G...

2016-05-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13147#issuecomment-222585527
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6320][SQL] Move planLater method into G...

2016-05-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13147#issuecomment-222585447
  
**[Test build #59627 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59627/consoleFull)**
 for PR 13147 at commit 
[`254381d`](https://github.com/apache/spark/commit/254381d245cabf3cbad57f7ab06eec155ae79d96).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15601][CORE] CircularBuffer's toString(...

2016-05-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13351#issuecomment-222583398
  
**[Test build #59628 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59628/consoleFull)**
 for PR 13351 at commit 
[`a0ae62e`](https://github.com/apache/spark/commit/a0ae62eaf7ecc19565695da68d3b42cc4aac8f09).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15601][CORE] CircularBuffer's toString(...

2016-05-30 Thread tejasapatil
Github user tejasapatil commented on the pull request:

https://github.com/apache/spark/pull/13351#issuecomment-222583109
  
Jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6320][SQL] Move planLater method into G...

2016-05-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13147#issuecomment-222577825
  
**[Test build #59627 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59627/consoleFull)**
 for PR 13147 at commit 
[`254381d`](https://github.com/apache/spark/commit/254381d245cabf3cbad57f7ab06eec155ae79d96).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15647] [SQL] Fix Boundary Cases in Opti...

2016-05-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13392#issuecomment-222574008
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59626/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15647] [SQL] Fix Boundary Cases in Opti...

2016-05-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13392#issuecomment-222574005
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   >