date:20160610

[GitHub] spark issue #13593: [SPARK-15864] [SQL] Fix Inconsistent Behaviors when Unca...

2016-06-10 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/13593
  
@rxin @liancheng I see. Since the existing Dataset API 
`sparkSession.catalog.uncacheTable("non-cachedTable")` issues an error if 
uncaching non-cached tables. Thus, to ensure both SQL statements and Dataset 
APIs have the same behavior. We still need to change one of them, right?

Will follow what @rxin said. No-op if the table is already uncached.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13137: [SPARK-15247][SQL] Set the default number of partitions ...

2016-06-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13137
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13137: [SPARK-15247][SQL] Set the default number of partitions ...

2016-06-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13137
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60336/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13137: [SPARK-15247][SQL] Set the default number of partitions ...

2016-06-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13137
  
**[Test build #60336 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60336/consoleFull)**
 for PR 13137 at commit 
[`4f3ee3c`](https://github.com/apache/spark/commit/4f3ee3cccba78911530767feef99a07794428b73).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13444: [SPARK-15530][SQL] Set #parallelism for file listing in ...

2016-06-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13444
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13444: [SPARK-15530][SQL] Set #parallelism for file listing in ...

2016-06-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13444
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60335/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13444: [SPARK-15530][SQL] Set #parallelism for file listing in ...

2016-06-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13444
  
**[Test build #60335 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60335/consoleFull)**
 for PR 13444 at commit 
[`f392f91`](https://github.com/apache/spark/commit/f392f915ade9fb2863e421891981cc278a887bdb).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13572: [SPARK-15862] [SQL] Better Error Message When Hav...

2016-06-10 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/13572#discussion_r66701399
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/cache.scala ---
@@ -17,30 +17,30 @@
 
 package org.apache.spark.sql.execution.command
 
-import org.apache.spark.sql.{Dataset, Row, SparkSession}
+import org.apache.spark.sql.{AnalysisException, Dataset, Row, SparkSession}
+import org.apache.spark.sql.catalyst.TableIdentifier
 import org.apache.spark.sql.catalyst.expressions.Attribute
 import org.apache.spark.sql.catalyst.plans.QueryPlan
 import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
 
 case class CacheTableCommand(
-  tableName: String,
-  plan: Option[LogicalPlan],
-  isLazy: Boolean)
-  extends RunnableCommand {
+tableIdent: TableIdentifier,
+plan: Option[LogicalPlan],
+isLazy: Boolean) extends RunnableCommand {
--- End diff --

Just added. : ) Please check if the checking is enough. 
https://github.com/apache/spark/pull/13572/files#diff-bc55b5f76add105ec32ae4107075b278R30

`default`.`tab` is still not allowed to create temp tables. Thus, I did not 
change that part. Let me know if anything else I need to change. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13613: [SPARK-15889][SQL][STREAMING] Add a unique id to Continu...

2016-06-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13613
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13613: [SPARK-15889][SQL][STREAMING] Add a unique id to Continu...

2016-06-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13613
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60333/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13571: [SPARK-15369][WIP][RFC][PySpark][SQL] Expose potential t...

2016-06-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13571
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60330/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13613: [SPARK-15889][SQL][STREAMING] Add a unique id to Continu...

2016-06-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13613
  
**[Test build #60333 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60333/consoleFull)**
 for PR 13613 at commit 
[`5f74d95`](https://github.com/apache/spark/commit/5f74d9529c59e28341906429ba27450f91ffbcc4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13571: [SPARK-15369][WIP][RFC][PySpark][SQL] Expose potential t...

2016-06-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13571
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13571: [SPARK-15369][WIP][RFC][PySpark][SQL] Expose potential t...

2016-06-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13571
  
**[Test build #60330 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60330/consoleFull)**
 for PR 13571 at commit 
[`84e1bf1`](https://github.com/apache/spark/commit/84e1bf14e51f98c13b2177d6c04c0a02e54982f7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class CollectionAccumulator[T] extends AccumulatorV2[T, 
java.util.List[T]] `
  * `class LibSVMFileFormat extends TextBasedFileFormat with 
DataSourceRegister `
  * `abstract class ForeachWriter[T] extends Serializable `
  * `   *   case class Person(name: String, age: Long)`
  * `abstract class SparkStrategy extends GenericStrategy[SparkPlan] `
  * `class CSVFileFormat extends TextBasedFileFormat with 
DataSourceRegister `
  * `abstract class TextBasedFileFormat extends FileFormat `
  * `class JsonFileFormat extends TextBasedFileFormat with 
DataSourceRegister `
  * `class TextFileFormat extends TextBasedFileFormat with 
DataSourceRegister `
  * `class ForeachSink[T : Encoder](writer: ForeachWriter[T]) extends Sink 
with Serializable `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13616: [SPARK-15585][SQL] Add doc for turning off quotations

2016-06-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13616
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60334/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13616: [SPARK-15585][SQL] Add doc for turning off quotations

2016-06-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13616
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13616: [SPARK-15585][SQL] Add doc for turning off quotations

2016-06-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13616
  
**[Test build #60334 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60334/consoleFull)**
 for PR 13616 at commit 
[`edb0395`](https://github.com/apache/spark/commit/edb03956308bb78e09330587c6fbf6ee1ab53a71).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13596: [SPARK-15870][SQL] DataFrame can't execute after uncache...

2016-06-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13596
  
**[Test build #60337 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60337/consoleFull)**
 for PR 13596 at commit 
[`cf4b6d8`](https://github.com/apache/spark/commit/cf4b6d89657434dc7cc0cda6f84fedeeb2578a7b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13572: [SPARK-15862] [SQL] Better Error Message When Having Dat...

2016-06-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13572
  
**[Test build #60338 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60338/consoleFull)**
 for PR 13572 at commit 
[`b22c44a`](https://github.com/apache/spark/commit/b22c44ab232fe712547cd6dd6c3180fa9c84d2cf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13596: [SPARK-15870][SQL] DataFrame can't execute after uncache...

2016-06-10 Thread ueshin

Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/13596
  
@cloud-fan I modified the test.
Please take a look at it again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13596: [SPARK-15870][SQL] DataFrame can't execute after ...

2016-06-10 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/13596#discussion_r66700916
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala ---
@@ -321,7 +321,8 @@ class CachedTableSuite extends QueryTest with 
SQLTestUtils with SharedSQLContext
 assert(spark.sharedState.cacheManager.isEmpty)
   }
 
-  test("Clear accumulators when uncacheTable to prevent memory leaking") {
+  // This test would be flaky.
+  ignore("Ensure accumulators to be cleared after GC when uncacheTable") {
--- End diff --

Thank you for the pointer.
Let me check it and I'll update the test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13595: [MINOR][SQL] Standardize 'continuous queries' to 'stream...

2016-06-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13595
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60329/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13595: [MINOR][SQL] Standardize 'continuous queries' to 'stream...

2016-06-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13595
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13595: [MINOR][SQL] Standardize 'continuous queries' to 'stream...

2016-06-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13595
  
**[Test build #60329 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60329/consoleFull)**
 for PR 13595 at commit 
[`097f2ca`](https://github.com/apache/spark/commit/097f2ca06614dbf1c8299cbd788829fbb32063f1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class CollectionAccumulator[T] extends AccumulatorV2[T, 
java.util.List[T]] `
  * `class LibSVMFileFormat extends TextBasedFileFormat with 
DataSourceRegister `
  * `abstract class SparkStrategy extends GenericStrategy[SparkPlan] `
  * `class CSVFileFormat extends TextBasedFileFormat with 
DataSourceRegister `
  * `abstract class TextBasedFileFormat extends FileFormat `
  * `class JsonFileFormat extends TextBasedFileFormat with 
DataSourceRegister `
  * `class TextFileFormat extends TextBasedFileFormat with 
DataSourceRegister `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13419: [SPARK-15678][SQL] Not use cache on appends and overwrit...

2016-06-10 Thread sameeragarwal

Github user sameeragarwal commented on the issue:

https://github.com/apache/spark/pull/13419
  
I ended up creating a small design doc describing the problem and 
presenting 2 possible solutions at 
https://docs.google.com/document/d/1h5SzfC5UsvIrRpeLNDKSMKrKJvohkkccFlXo-GBAwQQ/edit?ts=574f717f#.
 Based on this, we decided in favor of option 2 
(https://github.com/apache/spark/pull/13566) as it is a less intrusive change 
to the default behavior. I'm going to close this PR for now, but we may revisit 
this approach (i.e., option 1) for 2.1.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13419: [SPARK-15678][SQL] Not use cache on appends and o...

2016-06-10 Thread sameeragarwal

Github user sameeragarwal closed the pull request at:

https://github.com/apache/spark/pull/13419


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12938: [SPARK-15162][SPARK-15164][PySpark][DOCS][ML] update som...

2016-06-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/12938
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13381: [SPARK-15608][ml][examples][doc] add examples and...

2016-06-10 Thread WeichenXu123

Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13381#discussion_r66700697
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/ml/IsotonicRegressionExample.scala
 ---
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// scalastyle:off println
+package org.apache.spark.examples.ml
+
+// $example on$
+import org.apache.spark.ml.regression.IsotonicRegression
+// $example off$
+import org.apache.spark.sql.SparkSession
+
+/**
+ * An example demonstrating Isotonic Regression.
+ * Run with
+ * {{{
+ * bin/run-example ml.IsotonicRegressionExample
+ * }}}
+ */
+object IsotonicRegressionExample {
+
+  def main(args: Array[String]): Unit = {
+
+// Creates a SparkSession.
+val spark = SparkSession
+  .builder
+  .appName(s"${this.getClass.getSimpleName}")
+  .getOrCreate()
+
+// $example on$
+// Loads data.
+val dataset = spark.read.format("libsvm")
+  .load("data/mllib/sample_isotonic_regression_libsvm_data.txt")
+
+// Trains an isotonic regression model.
+val ir = new IsotonicRegression()
+val model = ir.fit(dataset)
+
+println(s"Boundaries in increasing order: ${model.boundaries}")
+println(s"Predictions associated with the boundaries: 
${model.predictions}")
+
+// Makes predictions.
+model.transform(dataset).show
--- End diff --

@jkbradley Done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12938: [SPARK-15162][SPARK-15164][PySpark][DOCS][ML] update som...

2016-06-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/12938
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60332/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13415: [SPARK-15676] [SQL] Disallow Column Names as Partition C...

2016-06-10 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/13415
  
Thank you! @andrewor14 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12938: [SPARK-15162][SPARK-15164][PySpark][DOCS][ML] update som...

2016-06-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/12938
  
**[Test build #60332 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60332/consoleFull)**
 for PR 12938 at commit 
[`873f6c8`](https://github.com/apache/spark/commit/873f6c8656c9f07543e5907d6bde7bf0c582673d).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13381: [SPARK-15608][ml][examples][doc] add examples and docume...

2016-06-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13381
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13381: [SPARK-15608][ml][examples][doc] add examples and docume...

2016-06-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13381
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60331/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13381: [SPARK-15608][ml][examples][doc] add examples and docume...

2016-06-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13381
  
**[Test build #60331 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60331/consoleFull)**
 for PR 13381 at commit 
[`2e46416`](https://github.com/apache/spark/commit/2e46416317356f8f8fa53457ba2318449a795218).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13588: SPARK-15858: Fix calculating error by tree stack over fl...

2016-06-10 Thread mhmoudr

Github user mhmoudr commented on the issue:

https://github.com/apache/spark/pull/13588
  
This PR contains exactly the same fix but targeting version 1.6 as if there 
is a plan to release 1.6.2 in the future, if that was not the case let me know 
to close it.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13501: [SPARK-15759] [SQL] Fallback to non-codegen when ...

2016-06-10 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13501


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13501: [SPARK-15759] [SQL] Fallback to non-codegen when fail to...

2016-06-10 Thread davies

Github user davies commented on the issue:

https://github.com/apache/spark/pull/13501
  
Merging this into master and 2.0, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13137: [SPARK-15247][SQL] Set the default number of partitions ...

2016-06-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13137
  
**[Test build #60336 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60336/consoleFull)**
 for PR 13137 at commit 
[`4f3ee3c`](https://github.com/apache/spark/commit/4f3ee3cccba78911530767feef99a07794428b73).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13444: [SPARK-15530][SQL] Set #parallelism for file listing in ...

2016-06-10 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/13444
  
@yhuai okay, fixed. I also fixed #13137 in the same way.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13444: [SPARK-15530][SQL] Set #parallelism for file listing in ...

2016-06-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13444
  
**[Test build #60335 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60335/consoleFull)**
 for PR 13444 at commit 
[`f392f91`](https://github.com/apache/spark/commit/f392f915ade9fb2863e421891981cc278a887bdb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13371: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-10 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/13371
  
@liancheng Got it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13596: [SPARK-15870][SQL] DataFrame can't execute after ...

2016-06-10 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13596#discussion_r66700295
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala ---
@@ -321,7 +321,8 @@ class CachedTableSuite extends QueryTest with 
SQLTestUtils with SharedSQLContext
 assert(spark.sharedState.cacheManager.isEmpty)
   }
 
-  test("Clear accumulators when uncacheTable to prevent memory leaking") {
+  // This test would be flaky.
+  ignore("Ensure accumulators to be cleared after GC when uncacheTable") {
--- End diff --

how about we attach a listener to `ContextCleaner`, and watch the 
`accumCleaned` event? an example is: 
https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/ContextCleanerSuite.scala#L406-L417


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...

2016-06-10 Thread WeichenXu123

Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13592#discussion_r66700281
  
--- Diff: docs/sql-programming-guide.md ---
@@ -517,24 +517,26 @@ types such as Sequences or Arrays. This RDD can be 
implicitly converted to a Dat
 registered as a table. Tables can be used in subsequent SQL statements.
 
 {% highlight scala %}
-// sc is an existing SparkContext.
-val sqlContext = new org.apache.spark.sql.SQLContext(sc)
+val spark: SparkSession // An existing SparkSession
 // this is used to implicitly convert an RDD to a DataFrame.
-import sqlContext.implicits._
+import spark.implicits._
 
 // Define the schema using a case class.
 // Note: Case classes in Scala 2.10 can support only up to 22 fields. To 
work around this limit,
 // you can use custom classes that implement the Product interface.
 case class Person(name: String, age: Int)
 
-// Create an RDD of Person objects and register it as a table.
-val people = 
sc.textFile("examples/src/main/resources/people.txt").map(_.split(",")).map(p 
=> Person(p(0), p(1).trim.toInt)).toDF()
+// Create an RDD of Person objects and register it as a temporary view.
+val people = sc
+  .textFile("examples/src/main/resources/people.txt")
+  .map(_.split(","))
+  .map(p => Person(p(0), p(1).trim.toInt))
+  .toDF()
 people.createOrReplaceTempView("people")
--- End diff --

Here it seems better to update the input data file as json format, and then 
can use `SparkSession.read.json('path/to/data.json')` so we don't need to use 
SparkContext, and 
can directly get a `DataFrame`, it can simplify the example code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...

2016-06-10 Thread liancheng

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13592#discussion_r66700277
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1607,13 +1600,13 @@ a regular multi-line JSON file will most often fail.
 
 {% highlight r %}
 # sc is an existing SparkContext.
-sqlContext <- sparkRSQL.init(sc)
+spark <- sparkRSQL.init(sc)
--- End diff --

R API is still in experimental status, and we haven't introduced 
`SparkSession` to SparkR yet.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13616: [SPARK-15585][SQL] Add doc for turning off quotations

2016-06-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13616
  
**[Test build #60334 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60334/consoleFull)**
 for PR 13616 at commit 
[`edb0395`](https://github.com/apache/spark/commit/edb03956308bb78e09330587c6fbf6ee1ab53a71).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13371: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-10 Thread liancheng

Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/13371
  
Reverted from master and branch-2.0.

@viirya For the benchmark, there are two things:

1. The benchmark also counts Parquet file writing into it, so the real 
number should be much better than the posted one.
2. We should also benchmark for cases where no filters are pushed down to 
verify that this patch doesn't affect normal code path.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13566: [SPARK-15678] Add support to REFRESH data source ...

2016-06-10 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13566


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13596: [SPARK-15870][SQL] DataFrame can't execute after ...

2016-06-10 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13596#discussion_r66700211
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala ---
@@ -321,7 +321,8 @@ class CachedTableSuite extends QueryTest with 
SQLTestUtils with SharedSQLContext
 assert(spark.sharedState.cacheManager.isEmpty)
   }
 
-  test("Clear accumulators when uncacheTable to prevent memory leaking") {
+  // This test would be flaky.
+  ignore("Ensure accumulators to be cleared after GC when uncacheTable") {
--- End diff --

This is the only risky part of this PR, I'll think about how to 
deterministically test it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13616: [SPARK-15585][SQL] Add doc for turning off quotat...

2016-06-10 Thread maropu

GitHub user maropu opened a pull request:

https://github.com/apache/spark/pull/13616

[SPARK-15585][SQL] Add doc for turning off quotations

## What changes were proposed in this pull request?
This pr is to add doc for turning off quotations because this behavior is 
different from `com.databricks.spark.csv`.

## How was this patch tested?
Check behavior  to put an empty string in csv options.




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maropu/spark SPARK-15585-2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13616.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13616


commit edb03956308bb78e09330587c6fbf6ee1ab53a71
Author: Takeshi YAMAMURO 
Date:   2016-06-07T08:16:16Z

Add doc for turning off quotations




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13613: [SPARK-15889][SQL][STREAMING] Add a unique id to Continu...

2016-06-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13613
  
**[Test build #60333 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60333/consoleFull)**
 for PR 13613 at commit 
[`5f74d95`](https://github.com/apache/spark/commit/5f74d9529c59e28341906429ba27450f91ffbcc4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13596: [SPARK-15870][SQL] DataFrame can't execute after ...

2016-06-10 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13596#discussion_r66700185
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala ---
@@ -105,7 +105,7 @@ private[sql] class CacheManager extends Logging {
 val planToCache = query.queryExecution.analyzed
 val dataIndex = cachedData.indexWhere(cd => 
planToCache.sameResult(cd.plan))
 require(dataIndex >= 0, s"Table $query is not cached.")
-cachedData(dataIndex).cachedRepresentation.uncache(blocking)
+
cachedData(dataIndex).cachedRepresentation.cachedColumnBuffers.unpersist(blocking)
--- End diff --

yea, the null setting looks useless, this change LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13566: [SPARK-15678] Add support to REFRESH data source paths

2016-06-10 Thread davies

Github user davies commented on the issue:

https://github.com/apache/spark/pull/13566
  
LGTM, 
Merging this into master and 2.0, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12938: [SPARK-15162][SPARK-15164][PySpark][DOCS][ML] update som...

2016-06-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/12938
  
**[Test build #60332 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60332/consoleFull)**
 for PR 12938 at commit 
[`873f6c8`](https://github.com/apache/spark/commit/873f6c8656c9f07543e5907d6bde7bf0c582673d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13381: [SPARK-15608][ml][examples][doc] add examples and docume...

2016-06-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13381
  
**[Test build #60331 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60331/consoleFull)**
 for PR 13381 at commit 
[`2e46416`](https://github.com/apache/spark/commit/2e46416317356f8f8fa53457ba2318449a795218).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13571: [SPARK-15369][WIP][RFC][PySpark][SQL] Expose potential t...

2016-06-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13571
  
**[Test build #60330 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60330/consoleFull)**
 for PR 13571 at commit 
[`84e1bf1`](https://github.com/apache/spark/commit/84e1bf14e51f98c13b2177d6c04c0a02e54982f7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13570: [SPARK-15832][SQL] Embedded IN/EXISTS predicate subquery...

2016-06-10 Thread ioana-delaney

Github user ioana-delaney commented on the issue:

https://github.com/apache/spark/pull/13570
  
@hvanhovell The EXISTS/NOT EXISTS predicates will have an empty condition. 
e.g.

select c1 from t1 where EXISTS (select c2 from t2)

== Optimized Logical Plan ==
Project [_1#224 AS c1#227]
+- Join LeftSemi
   :- LocalRelation [_1#224, _2#225]
   +- LocalRelation [c2#239]

But the other subquery predicates are quaranteed to have at least one 
condition. 

Regarding the rewriteExistentialExpr interface, I think that I need to pass 
an expression instead of a sequence of conditions since the last case in the 
main rewrite rule does not have conditions. It's just an expression. e.g. where 
(case when c2 IN (select 1 as one) then 1 else 2) = c1

Please let me know. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13371: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-10 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/13371
  
@rxin One thing needs to be explain is, because we just have one 
configuration to control filter push down, it affects row-based filter push 
down and this row-group filter push down.

The benchmark I posted above is running it against this patch and master 
branch individually. Of course it includes the time to write the parquet data, 
I will change it. I want to confirm if this kind of benchmark is enough?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13557: [SPARK-15819][PYSPARK][ML] Add KMeanSummary in KMeans of...

2016-06-10 Thread zjffdu

Github user zjffdu commented on the issue:

https://github.com/apache/spark/pull/13557
  
@jkbradley  Could you help review it ? Thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12313: [SPARK-14543] [SQL] Improve InsertIntoTable column resol...

2016-06-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/12313
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60327/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12313: [SPARK-14543] [SQL] Improve InsertIntoTable column resol...

2016-06-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/12313
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12313: [SPARK-14543] [SQL] Improve InsertIntoTable column resol...

2016-06-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/12313
  
**[Test build #60327 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60327/consoleFull)**
 for PR 12313 at commit 
[`906e68d`](https://github.com/apache/spark/commit/906e68d071daf4e2f15b0f2017b248b872bb6285).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13595: [MINOR][SQL] Standardize 'continuous queries' to 'stream...

2016-06-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13595
  
**[Test build #60329 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60329/consoleFull)**
 for PR 13595 at commit 
[`097f2ca`](https://github.com/apache/spark/commit/097f2ca06614dbf1c8299cbd788829fbb32063f1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...

2016-06-10 Thread WeichenXu123

Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13592#discussion_r66699400
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1607,13 +1600,13 @@ a regular multi-line JSON file will most often fail.
 
 {% highlight r %}
 # sc is an existing SparkContext.
-sqlContext <- sparkRSQL.init(sc)
+spark <- sparkRSQL.init(sc)
--- End diff --

Currently, `sparkRSQL.init` call 
`org.apache.spark.sql.api.r.SQLUtils.createSQLContext` which return 
`SQLContext` object not `SparkSession` object. So here it seems to update the R 
api ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13544: [SPARK-15805][SQL][Documents] update sql programming gui...

2016-06-10 Thread WeichenXu123

Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/13544
  
@liancheng OK, no problem !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13613: [SPARK-15889][SQL][STREAMING] Add a unique id to Continu...

2016-06-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13613
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13613: [SPARK-15889][SQL][STREAMING] Add a unique id to Continu...

2016-06-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13613
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60328/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13613: [SPARK-15889][SQL][STREAMING] Add a unique id to Continu...

2016-06-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13613
  
**[Test build #60328 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60328/consoleFull)**
 for PR 13613 at commit 
[`459bbb1`](https://github.com/apache/spark/commit/459bbb17603f132eb737f1272f05e29b60d04842).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13606: [SPARK-15086] [CORE] [STREAMING] Deprecate old Java accu...

2016-06-10 Thread lw-lin

Github user lw-lin commented on the issue:

https://github.com/apache/spark/pull/13606
  
@srowen , the [[streaming programming guide] - 
accumulators-and-broadcast-variables](https://github.com/apache/spark/blob/1e2c9311871968426e019164b129652fd6d0037f/docs/streaming-programming-guide.md#accumulators-and-broadcast-variables)
  section might also need an update to reflect the code change here, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13613: [SPARK-15889][SQL][STREAMING] Add a unique id to Continu...

2016-06-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13613
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60326/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13613: [SPARK-15889][SQL][STREAMING] Add a unique id to Continu...

2016-06-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13613
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13613: [SPARK-15889][SQL][STREAMING] Add a unique id to Continu...

2016-06-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13613
  
**[Test build #60326 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60326/consoleFull)**
 for PR 13613 at commit 
[`d1fbb9d`](https://github.com/apache/spark/commit/d1fbb9d7eccb20a8e5ad7a521393bb979866c243).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13147: [SPARK-6320][SQL] Move planLater method into GenericStra...

2016-06-10 Thread ueshin

Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/13147
  
@marmbrus Thank you for merging this!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13612: [SPARK-15851] [Build] Fix the call of the bash script to...

2016-06-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13612
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13612: [SPARK-15851] [Build] Fix the call of the bash script to...

2016-06-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13612
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60325/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13612: [SPARK-15851] [Build] Fix the call of the bash script to...

2016-06-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13612
  
**[Test build #60325 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60325/consoleFull)**
 for PR 13612 at commit 
[`0379025`](https://github.com/apache/spark/commit/03790251ccef03687535ea9b2968101d1206ae22).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13371: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-10 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/13371
  
And once we have more data, it might make sense to merge this in 2.0!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13371: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-10 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/13371
  
To be more clear, please write a proper benchmark that reads data when 
filter push down is not useful to compare whether this regress performance for 
the non-push-down case. Also make sure the benchmark does not include the time 
it takes to write the parquet data.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13371: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-10 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/13371
  
I just talked to @liancheng offline. I don't think we should've merged this 
until we have verified there is no performance regression, and we definitely 
shouldn't have merged this in 2.0.

@liancheng can you revert this from both master and branch-2.0?

@viirya can you run some parquet scan benchmark and make sure this does not 
result in perf regression?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13596: [SPARK-15870][SQL] DataFrame can't execute after uncache...

2016-06-10 Thread ueshin

Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/13596
  
@cloud-fan Thank you for your review.
That's right, so we can't unregister the `batchStats` accumulator here yet.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13381: [SPARK-15608][ml][examples][doc] add examples and docume...

2016-06-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13381
  
**[Test build #3079 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3079/consoleFull)**
 for PR 13381 at commit 
[`13027b7`](https://github.com/apache/spark/commit/13027b79bbe8e77119207cc8810a775bca022c32).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13596: [SPARK-15870][SQL] DataFrame can't execute after ...

2016-06-10 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/13596#discussion_r66698444
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala ---
@@ -105,7 +105,7 @@ private[sql] class CacheManager extends Logging {
 val planToCache = query.queryExecution.analyzed
 val dataIndex = cachedData.indexWhere(cd => 
planToCache.sameResult(cd.plan))
 require(dataIndex >= 0, s"Table $query is not cached.")
-cachedData(dataIndex).cachedRepresentation.uncache(blocking)
+
cachedData(dataIndex).cachedRepresentation.cachedColumnBuffers.unpersist(blocking)
--- End diff --

Yes, that's right.
But I noticed that the original `InMemoryRelation` instance to be set 
`_cachedColumnBuffers` to `null` is not the same instance that will be executed 
by the `DataFrame` because it was copied by `withOutput` when `CacheManager` 
replace the logical plan for the `DataFrame`.
So we don't need to set it to null and the original one will be collected 
by GC soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13393: [SPARK-14615][ML][FOLLOWUP] Fix Python examples t...

2016-06-10 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13393


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13544: [SPARK-15805][SQL][Documents] update sql programming gui...

2016-06-10 Thread liancheng

Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/13544
  
Ah, too bad... I wasn't aware of this PR when I was doing #13592. Will 
review this one to see whether I missed something in #13592. Thanks for working 
on this!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13381: [SPARK-15608][ml][examples][doc] add examples and...

2016-06-10 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/13381#discussion_r66698378
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/mllib/JavaIsotonicRegressionExample.java
 ---
@@ -35,14 +37,15 @@ public static void main(String[] args) {
 SparkConf sparkConf = new 
SparkConf().setAppName("JavaIsotonicRegressionExample");
 JavaSparkContext jsc = new JavaSparkContext(sparkConf);
 // $example on$
-JavaRDD data = 
jsc.textFile("data/mllib/sample_isotonic_regression_data.txt");
+JavaRDD data = MLUtils.loadLibSVMFile(
+jsc.sc(), 
"data/mllib/sample_isotonic_regression_libsvm_data.txt").toJavaRDD();
--- End diff --

Fix indentation: indent by 2 spaces here and elsewhere


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13381: [SPARK-15608][ml][examples][doc] add examples and...

2016-06-10 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/13381#discussion_r66698376
  
--- Diff: docs/ml-classification-regression.md ---
@@ -685,6 +685,76 @@ The implementation matches the result from R's 
survival function
 
 
 
+## Isotonic regression
+[Isotonic regression](http://en.wikipedia.org/wiki/Isotonic_regression)
+belongs to the family of regression algorithms. Formally isotonic 
regression is a problem where
+given a finite set of real numbers `$Y = {y_1, y_2, ..., y_n}$` 
representing observed responses
+and `$X = {x_1, x_2, ..., x_n}$` the unknown response values to be fitted
+finding a function that minimises
+
+`\begin{equation}
+  f(x) = \sum_{i=1}^n w_i (y_i - x_i)^2
+\end{equation}`
+
+with respect to complete order subject to
+`$x_1\le x_2\le ...\le x_n$` where `$w_i$` are positive weights.
+The resulting function is called isotonic regression and it is unique.
+It can be viewed as least squares problem under order restriction.
+Essentially isotonic regression is a
+[monotonic function](http://en.wikipedia.org/wiki/Monotonic_function)
+best fitting the original data points.
+
+MLlib supports a
+[pool adjacent violators algorithm](http://doi.org/10.1198/TECH.2010.10111)
+which uses an approach to
+[parallelizing isotonic 
regression](http://doi.org/10.1007/978-3-642-99789-1_10).
+The training input is a RDD of tuples of three double values that represent
--- End diff --

not an RDD


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13381: [SPARK-15608][ml][examples][doc] add examples and...

2016-06-10 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/13381#discussion_r66698380
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/ml/IsotonicRegressionExample.scala
 ---
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// scalastyle:off println
+package org.apache.spark.examples.ml
+
+// $example on$
+import org.apache.spark.ml.regression.IsotonicRegression
+// $example off$
+import org.apache.spark.sql.SparkSession
+
+/**
+ * An example demonstrating Isotonic Regression.
+ * Run with
+ * {{{
+ * bin/run-example ml.IsotonicRegressionExample
+ * }}}
+ */
+object IsotonicRegressionExample {
+
+  def main(args: Array[String]): Unit = {
+
+// Creates a SparkSession.
+val spark = SparkSession
+  .builder
+  .appName(s"${this.getClass.getSimpleName}")
+  .getOrCreate()
+
+// $example on$
+// Loads data.
+val dataset = spark.read.format("libsvm")
+  .load("data/mllib/sample_isotonic_regression_libsvm_data.txt")
+
+// Trains an isotonic regression model.
+val ir = new IsotonicRegression()
+val model = ir.fit(dataset)
+
+println(s"Boundaries in increasing order: ${model.boundaries}")
+println(s"Predictions associated with the boundaries: 
${model.predictions}")
+
+// Makes predictions.
+model.transform(dataset).show
--- End diff --

"show" --> "show()"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13371: [SPARK-15639][SQL] Try to push down filter at Row...

2016-06-10 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13371


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13371: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-10 Thread liancheng

Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/13371
  
@yhuai We used to support row group level filter push-down before 
refactoring `HadoopFsRelation` into `FileFormat`, but lost it (by accident I 
guess) after the refactoring. So now we only have row group level filtering 
when the vectorized reader is not used, [see here][1].

And yes, both `ParquetInputFormat` and `ParquetRecordReader` do row group 
level filtering.

This LGTM. Thanks for fixing it! Merging to master and 2.0.

[1]: 
https://github.com/apache/spark/blob/54f758b5fc60ecb0da6b191939a72ef5829be38c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L371-L378


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13613: [SPARK-15889][SQL][STREAMING] Add a unique id to Continu...

2016-06-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13613
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13613: [SPARK-15889][SQL][STREAMING] Add a unique id to Continu...

2016-06-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13613
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60324/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13613: [SPARK-15889][SQL][STREAMING] Add a unique id to Continu...

2016-06-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13613
  
**[Test build #60324 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60324/consoleFull)**
 for PR 13613 at commit 
[`d686df4`](https://github.com/apache/spark/commit/d686df49399d5387721e2ad761a23eb1a63a0890).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13612: [SPARK-15851] [Build] Fix the call of the bash script to...

2016-06-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13612
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60323/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13595: [MINOR][SQL] Standardize 'continuous queries' to 'stream...

2016-06-10 Thread lw-lin

Github user lw-lin commented on the issue:

https://github.com/apache/spark/pull/13595
  
@zsxwing @tdas, sure, this can wait. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13612: [SPARK-15851] [Build] Fix the call of the bash script to...

2016-06-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13612
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13612: [SPARK-15851] [Build] Fix the call of the bash script to...

2016-06-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13612
  
**[Test build #60323 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60323/consoleFull)**
 for PR 13612 at commit 
[`aa1927f`](https://github.com/apache/spark/commit/aa1927f19ed24af60001fd822898cae51043f8e4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13570: [SPARK-15832][SQL] Embedded IN/EXISTS predicate subquery...

2016-06-10 Thread hvanhovell

Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/13570
  
@ioana-delaney no worries. I think the approach you have taken is the 
correct one. I have left one smallish comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13570: [SPARK-15832][SQL] Embedded IN/EXISTS predicate s...

2016-06-10 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/13570#discussion_r66697830
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -1715,31 +1715,52 @@ object RewritePredicateSubquery extends 
Rule[LogicalPlan] with PredicateHelper {
   // Filter the plan by applying left semi and left anti joins.
   withSubquery.foldLeft(newFilter) {
 case (p, PredicateSubquery(sub, conditions, _, _)) =>
-  Join(p, sub, LeftSemi, conditions.reduceOption(And))
+  val (joinCond, outerPlan) = 
rewriteExistentialExpr(conditions.reduceOption(And), p)
+  Join(outerPlan, sub, LeftSemi, joinCond)
 case (p, Not(PredicateSubquery(sub, conditions, false, _))) =>
-  Join(p, sub, LeftAnti, conditions.reduceOption(And))
+  val (joinCond, outerPlan) = 
rewriteExistentialExpr(conditions.reduceOption(And), p)
+  Join(outerPlan, sub, LeftAnti, joinCond)
 case (p, Not(PredicateSubquery(sub, conditions, true, _))) =>
-  // This is a NULL-aware (left) anti join (NAAJ).
+  // This is a NULL-aware (left) anti join (NAAJ) e.g. col NOT IN 
expr
   // Construct the condition. A NULL in one of the conditions is 
regarded as a positive
   // result; such a row will be filtered out by the Anti-Join 
operator.
-  val anyNull = conditions.map(IsNull).reduceLeft(Or)
-  val condition = conditions.reduceLeft(And)
 
-  // Note that will almost certainly be planned as a Broadcast 
Nested Loop join. Use EXISTS
-  // if performance matters to you.
-  Join(p, sub, LeftAnti, Option(Or(anyNull, condition)))
+  // Note that will almost certainly be planned as a Broadcast 
Nested Loop join.
+  // Use EXISTS if performance matters to you.
+  val (joinCond, outerPlan) = 
rewriteExistentialExpr(conditions.reduceLeftOption(And), p)
+  val anyNull = 
splitConjunctivePredicates(joinCond.get).map(IsNull).reduceLeft(Or)
+  Join(outerPlan, sub, LeftAnti, Option(Or(anyNull, joinCond.get)))
 case (p, predicate) =>
-  var joined = p
-  val replaced = predicate transformUp {
-case PredicateSubquery(sub, conditions, nullAware, _) =>
-  // TODO: support null-aware join
-  val exists = AttributeReference("exists", BooleanType, 
nullable = false)()
-  joined = Join(joined, sub, ExistenceJoin(exists), 
conditions.reduceLeftOption(And))
-  exists
-  }
-  Project(p.output, Filter(replaced, joined))
+  val (newCond, inputPlan) = 
rewriteExistentialExpr(Option(predicate), p)
+  Project(p.output, Filter(newCond.get, inputPlan))
   }
   }
+
+  /**
+   * Given a predicate expression and an input plan, it rewrites
+   * any embedded existential sub-query into an existential join.
+   * It returns the rewritten expression together with the updated plan.
+   * Currently, it does not support null-aware joins. Embedded NOT IN 
predicates
+   * are blocked in the Analyzer.
+   */
+  private def rewriteExistentialExpr(
+  expr: Option[Expression],
+  plan: LogicalPlan): (Option[Expression], LogicalPlan) = {
+var newPlan = plan
--- End diff --

Move this down to the Some(case). A bit of mutability is not a problem 
though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13570: [SPARK-15832][SQL] Embedded IN/EXISTS predicate s...

2016-06-10 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/13570#discussion_r66697790
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -1715,31 +1715,52 @@ object RewritePredicateSubquery extends 
Rule[LogicalPlan] with PredicateHelper {
   // Filter the plan by applying left semi and left anti joins.
   withSubquery.foldLeft(newFilter) {
 case (p, PredicateSubquery(sub, conditions, _, _)) =>
-  Join(p, sub, LeftSemi, conditions.reduceOption(And))
+  val (joinCond, outerPlan) = 
rewriteExistentialExpr(conditions.reduceOption(And), p)
+  Join(outerPlan, sub, LeftSemi, joinCond)
 case (p, Not(PredicateSubquery(sub, conditions, false, _))) =>
-  Join(p, sub, LeftAnti, conditions.reduceOption(And))
+  val (joinCond, outerPlan) = 
rewriteExistentialExpr(conditions.reduceOption(And), p)
+  Join(outerPlan, sub, LeftAnti, joinCond)
 case (p, Not(PredicateSubquery(sub, conditions, true, _))) =>
-  // This is a NULL-aware (left) anti join (NAAJ).
+  // This is a NULL-aware (left) anti join (NAAJ) e.g. col NOT IN 
expr
   // Construct the condition. A NULL in one of the conditions is 
regarded as a positive
   // result; such a row will be filtered out by the Anti-Join 
operator.
-  val anyNull = conditions.map(IsNull).reduceLeft(Or)
-  val condition = conditions.reduceLeft(And)
 
-  // Note that will almost certainly be planned as a Broadcast 
Nested Loop join. Use EXISTS
-  // if performance matters to you.
-  Join(p, sub, LeftAnti, Option(Or(anyNull, condition)))
+  // Note that will almost certainly be planned as a Broadcast 
Nested Loop join.
+  // Use EXISTS if performance matters to you.
+  val (joinCond, outerPlan) = 
rewriteExistentialExpr(conditions.reduceLeftOption(And), p)
+  val anyNull = 
splitConjunctivePredicates(joinCond.get).map(IsNull).reduceLeft(Or)
+  Join(outerPlan, sub, LeftAnti, Option(Or(anyNull, joinCond.get)))
 case (p, predicate) =>
-  var joined = p
-  val replaced = predicate transformUp {
-case PredicateSubquery(sub, conditions, nullAware, _) =>
-  // TODO: support null-aware join
-  val exists = AttributeReference("exists", BooleanType, 
nullable = false)()
-  joined = Join(joined, sub, ExistenceJoin(exists), 
conditions.reduceLeftOption(And))
-  exists
-  }
-  Project(p.output, Filter(replaced, joined))
+  val (newCond, inputPlan) = 
rewriteExistentialExpr(Option(predicate), p)
+  Project(p.output, Filter(newCond.get, inputPlan))
   }
   }
+
+  /**
+   * Given a predicate expression and an input plan, it rewrites
+   * any embedded existential sub-query into an existential join.
+   * It returns the rewritten expression together with the updated plan.
+   * Currently, it does not support null-aware joins. Embedded NOT IN 
predicates
+   * are blocked in the Analyzer.
+   */
+  private def rewriteExistentialExpr(
+  expr: Option[Expression],
--- End diff --

Lets just pass a sequence of expressions. Predicate subqueries are 
guaranteed to have one or more conditions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13596: [SPARK-15870][SQL] DataFrame can't execute after uncache...

2016-06-10 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/13596
  
My suggestion is: in `InMemoryRelation.uncache`, we set `batchStats` to 
null at the end, when this `InMemoryRelation` get executed again, it will 
regenerate the accumulator and register it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13381: [SPARK-15608][ml][examples][doc] add examples and docume...

2016-06-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13381
  
**[Test build #3079 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3079/consoleFull)**
 for PR 13381 at commit 
[`13027b7`](https://github.com/apache/spark/commit/13027b79bbe8e77119207cc8810a775bca022c32).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 >

1 - 100 of 618 matches

Mail list logo