[GitHub] spark pull request #15421: [SPARK-17811] SparkR cannot parallelize data.fram...

2016-10-11 Thread falaki
Github user falaki commented on a diff in the pull request:

https://github.com/apache/spark/pull/15421#discussion_r82940884
  
--- Diff: core/src/main/scala/org/apache/spark/api/r/SerDe.scala ---
@@ -125,15 +125,34 @@ private[spark] object SerDe {
   }
 
   def readDate(in: DataInputStream): Date = {
-Date.valueOf(readString(in))
+try {
+  val inStr = readString(in)
+  if (inStr == "NA") {
+null
+  } else {
+Date.valueOf(inStr)
+  }
+} catch {
+  // On windows we get NegativeArraySizeException for NAs in R
--- End diff --

No. I will revert this change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15447: [SPARK-14804][Graphx] Graph vertexRDD/EdgeRDD checkpoint...

2016-10-11 Thread apivovarov
Github user apivovarov commented on the issue:

https://github.com/apache/spark/pull/15447
  
Related PRs
https://github.com/apache/spark/pull/15396
https://github.com/apache/spark/pull/12576


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15447: [SPARK-14804][Graphx] Graph vertexRDD/EdgeRDD checkpoint...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15447
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15375: [SPARK-17790][SPARKR] Support for parallelizing R data.f...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15375
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15375: [SPARK-17790][SPARKR] Support for parallelizing R data.f...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15375
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66792/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15447: [SPARK-14804][Graphx] Graph vertexRDD/EdgeRDD che...

2016-10-11 Thread apivovarov
GitHub user apivovarov opened a pull request:

https://github.com/apache/spark/pull/15447

[SPARK-14804][Graphx] Graph vertexRDD/EdgeRDD checkpoint results Clas…

EdgeRDD/VertexRDD wraps partitionsRDD
e.g. `EdgeRDDImpl.checkpoint()` calls `partitionsRDD.checkpoint()`
EdgeRDD/VertexRDD `isCheckpointed()` method should be implemented the same 
way - it should call `partitionsRDD.isCheckpointed`

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apivovarov/spark 14804

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15447.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15447


commit b123b68589d59d65db6210f1792a48d7f94e09bb
Author: Alexander Pivovarov 
Date:   2016-10-12T05:48:37Z

[SPARK-14804][Graphx] Graph vertexRDD/EdgeRDD checkpoint results 
ClassCastException




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15445: [SPARK-17817][PySpark][FOLLOWUP] PySpark RDD Repartition...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15445
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66789/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15445: [SPARK-17817][PySpark][FOLLOWUP] PySpark RDD Repartition...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15445
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15375: [SPARK-17790][SPARKR] Support for parallelizing R data.f...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15375
  
**[Test build #66792 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66792/consoleFull)**
 for PR 15375 at commit 
[`836e874`](https://github.com/apache/spark/commit/836e8745c346c59f78958e10aec1c6f9537242b9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15445: [SPARK-17817][PySpark][FOLLOWUP] PySpark RDD Repartition...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15445
  
**[Test build #66789 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66789/consoleFull)**
 for PR 15445 at commit 
[`be6d153`](https://github.com/apache/spark/commit/be6d1537e9bbd2cc2484e4d8da9d901b16725c97).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/9766
  
**[Test build #66794 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66794/consoleFull)**
 for PR 9766 at commit 
[`45a9b7a`](https://github.com/apache/spark/commit/45a9b7af6afbb2ab1287cc41fafbaa1de823eafa).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/9766
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66794/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/9766
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/9766
  
**[Test build #66794 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66794/consoleFull)**
 for PR 9766 at commit 
[`45a9b7a`](https://github.com/apache/spark/commit/45a9b7af6afbb2ab1287cc41fafbaa1de823eafa).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15230: [SPARK-17657] [SQL] Disallow Users to Change Tabl...

2016-10-11 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15230#discussion_r82940270
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -225,6 +225,11 @@ case class AlterTableSetPropertiesCommand(
 val catalog = sparkSession.sessionState.catalog
 val table = catalog.getTableMetadata(tableName)
 DDLUtils.verifyAlterTableType(catalog, table, isView)
+// Not allowed to switch the table type.
+if (properties.contains("EXTERNAL")) {
--- End diff --

This is officially documented in the Hive document, as shown in the 
[link](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL):
`TBLPROPERTIES ("EXTERNAL"="TRUE") in release 0.6.0+ (HIVE-1329) – Change 
a managed table to an external table and vice versa for "FALSE".`

This is the only property users are not allowed to change. The other 
Hive-specific properties are still allowed to change, because Hive also allows 
it. 

For the our Spark-reserved properties, users are not allowed to change. See 
the function call `verifyTableProperties` in 
`[alterTable](https://github.com/apache/spark/blob/b9a147181d5e38d9abed0c7215f4c5cb695f579c/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala#L393)`.
 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15173: [SPARK-15698][SQL][Streaming][Follw-up]Fix FileStream so...

2016-10-11 Thread tdas
Github user tdas commented on the issue:

https://github.com/apache/spark/pull/15173
  
@zsxwing Why was not this merge to 2.0?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15439: [SPARK-17880][DOC] The url linking to `Accumulato...

2016-10-11 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15439


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15427: [SPARK-17866][SPARK-17867][SQL] Fix Dataset.dropduplicat...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15427
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15427: [SPARK-17866][SPARK-17867][SQL] Fix Dataset.dropduplicat...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15427
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66790/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15427: [SPARK-17866][SPARK-17867][SQL] Fix Dataset.dropduplicat...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15427
  
**[Test build #66790 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66790/consoleFull)**
 for PR 15427 at commit 
[`81339dc`](https://github.com/apache/spark/commit/81339dc429104633ee28cf078f643b5050564557).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15439: [SPARK-17880][DOC] The url linking to `AccumulatorV2` in...

2016-10-11 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/15439
  
Thanks - merging in master/2.0.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15440: Fix hadoop.version in building-spark.md

2016-10-11 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15440


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15440: Fix hadoop.version in building-spark.md

2016-10-11 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/15440
  
Thanks - merging in master/branch-2.0.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15434: [SPARK-17873][SQL] ALTER TABLE RENAME TO should a...

2016-10-11 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/15434#discussion_r82938529
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -459,11 +459,20 @@ class SessionCatalog(
* If a database is specified in `oldName`, this will rename the table 
in that database.
* If no database is specified, this will first attempt to rename a 
temporary table with
* the same name, then, if that does not exist, rename the table in the 
current database.
+   *
+   * This assumes the database specified in `newName` matches the one in 
`oldName`.
*/
-  def renameTable(oldName: TableIdentifier, newName: String): Unit = 
synchronized {
+  def renameTable(oldName: TableIdentifier, newName: TableIdentifier): 
Unit = synchronized {
 val db = formatDatabaseName(oldName.database.getOrElse(currentDb))
+newName.database.map(formatDatabaseName).foreach { newDb =>
--- End diff --

see PR description, we should use the database of source table, so that 
users can just write `db.tbl1 RENAME TO tbl2`. This is different from Hive, as 
we don't support move table from one database to another.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15423: [SPARK-17860][SQL] SHOW COLUMN's database conflic...

2016-10-11 Thread dilipbiswal
Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/15423#discussion_r82938410
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala ---
@@ -207,6 +208,7 @@ class SQLQueryTestSuite extends QueryTest with 
SharedSQLContext {
 // Returns true if the plan is supposed to be sorted.
 def isSorted(plan: LogicalPlan): Boolean = plan match {
   case _: Join | _: Aggregate | _: Generate | _: Sample | _: Distinct 
=> false
+  case _: ShowColumnsCommand => true
--- End diff --

@cloud-fan @viirya Thanks :-) I will change it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15434: [SPARK-17873][SQL] ALTER TABLE RENAME TO should allow us...

2016-10-11 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15434
  
Just FYI. Hive allows the following changes:
```SQL
ALTER TABLE db1.tbl RENAME TO db2.tbl2
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15406: [Spark-17745][ml][PySpark] update NB python api - add we...

2016-10-11 Thread sethah
Github user sethah commented on the issue:

https://github.com/apache/spark/pull/15406
  
We should add weights to the doctests to demonstrate them and make sure 
they're working.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15423: [SPARK-17860][SQL] SHOW COLUMN's database conflic...

2016-10-11 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/15423#discussion_r82937473
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala ---
@@ -207,6 +208,7 @@ class SQLQueryTestSuite extends QueryTest with 
SharedSQLContext {
 // Returns true if the plan is supposed to be sorted.
 def isSorted(plan: LogicalPlan): Boolean = plan match {
   case _: Join | _: Aggregate | _: Generate | _: Sample | _: Distinct 
=> false
+  case _: ShowColumnsCommand => true
--- End diff --

+1 as mentioned in previous comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-11 Thread tdas
Github user tdas commented on the issue:

https://github.com/apache/spark/pull/15307
  
@marmbrus Could you take a look.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #11610: [SPARK-13777] [ML] Remove constant features from trainin...

2016-10-11 Thread sethah
Github user sethah commented on the issue:

https://github.com/apache/spark/pull/11610
  
This problem should be handled by 
https://github.com/apache/spark/pull/15394 if it is merged. It seems this is no 
longer active, and we are pursuing alternative solutions. Shall we close this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15423: [SPARK-17860][SQL] SHOW COLUMN's database conflic...

2016-10-11 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/15423#discussion_r82937255
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala ---
@@ -207,6 +208,7 @@ class SQLQueryTestSuite extends QueryTest with 
SharedSQLContext {
 // Returns true if the plan is supposed to be sorted.
 def isSorted(plan: LogicalPlan): Boolean = plan match {
   case _: Join | _: Aggregate | _: Generate | _: Sample | _: Distinct 
=> false
+  case _: ShowColumnsCommand => true
--- End diff --

marking `ShowColumnsCommand` as sorted is more weird, I'd like to leave the 
result sorted.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #9008: [SPARK-9478] [ml] Add class weights to Random Forest

2016-10-11 Thread sethah
Github user sethah commented on the issue:

https://github.com/apache/spark/pull/9008
  
@rotationsymmetry Could you please close this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15375: [SPARK-17790][SPARKR] Support for parallelizing R data.f...

2016-10-11 Thread tdas
Github user tdas commented on the issue:

https://github.com/apache/spark/pull/15375
  
@falaki @felixcheung  The DirectKafkaStreamSuite is a known flaky test. 
Nothing in this patch should affect Kafka. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15414: [SPARK-17848][ML] Move LabelCol datatype cast int...

2016-10-11 Thread sethah
Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/15414#discussion_r82931901
  
--- Diff: mllib/src/test/scala/org/apache/spark/ml/PredictorSuite.scala ---
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.ml.linalg._
+import org.apache.spark.ml.param.ParamMap
+import org.apache.spark.ml.util._
+import org.apache.spark.mllib.util.MLlibTestSparkContext
+import org.apache.spark.sql.{DataFrame, Dataset}
+import org.apache.spark.sql.types._
+
+class PredictorSuite extends SparkFunSuite with MLlibTestSparkContext with 
DefaultReadWriteTest {
+
+  import testImplicits._
+
+  class MockPredictor(override val uid: String)
--- End diff --

move into companion object.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15414: [SPARK-17848][ML] Move LabelCol datatype cast int...

2016-10-11 Thread sethah
Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/15414#discussion_r82932068
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/Predictor.scala ---
@@ -121,10 +122,18 @@ abstract class Predictor[
* and put it in an RDD with strong types.
*/
   protected def extractLabeledPoints(dataset: Dataset[_]): 
RDD[LabeledPoint] = {
-dataset.select(col($(labelCol)).cast(DoubleType), 
col($(featuresCol))).rdd.map {
+dataset.select(col($(labelCol)), col($(featuresCol))).rdd.map {
   case Row(label: Double, features: Vector) => LabeledPoint(label, 
features)
 }
   }
+
+  /**
+   * Return the given DataFrame, with [[labelCol]] casted to DoubleType.
+   */
+protected def castDataSet(dataset: Dataset[_]): DataFrame = {
--- End diff --

let's just put this logic directly in `fit`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15414: [SPARK-17848][ML] Move LabelCol datatype cast int...

2016-10-11 Thread sethah
Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/15414#discussion_r82935295
  
--- Diff: mllib/src/test/scala/org/apache/spark/ml/PredictorSuite.scala ---
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.ml.linalg._
+import org.apache.spark.ml.param.ParamMap
+import org.apache.spark.ml.util._
+import org.apache.spark.mllib.util.MLlibTestSparkContext
+import org.apache.spark.sql.{DataFrame, Dataset}
+import org.apache.spark.sql.types._
+
+class PredictorSuite extends SparkFunSuite with MLlibTestSparkContext with 
DefaultReadWriteTest {
+
+  import testImplicits._
+
+  class MockPredictor(override val uid: String)
+extends Predictor[Vector, MockPredictor, MockPredictionModel] {
+
+override def train(dataset: Dataset[_]): MockPredictionModel = {
+  require(dataset.schema("label").dataType == DoubleType)
+  new MockPredictionModel(uid)
+}
+
+override def copy(extra: ParamMap): MockPredictor = defaultCopy(extra)
+  }
+
+  class MockPredictionModel(override val uid: String)
+extends PredictionModel[Vector, MockPredictionModel] {
+
+override def predict(features: Vector): Double = 1.0
--- End diff --

`override def predict(features: Vector): Double = throw new 
NotImplementedError()` We can do this for everything except `train`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15414: [SPARK-17848][ML] Move LabelCol datatype cast int...

2016-10-11 Thread sethah
Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/15414#discussion_r82932894
  
--- Diff: mllib/src/test/scala/org/apache/spark/ml/PredictorSuite.scala ---
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.ml.linalg._
+import org.apache.spark.ml.param.ParamMap
+import org.apache.spark.ml.util._
+import org.apache.spark.mllib.util.MLlibTestSparkContext
+import org.apache.spark.sql.{DataFrame, Dataset}
+import org.apache.spark.sql.types._
+
+class PredictorSuite extends SparkFunSuite with MLlibTestSparkContext with 
DefaultReadWriteTest {
+
+  import testImplicits._
+
+  class MockPredictor(override val uid: String)
+extends Predictor[Vector, MockPredictor, MockPredictionModel] {
+
+override def train(dataset: Dataset[_]): MockPredictionModel = {
+  require(dataset.schema("label").dataType == DoubleType)
+  new MockPredictionModel(uid)
+}
+
+override def copy(extra: ParamMap): MockPredictor = defaultCopy(extra)
+  }
+
+  class MockPredictionModel(override val uid: String)
+extends PredictionModel[Vector, MockPredictionModel] {
+
+override def predict(features: Vector): Double = 1.0
+
+override def copy(extra: ParamMap): MockPredictionModel = 
defaultCopy(extra)
+  }
+
+  test("should support all NumericType labels and not support other 
types") {
+val predictor = new MockPredictor("mock")
+MLTestingUtils.checkNumericTypes[MockPredictionModel, MockPredictor](
--- End diff --

Why don't we just cycle through the types here and call `fit`. I think it's 
a bit confusing the way it is now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15414: [SPARK-17848][ML] Move LabelCol datatype cast int...

2016-10-11 Thread sethah
Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/15414#discussion_r82932799
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/util/MLTestingUtils.scala ---
@@ -117,7 +117,7 @@ object MLTestingUtils extends SparkFunSuite {
   Seq(ShortType, LongType, IntegerType, FloatType, ByteType, 
DoubleType, DecimalType(10, 0))
 types.map { t =>
 val castDF = df.select(col(labelColName).cast(t), 
col(featuresColName))
-t -> TreeTests.setMetadata(castDF, 2, labelColName, 
featuresColName)
+t -> TreeTests.setMetadata(castDF, 0, labelColName, 
featuresColName)
--- End diff --

What is this for? If the intent is to force `getNumClasses` to infer the 
number of classes, then you're no longer testing the not inferred case. 
Further, the point of this PR is to eliminate the need to do that since it is 
not a robust solution, IMO. 

Also, I'd like to remove the dependence on `TreeTests` here (and 
`genRegressionDF`) and just explicitly set the attributes in the functions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15172: [SPARK-13331] AES support for over-the-wire encryption

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15172
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15172: [SPARK-13331] AES support for over-the-wire encryption

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15172
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66786/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15172: [SPARK-13331] AES support for over-the-wire encryption

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15172
  
**[Test build #66786 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66786/consoleFull)**
 for PR 15172 at commit 
[`46b52e6`](https://github.com/apache/spark/commit/46b52e63918376dcf5dde0359fdfe1efa2456dfd).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15307
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15172: [SPARK-13331] AES support for over-the-wire encryption

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15172
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15307
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66784/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15172: [SPARK-13331] AES support for over-the-wire encryption

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15172
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66785/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15307
  
**[Test build #66784 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66784/consoleFull)**
 for PR 15307 at commit 
[`35bf508`](https://github.com/apache/spark/commit/35bf5089f0d79ba0ba007ca9983a75616f1a553d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15172: [SPARK-13331] AES support for over-the-wire encryption

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15172
  
**[Test build #66785 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66785/consoleFull)**
 for PR 15172 at commit 
[`0bf663f`](https://github.com/apache/spark/commit/0bf663f0d8a71b2944d4030dc0ef95e36ee35471).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15446: [SPARK-17882][SparkR] Fix swallowed exception in RBacken...

2016-10-11 Thread falaki
Github user falaki commented on the issue:

https://github.com/apache/spark/pull/15446
  
@shivaram yes I just noticed it during my debugging and fixed it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15335: [SPARK-17769][Core][Scheduler]Some FetchFailure r...

2016-10-11 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/15335#discussion_r82933318
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -1255,27 +1255,46 @@ class DAGScheduler(
   s"longer running")
   }
 
-  if (disallowStageRetryForTest) {
-abortStage(failedStage, "Fetch failure will not retry stage 
due to testing config",
-  None)
-  } else if 
(failedStage.failedOnFetchAndShouldAbort(task.stageAttemptId)) {
-abortStage(failedStage, s"$failedStage (${failedStage.name}) " 
+
-  s"has failed the maximum allowable number of " +
-  s"times: ${Stage.MAX_CONSECUTIVE_FETCH_FAILURES}. " +
-  s"Most recent failure reason: ${failureMessage}", None)
-  } else {
-if (failedStages.isEmpty) {
-  // Don't schedule an event to resubmit failed stages if 
failed isn't empty, because
-  // in that case the event will already have been scheduled.
-  // TODO: Cancel running tasks in the stage
-  logInfo(s"Resubmitting $mapStage (${mapStage.name}) and " +
-s"$failedStage (${failedStage.name}) due to fetch failure")
-  messageScheduler.schedule(new Runnable {
-override def run(): Unit = 
eventProcessLoop.post(ResubmitFailedStages)
-  }, DAGScheduler.RESUBMIT_TIMEOUT, TimeUnit.MILLISECONDS)
+  val shouldAbortStage =
+failedStage.failedOnFetchAndShouldAbort(task.stageAttemptId) ||
+disallowStageRetryForTest
+
+  if (shouldAbortStage) {
+val abortMessage = if (disallowStageRetryForTest) {
+  "Fetch failure will not retry stage due to testing config"
+} else {
+  s"""$failedStage (${failedStage.name})
+ |has failed the maximum allowable number of
+ |times: ${Stage.MAX_CONSECUTIVE_FETCH_FAILURES}.
+ |Most recent failure reason: 
$failureMessage""".stripMargin.replaceAll("\n", " ")
 }
+abortStage(failedStage, abortMessage, None)
+  } else { // update failedStages and make sure a 
ResubmitFailedStages event is enqueued
+// TODO: Cancel running tasks in the failed stage -- cf. 
SPARK-17064
+val noResubmitEnqueued = !failedStages.contains(failedStage)
--- End diff --

I think I was worried about the opposite problem -- perhaps we add 
`mapStage` to `failedStages`, but fail to fire a `Resubmit` event.  Maybe too 
many negatives to think through this clearly -- my intention was *more* logging 
& resubmission, not less.  I suppose I was thinking of it as:

```scala
val addedToFailedStages = failedStages.add(failedStage) | 
failedStages.add(mapStage)
if (addedToFailedStage) {
  logStuff()
  resubmit()
}
```

the point being, to avoid another case of the bug which started this all -- 
you add to `failedStages`, but fail to ever `Resubmit`.

I was thinking of something more like this (though as you'll see, this case 
is fine).  Say you have two jobs submitted concurrently, which share the first 
few stages.  A -> B -> C and A -> B -> D.   There is an executor failure while 
they are both running their independent parts, C & D, concurrently.  The 
failure is detected in C first, so it marks B & C as failed.  Later on, the 
failure is detected in D, it marks B & D as failed.  If the first resubmit was 
already processed, its fine, B is already running, and we mark D as waiting on 
D.  Similarly, its fine if the resubmit wasn't processed yet when the failure 
is detected in D-- then when the resubmit is processed, we resubmit all 3 
stages.

I think it also works out even if stage A needs to get resubmitted as well 
-- its handled in the same call that does the resubmit for B, when it checks 
for missing parents.  (In fact, thinking through these cases makes me think we 
don't even need to resubmit the `mapStage` at all -- the `failedStage` will 
submit itself on its resubmit, since it will notice its parents aren't ready.  
Which is why there isn't a case where this check would really mater.)

Anyway, the point is not that I could show you of a case were we *do* need 
to make sure there is a resubmit.  The point is that I'm *not* sure that we do 
*not* need it, which is why I thought it was better to err on the side of 
over-logging / resubmitting


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at 

[GitHub] spark pull request #15422: [SPARK-17850][Core]Add a flag to ignore corrupt f...

2016-10-11 Thread mridulm
Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/15422#discussion_r82932947
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -588,6 +588,12 @@ object SQLConf {
   .doubleConf
   .createWithDefault(0.05)
 
+  val IGNORE_CORRUPT_FILES = 
SQLConfigBuilder("spark.sql.files.ignoreCorruptFiles")
+.doc("Whether to ignore corrupt files. If true, the Spark jobs will 
continue to run when " +
+  "encountering corrupt files and contents that have been read will 
still be returned.")
+.booleanConf
+.createWithDefault(false)
+
--- End diff --

Curious why we are duplicating the parameter in sql namespace. Wont 
spark.files.ignoreCorruptFiles not do ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15422: [SPARK-17850][Core]Add a flag to ignore corrupt f...

2016-10-11 Thread mridulm
Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/15422#discussion_r82933077
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/config/package.scala ---
@@ -170,4 +170,9 @@ package object config {
 .doc("Port to use for the block managed on the driver.")
 .fallbackConf(BLOCK_MANAGER_PORT)
 
+  private[spark] val IGNORE_CORRUPT_FILES = 
ConfigBuilder("spark.files.ignoreCorruptFiles")
+.doc("Whether to ignore corrupt files. If true, the Spark jobs will 
continue to run when " +
+  "encountering corrupt files and contents that have been read will 
still be returned.")
+.booleanConf
+.createWithDefault(false)
--- End diff --

So either way we will have a behavioral change - if NewHadoopRDD vs 
HadoopRDD.
IMO that is fine, given that we are standardizing on the behavior and this 
is something which was a corner case anyway.

Setting default to false makes sense.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15422: [SPARK-17850][Core]Add a flag to ignore corrupt f...

2016-10-11 Thread mridulm
Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/15422#discussion_r82932992
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala ---
@@ -179,7 +183,16 @@ class NewHadoopRDD[K, V](
 
   override def hasNext: Boolean = {
 if (!finished && !havePair) {
-  finished = !reader.nextKeyValue
+  try {
+finished = !reader.nextKeyValue
+  } catch {
+case e: IOException =>
+  if (ignoreCorruptFiles) {
+finished = true
+  } else {
+throw e
+  }
+  }
--- End diff --

Thanks for changing this too !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15422: [SPARK-17850][Core]Add a flag to ignore corrupt f...

2016-10-11 Thread mridulm
Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/15422#discussion_r82932645
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala ---
@@ -253,8 +256,12 @@ class HadoopRDD[K, V](
 try {
   finished = !reader.next(key, value)
 } catch {
-  case eof: EOFException =>
-finished = true
+  case e: IOException =>
+if (ignoreCorruptFiles) {
+  finished = true
+} else {
+  throw e
+}
--- End diff --

nit: case e: IOException if ignoreCorruptFiles =>
would have been more concise.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15444: [SPARK-17870][MLLIB][ML]Change statistic to pValue for S...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15444
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66787/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15444: [SPARK-17870][MLLIB][ML]Change statistic to pValue for S...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15444
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15444: [SPARK-17870][MLLIB][ML]Change statistic to pValue for S...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15444
  
**[Test build #66787 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66787/consoleFull)**
 for PR 15444 at commit 
[`b98ccdf`](https://github.com/apache/spark/commit/b98ccdfd696cb89cb4793a140c87c498ce5c3086).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/9766
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/9766
  
**[Test build #66793 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66793/consoleFull)**
 for PR 9766 at commit 
[`dc6d5f9`](https://github.com/apache/spark/commit/dc6d5f927d93566ee1c3b935db864f2e517bc7e0).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/9766
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66793/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/9766
  
**[Test build #66793 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66793/consoleFull)**
 for PR 9766 at commit 
[`dc6d5f9`](https://github.com/apache/spark/commit/dc6d5f927d93566ee1c3b935db864f2e517bc7e0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15443: [SPARK-17881] [SQL] Aggregation function for generating ...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15443
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66782/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15443: [SPARK-17881] [SQL] Aggregation function for generating ...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15443
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15443: [SPARK-17881] [SQL] Aggregation function for generating ...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15443
  
**[Test build #66782 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66782/consoleFull)**
 for PR 15443 at commit 
[`a843920`](https://github.com/apache/spark/commit/a843920983914de7efd21608b8f0e39c70b210d7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class StringHistogram(`
  * `  case class StringHistogramInfo(`
  * `  class StringHistogramInfoSerializer `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15375: [SPARK-17790][SPARKR] Support for parallelizing R data.f...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15375
  
**[Test build #66792 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66792/consoleFull)**
 for PR 15375 at commit 
[`836e874`](https://github.com/apache/spark/commit/836e8745c346c59f78958e10aec1c6f9537242b9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15398: [SPARK-17647][SQL] Fix backslash escaping in 'LIK...

2016-10-11 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/15398#discussion_r82931395
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala
 ---
@@ -25,26 +25,25 @@ object StringUtils {
 
   // replace the _ with .{1} exactly match 1 time of any character
   // replace the % with .*, match 0 or more times with any character
-  def escapeLikeRegex(v: String): String = {
-if (!v.isEmpty) {
-  "(?s)" + (' ' +: v.init).zip(v).flatMap {
-case (prev, '\\') => ""
-case ('\\', c) =>
-  c match {
-case '_' => "_"
-case '%' => "%"
-case _ => Pattern.quote("\\" + c)
-  }
-case (prev, c) =>
-  c match {
-case '_' => "."
-case '%' => ".*"
-case _ => Pattern.quote(Character.toString(c))
-  }
-  }.mkString
-} else {
-  v
+  def escapeLikeRegex(str: String): String = {
+val builder = new StringBuilder()
+var escaping = false
+for (next <- str) {
+  if (escaping) {
+builder ++= Pattern.quote(Character.toString(next))
--- End diff --

`\Q\\E\Qa\E` is correct. But doesn't it become `\Qa\E` in this change?

For `\\a`, the prefixing `\\` will go the next branch and enable 
`escaping`. Then the next char `a` will be quoted here. So it becomes `\Qa\E`. 
BTW, before this change, it will be `\Q\a\E`. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15446: [SPARK-17882][SparkR] Fix swallowed exception in RBacken...

2016-10-11 Thread shivaram
Github user shivaram commented on the issue:

https://github.com/apache/spark/pull/15446
  
cc @falaki  Is this also a part of #15375 ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15446: [SPARK-17882][SparkR] Fix swallowed exception in RBacken...

2016-10-11 Thread shivaram
Github user shivaram commented on the issue:

https://github.com/apache/spark/pull/15446
  
Thanks @jrshust for the PR.

Jenkins, ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15335: [SPARK-17769][Core][Scheduler]Some FetchFailure r...

2016-10-11 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/15335#discussion_r82931294
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -1255,27 +1255,46 @@ class DAGScheduler(
   s"longer running")
   }
 
-  if (disallowStageRetryForTest) {
-abortStage(failedStage, "Fetch failure will not retry stage 
due to testing config",
-  None)
-  } else if 
(failedStage.failedOnFetchAndShouldAbort(task.stageAttemptId)) {
-abortStage(failedStage, s"$failedStage (${failedStage.name}) " 
+
-  s"has failed the maximum allowable number of " +
-  s"times: ${Stage.MAX_CONSECUTIVE_FETCH_FAILURES}. " +
-  s"Most recent failure reason: ${failureMessage}", None)
-  } else {
-if (failedStages.isEmpty) {
-  // Don't schedule an event to resubmit failed stages if 
failed isn't empty, because
-  // in that case the event will already have been scheduled.
-  // TODO: Cancel running tasks in the stage
-  logInfo(s"Resubmitting $mapStage (${mapStage.name}) and " +
-s"$failedStage (${failedStage.name}) due to fetch failure")
-  messageScheduler.schedule(new Runnable {
-override def run(): Unit = 
eventProcessLoop.post(ResubmitFailedStages)
-  }, DAGScheduler.RESUBMIT_TIMEOUT, TimeUnit.MILLISECONDS)
+  val shouldAbortStage =
+failedStage.failedOnFetchAndShouldAbort(task.stageAttemptId) ||
+disallowStageRetryForTest
+
+  if (shouldAbortStage) {
+val abortMessage = if (disallowStageRetryForTest) {
+  "Fetch failure will not retry stage due to testing config"
+} else {
+  s"""$failedStage (${failedStage.name})
+ |has failed the maximum allowable number of
+ |times: ${Stage.MAX_CONSECUTIVE_FETCH_FAILURES}.
+ |Most recent failure reason: 
$failureMessage""".stripMargin.replaceAll("\n", " ")
 }
+abortStage(failedStage, abortMessage, None)
+  } else { // update failedStages and make sure a 
ResubmitFailedStages event is enqueued
+// TODO: Cancel running tasks in the failed stage -- cf. 
SPARK-17064
+val noResubmitEnqueued = !failedStages.contains(failedStage)
 failedStages += failedStage
 failedStages += mapStage
+if (noResubmitEnqueued) {
+  // We expect one executor failure to trigger many 
FetchFailures in rapid succession,
+  // but all of those task failures can typically be handled 
by a single resubmission of
+  // the failed stage.  We avoid flooding the scheduler's 
event queue with resubmit
+  // messages by checking whether a resubmit is already in the 
event queue for the
+  // failed stage.  If there is already a resubmit enqueued 
for a different failed
+  // stage, that event would also be sufficient to handle the 
current failed stage, but
+  // producing a resubmit for each failed stage makes 
debugging and logging a little
+  // simpler while not producing an overwhelming number of 
scheduler events.
+  logInfo(
+s"Resubmitting $mapStage (${mapStage.name}) and " +
+s"$failedStage (${failedStage.name}) due to fetch failure"
+  )
+  messageScheduler.schedule(
--- End diff --

yeah probably a separate PR, sorry this was just an opportunity for me to 
rant :)

And sorry if I worded it poorly, but I was not suggesting the one w/ 
"Periodically" as a better comment -- in fact I think its a *bad* comment, just 
wanted to mention it was another description which used to be there long ago.

This was my suggestion:

```
If we get one fetch-failure, we often get more fetch failures across 
multiple executors. We will get better parallelism when we resubmit the 
mapStage if we can resubmit when we know about as many of those failures as 
possible. So this is a heuristic to add a small delay to see if we gather a few 
more failures before we resubmit.
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: 

[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/9766
  
Build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/9766
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66791/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15375: [SPARK-17790][SPARKR] Support for parallelizing R data.f...

2016-10-11 Thread shivaram
Github user shivaram commented on the issue:

https://github.com/apache/spark/pull/15375
  
Jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/9766
  
**[Test build #66791 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66791/consoleFull)**
 for PR 9766 at commit 
[`9de8c0e`](https://github.com/apache/spark/commit/9de8c0e7c0a2108b519c8adce7af5162578b04c9).
 * This patch **fails RAT tests**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15427: [SPARK-17866][SPARK-17867][SQL] Fix Dataset.dropduplicat...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15427
  
**[Test build #66790 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66790/consoleFull)**
 for PR 15427 at commit 
[`81339dc`](https://github.com/apache/spark/commit/81339dc429104633ee28cf078f643b5050564557).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/9766
  
**[Test build #66791 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66791/consoleFull)**
 for PR 9766 at commit 
[`9de8c0e`](https://github.com/apache/spark/commit/9de8c0e7c0a2108b519c8adce7af5162578b04c9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15295: [SPARK-17720][SQL] introduce static SQL conf

2016-10-11 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15295
  
Merging to master! Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15295: [SPARK-17720][SQL] introduce static SQL conf

2016-10-11 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15295


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15445: [SPARK-17817][PySpark][FOLLOWUP] PySpark RDD Repartition...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15445
  
**[Test build #66789 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66789/consoleFull)**
 for PR 15445 at commit 
[`be6d153`](https://github.com/apache/spark/commit/be6d1537e9bbd2cc2484e4d8da9d901b16725c97).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15446: [SPARK-17882][SPARKR] Fix swallowed exception in RBacken...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15446
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15446: [SPARK-17882][SPARKR] Fix swallowed exception in ...

2016-10-11 Thread jrshust
GitHub user jrshust opened a pull request:

https://github.com/apache/spark/pull/15446

[SPARK-17882][SPARKR] Fix swallowed exception in RBackendHandler

## What changes were proposed in this pull request?

Log exception that is swallowed in handleMethodCall. This allows invoked 
Java issues to be easily debugged when using SparkR.


## How was this patch tested?

Manual tests to verify the logged exception shows up.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jrshust/spark rbackend-logging

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15446.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15446


commit 083f57a16c7153364f8686a28f24afa917e33219
Author: James Shuster 
Date:   2016-10-12T03:19:11Z

log exception object




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15295: [SPARK-17720][SQL] introduce static SQL conf

2016-10-11 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15295
  
LGTM



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15389: [SPARK-17817][PySpark] PySpark RDD Repartitioning...

2016-10-11 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/15389#discussion_r82930615
  
--- Diff: python/pyspark/rdd.py ---
@@ -2029,7 +2028,15 @@ def coalesce(self, numPartitions, shuffle=False):
 >>> sc.parallelize([1, 2, 3, 4, 5], 3).coalesce(1).glom().collect()
 [[1, 2, 3, 4, 5]]
 """
-jrdd = self._jrdd.coalesce(numPartitions, shuffle)
+if shuffle:
+# In Scala's repartition code, we will distribute elements 
evenly across output
+# partitions. However, the RDD from Python is serialized as a 
single binary data,
+# so the distribution fails and produces highly skewed 
partitions. We need to
+# convert it to a RDD of java object before repartitioning.
+data_java_rdd = 
self._to_java_object_rdd().coalesce(numPartitions, shuffle)
--- End diff --

@davies The followup is at #15445. Can you take a look? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15445: [SPARK-17817][PySpark][FOLLOWUP] PySpark RDD Repa...

2016-10-11 Thread viirya
GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/15445

[SPARK-17817][PySpark][FOLLOWUP] PySpark RDD Repartitioning Results in 
Highly Skewed Partition Sizes

## What changes were proposed in this pull request?

This change is a followup for #15389 which calls `_to_java_object_rdd()` to 
solve this issue. Due to the concern of the possible expensive cost of the 
call, we can choose to decrease the batch size to solve this issue too.

## How was this patch tested?

Jenkins tests.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 repartition-batch-size

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15445.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15445


commit 60e2abd9616016dce8e5dc2faf5c75be8e07335f
Author: Liang-Chi Hsieh 
Date:   2016-10-07T04:59:37Z

Decrease the batch size for repartition.

commit be6d1537e9bbd2cc2484e4d8da9d901b16725c97
Author: Liang-Chi Hsieh 
Date:   2016-10-12T03:08:38Z

Merge remote-tracking branch 'upstream/master' into repartition-batch-size




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12064: [SPARK-14272][ML] Evaluate GaussianMixtureModel with Log...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/12064
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12064: [SPARK-14272][ML] Evaluate GaussianMixtureModel with Log...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/12064
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66788/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12064: [SPARK-14272][ML] Evaluate GaussianMixtureModel with Log...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/12064
  
**[Test build #66788 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66788/consoleFull)**
 for PR 12064 at commit 
[`cdd829a`](https://github.com/apache/spark/commit/cdd829aa56663c8bdb36c85c8599a99fb2fbf643).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12064: [SPARK-14272][ML] Evaluate GaussianMixtureModel with Log...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/12064
  
**[Test build #66788 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66788/consoleFull)**
 for PR 12064 at commit 
[`cdd829a`](https://github.com/apache/spark/commit/cdd829aa56663c8bdb36c85c8599a99fb2fbf643).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15444: [SPARK-17870][MLLIB][ML]Change statistic to pValue for S...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15444
  
**[Test build #66787 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66787/consoleFull)**
 for PR 15444 at commit 
[`b98ccdf`](https://github.com/apache/spark/commit/b98ccdfd696cb89cb4793a140c87c498ce5c3086).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15389: [SPARK-17817][PySpark] PySpark RDD Repartitioning...

2016-10-11 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/15389#discussion_r82929167
  
--- Diff: python/pyspark/rdd.py ---
@@ -2029,7 +2028,15 @@ def coalesce(self, numPartitions, shuffle=False):
 >>> sc.parallelize([1, 2, 3, 4, 5], 3).coalesce(1).glom().collect()
 [[1, 2, 3, 4, 5]]
 """
-jrdd = self._jrdd.coalesce(numPartitions, shuffle)
+if shuffle:
+# In Scala's repartition code, we will distribute elements 
evenly across output
+# partitions. However, the RDD from Python is serialized as a 
single binary data,
+# so the distribution fails and produces highly skewed 
partitions. We need to
+# convert it to a RDD of java object before repartitioning.
+data_java_rdd = 
self._to_java_object_rdd().coalesce(numPartitions, shuffle)
--- End diff --

@davies Thank you! I do a simple benchmark as above with decreasing the 
batch size, I don't see an improvement in running time. I.e.,

import time
num_partitions = 2
a = sc.parallelize(range(int(1e6)), 2)
start = time.time()
l = a.repartition(num_partitions).glom().map(len).collect()
end = time.time()
print(end - start)

Before: 419.447577953
_to_java_object_rdd(): 421.916361094
decreasing the batch size: 423.712255955

Maybe it depends how is expensive actually converting to java object case 
by case. Is it generally faster than _to_java_object_rdd()? I would open a 
followup for this change.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15431: [SPARK-15153] [ML] [SparkR] Fix SparkR spark.naiveBayes ...

2016-10-11 Thread yanboliang
Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/15431
  
@jkbradley I agree it's not necessary to get in branch-2.0, since it 
requires a new public API. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15434: [SPARK-17873][SQL] ALTER TABLE RENAME TO should allow us...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15434
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15434: [SPARK-17873][SQL] ALTER TABLE RENAME TO should allow us...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15434
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66778/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15434: [SPARK-17873][SQL] ALTER TABLE RENAME TO should allow us...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15434
  
**[Test build #66778 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66778/consoleFull)**
 for PR 15434 at commit 
[`65c1885`](https://github.com/apache/spark/commit/65c1885818e4b712c2132e7e97e0b96ceb3f6dd7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14847: [SPARK-17254][SQL] Filter can stop when the condi...

2016-10-11 Thread viirya
Github user viirya closed the pull request at:

https://github.com/apache/spark/pull/14847


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14847: [SPARK-17254][SQL] Filter can stop when the condition is...

2016-10-11 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/14847
  
@rxin Thanks for recommendation! Let me close it now and work on it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15295: [SPARK-17720][SQL] introduce static SQL conf

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15295
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66777/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15295: [SPARK-17720][SQL] introduce static SQL conf

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15295
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15295: [SPARK-17720][SQL] introduce static SQL conf

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15295
  
**[Test build #66777 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66777/consoleFull)**
 for PR 15295 at commit 
[`595b220`](https://github.com/apache/spark/commit/595b22097dba8716545cd405fa36448065ce779d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15172: [SPARK-13331] AES support for over-the-wire encryption

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15172
  
**[Test build #66786 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66786/consoleFull)**
 for PR 15172 at commit 
[`46b52e6`](https://github.com/apache/spark/commit/46b52e63918376dcf5dde0359fdfe1efa2456dfd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15444: [SPARK-17870][MLLIB][ML]Change statistic to pValue for S...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15444
  
**[Test build #66783 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66783/consoleFull)**
 for PR 15444 at commit 
[`59ee17d`](https://github.com/apache/spark/commit/59ee17df3b46996bcf62f427c21d0f89b6ced204).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15444: [SPARK-17870][MLLIB][ML]Change statistic to pValue for S...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15444
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66783/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   >