[GitHub] spark pull request: [SPARK-10430][Spark Core]Added hashCode method...

2015-09-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8581#issuecomment-137354051
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10430][Spark Core]Added hashCode method...

2015-09-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8581#issuecomment-137353992
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10395] [SQL] Simplifies CatalystReadSup...

2015-09-02 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/8553#discussion_r38617544
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/CatalystReadSupport.scala
 ---
@@ -32,31 +32,56 @@ import org.apache.spark.Logging
 import org.apache.spark.sql.catalyst.InternalRow
 import org.apache.spark.sql.types._
 
+/**
+ * A Parquet [[ReadSupport]] implementation for reading Parquet records as 
Catalyst
+ * [[InternalRow]]s.
+ *
+ * The API interface of [[ReadSupport]] is a little bit over complicated 
because of historical
+ * reasons.  In older versions of parquet-mr (say 1.6.0rc3 and prior), 
[[ReadSupport]] need to be
+ * instantiated and initialized twice on both driver side and executor 
side.  The [[init()]] method
+ * is for driver side initialization, while [[prepareForRead()]] is for 
executor side.  However,
+ * starting from parquet-mr 1.6.0, it's no longer the case, and 
[[ReadSupport]] is only instantiated
+ * and initialized on executor side.  So, theoretically, now it's totally 
fine to combine these two
+ * methods into a single initialization method.  The only reason (I could 
think of) to still have
+ * them here is for parquet-mr API backwards-compatibility.
+ *
+ * Due to this reason, we no longer rely on [[ReadContext]] to pass 
requested schema from [[init()]]
+ * to [[prepareForRead()]], but use a private `var` for simplicity.
+ */
 private[parquet] class CatalystReadSupport extends 
ReadSupport[InternalRow] with Logging {
-  // Called after `init()` when initializing Parquet record reader.
+  private var catalystRequestedSchema: StructType = _
+
+  /**
+   * Called on executor side before [[prepareForRead()]] and instantiating 
actual Parquet record
+   * readers.  Responsible for figuring out Parquet requested schema used 
for column pruning.
+   */
+  override def init(context: InitContext): ReadContext = {
+catalystRequestedSchema = {
+  val conf = context.getConfiguration
+  val schemaString = 
conf.get(CatalystReadSupport.SPARK_ROW_REQUESTED_SCHEMA)
+  assert(schemaString != null, "Parquet requested schema not set.")
+  StructType.fromString(schemaString)
+}
+
+val parquetRequestedSchema =
+  CatalystReadSupport.clipParquetSchema(context.getFileSchema, 
catalystRequestedSchema)
+
+new ReadContext(parquetRequestedSchema, Map.empty[String, 
String].asJava)
+  }
+
+  /**
+   * Called on executor side after [[init()]], before instantiating actual 
Parquet record readers.
+   * Responsible for instantiating [[RecordMaterializer]], which is used 
for converting Parquet
+   * records to Catalyst [[InternalRow]]s.
+   */
   override def prepareForRead(
   conf: Configuration,
   keyValueMetaData: JMap[String, String],
   fileSchema: MessageType,
   readContext: ReadContext): RecordMaterializer[InternalRow] = {
 log.debug(s"Preparing for read Parquet file with message type: 
$fileSchema")
-
-val toCatalyst = new CatalystSchemaConverter(conf)
 val parquetRequestedSchema = readContext.getRequestedSchema
 
-val catalystRequestedSchema =
-  Option(readContext.getReadSupportMetadata).map(_.asScala).flatMap { 
metadata =>
-metadata
-  // First tries to read requested schema, which may result from 
projections
-  .get(CatalystReadSupport.SPARK_ROW_REQUESTED_SCHEMA)
-  // If not available, tries to read Catalyst schema from file 
metadata.  It's only
-  // available if the target file is written by Spark SQL.
-  .orElse(metadata.get(CatalystReadSupport.SPARK_METADATA_KEY))
-  }.map(StructType.fromString).getOrElse {
-logInfo("Catalyst schema not available, falling back to Parquet 
schema")
-toCatalyst.convert(parquetRequestedSchema)
-  }
--- End diff --

This "fallback" logic is removed because now we always set requested schema 
properly along the read path. This piece of code was inherited from the old 
Parquet support, which has already been removed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10430][Spark Core]Added hashCode method...

2015-09-02 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/8581#issuecomment-137352784
  
Looks good, thanks for fixing this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10430][Spark Core]Added hashCode method...

2015-09-02 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/8581#discussion_r38617290
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/AccumulableInfo.scala ---
@@ -34,9 +34,15 @@ class AccumulableInfo private[spark] (
   override def equals(other: Any): Boolean = other match {
 case acc: AccumulableInfo =>
   this.id == acc.id && this.name == acc.name &&
-this.update == acc.update && this.value == acc.value
+this.update == acc.update && this.value == acc.value  &&
--- End diff --

no space before &&


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10430][Spark Core]Added hashCode method...

2015-09-02 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8581#issuecomment-137352727
  
  [Test build #41961 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41961/consoleFull)
 for   PR 8581 at commit 
[`25c0f7b`](https://github.com/apache/spark/commit/25c0f7b471356b829d3611972741b4a40982cc2f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10430][Spark Core]Added hashCode method...

2015-09-02 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/8581#discussion_r38617239
  
--- Diff: 
core/src/test/scala/org/apache/spark/rdd/RDDOperationScopeSuite.scala ---
@@ -38,6 +38,13 @@ class RDDOperationScopeSuite extends SparkFunSuite with 
BeforeAndAfter {
 sc.stop()
   }
 
+  test("equals and hashCode") {
+val opScope1 = new RDDOperationScope("scope1", id = "1")
+val opScope2 = new RDDOperationScope("scope1", id = "1")
+assert(opScope1 == opScope2)
--- End diff --

`===`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10430][Spark Core]Added hashCode method...

2015-09-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8581#issuecomment-137352426
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10430][Spark Core]Added hashCode method...

2015-09-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8581#issuecomment-137352433
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10430][Spark Core]Added hashCode method...

2015-09-02 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/8581#issuecomment-137352088
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9730][SQL] Add Full Outer Join support ...

2015-09-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8383#issuecomment-137350715
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41958/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9730][SQL] Add Full Outer Join support ...

2015-09-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8383#issuecomment-137350711
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9730][SQL] Add Full Outer Join support ...

2015-09-02 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8383#issuecomment-137350369
  
  [Test build #41958 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41958/console)
 for   PR 8383 at commit 
[`62cc7e3`](https://github.com/apache/spark/commit/62cc7e3229bbdbe0abd7e262ac11f05aca736000).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-10314 [CORE]RDD persist to OFF_HEAP tach...

2015-09-02 Thread romansew
Github user romansew closed the pull request at:

https://github.com/apache/spark/pull/8482


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-10314 [CORE]RDD persist to OFF_HEAP tach...

2015-09-02 Thread romansew
Github user romansew commented on the pull request:

https://github.com/apache/spark/pull/8482#issuecomment-137344930
  
I open the other one:https://github.com/apache/spark/pull/8562


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10430][SPARK-10430]Added hashCode metho...

2015-09-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8581#issuecomment-137335470
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10430][SPARK-10430]Added hashCode metho...

2015-09-02 Thread vinodkc
GitHub user vinodkc opened a pull request:

https://github.com/apache/spark/pull/8581

[SPARK-10430][SPARK-10430]Added hashCode methods in AccumulableInfo and 
RDDOperationScope



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vinodkc/spark fix_RDDOperationScope_Hashcode

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/8581.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #8581


commit 25c0f7b471356b829d3611972741b4a40982cc2f
Author: Vinod K C 
Date:   2015-09-03T07:03:40Z

Added hashCode methods




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2015-09-02 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/3543#issuecomment-137335223
  
OK, makes sense. Can you close this PR for now then? If there's interest we 
can always reopen it against the latest master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10411][SQL]Move visualization above exp...

2015-09-02 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/8570


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10411][SQL]Move visualization above exp...

2015-09-02 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/8570#issuecomment-137334438
  
LGTM merging into master 1.5 thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10379] preserve first page in UnsafeShu...

2015-09-02 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/8543


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10247] [core] improve readability of a ...

2015-09-02 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/8434


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Removed code duplication in ShuffleBlockFetche...

2015-09-02 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/8514


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10379] preserve first page in UnsafeShu...

2015-09-02 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/8543#issuecomment-137334210
  
Merged into master, thanks @davies 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10247] [core] improve readability of a ...

2015-09-02 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/8434#issuecomment-137333753
  
Merged into master, thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9591][CORE]Job may fail for exception d...

2015-09-02 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7927#issuecomment-137333723
  
  [Test build #41960 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41960/consoleFull)
 for   PR 7927 at commit 
[`6c6d53d`](https://github.com/apache/spark/commit/6c6d53d61d293e2c14ed3f776b8fb20397a6b332).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Removed code duplication in ShuffleBlockFetche...

2015-09-02 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/8514#discussion_r38613656
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala 
---
@@ -315,6 +309,14 @@ final class ShuffleBlockFetcherIterator(
 }
   }
 
+  private def fetchUpToMaxBytes() {
--- End diff --

I added `Unit` return type when I merged this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8707] RDD#toDebugString fails if any ca...

2015-09-02 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/7127


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Removed code duplication in ShuffleBlockFetche...

2015-09-02 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/8514#issuecomment-137333652
  
Merged into master, thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8707] RDD#toDebugString fails if any ca...

2015-09-02 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/7127#issuecomment-13790
  
I've merged this into master after applying the change myself


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5945] Spark should not retry a stage in...

2015-09-02 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/5636


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9591][CORE]Job may fail for exception d...

2015-09-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7927#issuecomment-137333165
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9591][CORE]Job may fail for exception d...

2015-09-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7927#issuecomment-137333173
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5945] Spark should not retry a stage in...

2015-09-02 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/5636#issuecomment-137333035
  
Alright, merging into master. I fixed the test name on merge. Thanks 
everyone.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9591][CORE]Job may fail for exception d...

2015-09-02 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/7927#issuecomment-137332857
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9672][MESOS] Don’t include SPARK_ENV_...

2015-09-02 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7979#issuecomment-137332182
  
  [Test build #41959 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41959/consoleFull)
 for   PR 7979 at commit 
[`28a34b8`](https://github.com/apache/spark/commit/28a34b87e79b5529a89f254a1affbffb0bfe29cc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-10314 [CORE]RDD persist to OFF_HEAP tach...

2015-09-02 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/8482#issuecomment-137332208
  
Can you close this now that you've opened another one?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9672][MESOS] Don’t include SPARK_ENV_...

2015-09-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7979#issuecomment-137331644
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9672][MESOS] Don’t include SPARK_ENV_...

2015-09-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7979#issuecomment-137331611
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9672][MESOS] Don’t include SPARK_ENV_...

2015-09-02 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/7979#discussion_r38612921
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/rest/RestSubmissionClient.scala ---
@@ -413,6 +412,15 @@ private[spark] object RestSubmissionClient {
 val mainClass = args(1)
 val appArgs = args.slice(2, args.size)
 val conf = new SparkConf
-run(appResource, mainClass, appArgs, conf)
+val env = filterSystemEnvironment(sys.env)
+run(appResource, mainClass, appArgs, conf, env)
+  }
+
+  /**
+   * Filter non-spark environment variables from any environment.
+   */
+  def filterSystemEnvironment(env: Map[String, String]): Map[String, 
String] = {
+val sparkVars = env.filter { case (k, _) => k.startsWith("SPARK_") || 
k.startsWith("MESOS_") }
+sparkVars - "SPARK_ENV_LOADED"
--- End diff --

this makes a copy of the map, can you just push this into the filter?
```
env.filter { case (k, _) =>
  (k.startsWith("SPARK_") && k != "SPARK_ENV_LOADED) || 
k.startsWith("MESOS_")
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9672][MESOS] Don’t include SPARK_ENV_...

2015-09-02 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/7979#discussion_r38612889
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/rest/RestSubmissionClient.scala ---
@@ -413,6 +412,15 @@ private[spark] object RestSubmissionClient {
 val mainClass = args(1)
 val appArgs = args.slice(2, args.size)
 val conf = new SparkConf
-run(appResource, mainClass, appArgs, conf)
+val env = filterSystemEnvironment(sys.env)
+run(appResource, mainClass, appArgs, conf, env)
+  }
+
+  /**
+   * Filter non-spark environment variables from any environment.
+   */
+  def filterSystemEnvironment(env: Map[String, String]): Map[String, 
String] = {
--- End diff --

please make this private


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9672][MESOS] Don’t include SPARK_ENV_...

2015-09-02 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/7979#issuecomment-137331334
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5945] Spark should not retry a stage in...

2015-09-02 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/5636#discussion_r38612752
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala ---
@@ -473,6 +473,282 @@ class DAGSchedulerSuite
 assertDataStructuresEmpty()
   }
 
+
+  // Helper function to validate state when creating tests for task 
failures
+  private def checkStageId(stageId: Int, attempt: Int, stageAttempt: 
TaskSet) {
+assert(stageAttempt.stageId === stageId)
+assert(stageAttempt.stageAttemptId == attempt)
+  }
+
+
+  // Helper functions to extract commonly used code in Fetch Failure test 
cases
+  private def setupStageAbortTest(sc: SparkContext) {
+sc.listenerBus.addListener(new EndListener())
+ended = false
+jobResult = null
+  }
+
+  // Create a new Listener to confirm that the listenerBus sees the JobEnd 
message
+  // when we abort the stage. This message will also be consumed by the 
EventLoggingListener
+  // so this will propagate up to the user.
+  var ended = false
+  var jobResult : JobResult = null
+
+  class EndListener extends SparkListener {
+override def onJobEnd(jobEnd: SparkListenerJobEnd): Unit = {
+  jobResult = jobEnd.jobResult
+  ended = true
+}
+  }
+
+  /**
+   * Common code to get the next stage attempt, confirm it's the one we 
expect, and complete it
+   * successfully.
+   *
+   * @param stageId - The current stageId
+   * @param attemptIdx - The current attempt count
+   * @param numShufflePartitions - The number of partitions in the next 
stage
+   */
+  private def completeShuffleMapStageSuccessfully(
+  stageId: Int,
+  attemptIdx: Int,
+  numShufflePartitions: Int): Unit = {
+val stageAttempt = taskSets.last
+checkStageId(stageId, attemptIdx, stageAttempt)
+complete(stageAttempt, stageAttempt.tasks.zipWithIndex.map {
+  case (task, idx) =>
+(Success, makeMapStatus("host" + ('A' + idx).toChar, 
numShufflePartitions))
+}.toSeq)
+  }
+
+  /**
+   * Common code to get the next stage attempt, confirm it's the one we 
expect, and complete it
+   * with all FetchFailure.
+   *
+   * @param stageId - The current stageId
+   * @param attemptIdx - The current attempt count
+   * @param shuffleDep - The shuffle dependency of the stage with a fetch 
failure
+   */
+  private def completeNextStageWithFetchFailure(
+  stageId: Int,
+  attemptIdx: Int,
+  shuffleDep: ShuffleDependency[_, _, _]): Unit = {
+val stageAttempt = taskSets.last
+checkStageId(stageId, attemptIdx, stageAttempt)
+complete(stageAttempt, stageAttempt.tasks.zipWithIndex.map { case 
(task, idx) =>
+  (FetchFailed(makeBlockManagerId("hostA"), shuffleDep.shuffleId, 0, 
idx, "ignored"), null)
+}.toSeq)
+  }
+
+  /**
+   * Common code to get the next result stage attempt, confirm it's the 
one we expect, and
+   * complete it with a success where we return 42.
+   *
+   * @param stageId - The current stageId
+   * @param attemptIdx - The current attempt count
+   */
+  private def completeNextResultStageWithSuccess(stageId: Int, attemptIdx: 
Int): Unit = {
+val stageAttempt = taskSets.last
+checkStageId(stageId, attemptIdx, stageAttempt)
+assert(scheduler.stageIdToStage(stageId).isInstanceOf[ResultStage])
+complete(stageAttempt, stageAttempt.tasks.zipWithIndex.map(_ => 
(Success, 42)).toSeq)
+  }
+
+  /**
+   * In this test, we simulate a job where many tasks in the same stage 
fail. We want to show
+   * that many fetch failures inside a single stage attempt do not trigger 
an abort
+   * on their own, but only when there are enough failing stage attempts.
+   */
+  test("Single fetch failure should not abort the stage.") {
+setupStageAbortTest(sc)
+
+val parts = 8
+val shuffleMapRdd = new MyRDD(sc, parts, Nil)
+val shuffleDep = new ShuffleDependency(shuffleMapRdd, null)
+val shuffleId = shuffleDep.shuffleId
+val reduceRdd = new MyRDD(sc, parts, List(shuffleDep))
+submit(reduceRdd, (0 until parts).toArray)
+
+completeShuffleMapStageSuccessfully(0, 0, numShufflePartitions = parts)
+
+completeNextStageWithFetchFailure(1, 0, shuffleDep)
--- End diff --

Was your concern that "Single fetch failure" could refer to a task? If so 
we can call this "Single stage fetch failure"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please

[GitHub] spark pull request: [SPARK-5945] Spark should not retry a stage in...

2015-09-02 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/5636#discussion_r38612691
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala ---
@@ -473,6 +473,282 @@ class DAGSchedulerSuite
 assertDataStructuresEmpty()
   }
 
+
+  // Helper function to validate state when creating tests for task 
failures
+  private def checkStageId(stageId: Int, attempt: Int, stageAttempt: 
TaskSet) {
+assert(stageAttempt.stageId === stageId)
+assert(stageAttempt.stageAttemptId == attempt)
+  }
+
+
+  // Helper functions to extract commonly used code in Fetch Failure test 
cases
+  private def setupStageAbortTest(sc: SparkContext) {
+sc.listenerBus.addListener(new EndListener())
+ended = false
+jobResult = null
+  }
+
+  // Create a new Listener to confirm that the listenerBus sees the JobEnd 
message
+  // when we abort the stage. This message will also be consumed by the 
EventLoggingListener
+  // so this will propagate up to the user.
+  var ended = false
+  var jobResult : JobResult = null
+
+  class EndListener extends SparkListener {
+override def onJobEnd(jobEnd: SparkListenerJobEnd): Unit = {
+  jobResult = jobEnd.jobResult
+  ended = true
+}
+  }
+
+  /**
+   * Common code to get the next stage attempt, confirm it's the one we 
expect, and complete it
+   * successfully.
+   *
+   * @param stageId - The current stageId
+   * @param attemptIdx - The current attempt count
+   * @param numShufflePartitions - The number of partitions in the next 
stage
+   */
+  private def completeShuffleMapStageSuccessfully(
+  stageId: Int,
+  attemptIdx: Int,
+  numShufflePartitions: Int): Unit = {
+val stageAttempt = taskSets.last
+checkStageId(stageId, attemptIdx, stageAttempt)
+complete(stageAttempt, stageAttempt.tasks.zipWithIndex.map {
+  case (task, idx) =>
+(Success, makeMapStatus("host" + ('A' + idx).toChar, 
numShufflePartitions))
+}.toSeq)
+  }
+
+  /**
+   * Common code to get the next stage attempt, confirm it's the one we 
expect, and complete it
+   * with all FetchFailure.
+   *
+   * @param stageId - The current stageId
+   * @param attemptIdx - The current attempt count
+   * @param shuffleDep - The shuffle dependency of the stage with a fetch 
failure
+   */
+  private def completeNextStageWithFetchFailure(
+  stageId: Int,
+  attemptIdx: Int,
+  shuffleDep: ShuffleDependency[_, _, _]): Unit = {
+val stageAttempt = taskSets.last
+checkStageId(stageId, attemptIdx, stageAttempt)
+complete(stageAttempt, stageAttempt.tasks.zipWithIndex.map { case 
(task, idx) =>
+  (FetchFailed(makeBlockManagerId("hostA"), shuffleDep.shuffleId, 0, 
idx, "ignored"), null)
+}.toSeq)
+  }
+
+  /**
+   * Common code to get the next result stage attempt, confirm it's the 
one we expect, and
+   * complete it with a success where we return 42.
+   *
+   * @param stageId - The current stageId
+   * @param attemptIdx - The current attempt count
+   */
+  private def completeNextResultStageWithSuccess(stageId: Int, attemptIdx: 
Int): Unit = {
+val stageAttempt = taskSets.last
+checkStageId(stageId, attemptIdx, stageAttempt)
+assert(scheduler.stageIdToStage(stageId).isInstanceOf[ResultStage])
+complete(stageAttempt, stageAttempt.tasks.zipWithIndex.map(_ => 
(Success, 42)).toSeq)
+  }
+
+  /**
+   * In this test, we simulate a job where many tasks in the same stage 
fail. We want to show
+   * that many fetch failures inside a single stage attempt do not trigger 
an abort
+   * on their own, but only when there are enough failing stage attempts.
+   */
+  test("Single fetch failure should not abort the stage.") {
+setupStageAbortTest(sc)
+
+val parts = 8
+val shuffleMapRdd = new MyRDD(sc, parts, Nil)
+val shuffleDep = new ShuffleDependency(shuffleMapRdd, null)
+val shuffleId = shuffleDep.shuffleId
+val reduceRdd = new MyRDD(sc, parts, List(shuffleDep))
+submit(reduceRdd, (0 until parts).toArray)
+
+completeShuffleMapStageSuccessfully(0, 0, numShufflePartitions = parts)
+
+completeNextStageWithFetchFailure(1, 0, shuffleDep)
--- End diff --

What? Why does it matter if there are one vs multiple tasks that failed 
with the fetch failure?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure 

[GitHub] spark pull request: [SPARK-8167] Make tasks that fail from YARN pr...

2015-09-02 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/8007#discussion_r38612598
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala
 ---
@@ -101,6 +134,32 @@ private[spark] abstract class YarnSchedulerBackend(
   
ThreadUtils.newDaemonCachedThreadPool("yarn-scheduler-ask-am-thread-pool")
 implicit val askAmExecutor = 
ExecutionContext.fromExecutor(askAmThreadPool)
 
+def handleExecutorDisconnectedFromDriver(
--- End diff --

yeah so this comment was assuming that the two endpoints are merged, so 
don't worry about it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8167] Make tasks that fail from YARN pr...

2015-09-02 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/8007#discussion_r38612546
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala
 ---
@@ -91,6 +94,36 @@ private[spark] abstract class YarnSchedulerBackend(
   }
 
   /**
+   * Override the DriverEndpoint to add extra logic for the case when an 
executor is disconnected.
+   * We should check the cluster manager and find if the loss of the 
executor was caused by YARN
+   * force killing it due to preemption.
+   */
+  private class YarnDriverEndpoint(rpcEnv: RpcEnv, sparkProperties: 
Seq[(String, String)])
+  extends DriverEndpoint(rpcEnv, sparkProperties) {
--- End diff --

I don't see the benefit in keeping them separate. Now the developer has to 
worry about which one to send a message to. If s/he accidentally picks the 
wrong one then we'll just time out on a future, which is hard to debug. Also 
they're not actually totally separate, since one calls into another.

Anyway I don't feel strongly about this because keeping them separate is 
actually just preserving existing behavior. I just find the introduction of 
another endpoint in the same file a little unnecessary.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9730][SQL] Add Full Outer Join support ...

2015-09-02 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8383#issuecomment-137328959
  
  [Test build #41958 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41958/consoleFull)
 for   PR 8383 at commit 
[`62cc7e3`](https://github.com/apache/spark/commit/62cc7e3229bbdbe0abd7e262ac11f05aca736000).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9730][SQL] Add Full Outer Join support ...

2015-09-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8383#issuecomment-137328168
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9730][SQL] Add Full Outer Join support ...

2015-09-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8383#issuecomment-137328114
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9730][SQL] Add Full Outer Join support ...

2015-09-02 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/8383#issuecomment-137327156
  
@davies Thanks for comment and continuing this patch. I still added new 
commits to fix the problems you mentioned. But free feel to merge #8579 when it 
is ready. I will close this after that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9723][ML] params getordefault should th...

2015-09-02 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/8567#issuecomment-137327087
  
LGTM. Merged into master. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8167] Make tasks that fail from YARN pr...

2015-09-02 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/8007#issuecomment-137326904
  
@mccheah Overall this looks pretty good. I think throughout the patch there 
are still quite a few places where the change is not intended / no longer 
necessary. Other than those the biggest open question might be the potential 
race condition where more tasks are scheduled on the dying executor in the 
short time window of asking for the executor loss reason. Was there a 
resolution to that discussion?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9723][ML] params getordefault should th...

2015-09-02 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/8567


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10422] [SQL] String column in InMemoryC...

2015-09-02 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/8578


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10422] [SQL] String column in InMemoryC...

2015-09-02 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/8578#issuecomment-137323772
  
LGTM, merging this into master 1.5


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10419] [SQL] Adding SQLServer support f...

2015-09-02 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/8575#discussion_r38611491
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala ---
@@ -240,3 +242,21 @@ case object DB2Dialect extends JdbcDialect {
 case _ => None
   }
 }
+
+/**
+ * :: DeveloperApi ::
+ * Default SQL Server dialect, mapping the datetimeoffset types to a 
String on read.
+ */
+@DeveloperApi
+case object SqlServerDialect extends JdbcDialect {
+
+  override def canHandle(url: String): Boolean = 
url.startsWith("jdbc:sqlserver")
+
+  override def getCatalystType(
+sqlType: Int, typeName: String, size: Int, md: MetadataBuilder): 
Option[DataType] = {
+if (typeName.contains("datetimeoffset")) {
+  // String is used by SQL Server for datetimeoffset types in 
legacy clients
--- End diff --

Sorry if this is an obvious question, but the comment says "legacy clients" 
- what versions are considered legacy clients (and do the new clients use a 
different type)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10422] [SQL] String column in InMemoryC...

2015-09-02 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8578#issuecomment-137323307
  
  [Test build #1716 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1716/console)
 for   PR 8578 at commit 
[`2905fd5`](https://github.com/apache/spark/commit/2905fd543b25251f9c50e8059668112c27a251d4).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10332][CORE] Fix yarn spark executor va...

2015-09-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8580#issuecomment-137323237
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10332][CORE] Fix yarn spark executor va...

2015-09-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8580#issuecomment-137323239
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41953/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10332][CORE] Fix yarn spark executor va...

2015-09-02 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8580#issuecomment-137323181
  
  [Test build #41953 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41953/console)
 for   PR 8580 at commit 
[`f7e5b00`](https://github.com/apache/spark/commit/f7e5b00097ac76b9282f5c6d204d00b7a848a572).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9925] [SQL] [TESTS] Set SQLConf.SHUFFLE...

2015-09-02 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/8155#issuecomment-137322854
  
Will do it tomorrow.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10411][SQL]Move visualization above exp...

2015-09-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8570#issuecomment-137322164
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41957/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10411][SQL]Move visualization above exp...

2015-09-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8570#issuecomment-137322163
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10411][SQL]Move visualization above exp...

2015-09-02 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8570#issuecomment-137322099
  
  [Test build #41957 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41957/console)
 for   PR 8570 at commit 
[`9978206`](https://github.com/apache/spark/commit/997820603d56559c478ac45bb4241454b103bdb7).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10199][SPARK-10200][SPARK-10201][SPARK-...

2015-09-02 Thread vinodkc
Github user vinodkc commented on the pull request:

https://github.com/apache/spark/pull/8507#issuecomment-137312813
  
@feynmanliang , fixed your review comments. Could you please verify it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10422] [SQL] String column in InMemoryC...

2015-09-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8578#issuecomment-137311709
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10422] [SQL] String column in InMemoryC...

2015-09-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8578#issuecomment-137311710
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41952/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10422] [SQL] String column in InMemoryC...

2015-09-02 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8578#issuecomment-137311641
  
  [Test build #41952 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41952/console)
 for   PR 8578 at commit 
[`2905fd5`](https://github.com/apache/spark/commit/2905fd543b25251f9c50e8059668112c27a251d4).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9066][SQL] Improve cartesian performanc...

2015-09-02 Thread zsxwing
Github user zsxwing commented on the pull request:

https://github.com/apache/spark/pull/7417#issuecomment-137310991
  
@Sephiroth-Lin there are two changes in your patch: using 
`BroadcastNestedLoopJoin` for the small table, and putting the small table in 
the left side of `RDD.cartesian`. I guess you only tested the first change. 
Could you do some performance test for another change?

According to my understanding of `RDD.cartesian`, putting the small table 
in the left side of `RDD.cartesian` won't reduce IO. The record number that 
needs to be scanned is always: `#left_partitions * #right_partitions * 
#records_in_one_left_partition * #records_in_one_right_partition` = 
`#left_records * #right_records`. One benefit I can image is reducing the 
number of `opening/closing file`. 

However, putting the small table in the right side of `RDD.cartesian` could 
take advantage of OS cache, considering it may be small enough to stay in the 
OS cache totally.

Therefore I'm really curious about the performance improvement of the 
`RDD.cartesian` change.





---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10414][MLlib]Fix hashcode issues in MLL...

2015-09-02 Thread vinodkc
Github user vinodkc commented on a diff in the pull request:

https://github.com/apache/spark/pull/8565#discussion_r38609166
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -278,7 +278,8 @@ class DenseMatrix @Since("1.3.0") (
   }
 
   override def hashCode: Int = {
-com.google.common.base.Objects.hashCode(numRows : Integer, numCols: 
Integer, toArray)
+val state = Seq(numRows, numCols, Arrays.hashCode(values), 
isTransposed.hashCode)
--- End diff --

Got confirmation from Breeze mailing list that it is an issue and suggested 
to me to raise an issue in Breeze => 
https://github.com/scalanlp/breeze/issues/440
Can you please suggest whether we need to wait for Breeze issue fix or 
change equals method of CSCMatrix & DenseMatrix to match my  hashCode 
implementation?




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9066][SQL] Improve cartesian performanc...

2015-09-02 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/7417#discussion_r38608917
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/CartesianProduct.scala
 ---
@@ -27,16 +27,27 @@ import org.apache.spark.sql.execution.{BinaryNode, 
SparkPlan}
  * :: DeveloperApi ::
  */
 @DeveloperApi
-case class CartesianProduct(left: SparkPlan, right: SparkPlan) extends 
BinaryNode {
+case class CartesianProduct(
+left: SparkPlan,
+right: SparkPlan,
+buildSide: BuildSide) extends BinaryNode {
   override def output: Seq[Attribute] = left.output ++ right.output
 
+  private val (small, big) = buildSide match {
+case BuildRight => (left, right)
--- End diff --

In other places, `BuildRight` means the right side is a small table, such 
as `org.apache.spark.sql.execution.joins.HashJoin`, and we usually broadcast 
it. Could you follow this semantics?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5945] Spark should not retry a stage in...

2015-09-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5636#issuecomment-137307153
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41951/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5945] Spark should not retry a stage in...

2015-09-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5636#issuecomment-137307149
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5945] Spark should not retry a stage in...

2015-09-02 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5636#issuecomment-137306559
  
  [Test build #41951 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41951/console)
 for   PR 5636 at commit 
[`5bb1ae6`](https://github.com/apache/spark/commit/5bb1ae6fe5b7b44a432e35ea588a7d88fe2168eb).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8167] Make tasks that fail from YARN pr...

2015-09-02 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/8007#discussion_r38608268
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
@@ -474,18 +509,26 @@ private[yarn] class YarnAllocator(
 
   containerIdToExecutorId.remove(containerId).foreach { eid =>
 executorIdToContainer.remove(eid)
-
+pendingLossReasonRequests.remove(eid).foreach { pendingRequests =>
+  pendingRequests.foreach(_.reply(exitReason))
+}
 if (!alreadyReleased) {
   // The executor could have gone away (like no route to host, 
node failure, etc)
   // Notify backend about the failure of the executor
   numUnexpectedContainerRelease += 1
-  driverRef.send(RemoveExecutor(eid,
-s"Yarn deallocated the executor $eid (container 
$containerId)"))
+  driverRef.send(RemoveExecutor(eid, exitReason))
 }
   }
 }
   }
 
+  def enqueueGetLossReasonRequest(eid: String, context: RpcCallContext): 
Unit = synchronized {
+if (executorIdToContainer.contains(eid)) {
+  pendingLossReasonRequests
+.getOrElseUpdate(eid, new ArrayBuffer[RpcCallContext]) += context
--- End diff --

I guess the alternative of limiting the number of requests sent to the AM 
is tricky because we need to keep a set of disconnected executor IDs, but we 
need to decide when to remove from that set. Responding for the same executor 
multiple times might be fine here if it's the simpler approach, but maybe we 
should have a comment up there to explain this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8167] Make tasks that fail from YARN pr...

2015-09-02 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/8007#discussion_r38608264
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
@@ -474,18 +509,26 @@ private[yarn] class YarnAllocator(
 
   containerIdToExecutorId.remove(containerId).foreach { eid =>
 executorIdToContainer.remove(eid)
-
+pendingLossReasonRequests.remove(eid).foreach { pendingRequests =>
+  pendingRequests.foreach(_.reply(exitReason))
+}
 if (!alreadyReleased) {
   // The executor could have gone away (like no route to host, 
node failure, etc)
   // Notify backend about the failure of the executor
   numUnexpectedContainerRelease += 1
-  driverRef.send(RemoveExecutor(eid,
-s"Yarn deallocated the executor $eid (container 
$containerId)"))
+  driverRef.send(RemoveExecutor(eid, exitReason))
 }
   }
 }
   }
 
+  def enqueueGetLossReasonRequest(eid: String, context: RpcCallContext): 
Unit = synchronized {
+if (executorIdToContainer.contains(eid)) {
+  pendingLossReasonRequests
+.getOrElseUpdate(eid, new ArrayBuffer[RpcCallContext]) += context
--- End diff --

onDisconnected can be called multiple times. There is earlier discussion on 
why this design was chosen.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8167] Make tasks that fail from YARN pr...

2015-09-02 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/8007#discussion_r38608220
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
@@ -236,10 +246,14 @@ private[yarn] class YarnAllocator(
 if (completedContainers.size > 0) {
   logDebug("Completed %d containers".format(completedContainers.size))
 
-  processCompletedContainers(completedContainers.asScala)
+  val discoveredLossReasons = 
processCompletedContainers(completedContainers.asScala)
--- End diff --

(pretty sure `processCompletedContainers` returns Unit!)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8167] Make tasks that fail from YARN pr...

2015-09-02 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/8007#discussion_r38608209
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
@@ -211,8 +218,11 @@ private[yarn] class YarnAllocator(
* Deal with any containers YARN has granted to us by possibly launching 
executors in them.
*
* This must be synchronized because variables read in this method are 
mutated by other methods.
+   *
+   * Returns a list of executor loss reasons discovered by the allocator, 
which can then be
+   * forwarded to the driver by the calling ApplicationMaster.
*/
-  def allocateResources(): Unit = synchronized {
+  def allocateResources(): Map[String, ExecutorLossReason] = synchronized {
--- End diff --

this doesn't need to change doe sit? Can you revert it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8167] Make tasks that fail from YARN pr...

2015-09-02 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/8007#discussion_r38608149
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
@@ -474,18 +509,26 @@ private[yarn] class YarnAllocator(
 
   containerIdToExecutorId.remove(containerId).foreach { eid =>
 executorIdToContainer.remove(eid)
-
+pendingLossReasonRequests.remove(eid).foreach { pendingRequests =>
+  pendingRequests.foreach(_.reply(exitReason))
+}
 if (!alreadyReleased) {
   // The executor could have gone away (like no route to host, 
node failure, etc)
   // Notify backend about the failure of the executor
   numUnexpectedContainerRelease += 1
-  driverRef.send(RemoveExecutor(eid,
-s"Yarn deallocated the executor $eid (container 
$containerId)"))
+  driverRef.send(RemoveExecutor(eid, exitReason))
 }
   }
 }
   }
 
+  def enqueueGetLossReasonRequest(eid: String, context: RpcCallContext): 
Unit = synchronized {
+if (executorIdToContainer.contains(eid)) {
+  pendingLossReasonRequests
+.getOrElseUpdate(eid, new ArrayBuffer[RpcCallContext]) += context
--- End diff --

when will this have multiple contexts? The driver should only ask once 
shouldn't it? If the concern is because `onDisconnected` could happen multiple 
times, then maybe we should just check whether we already have a request for a 
particular executor ID instead of responding multiple times.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8167] Make tasks that fail from YARN pr...

2015-09-02 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/8007#discussion_r38608061
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
@@ -474,18 +509,26 @@ private[yarn] class YarnAllocator(
 
   containerIdToExecutorId.remove(containerId).foreach { eid =>
 executorIdToContainer.remove(eid)
-
+pendingLossReasonRequests.remove(eid).foreach { pendingRequests =>
+  pendingRequests.foreach(_.reply(exitReason))
+}
 if (!alreadyReleased) {
   // The executor could have gone away (like no route to host, 
node failure, etc)
   // Notify backend about the failure of the executor
   numUnexpectedContainerRelease += 1
-  driverRef.send(RemoveExecutor(eid,
-s"Yarn deallocated the executor $eid (container 
$containerId)"))
+  driverRef.send(RemoveExecutor(eid, exitReason))
 }
   }
 }
   }
 
+  def enqueueGetLossReasonRequest(eid: String, context: RpcCallContext): 
Unit = synchronized {
--- End diff --

also can you make this `private[yarn]`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8167] Make tasks that fail from YARN pr...

2015-09-02 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/8007#discussion_r38608057
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
@@ -474,18 +509,26 @@ private[yarn] class YarnAllocator(
 
   containerIdToExecutorId.remove(containerId).foreach { eid =>
 executorIdToContainer.remove(eid)
-
+pendingLossReasonRequests.remove(eid).foreach { pendingRequests =>
+  pendingRequests.foreach(_.reply(exitReason))
+}
 if (!alreadyReleased) {
   // The executor could have gone away (like no route to host, 
node failure, etc)
   // Notify backend about the failure of the executor
   numUnexpectedContainerRelease += 1
-  driverRef.send(RemoveExecutor(eid,
-s"Yarn deallocated the executor $eid (container 
$containerId)"))
+  driverRef.send(RemoveExecutor(eid, exitReason))
 }
   }
 }
   }
 
+  def enqueueGetLossReasonRequest(eid: String, context: RpcCallContext): 
Unit = synchronized {
--- End diff --

can you add a java doc here to explain what this does and why it's needed, 
i.e. we need to wait for the next time we allocate resources to find the exit 
reasons of the failed containers.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8167] Make tasks that fail from YARN pr...

2015-09-02 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/8007#discussion_r38607895
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
@@ -440,22 +454,43 @@ private[yarn] class YarnAllocator(
 // Hadoop 2.2.X added a ContainerExitStatus we should switch to use
 // there are some exit status' we shouldn't necessarily count 
against us, but for
 // now I think its ok as none of the containers are expected to 
exit
-if (completedContainer.getExitStatus == 
ContainerExitStatus.PREEMPTED) {
-  logInfo("Container preempted: " + containerId)
-} else if (completedContainer.getExitStatus == -103) { // vmem 
limit exceeded
-  logWarning(memLimitExceededLogMessage(
+var isNormalExit = false
+var containerExitReason = "Container exited for an unknown reason."
+val exitStatus = completedContainer.getExitStatus
+if (exitStatus == ContainerExitStatus.PREEMPTED) {
+  isNormalExit = true
+  containerExitReason = s"Container $containerId was preempted."
+  logInfo(containerExitReason)
+} else if (exitStatus == -103) { // vmem limit exceeded
+  // Should probably still count these towards task failures
+  containerExitReason = memLimitExceededLogMessage(
 completedContainer.getDiagnostics,
-VMEM_EXCEEDED_PATTERN))
-} else if (completedContainer.getExitStatus == -104) { // pmem 
limit exceeded
-  logWarning(memLimitExceededLogMessage(
+VMEM_EXCEEDED_PATTERN)
+  logWarning(containerExitReason)
+} else if (exitStatus == -104) { // pmem limit exceeded
+  // Should probably still count these towards task failures
+  containerExitReason = memLimitExceededLogMessage(
 completedContainer.getDiagnostics,
-PMEM_EXCEEDED_PATTERN))
-} else if (completedContainer.getExitStatus != 0) {
+PMEM_EXCEEDED_PATTERN)
+  logWarning(containerExitReason)
+} else if (exitStatus != 0) {
   logInfo("Container marked as failed: " + containerId +
 ". Exit status: " + completedContainer.getExitStatus +
 ". Diagnostics: " + completedContainer.getDiagnostics)
   numExecutorsFailed += 1
+  containerExitReason = s"Container $containerId exited abnormally 
with exit" +
+s" status $exitStatus, and was marked as failed."
+}
+
+if (exitStatus == 0) {
+  ExecutorExited(0, isNormalExit = true,
+s"Executor for container $containerId exited normally.")
+} else {
+  ExecutorExited(completedContainer.getExitStatus, isNormalExit, 
containerExitReason)
 }
+  } else {
+ExecutorExited(completedContainer.getExitStatus, isNormalExit = 
true,
+  s"Container $containerId exited from explicit termination 
request.")
--- End diff --

can you also add a comment here:
```
// If we have already released this container, then it must mean
// that the driver has explicitly requested it to be killed
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8167] Make tasks that fail from YARN pr...

2015-09-02 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/8007#discussion_r38607757
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala
 ---
@@ -91,6 +94,36 @@ private[spark] abstract class YarnSchedulerBackend(
   }
 
   /**
+   * Override the DriverEndpoint to add extra logic for the case when an 
executor is disconnected.
+   * We should check the cluster manager and find if the loss of the 
executor was caused by YARN
+   * force killing it due to preemption.
+   */
+  private class YarnDriverEndpoint(rpcEnv: RpcEnv, sparkProperties: 
Seq[(String, String)])
+  extends DriverEndpoint(rpcEnv, sparkProperties) {
--- End diff --

The superclass, CoarseGrainedSchedulerBackend, will use the driver endpoint 
for messaging as well, right? So then the CoarseGrainedSchedulerBackend's 
driver endpoint will now handle both calls for RequestExecutors, KillExecutors, 
etc. - while it also has to handle 
CoarseGrainedSchedulerBackend.DriverEndpoint's messages as well. I think the 
separation of concerns of what endpoints process what messages makes it 
sensible to have the two endpoints.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10332][CORE] Fix yarn spark executor va...

2015-09-02 Thread vanzin
Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/8580#issuecomment-137302069
  
I'll merge tomorrow morning unless someone chimes in.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8167] Make tasks that fail from YARN pr...

2015-09-02 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/8007#discussion_r38607674
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
@@ -440,22 +454,43 @@ private[yarn] class YarnAllocator(
 // Hadoop 2.2.X added a ContainerExitStatus we should switch to use
 // there are some exit status' we shouldn't necessarily count 
against us, but for
 // now I think its ok as none of the containers are expected to 
exit
-if (completedContainer.getExitStatus == 
ContainerExitStatus.PREEMPTED) {
-  logInfo("Container preempted: " + containerId)
-} else if (completedContainer.getExitStatus == -103) { // vmem 
limit exceeded
-  logWarning(memLimitExceededLogMessage(
+var isNormalExit = false
+var containerExitReason = "Container exited for an unknown reason."
+val exitStatus = completedContainer.getExitStatus
+if (exitStatus == ContainerExitStatus.PREEMPTED) {
+  isNormalExit = true
+  containerExitReason = s"Container $containerId was preempted."
+  logInfo(containerExitReason)
+} else if (exitStatus == -103) { // vmem limit exceeded
+  // Should probably still count these towards task failures
+  containerExitReason = memLimitExceededLogMessage(
 completedContainer.getDiagnostics,
-VMEM_EXCEEDED_PATTERN))
-} else if (completedContainer.getExitStatus == -104) { // pmem 
limit exceeded
-  logWarning(memLimitExceededLogMessage(
+VMEM_EXCEEDED_PATTERN)
+  logWarning(containerExitReason)
+} else if (exitStatus == -104) { // pmem limit exceeded
+  // Should probably still count these towards task failures
+  containerExitReason = memLimitExceededLogMessage(
 completedContainer.getDiagnostics,
-PMEM_EXCEEDED_PATTERN))
-} else if (completedContainer.getExitStatus != 0) {
+PMEM_EXCEEDED_PATTERN)
+  logWarning(containerExitReason)
+} else if (exitStatus != 0) {
   logInfo("Container marked as failed: " + containerId +
 ". Exit status: " + completedContainer.getExitStatus +
 ". Diagnostics: " + completedContainer.getDiagnostics)
   numExecutorsFailed += 1
+  containerExitReason = s"Container $containerId exited abnormally 
with exit" +
+s" status $exitStatus, and was marked as failed."
+}
--- End diff --

would you mind rewriting this for readability:
```
val (isNormalExit, exitReason) = exitStatus match {
  case ContainerExitStatus.NORMAL =>
(true, "...")
  case ContainerExitStatus.PREEMPTED =>
(true, "...")
  case ContainerExitStatus.VMEM_LIMIT_EXCEEDED =>
(false, "...")
  case ContainerExitStatus.PMEM_LIMIT_EXCEEDED =>
(false, "...")
  case other =>
// TODO: increment this for other failures as well?
numExecutorsFailed += 1
(false, "... unknown reason")
}

if (isNormalExit) {
  logInfo(exitReason)
} else {
  logWarning(exitReason)
}

ExecutorExited(completedContainer.getExitStatus, isNormalExit, exitReason)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8167] Make tasks that fail from YARN pr...

2015-09-02 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/8007#discussion_r38607588
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
@@ -440,22 +454,43 @@ private[yarn] class YarnAllocator(
 // Hadoop 2.2.X added a ContainerExitStatus we should switch to use
 // there are some exit status' we shouldn't necessarily count 
against us, but for
 // now I think its ok as none of the containers are expected to 
exit
-if (completedContainer.getExitStatus == 
ContainerExitStatus.PREEMPTED) {
-  logInfo("Container preempted: " + containerId)
-} else if (completedContainer.getExitStatus == -103) { // vmem 
limit exceeded
-  logWarning(memLimitExceededLogMessage(
+var isNormalExit = false
+var containerExitReason = "Container exited for an unknown reason."
--- End diff --

actually this would cause exit code 0 to have this message, right? That 
doesn't seem correct.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8167] Make tasks that fail from YARN pr...

2015-09-02 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/8007#discussion_r38607562
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala
 ---
@@ -101,6 +134,32 @@ private[spark] abstract class YarnSchedulerBackend(
   
ThreadUtils.newDaemonCachedThreadPool("yarn-scheduler-ask-am-thread-pool")
 implicit val askAmExecutor = 
ExecutionContext.fromExecutor(askAmThreadPool)
 
+def handleExecutorDisconnectedFromDriver(
--- End diff --

Can't be private because YarnDriverEndpoint accesses it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8167] Make tasks that fail from YARN pr...

2015-09-02 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/8007#discussion_r38607365
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
@@ -440,22 +454,43 @@ private[yarn] class YarnAllocator(
 // Hadoop 2.2.X added a ContainerExitStatus we should switch to use
 // there are some exit status' we shouldn't necessarily count 
against us, but for
 // now I think its ok as none of the containers are expected to 
exit
-if (completedContainer.getExitStatus == 
ContainerExitStatus.PREEMPTED) {
-  logInfo("Container preempted: " + containerId)
-} else if (completedContainer.getExitStatus == -103) { // vmem 
limit exceeded
-  logWarning(memLimitExceededLogMessage(
+var isNormalExit = false
+var containerExitReason = "Container exited for an unknown reason."
--- End diff --

this is a little strange for a default value. I would just assign this in 
the else case to make it clearer.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8167] Make tasks that fail from YARN pr...

2015-09-02 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/8007#discussion_r38607396
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
@@ -440,22 +454,43 @@ private[yarn] class YarnAllocator(
 // Hadoop 2.2.X added a ContainerExitStatus we should switch to use
 // there are some exit status' we shouldn't necessarily count 
against us, but for
 // now I think its ok as none of the containers are expected to 
exit
-if (completedContainer.getExitStatus == 
ContainerExitStatus.PREEMPTED) {
-  logInfo("Container preempted: " + containerId)
-} else if (completedContainer.getExitStatus == -103) { // vmem 
limit exceeded
-  logWarning(memLimitExceededLogMessage(
+var isNormalExit = false
+var containerExitReason = "Container exited for an unknown reason."
+val exitStatus = completedContainer.getExitStatus
+if (exitStatus == ContainerExitStatus.PREEMPTED) {
+  isNormalExit = true
--- End diff --

can you add a comment to explain why this is a normal exit, and link 
`SPARK-8167` in that comment?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8167] Make tasks that fail from YARN pr...

2015-09-02 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/8007#discussion_r38607225
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala ---
@@ -17,6 +17,7 @@
 
 package org.apache.spark.deploy.yarn
 
+import scala.collection.mutable.{ArrayBuffer, Buffer, HashMap}
--- End diff --

also not used, I think


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8167] Make tasks that fail from YARN pr...

2015-09-02 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/8007#discussion_r38607194
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala ---
@@ -67,6 +68,11 @@ private[spark] class ApplicationMaster(
 sparkConf.getInt("spark.yarn.max.worker.failures",
   math.max(sparkConf.getInt("spark.executor.instances", 0) *  2, 3)))
 
+  // Executor loss reason requests that are pending - maps from executor 
ID for inquiry to a
+  // list of requesters that should be responded to once we find out why 
the given executor
+  // was lost.
+  private val pendingLossReasonRequests = new HashMap[String, 
Buffer[RpcCallContext]]
--- End diff --

this isn't used is it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8167] Make tasks that fail from YARN pr...

2015-09-02 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/8007#discussion_r38607202
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
@@ -64,7 +66,7 @@ private[yarn] class YarnAllocator(
 securityMgr: SecurityManager)
   extends Logging {
 
-  import YarnAllocator._
+  import org.apache.spark.deploy.yarn.YarnAllocator._
--- End diff --

is this change necessary?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10332][CORE] Fix yarn spark executor va...

2015-09-02 Thread holdenk
Github user holdenk commented on the pull request:

https://github.com/apache/spark/pull/8580#issuecomment-137300659
  
Do you want to merge it since I don't have commit bits then? (or wait for 
srowen to take a look too?)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10422] [SQL] String column in InMemoryC...

2015-09-02 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8578#issuecomment-137300609
  
  [Test build #1716 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1716/consoleFull)
 for   PR 8578 at commit 
[`2905fd5`](https://github.com/apache/spark/commit/2905fd543b25251f9c50e8059668112c27a251d4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10332][CORE] Fix yarn spark executor va...

2015-09-02 Thread holdenk
Github user holdenk commented on the pull request:

https://github.com/apache/spark/pull/8580#issuecomment-137300560
  
Makes sense


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10332][CORE] Fix yarn spark executor va...

2015-09-02 Thread vanzin
Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/8580#issuecomment-137299432
  
I wouldn't hold the release for it, but we can definitely push it to 
branch-1.5. Worst case, it makes it to 1.5.1.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8167] Make tasks that fail from YARN pr...

2015-09-02 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/8007#discussion_r38606027
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala
 ---
@@ -101,6 +134,32 @@ private[spark] abstract class YarnSchedulerBackend(
   
ThreadUtils.newDaemonCachedThreadPool("yarn-scheduler-ask-am-thread-pool")
 implicit val askAmExecutor = 
ExecutionContext.fromExecutor(askAmThreadPool)
 
+def handleExecutorDisconnectedFromDriver(
--- End diff --

this can just be `handleDisconnectedExecutor`. Also please keep it private


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8167] Make tasks that fail from YARN pr...

2015-09-02 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/8007#discussion_r38605954
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala
 ---
@@ -91,6 +94,36 @@ private[spark] abstract class YarnSchedulerBackend(
   }
 
   /**
+   * Override the DriverEndpoint to add extra logic for the case when an 
executor is disconnected.
+   * We should check the cluster manager and find if the loss of the 
executor was caused by YARN
+   * force killing it due to preemption.
+   */
+  private class YarnDriverEndpoint(rpcEnv: RpcEnv, sparkProperties: 
Seq[(String, String)])
+  extends DriverEndpoint(rpcEnv, sparkProperties) {
--- End diff --

I still think you can merge the two endpoints in this file. In particular, 
you can make the other `YarnSchedulerEndpoint` extend `DriverEndpoint` and 
override `onDisconnected` just like how you did it here. Would that work?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8167] Make tasks that fail from YARN pr...

2015-09-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8007#issuecomment-137297486
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41956/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8167] Make tasks that fail from YARN pr...

2015-09-02 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8007#issuecomment-137297480
  
  [Test build #41956 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41956/console)
 for   PR 8007 at commit 
[`e967fa4`](https://github.com/apache/spark/commit/e967fa480cc7f276363cbd56222751173b7b891a).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class ExecutorLostFailure(execId: String, isNormalExit: Boolean = 
false)`
  * `class ExecutorLossReason(val message: String) extends Serializable `
  * `case class ExecutorExited(exitCode: Int, isNormalExit: Boolean, 
reason: String)`
  * `  case class RemoveExecutor(executorId: String, reason: 
ExecutorLossReason)`
  * `  case class GetExecutorLossReason(executorId: String) extends 
CoarseGrainedClusterMessage`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   >