date:20150115

[GitHub] spark pull request: [SPARK-5266][Yarn]AM's numExecutorsFailed shou...

2015-01-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4061#issuecomment-70098530
  
  [Test build #25601 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25601/consoleFull)
 for   PR 4061 at commit 
[`c0a3ec7`](https://github.com/apache/spark/commit/c0a3ec7937074d8a0b35cd3a7621d764b3d67431).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class ExperimentalMethods protected[sql](sqlContext: SQLContext) `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5266][Yarn]AM's numExecutorsFailed shou...

2015-01-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4061#issuecomment-70098547
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25601/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5268] don't stop ExecutorBackend for ir...

2015-01-15 Thread CodingCat

GitHub user CodingCat opened a pull request:

https://github.com/apache/spark/pull/4063

[SPARK-5268] don't stop ExecutorBackend for irrelevant DisassociatedEvent

In CoarseGrainedExecutorBackend, we subscribe DisassociatedEvent in 
executor backend actor and exit the program upon receive such event...

let's consider the following case

The user may develop an Akka-based program which starts the actor with 
Spark's actor system and communicate with an external actor system (e.g. an 
Akka-based receiver in spark streaming which communicates with an external 
system) If the external actor system fails or disassociates with the actor 
within spark's system with purpose, we may receive DisassociatedEvent and the 
executor is restarted.

This is not the expected behavior.



This is a simple fix to check the event before making the quit decision

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/CodingCat/spark SPARK-5268

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4063.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4063


commit 4a65793563d14a85b37b5f90fea52b377aec2d5c
Author: CodingCat zhunans...@gmail.com
Date:   2015-01-15T15:17:33Z

check whether DisassociatedEvent is relevant before quit




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark 3883: SSL support for HttpServer and Akk...

2015-01-15 Thread vanzin

Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/3571#issuecomment-70131914
  
@jacek-lewandowski I can only review the code, you need a committer to be 
able to move forward. e.g. @andrewor14 @JoshRosen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5186] [MLLIB] Vector.equals and Vector....

2015-01-15 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3997#discussion_r23027089
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala 
---
@@ -449,6 +449,31 @@ class SparseVector(
   override def toString: String =
 (%s,%s,%s).format(size, indices.mkString([, ,, ]), 
values.mkString([, ,, ]))
 
+  override def equals(other: Any): Boolean = {
+other match {
+  case v: SparseVector = {
+if (this.size != v.size) { return false }
+var k1 = 0
+var k2 = 0
+while (true) {
+  while (k1  this.values.size  this.values(k1) == 0) k1 += 1
+  while (k2  v.values.size  v.values(k2) == 0) k2 += 1
+
+  if (k1 == this.values.size || k2 == v.values.size) {
+return (k1 == this.values.size  k2 == v.values.size) // 
check end alignment
+  }
+  if (this.indices(k1) != v.indices(k2) || this.values(k1) != 
v.values(k2)) {
+return false
+  }
+  k1 += 1
+  k2 += 1
+}
+throw new Exception(unreachable)
--- End diff --

I wondered to myself whether this could be simplified to not have `while 
(true)`, the dummy `Exception`, etc. The best I could do was with a helper 
function:

```
...
var k1 = nextNonzero(this.values, 0)
var k2 = nextNonzero(v.values, 0)

while (k1  this.values.size  k2  v.values.size) {
  if (this.indices(k1) != v.indices(k2) || this.values(k1) != v.values(k2)) 
{
return false
  }
  k1 = nextNonzero(this.values, k1 + 1)
  k2 = nextNonzero(v.values, k2 + 1)
}

return (k1 == this.values.size  k2 == v.values.size) 
...

def nextNonzero(values: Array[Double], from: Int): Int = {
  var index = from
  while (index  this.values.size  this.values(index) == 0.0) index += 1
  index
}
```

I'm not sure it's better, just food for thought.

So the idea would be to specialize `hashCode` as well, and also handle 
`DenseVector` right? and even remove the implementations in the parent?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5186] [MLLIB] Vector.equals and Vector....

2015-01-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3997#issuecomment-70127522
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25608/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5186] [MLLIB] Vector.equals and Vector....

2015-01-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3997#issuecomment-70127518
  
  [Test build #25608 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25608/consoleFull)
 for   PR 3997 at commit 
[`a6952c3`](https://github.com/apache/spark/commit/a6952c39532594e1f4eb1c2f764d528420320ea8).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5012][MLLib][PySpark]Python API for Gau...

2015-01-15 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/4059#issuecomment-70141761
  
add to whitelist


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4286] Integrate external shuffle servic...

2015-01-15 Thread tnachen

Github user tnachen commented on the pull request:

https://github.com/apache/spark/pull/3861#issuecomment-70131124
  
Adding you folks for review: @dragos @deanw @huitseeker @skyluc 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5095] Support capping cores and launch ...

2015-01-15 Thread tnachen

Github user tnachen commented on the pull request:

https://github.com/apache/spark/pull/4027#issuecomment-70131146
  
Adding you folks for review: @dragos @deanw @huitseeker @skyluc 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-01-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4021#issuecomment-70117592
  
  [Test build #25606 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25606/consoleFull)
 for   PR 4021 at commit 
[`18d62ec`](https://github.com/apache/spark/commit/18d62ec8906c4ea3fc8d753e889f36f87b539ef5).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2996] Implement userClassPathFirst for ...

2015-01-15 Thread tgravescs

Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/3233#discussion_r23012898
  
--- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala ---
@@ -762,46 +764,37 @@ object Client extends Logging {
   extraClassPath: Option[String] = None): Unit = {
 extraClassPath.foreach(addClasspathEntry(_, env))
 addClasspathEntry(Environment.PWD.$(), env)
-
-// Normally the users app.jar is last in case conflicts with spark jars
 if (sparkConf.getBoolean(spark.yarn.user.classpath.first, false)) {
-  addUserClasspath(args, sparkConf, env)
-  addFileToClasspath(sparkJar(sparkConf), SPARK_JAR, env)
-  populateHadoopClasspath(conf, env)
-} else {
-  addFileToClasspath(sparkJar(sparkConf), SPARK_JAR, env)
-  populateHadoopClasspath(conf, env)
-  addUserClasspath(args, sparkConf, env)
+  getUserClasspath(args, sparkConf).foreach { x =
+addFileToClasspath(x, null, env)
+  }
 }
-
-// Append all jar files under the working directory to the classpath.
-addClasspathEntry(Environment.PWD.$() + Path.SEPARATOR + *, env)
--- End diff --

I agree, it would be good to keep consistent.  I just wanted to make sure 
we didn't break anything by removal.  It sounds like you tested all the things 
I can think of.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4943][SQL] Allow table name having dot ...

2015-01-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4062#issuecomment-70102560
  
  [Test build #25602 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25602/consoleFull)
 for   PR 4062 at commit 
[`e0d1960`](https://github.com/apache/spark/commit/e0d19600204c3a54ca9a6a959ccaaa1c0d7bcdca).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-01-15 Thread ilganeli

Github user ilganeli commented on the pull request:

https://github.com/apache/spark/pull/4021#issuecomment-70105302
  
I've updated the code to throw an exception in the error case you mentioned 
and I've reverted the file permission change. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5186] [MLLIB] Vector.equals and Vector....

2015-01-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3997#issuecomment-70107537
  
  [Test build #25607 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25607/consoleFull)
 for   PR 3997 at commit 
[`50abef3`](https://github.com/apache/spark/commit/50abef35ef4ccb4f4f037bb7d29c5200cc7ab7cb).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5186] [MLLIB] Vector.equals and Vector....

2015-01-15 Thread hhbyyh

Github user hhbyyh commented on the pull request:

https://github.com/apache/spark/pull/3997#issuecomment-70109130
  
Just send an update. I didn't use`sqdist` due to the performance concern. 
Since the original equals is actually a fail-fast comparison, yet `sqdist` will 
inevitably compute through the vectors even if the first element is different. 
The performance will be hard to accept for scenarios like doc2Vec over a large 
vocabulary. 

Current implementation is still based on the comparison for indices and 
values, just with the handling of the explicit 0. I gave some tests to the 
implementation and add a few ut. Any comment will be welcome!




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-01-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4021#issuecomment-70117605
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25606/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2996] Implement userClassPathFirst for ...

2015-01-15 Thread tgravescs

Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/3233#issuecomment-70094284
  
Nope your first comment answers it.  Sorry I had read that a while ago but 
forgot about it.  thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-01-15 Thread ilganeli

Github user ilganeli commented on a diff in the pull request:

https://github.com/apache/spark/pull/4021#discussion_r23018036
  
--- Diff: core/src/main/scala/org/apache/spark/Accumulators.scala ---
@@ -280,10 +281,12 @@ object AccumulatorParam {
 // TODO: The multi-thread support in accumulators is kind of lame; check
 // if there's a more intuitive way of doing it right
 private[spark] object Accumulators {
-  // TODO: Use soft references? = need to make readObject work properly 
then
-  val originals = Map[Long, Accumulable[_, _]]()
-  val localAccums = new ThreadLocal[Map[Long, Accumulable[_, _]]]() {
-override protected def initialValue() = Map[Long, Accumulable[_, _]]()
+  // Store a WeakReference instead of a StrongReference because this way 
accumulators can be
+  // appropriately garbage collected during long-running jobs and release 
memory
+  type WeakAcc = WeakReference[Accumulable[_, _]]
+  val originals = Map[Long, WeakAcc]()
+  val localAccums = new ThreadLocal[Map[Long, WeakAcc]]() {
--- End diff --

Hi Josh - are you suggesting to replace this snippet with a MapMaker just 
to simplify the initialization code? I believe the usage of either object would 
be the same - do you see a specific advantage to trying to use the MapMaker?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5186] [MLLIB] Vector.equals and Vector....

2015-01-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3997#issuecomment-70114222
  
  [Test build #25608 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25608/consoleFull)
 for   PR 3997 at commit 
[`a6952c3`](https://github.com/apache/spark/commit/a6952c39532594e1f4eb1c2f764d528420320ea8).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5268] don't stop CoarseGrainedExecutorB...

2015-01-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4063#issuecomment-70116742
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25605/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4943][SQL] Allow table name having dot ...

2015-01-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4062#issuecomment-70102567
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25602/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5268] don't stop CoarseGrainedExecutorB...

2015-01-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4063#issuecomment-70103904
  
  [Test build #25605 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25605/consoleFull)
 for   PR 4063 at commit 
[`a7654d0`](https://github.com/apache/spark/commit/a7654d08b97fb14a3a75622e179885ae26908ed9).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5268] don't stop CoarseGrainedExecutorB...

2015-01-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4063#issuecomment-70113747
  
  [Test build #25604 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25604/consoleFull)
 for   PR 4063 at commit 
[`4a65793`](https://github.com/apache/spark/commit/4a65793563d14a85b37b5f90fea52b377aec2d5c).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5268] don't stop CoarseGrainedExecutorB...

2015-01-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4063#issuecomment-70113758
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25604/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5263][SQL] `create table` DDL need to c...

2015-01-15 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/4058#issuecomment-70115220
  
The semantics of temporary tables is that they can shadow existing 
persistent tables.  This is by design.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5268] don't stop CoarseGrainedExecutorB...

2015-01-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4063#issuecomment-70116730
  
  [Test build #25605 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25605/consoleFull)
 for   PR 4063 at commit 
[`a7654d0`](https://github.com/apache/spark/commit/a7654d08b97fb14a3a75622e179885ae26908ed9).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark 3883: SSL support for HttpServer and Akk...

2015-01-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3571#issuecomment-70100458
  
  [Test build #25603 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25603/consoleFull)
 for   PR 3571 at commit 
[`a703c9b`](https://github.com/apache/spark/commit/a703c9b58d23894ad92619c05ac4968445208373).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5186] [MLLIB] Vector.equals and Vector....

2015-01-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3997#issuecomment-70107705
  
  [Test build #25607 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25607/consoleFull)
 for   PR 3997 at commit 
[`50abef3`](https://github.com/apache/spark/commit/50abef35ef4ccb4f4f037bb7d29c5200cc7ab7cb).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5268] don't stop CoarseGrainedExecutorB...

2015-01-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4063#issuecomment-70101325
  
  [Test build #25604 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25604/consoleFull)
 for   PR 4063 at commit 
[`4a65793`](https://github.com/apache/spark/commit/4a65793563d14a85b37b5f90fea52b377aec2d5c).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark 3883: SSL support for HttpServer and Akk...

2015-01-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3571#issuecomment-70112958
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25603/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark 3883: SSL support for HttpServer and Akk...

2015-01-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3571#issuecomment-70112945
  
  [Test build #25603 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25603/consoleFull)
 for   PR 3571 at commit 
[`a703c9b`](https://github.com/apache/spark/commit/a703c9b58d23894ad92619c05ac4968445208373).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5264][SQL] support `drop table` DDL com...

2015-01-15 Thread OopsOutOfMemory

Github user OopsOutOfMemory commented on the pull request:

https://github.com/apache/spark/pull/4060#issuecomment-70114354
  
Hi, @scwf @chenghao-intel
Could u please review this. I modify it to have a same entry of logical 
plan. But:
I have some questions:
1. dialect can not got by using `sqlContext.getConf(spark.sql.dialect)` 
in spark shell or in test suite.
2. sql package can not access hive package, so I add use `expose dialect as 
a function` in each Context.
Any suggestions or better way to implement this ? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark 3883: SSL support for HttpServer and Akk...

2015-01-15 Thread jacek-lewandowski

Github user jacek-lewandowski commented on the pull request:

https://github.com/apache/spark/pull/3571#issuecomment-70101711
  
@vanzin can we move forward with this PR ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5186] [MLLIB] Vector.equals and Vector....

2015-01-15 Thread hhbyyh

GitHub user hhbyyh reopened a pull request:

https://github.com/apache/spark/pull/3997

[SPARK-5186] [MLLIB]  Vector.equals and Vector.hashCode are very inefficient

JIRA Issue: https://issues.apache.org/jira/browse/SPARK-5186

Currently SparseVector is using the inherited equals from Vector, which 
will create a full-size array for even the sparse vector. The pull request 
contains a specialized equals optimization that improves on both time and 
space. 

1. The implementation will be consistent with the original. Especially it 
will keep equality comparison between SparseVector and DenseVector.
2. For the hash code, overriding it may generate some breaking change and 
we should do it in another PR.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hhbyyh/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3997.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3997


commit 574114433222f4adb95e1e98f5d96e72e733eb4d
Author: Yuhao Yang yu...@yuhaodevbox.sh.intel.com
Date:   2015-01-13T04:31:13Z

Specialized equals for SparseVector

commit f41b135ab0394e881bd03c87bb02aa77be61fb64
Author: Yuhao Yang hhb...@gmail.com
Date:   2015-01-16T15:13:12Z

iterative equals for sparse vector

commit 50abef35ef4ccb4f4f037bb7d29c5200cc7ab7cb
Author: Yuhao Yang hhb...@gmail.com
Date:   2015-01-16T15:47:19Z

fix ut for sparse vector with explicit 0




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-01-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4021#issuecomment-70104769
  
  [Test build #25606 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25606/consoleFull)
 for   PR 4021 at commit 
[`18d62ec`](https://github.com/apache/spark/commit/18d62ec8906c4ea3fc8d753e889f36f87b539ef5).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-5111][SQL]HiveContext and Thriftserver ...

2015-01-15 Thread zhzhan

GitHub user zhzhan opened a pull request:

https://github.com/apache/spark/pull/4064

[Spark-5111][SQL]HiveContext and Thriftserver cannot work in secure cluster 
beyond hadoop2.5

Hive0.13 cannot work with secure cluster in hadoop-2.5 and beyound. Due to 
java.lang.NoSuchFieldError: SASL_PROPS error. Need to backport some hive-0.14 
fix into spark, since there is no effort to upgrade hive to 0.14 support in 
spark.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zhzhan/spark spark5111

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4064.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4064


commit 3bf966c2f1bb913149a34176598a69041487cb88
Author: Zhan Zhang zhaz...@gmail.com
Date:   2014-08-08T17:47:18Z

test

commit fc56b25ff62964f59b96d2db13b5c357ae1c2f2b
Author: Zhan Zhang zhaz...@gmail.com
Date:   2015-01-07T21:01:45Z

squash all commits

commit c6b57402d19557105bc2bb95978b5815d7e95907
Author: Zhan Zhang zhaz...@gmail.com
Date:   2015-01-09T17:48:45Z

hive secure cluster fix

commit 456232c1ce29a7bff7f7d606764d5da00a478695
Author: Zhan Zhang zhaz...@gmail.com
Date:   2015-01-09T21:57:54Z

hive on secure cluster fix

commit 6532a342ba85be0300c169ce81f671da7ea5dcb1
Author: Zhan Zhang zhaz...@gmail.com
Date:   2015-01-15T19:53:36Z

rebase




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5012][MLLib][PySpark]Python API for Gau...

2015-01-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4059#issuecomment-70142339
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25609/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark 3883: SSL support for HttpServer and Akk...

2015-01-15 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/3571#discussion_r23035825
  
--- Diff: core/src/main/scala/org/apache/spark/SSLOptions.scala ---
@@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark
+
+import java.io.File
+
+import scala.util.Try
+
+import com.typesafe.config.{Config, ConfigFactory, ConfigValueFactory}
+import org.eclipse.jetty.util.ssl.SslContextFactory
+
+private[spark] case class SSLOptions(
+enabled: Boolean = false,
+keyStore: Option[File] = None,
+keyStorePassword: Option[String] = None,
+keyPassword: Option[String] = None,
+trustStore: Option[File] = None,
+trustStorePassword: Option[String] = None,
+protocol: Option[String] = None,
+enabledAlgorithms: Set[String] = Set.empty) {
+
+  /**
+   * Creates a Jetty SSL context factory according to the SSL settings 
represented by this object.
+   */
+  def createJettySslContextFactory(): Option[SslContextFactory] = {
+if (enabled) {
+  val sslContextFactory = new SslContextFactory()
+
+  keyStore.foreach(file = 
sslContextFactory.setKeyStorePath(file.getAbsolutePath))
+  trustStore.foreach(file = 
sslContextFactory.setTrustStore(file.getAbsolutePath))
+  keyStorePassword.foreach(sslContextFactory.setKeyStorePassword)
+  trustStorePassword.foreach(sslContextFactory.setTrustStorePassword)
+  keyPassword.foreach(sslContextFactory.setKeyManagerPassword)
+  protocol.foreach(sslContextFactory.setProtocol)
+  sslContextFactory.setIncludeCipherSuites(enabledAlgorithms.toSeq: _*)
+
+  Some(sslContextFactory)
+} else {
+  None
+}
+  }
+
+  /**
+   * Creates an Akka configuration object which contains all the SSL 
settings represented by this
+   * object. It can be used then to compose the ultimate Akka 
configuration.
+   */
+  def createAkkaConfig: Option[Config] = {
+import scala.collection.JavaConversions._
+if (enabled) {
+  Some(ConfigFactory.empty()
+.withValue(akka.remote.netty.tcp.security.key-store,
+  
ConfigValueFactory.fromAnyRef(keyStore.map(_.getAbsolutePath).getOrElse()))
+.withValue(akka.remote.netty.tcp.security.key-store-password,
+  ConfigValueFactory.fromAnyRef(keyStorePassword.getOrElse()))
+.withValue(akka.remote.netty.tcp.security.trust-store,
+  
ConfigValueFactory.fromAnyRef(trustStore.map(_.getAbsolutePath).getOrElse()))
+.withValue(akka.remote.netty.tcp.security.trust-store-password,
+  ConfigValueFactory.fromAnyRef(trustStorePassword.getOrElse()))
+.withValue(akka.remote.netty.tcp.security.key-password,
+  ConfigValueFactory.fromAnyRef(keyPassword.getOrElse()))
+
.withValue(akka.remote.netty.tcp.security.random-number-generator,
+  ConfigValueFactory.fromAnyRef())
+.withValue(akka.remote.netty.tcp.security.protocol,
+  ConfigValueFactory.fromAnyRef(protocol.getOrElse()))
+.withValue(akka.remote.netty.tcp.security.enabled-algorithms,
+  ConfigValueFactory.fromIterable(enabledAlgorithms.toSeq))
+.withValue(akka.remote.netty.tcp.enable-ssl,
+  ConfigValueFactory.fromAnyRef(true)))
+} else {
+  None
+}
+  }
+
+  override def toString: String = sSSLOptions{enabled=$enabled,  +
+  skeyStore=$keyStore, keyStorePassword=${keyStorePassword.map(_ = 
xxx)},  +
+  strustStore=$trustStore, 
trustStorePassword=${trustStorePassword.map(_ = xxx)},  +
+  sprotocol=$protocol, enabledAlgorithms=$enabledAlgorithms}
+
+}
+
+object SSLOptions extends Logging {
+
+  /**
+   * Resolves SSLOptions settings from a given Spark configuration object 
at a given namespace.
+   * The parent directory of that location is used as a base directory to

[GitHub] spark pull request: Spark 3883: SSL support for HttpServer and Akk...

2015-01-15 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/3571#discussion_r23035899
  
--- Diff: core/src/main/scala/org/apache/spark/SecurityManager.scala ---
@@ -18,7 +18,11 @@
 package org.apache.spark
 
 import java.net.{Authenticator, PasswordAuthentication}
+import java.security.KeyStore
+import java.security.cert.X509Certificate
+import javax.net.ssl._
--- End diff --

nit: `java.net` before `java.security`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark 3883: SSL support for HttpServer and Akk...

2015-01-15 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/3571#discussion_r23036580
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala
 ---
@@ -21,6 +21,8 @@ import java.io.File
 import java.util.{List = JList}
 import java.util.Collections
 
+import org.apache.spark.util.AkkaUtils
--- End diff --

nit: spark imports come last


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5224] [PySpark] improve performance of ...

2015-01-15 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/4024


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark 3883: SSL support for HttpServer and Akk...

2015-01-15 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/3571#discussion_r23037152
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala 
---
@@ -523,10 +525,31 @@ private[spark] object Worker extends Logging {
 val securityMgr = new SecurityManager(conf)
 val (actorSystem, boundPort) = AkkaUtils.createActorSystem(systemName, 
host, port,
   conf = conf, securityManager = securityMgr)
-val masterAkkaUrls = masterUrls.map(Master.toAkkaUrl)
+val masterAkkaUrls = masterUrls.map(Master.toAkkaUrl(_, conf))
 actorSystem.actorOf(Props(classOf[Worker], host, boundPort, webUiPort, 
cores, memory,
   masterAkkaUrls, systemName, actorName,  workDir, conf, securityMgr), 
name = actorName)
 (actorSystem, boundPort)
   }
 
+  private[spark] def isUseLocalNodeSSLConfig(cmd: Command): Boolean = {
+val pattern = \-Dspark\.ssl\.useNodeLocalConf\=(.+).r
+val result = cmd.javaOpts.collectFirst {
+  case pattern(_result) = _result.toBoolean
+}
+result.getOrElse(false)
+  }
+
+  private[spark] def maybeUpdateSSLSettings(cmd: Command, conf: 
SparkConf): Command = {
+val prefix = spark.ssl.
+val useLNCPrefix = spark.ssl.useNodeLocalConf
+if (isUseLocalNodeSSLConfig(cmd)) {
+  val newJavaOpts = cmd.javaOpts
+  .filterNot(opt = opt.startsWith(s-D$prefix)  
!opt.startsWith(s-D$useLNCPrefix=)) ++
--- End diff --

nit: could you use `filter`? My brain gets into a knot trying to negate the 
condition here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark 3883: SSL support for HttpServer and Akk...

2015-01-15 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/3571#discussion_r23037199
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala 
---
@@ -523,10 +525,31 @@ private[spark] object Worker extends Logging {
 val securityMgr = new SecurityManager(conf)
 val (actorSystem, boundPort) = AkkaUtils.createActorSystem(systemName, 
host, port,
   conf = conf, securityManager = securityMgr)
-val masterAkkaUrls = masterUrls.map(Master.toAkkaUrl)
+val masterAkkaUrls = masterUrls.map(Master.toAkkaUrl(_, conf))
 actorSystem.actorOf(Props(classOf[Worker], host, boundPort, webUiPort, 
cores, memory,
   masterAkkaUrls, systemName, actorName,  workDir, conf, securityMgr), 
name = actorName)
 (actorSystem, boundPort)
   }
 
+  private[spark] def isUseLocalNodeSSLConfig(cmd: Command): Boolean = {
+val pattern = \-Dspark\.ssl\.useNodeLocalConf\=(.+).r
+val result = cmd.javaOpts.collectFirst {
+  case pattern(_result) = _result.toBoolean
+}
+result.getOrElse(false)
+  }
+
+  private[spark] def maybeUpdateSSLSettings(cmd: Command, conf: 
SparkConf): Command = {
+val prefix = spark.ssl.
+val useLNCPrefix = spark.ssl.useNodeLocalConf
--- End diff --

wait: is this a prefix at all? or is it a single config?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark 3883: SSL support for HttpServer and Akk...

2015-01-15 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/3571#discussion_r23036539
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/SimrSchedulerBackend.scala
 ---
@@ -18,6 +18,7 @@
 package org.apache.spark.scheduler.cluster
 
 import org.apache.hadoop.fs.{Path, FileSystem}
+import org.apache.spark.util.AkkaUtils
--- End diff --

nit: group with other spark imports.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1405] [mllib] Latent Dirichlet Allocati...

2015-01-15 Thread jkbradley

Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/4047#issuecomment-70146798
  
@EntilZha  Hereâs a sketch of my plan.

Datasets:
* UCI ML Repository data (also used by Asuncion et al., 2009):
  * KOS
  * NIPS
  * NYTimes
  * PubMed (full)
* Wikipedia?

Data preparation:
* Converting to bags of words:
  * UCI datasets are given as word counts already.
  * Wikipedia dump is text.
* I use the SimpleTokenizer in the LDAExample, which sets term = word 
and only accepts alphabetic characters.
* Use stopwords from @dlwh located at 
[https://github.com/dlwh/spark/feature/lda]
* No stemming
* Choosing vocab: For various vocabSize settings, I took the most common 
vocabSize terms.

Scaling tests: *(doing these first)*
* corpus size
* vocabSize
* k
* numIterations

Accuracy tests: *(doing these second)*
* train on full datasets
* Tune hyperparameters via grid search, following Asuncion et al. (2009) 
section 4.1.
* Can hopefully compare with their results in Fig. 5.

These tests will run on a 16-node EC2 cluster of r3.2xlarge instances.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1405] [mllib] Latent Dirichlet Allocati...

2015-01-15 Thread jkbradley

Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/4047#issuecomment-70146828
  
@witgo  I agree that there are 2 different use regimes for LDA: 
interpretable topics and featurization.  The current implementation follows 
pretty much every other graph-based implementation Iâve seen:
* 1 vertex per document + 1 vertex per term
* Each vertex stores a vector of length # topics.
* On each iteration, each doc vertex must communicate its vector to any 
connected term vertices (and likewise for term vertices), via map-reduce stages 
over triplets.

I have not heard of methods which can avoid this amount of communication 
for LDA.  Iâm sure the implementation can be optimized, so please make 
comments here or JIRAs afterwards about that.  For modified models, it might be 
possible to communicate less: sparsity-inducing priors, hierarchical models, 
etc.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4939] move to next locality when no pen...

2015-01-15 Thread tdas

Github user tdas commented on the pull request:

https://github.com/apache/spark/pull/3779#issuecomment-70149578
  
@kayousterhout Could you take a look at this. This is priority for 1.3 :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5012][MLLib][PySpark]Python API for Gau...

2015-01-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4059#issuecomment-70142336
  
  [Test build #25609 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25609/consoleFull)
 for   PR 4059 at commit 
[`5c83825`](https://github.com/apache/spark/commit/5c83825c570b4ee1357021ec25a1a35a09a633e7).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class GaussianMixtureModel(object):`
  * `class GaussianMixtureEM(object):`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5224] [PySpark] improve performance of ...

2015-01-15 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/4024#issuecomment-70147787
  
LGTM, so I'm going to merge this into `master` (1.3.0) and `branch-1.2` 
(1.2.1).  Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark 3883: SSL support for HttpServer and Akk...

2015-01-15 Thread jacek-lewandowski

Github user jacek-lewandowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/3571#discussion_r23038107
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala 
---
@@ -523,10 +525,31 @@ private[spark] object Worker extends Logging {
 val securityMgr = new SecurityManager(conf)
 val (actorSystem, boundPort) = AkkaUtils.createActorSystem(systemName, 
host, port,
   conf = conf, securityManager = securityMgr)
-val masterAkkaUrls = masterUrls.map(Master.toAkkaUrl)
+val masterAkkaUrls = masterUrls.map(Master.toAkkaUrl(_, conf))
 actorSystem.actorOf(Props(classOf[Worker], host, boundPort, webUiPort, 
cores, memory,
   masterAkkaUrls, systemName, actorName,  workDir, conf, securityMgr), 
name = actorName)
 (actorSystem, boundPort)
   }
 
+  private[spark] def isUseLocalNodeSSLConfig(cmd: Command): Boolean = {
+val pattern = \-Dspark\.ssl\.useNodeLocalConf\=(.+).r
+val result = cmd.javaOpts.collectFirst {
+  case pattern(_result) = _result.toBoolean
+}
+result.getOrElse(false)
+  }
+
+  private[spark] def maybeUpdateSSLSettings(cmd: Command, conf: 
SparkConf): Command = {
+val prefix = spark.ssl.
+val useLNCPrefix = spark.ssl.useNodeLocalConf
--- End diff --

Actually it is not a prefix - renamed the constant


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5012][MLLib][PySpark]Python API for Gau...

2015-01-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4059#issuecomment-70142185
  
  [Test build #25609 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25609/consoleFull)
 for   PR 4059 at commit 
[`5c83825`](https://github.com/apache/spark/commit/5c83825c570b4ee1357021ec25a1a35a09a633e7).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4504][Examples] fix run-example failure...

2015-01-15 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3377#issuecomment-70147153
  
Jenkins, this is ok to test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark 3883: SSL support for HttpServer and Akk...

2015-01-15 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/3571#discussion_r23036715
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala 
---
@@ -347,10 +347,10 @@ private[spark] class Worker(
 }.toSeq
   }
   appDirectories(appId) = appLocalDirs
-
-  val manager = new ExecutorRunner(appId, execId, appDesc, cores_, 
memory_,
-self, workerId, host, sparkHome, executorDir, akkaUrl, conf, 
appLocalDirs,
-ExecutorState.LOADING)
+  val manager = new ExecutorRunner(appId, execId,
+appDesc.copy(command = 
Worker.maybeUpdateSSLSettings(appDesc.command, conf)),
--- End diff --

Hmmm... why do you need the copy? A quick overlook of `ExecutorRunner` 
doesn't seem to indicate it modifies this object in any way...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4920][UI]: back port the PR-3763 to bra...

2015-01-15 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3768#issuecomment-70147186
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2015-01-15 Thread pwendell

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/3518#discussion_r23050901
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -827,9 +868,21 @@ class DAGScheduler(
 // might modify state of objects referenced in their closures. This is 
necessary in Hadoop
 // where the JobConf/Configuration object is not thread-safe.
 var taskBinary: Broadcast[Array[Byte]] = null
+
+// Check if RDD serialization debugging is enabled
+val debugSerialization: Boolean = 
sc.getConf.getBoolean(spark.serializer.debug, false)
--- End diff --

Ah I see - this does that already. Yeah so I'd just remove the config 
option and just always print debugging output if there is a failure. We usually 
try not to add config options unless there is a really compelling reason to not 
have the feature enabled.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5268] don't stop CoarseGrainedExecutorB...

2015-01-15 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/4063#issuecomment-70181033
  
Rather than doing this one by one, can't we change the common class 
ActorLogReceive? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4923][REPL] Add Developer API to REPL t...

2015-01-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4034#issuecomment-70181623
  
  [Test build #25617 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25617/consoleFull)
 for   PR 4034 at commit 
[`6dc1ee2`](https://github.com/apache/spark/commit/6dc1ee2b9ec589ceb2ade3454c3dbaf0697a09b4).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class SparkILoop(`
  * `   * @param id The id (variable name, method name, class name, etc) 
whose`
  * `   * Retrieves the class representing the id (variable name, method 
name,`
  * `   * @param id The id (variable name, method name, class name, etc) 
whose`
  * `   * @return Some containing term name (id) class if exists, else None`
  * `   * @param id The id (variable name, method name, class name, etc) 
whose`
  * `   * @param id The id (variable name, method name, class name, etc) 
whose`
  * `   * Retrieves the runtime class and type representing the id 
(variable name,`
  * `   * @param id The id (variable name, method name, class name, etc) 
whose`
  * `   * @param id The id (variable name, method name, class name, etc) 
whose`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5274][SQL] Reconcile Java and Scala UDF...

2015-01-15 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/4056


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3288] All fields in TaskMetrics should ...

2015-01-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4020#issuecomment-70189111
  
  [Test build #25621 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25621/consoleFull)
 for   PR 4020 at commit 
[`e446287`](https://github.com/apache/spark/commit/e446287b866eedeb74e68c2f800acf29250d2a76).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3288] All fields in TaskMetrics should ...

2015-01-15 Thread sryza

Github user sryza commented on the pull request:

https://github.com/apache/spark/pull/4020#issuecomment-70189173
  
Had a look over, and this mostly looks good, but it looks like there are 
many places where the patch replaces assigning with incrementing.  It would be 
good to take a close look and pull all these out.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5274][SQL] Reconcile Java and Scala UDF...

2015-01-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4056#issuecomment-70176785
  
  [Test build #25618 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25618/consoleFull)
 for   PR 4056 at commit 
[`ae9c556`](https://github.com/apache/spark/commit/ae9c556d91a58f41098b40b3e10842570e4b3278).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2015-01-15 Thread pwendell

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/3518#discussion_r23051146
  
--- Diff: core/src/main/scala/org/apache/spark/util/ObjectWalker.scala ---
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.util
+
+import java.lang.reflect.{Modifier, Field}
+
+import com.google.common.collect.Queues
+
+import scala.collection.mutable
+
+
+/**
+ * This class permits traversing a generic Object's reference graph. This 
is useful for debugging 
+ * serialization errors. See SPARK-3694.
+ * 
+ * This code is based on code written by Josh Rosen found here:
+ * https://gist.github.com/JoshRosen/d6a8972c2e97d040
+ */
+object ObjectWalker {
+  def isTransient(field: Field): Boolean = 
Modifier.isTransient(field.getModifiers)
+  def isStatic(field: Field): Boolean = 
Modifier.isStatic(field.getModifiers)
+  def isPrimitive(field: Field): Boolean = field.getType.isPrimitive
+
+  /**
+   * Traverse the graph representing all references between the provided 
root object, its
+   * members, and their references in turn. 
+   * 
+   * What we want to be able to do is readily identify un-serializable 
components AND the path
+   * to those components. To do this, store the traversal of the graph as 
a 2-tuple - the actual 
+   * reference visited and its parent. Then, to get the path to the 
un-serializable reference 
+   * we can simply follow the parent links. 
+   *
+   * @param rootObj - The root object for which to generate the reference 
graph
+   * @return a new Set containing the 2-tuple of references from the 
traversal of the 
+   * reference graph along with their parent references. (self, 
parent)
+   */
+  def buildRefGraph(rootObj: AnyRef): mutable.LinkedList[AnyRef] = {
+val visitedRefs = mutable.Set[AnyRef]()
+val toVisit = Queues.newArrayDeque[AnyRef]()
+var results = mutable.LinkedList[AnyRef]()
+
+toVisit.add(rootObj)
+
+while (!toVisit.isEmpty) {
+  val obj : AnyRef = toVisit.pollFirst()
+  // Store the last parent reference to enable quick retrieval of the 
path to a broken node
+  
+  if (!visitedRefs.contains(obj)) {
+results = mutable.LinkedList(obj).append(results)
+visitedRefs.add(obj)
+  
+// Extract all the fields from the object that would be 
serialized. Transient and 
+// static references are not serialized and primitive variables 
will always be serializable
+// and will not contain further references.
+
+for (field - getAllFields(obj.getClass)
--- End diff --

could you pull this expression out into it's own variable `val fieldsToTest 
= getAllFields(...)`. We try not to nest expressions like this to make the code 
more readable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2015-01-15 Thread ilganeli

Github user ilganeli commented on the pull request:

https://github.com/apache/spark/pull/3518#issuecomment-70180151
  
Hi Patrick - thanks for the feedback. I would love to print out the names 
of the fields but I wasn't able to figure out a way to do that - could you 
suggest how? 

I wasn't sure if printing the hash code was useful or not, Josh included it 
in his original example of a traversal so I figured I'd leave it in. I didn't 
know if there would be a way to look it up post-facto. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3288] All fields in TaskMetrics should ...

2015-01-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4020#issuecomment-70180180
  
  [Test build #25619 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25619/consoleFull)
 for   PR 4020 at commit 
[`6444391`](https://github.com/apache/spark/commit/644439144dba2f1a2c0cac29da16a0fc7a52b109).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5249] Added type specific set functions...

2015-01-15 Thread AdamGS

Github user AdamGS commented on the pull request:

https://github.com/apache/spark/pull/4042#issuecomment-70183233
  
@pwendell, will just adding the new set (and setIfMissing) methods work?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3288] All fields in TaskMetrics should ...

2015-01-15 Thread ilganeli

Github user ilganeli commented on a diff in the pull request:

https://github.com/apache/spark/pull/4020#discussion_r23056056
  
--- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
@@ -257,8 +257,8 @@ private[spark] class Executor(
   val serviceTime = System.currentTimeMillis() - taskStart
   val metrics = attemptedTask.flatMap(t = t.metrics)
   for (m - metrics) {
-m.executorRunTime = serviceTime
-m.jvmGCTime = gcTime - startGCTime
+m.incExecutorRunTime(serviceTime)
--- End diff --

I'm not sure whether the original behavior is necessarily correct. If the 
goal is to track total run time for the task, why does it make sense to do an 
assignment anywhere instead of an accumulation? 
 




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5268] don't stop CoarseGrainedExecutorB...

2015-01-15 Thread CodingCat

Github user CodingCat commented on the pull request:

https://github.com/apache/spark/pull/4063#issuecomment-70189607
  
good point, how about the current one?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2015-01-15 Thread pwendell

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/3518#discussion_r23050776
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -789,6 +792,44 @@ class DAGScheduler(
 }
   }
 
+  /**
+   * Helper function to check whether an RDD and its dependencies are 
serializable. 
+   * 
+   * This hook is exposed here primarily for testing purposes. 
+   * 
+   * Note: This function is defined separately from the 
SerializationHelper.isSerializable()
+   * since DAGScheduler.isSerializable() is passed as a parameter to the 
RDDWalker class's graph
+   * traversal, which would otherwise require knowledge of the 
closureSerializer 
+   * (which was undesirable).
+   * 
+   * @param rdd - Rdd to attempt to serialize
+   * @return Array[SerializedRdd] - 
+   *   Return an array of Either objects indicating if 
serialization is successful.
+   *   Each object represents the RDD or a dependency of the RDD
+   * Success: ByteBuffer - The serialized RDD
+   * Failure: String - The reason for the failure.
+   *  
+   */
+  def tryToSerializeRddDeps(rdd: RDD[_]): Array[RDDTrace] = {
--- End diff --

I think initially it might be good to keep this private and just expose it 
as an internal utility that is triggered when we actually see serialization 
issues. Once we get some more experience with it in practice we can open up a 
debugging API.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2015-01-15 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/3518#issuecomment-70179454
  
Hey just took a quick pass with some code style suggestions (more coming) 
and usability suggestions. One thing, would it be possible to track the name of 
the fields you are traversing? This would make the debugging output more 
useful. Also, is there a good reason to print the hash code? How would users 
use that?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4746 make it easy to skip IntegrationTes...

2015-01-15 Thread squito

Github user squito commented on the pull request:

https://github.com/apache/spark/pull/4048#issuecomment-70181484
  
so, this doesn't actually work quite the way I wanted it to.  It turns out 
its skipping all the Junit tests as well.  The junit tests are run if you run 
with `test-only * -- -l`, but as sound as you add a tag like `test-only * -- -l 
foo`, then all the junit tests are skipped.  From the [junit-interface 
docs](https://github.com/sbt/junit-interface) Any parameter not starting with 
- or + is treated as a glob pattern for matching tests.

I will look into a solution for this, but I have a feeling this might mean 
we can't mix junit w/ the tagging approach, and we have to go to a more 
standard directory / filenaming approach to separating out integration tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4923][REPL] Add Developer API to REPL t...

2015-01-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4034#issuecomment-70181635
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25617/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5268] don't stop CoarseGrainedExecutorB...

2015-01-15 Thread CodingCat

Github user CodingCat commented on the pull request:

https://github.com/apache/spark/pull/4063#issuecomment-70183535
  
hmmm...I'm not sure if we really can do that, as Scala doesn't support 
super.method naturally

I checked the actors in other components (master, worker and 
CoarseGrainedSchedulerBackend), they are just fine...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5274][SQL] Reconcile Java and Scala UDF...

2015-01-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4056#issuecomment-70184276
  
  [Test build #25616 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25616/consoleFull)
 for   PR 4056 at commit 
[`675a3c9`](https://github.com/apache/spark/commit/675a3c985b9b65f1a818ec6756a215d9ef7b2246).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class UDFRegistration (sqlContext: SQLContext) extends 
org.apache.spark.Logging `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5274][SQL] Reconcile Java and Scala UDF...

2015-01-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4056#issuecomment-70184289
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25616/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5274][SQL] Reconcile Java and Scala UDF...

2015-01-15 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/4056#issuecomment-70187936
  
Merging in master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2015-01-15 Thread pwendell

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/3518#discussion_r23051221
  
--- Diff: core/src/main/scala/org/apache/spark/util/ObjectWalker.scala ---
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.util
+
+import java.lang.reflect.{Modifier, Field}
+
+import com.google.common.collect.Queues
--- End diff --

Does scala have a queue you can use here instead of using the google 
libraries?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2015-01-15 Thread pwendell

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/3518#discussion_r23051282
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -459,7 +459,23 @@ private[spark] class TaskSetManager(
   }
   // Serialize and return the task
   val startTime = clock.getTime()
+
   val serializedTask: ByteBuffer = try {
+// We rely on the DAGScheduler to catch non-serializable 
closures and RDDs, so in here
+// we assume the task can be serialized without exceptions.
+
+// Check if serialization debugging is enabled
+val debugSerialization: Boolean = sched.sc.getConf.
+  getBoolean(spark.serializer.debug, false)
+
+if (debugSerialization) {
+  SerializationHelper.tryToSerialize(ser, task).fold (
--- End diff --

We should make sure this catches any exceptions thrown by the serialization 
utility itself and in that case just say that we couldn't produce debugging 
output.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4923][REPL] Add Developer API to REPL t...

2015-01-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4034#issuecomment-70172402
  
  [Test build #25617 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25617/consoleFull)
 for   PR 4034 at commit 
[`6dc1ee2`](https://github.com/apache/spark/commit/6dc1ee2b9ec589ceb2ade3454c3dbaf0697a09b4).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2015-01-15 Thread pwendell

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/3518#discussion_r23051021
  
--- Diff: 
core/src/main/scala/org/apache/spark/util/SerializationHelper.scala ---
@@ -0,0 +1,308 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.util
+
+import java.io.NotSerializableException
+import java.nio.ByteBuffer
+
+import scala.collection.mutable
+import scala.collection.mutable.HashMap
+import scala.util.control.NonFatal
+
+import org.apache.spark.rdd.RDD
+import org.apache.spark.scheduler.Task
+import org.apache.spark.serializer.SerializerInstance
+
+/**
+ * This enumeration defines variables use to standardize debugging output
+ */
+object SerializationState extends Enumeration {
--- End diff --

Could you make this and all classes you expose in this pr `private[spark]`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2015-01-15 Thread ilganeli

Github user ilganeli commented on a diff in the pull request:

https://github.com/apache/spark/pull/3518#discussion_r23052373
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -789,6 +792,44 @@ class DAGScheduler(
 }
   }
 
+  /**
+   * Helper function to check whether an RDD and its dependencies are 
serializable. 
+   * 
+   * This hook is exposed here primarily for testing purposes. 
+   * 
+   * Note: This function is defined separately from the 
SerializationHelper.isSerializable()
+   * since DAGScheduler.isSerializable() is passed as a parameter to the 
RDDWalker class's graph
+   * traversal, which would otherwise require knowledge of the 
closureSerializer 
+   * (which was undesirable).
+   * 
+   * @param rdd - Rdd to attempt to serialize
+   * @return Array[SerializedRdd] - 
+   *   Return an array of Either objects indicating if 
serialization is successful.
+   *   Each object represents the RDD or a dependency of the RDD
+   * Success: ByteBuffer - The serialized RDD
+   * Failure: String - The reason for the failure.
+   *  
+   */
+  def tryToSerializeRddDeps(rdd: RDD[_]): Array[RDDTrace] = {
--- End diff --

I can make this private[spark] but when I say testing purposes, I mean that 
it's used within the DAGSchedulerSuite so it needs to be public (at least 
within Spark).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3288] All fields in TaskMetrics should ...

2015-01-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4020#issuecomment-70186438
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25619/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5268] don't stop CoarseGrainedExecutorB...

2015-01-15 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/4063#issuecomment-70186533
  
Can't we just intercept the message and only call receiveWithLogging on it 
if it is the proper one?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3288] All fields in TaskMetrics should ...

2015-01-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4020#issuecomment-70186432
  
  [Test build #25619 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25619/consoleFull)
 for   PR 4020 at commit 
[`6444391`](https://github.com/apache/spark/commit/644439144dba2f1a2c0cac29da16a0fc7a52b109).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5274][SQL] Reconcile Java and Scala UDF...

2015-01-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4056#issuecomment-70188254
  
  [Test build #25618 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25618/consoleFull)
 for   PR 4056 at commit 
[`ae9c556`](https://github.com/apache/spark/commit/ae9c556d91a58f41098b40b3e10842570e4b3278).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class UDFRegistration (sqlContext: SQLContext) extends 
org.apache.spark.Logging `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5274][SQL] Reconcile Java and Scala UDF...

2015-01-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4056#issuecomment-70188261
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25618/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2015-01-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3518#issuecomment-70188638
  
  [Test build #25620 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25620/consoleFull)
 for   PR 3518 at commit 
[`1d2d563`](https://github.com/apache/spark/commit/1d2d563c04a7cfb302ccacf42fcfdc8b488a3a61).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2015-01-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3518#issuecomment-70188734
  
  [Test build #25620 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25620/consoleFull)
 for   PR 3518 at commit 
[`1d2d563`](https://github.com/apache/spark/commit/1d2d563c04a7cfb302ccacf42fcfdc8b488a3a61).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2015-01-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3518#issuecomment-70188737
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25620/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3288] All fields in TaskMetrics should ...

2015-01-15 Thread sryza

Github user sryza commented on a diff in the pull request:

https://github.com/apache/spark/pull/4020#discussion_r23055589
  
--- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
@@ -257,8 +257,8 @@ private[spark] class Executor(
   val serviceTime = System.currentTimeMillis() - taskStart
   val metrics = attemptedTask.flatMap(t = t.metrics)
   for (m - metrics) {
-m.executorRunTime = serviceTime
-m.jvmGCTime = gcTime - startGCTime
+m.incExecutorRunTime(serviceTime)
--- End diff --

will this replace `=` with `+=`?  This applies in a couple places above as 
well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5268] don't stop CoarseGrainedExecutorB...

2015-01-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4063#issuecomment-70189998
  
  [Test build #25622 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25622/consoleFull)
 for   PR 4063 at commit 
[`4ed522c`](https://github.com/apache/spark/commit/4ed522c9c101573ee8eac7b8ab3206504cc8aabf).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4859][Core][Streaming] Refactor LiveLis...

2015-01-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4006#issuecomment-70206158
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25631/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4859][Core][Streaming] Refactor LiveLis...

2015-01-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4006#issuecomment-70206157
  
  [Test build #25631 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25631/consoleFull)
 for   PR 4006 at commit 
[`0710364`](https://github.com/apache/spark/commit/0710364818d9c1338188d89fa522316d84482ec4).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4879] [WIP] Use driver to coordinate Ha...

2015-01-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4066#issuecomment-70206625
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25629/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4879] [WIP] Use driver to coordinate Ha...

2015-01-15 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/4066#discussion_r23062798
  
--- Diff: core/src/main/scala/org/apache/spark/SparkHadoopWriter.scala ---
@@ -105,10 +107,20 @@ class SparkHadoopWriter(@transient jobConf: JobConf)
   def commit() {
 val taCtxt = getTaskContext()
 val cmtr = getOutputCommitter()
+val dagSchedulerActor =
+  AkkaUtils.makeDriverRef(DAGScheduler, SparkEnv.get.conf, 
SparkEnv.get.actorSystem)
+val askTimeout = AkkaUtils.askTimeout(SparkEnv.get.conf)
 if (cmtr.needsTaskCommit(taCtxt)) {
   try {
-cmtr.commitTask(taCtxt)
-logInfo (taID + : Committed)
+val canCommit: Boolean = AkkaUtils.askWithReply(
+  AskPermissionToCommitOutput(jobID, splitID, attemptID), 
dagSchedulerActor, askTimeout)
+if (canCommit) {
+  cmtr.commitTask(taCtxt)
+  logInfo (s$taID: Committed)
+} else {
+  logInfo (s$taID: Not committed because DAGScheduler did not 
authorize commit)
+}
+
   } catch {
 case e: IOException = {
   logError(Error committing the output of task:  + taID.value, e)
--- End diff --

I guess we need to catch TimeoutException here, too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5193][SQL] Remove Spark SQL Java-specif...

2015-01-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4065#issuecomment-70208071
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25632/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5193][SQL] Remove Spark SQL Java-specif...

2015-01-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4065#issuecomment-70208067
  
  [Test build #25632 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25632/consoleFull)
 for   PR 4065 at commit 
[`500d2c4`](https://github.com/apache/spark/commit/500d2c4ee388dfc508d0c810d0402e1791441cb0).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4777][CORE] Some block memory after unr...

2015-01-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3629#issuecomment-70208700
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25633/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4777][CORE] Some block memory after unr...

2015-01-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3629#issuecomment-70208695
  
  [Test build #25633 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25633/consoleFull)
 for   PR 3629 at commit 
[`f0e80f2`](https://github.com/apache/spark/commit/f0e80f29713615c60674998bba6cfbc39f120891).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4508] [SQL] build native date type to c...

2015-01-15 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/3732#issuecomment-70208781
  
Adrian - as we spoke offline, it would be simpler (for future datetime 
related features) to just represent the Date type as a primitive int 
internally, and convert to java.sql.Date when we give it back to the user.

You can create a DateTimeUtils class to implement common functionalities 
such as conversion between strings and int date.

Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 >

1 - 100 of 269 matches

Mail list logo