date:20150123

[GitHub] spark pull request: [SPARK-4879] Use the Spark driver to authorize...

2015-01-23 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/4155#issuecomment-71227419
  
Woohoo, looks like this is passing tests!  The earlier failure was due to a 
known flaky streaming test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5383][SQL] Multi alias names support

2015-01-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4182#issuecomment-71230268
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26028/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5384][mllib] Vectors.sqdist return inco...

2015-01-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4183#issuecomment-71232329
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26029/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5384][mllib] Vectors.sqdist return inco...

2015-01-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4183#issuecomment-71232323
  
  [Test build #26029 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26029/consoleFull)
 for   PR 4183 at commit 
[`54cbf97`](https://github.com/apache/spark/commit/54cbf97b3b08136ac77d7f2e6265aec9c5206a4b).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4809] Rework Guava library shading.

2015-01-23 Thread vanzin

Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/3658#issuecomment-71236254
  
Ping.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-01-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-71237900
  
  [Test build #26031 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26031/consoleFull)
 for   PR 3916 at commit 
[`23aa2a9`](https://github.com/apache/spark/commit/23aa2a9c7a0e39987bc487c51e9ad70ecb972e8f).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5384][mllib] Vectors.sqdist return inco...

2015-01-23 Thread hhbyyh

GitHub user hhbyyh opened a pull request:

https://github.com/apache/spark/pull/4183

[SPARK-5384][mllib] Vectors.sqdist return inconsistent result for 
sparse/dense vectors when the vectors have different lengths

JIRA issue: https://issues.apache.org/jira/browse/SPARK-5384
Currently `Vectors.sqdist` return inconsistent result for sparse/dense 
vectors when the vectors have different lengths, please refer to JIRA for sample

PR scope: 
Unify the sqdist logic for dense/sparse vectors and fix the inconsistency, 
also remove the possible sparse to dense conversion in the original code.

For reviewers:
Maybe we should first discuss what's the correct behavior.
1. Vectors for sqdist must have the same length, like in breeze?
2. If they can have different lengths, what's the correct result for 
sqdist? (should the extra part get into calculation?)

I'll update PR with more optimization and additional ut afterwards. Thanks.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hhbyyh/spark fixDouble

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4183.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4183


commit 54cbf97b3b08136ac77d7f2e6265aec9c5206a4b
Author: Yuhao Yang hhb...@gmail.com
Date:   2015-01-24T16:03:37Z

fix Vectors.sqdist inconsistence




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5384][mllib] Vectors.sqdist return inco...

2015-01-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4183#issuecomment-71220720
  
  [Test build #26029 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26029/consoleFull)
 for   PR 4183 at commit 
[`54cbf97`](https://github.com/apache/spark/commit/54cbf97b3b08136ac77d7f2e6265aec9c5206a4b).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4879] Use the Spark driver to authorize...

2015-01-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4155#issuecomment-71225754
  
  [Test build #26027 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26027/consoleFull)
 for   PR 4155 at commit 
[`c334255`](https://github.com/apache/spark/commit/c3342552e03d690ac4beea939b5abd13363698c4).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  class OutputCommitCoordinatorActor(outputCommitCoordinator: 
OutputCommitCoordinator)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Minor][streaming][MQTT streaming] some trivia...

2015-01-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4178#issuecomment-71229958
  
  [Test build #26030 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26030/consoleFull)
 for   PR 4178 at commit 
[`66919a3`](https://github.com/apache/spark/commit/66919a34ab1838f0f0dbc2ee76903532fa5117b8).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4879] Use the Spark driver to authorize...

2015-01-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4155#issuecomment-71225766
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26027/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5063] More helpful error messages for s...

2015-01-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3884#issuecomment-71225070
  
  [Test build #26026 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26026/consoleFull)
 for   PR 3884 at commit 
[`a943e00`](https://github.com/apache/spark/commit/a943e00fd76d1b84a598fa449b5abd99074c2c62).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5063] More helpful error messages for s...

2015-01-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3884#issuecomment-71225077
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26026/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5383][SQL] Multi alias names support

2015-01-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4182#issuecomment-71230259
  
  [Test build #26028 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26028/consoleFull)
 for   PR 4182 at commit 
[`9b7e7c9`](https://github.com/apache/spark/commit/9b7e7c9aa02a2a29eab1c7ba08ee681543904d19).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class Alias(child: Expression, names: Seq[String])`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5291][CORE] Add timestamp and reason wh...

2015-01-23 Thread vanzin

Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/4082#issuecomment-71248520
  
@ksakellis 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Minor][streaming][MQTT streaming] some trivia...

2015-01-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4178#issuecomment-71241536
  
  [Test build #26030 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26030/consoleFull)
 for   PR 4178 at commit 
[`66919a3`](https://github.com/apache/spark/commit/66919a34ab1838f0f0dbc2ee76903532fa5117b8).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5063] More helpful error messages for s...

2015-01-23 Thread vanzin

Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/3884#issuecomment-71244412
  
Scala changes look ok to me; I'm not super familiar with the pyspark 
internals, but the check on `rdd.py` surprised me because I thought RDDs were 
actually serialized, at least on the Scala side.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4879] Use the Spark driver to authorize...

2015-01-23 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/4155#discussion_r23473037
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/OutputCommitCoordinator.scala ---
@@ -0,0 +1,178 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler
+
+import scala.collection.mutable
+import scala.concurrent.duration.FiniteDuration
+
+import akka.actor.{PoisonPill, ActorRef, Actor}
--- End diff --

super nit: sort imports (here and elsewhere)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4879] Use the Spark driver to authorize...

2015-01-23 Thread vanzin

Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/4155#issuecomment-71252591
  
I had this (unbased) notion that tasks knew whether they were speculative 
or not, and thus the non-speculative ones would be able to avoid this extra 
hop to the driver and just commit things. But it seems that's not the case (and 
it sort of makes sense, in case the speculative task finishes first), so I 
guess this approach is fine.

One thing that worries me a bit is that I've been told before that akka 
actors' `onReceive` methods are single-threaded (meaning they'll never be 
called concurrently, even for messages coming from different remote endpoints). 
That can become a bottleneck on really large jobs. If that's really true, we 
should probably look at decoupling the processing of the message from the 
`onReceive` method so that multiple executors can be serviced concurrently.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-01-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-71255179
  
  [Test build #26031 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26031/consoleFull)
 for   PR 3916 at commit 
[`23aa2a9`](https://github.com/apache/spark/commit/23aa2a9c7a0e39987bc487c51e9ad70ecb972e8f).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5097][WIP] DataFrame as the common abst...

2015-01-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4173#issuecomment-71259112
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26033/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5097][WIP] DataFrame as the common abst...

2015-01-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4173#issuecomment-71259100
  
  [Test build #26033 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26033/consoleFull)
 for   PR 4173 at commit 
[`23b2c2d`](https://github.com/apache/spark/commit/23b2c2d1bfb6e6504e3357af5027af579020b22e).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4879] Use the Spark driver to authorize...

2015-01-23 Thread mccheah

Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/4155#discussion_r23478673
  
--- Diff: core/src/main/scala/org/apache/spark/SparkHadoopWriter.scala ---
@@ -106,18 +107,25 @@ class SparkHadoopWriter(@transient jobConf: JobConf)
 val taCtxt = getTaskContext()
 val cmtr = getOutputCommitter()
 if (cmtr.needsTaskCommit(taCtxt)) {
-  try {
-cmtr.commitTask(taCtxt)
-logInfo (taID + : Committed)
-  } catch {
-case e: IOException = {
-  logError(Error committing the output of task:  + taID.value, e)
-  cmtr.abortTask(taCtxt)
-  throw e
+  val outputCommitCoordinator = SparkEnv.get.outputCommitCoordinator
+  val conf = SparkEnv.get.conf
+  val canCommit: Boolean = outputCommitCoordinator.canCommit(jobID, 
splitID, attemptID)
+  if (canCommit) {
--- End diff --

It would force a new task to recompute everything, but this does highlight 
that task 2 should throw an error, @JoshRosen?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4879] Use the Spark driver to authorize...

2015-01-23 Thread vanzin

Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/4155#issuecomment-71262056
  
 We do actually need the processing to be single threaded, as trying to 
coordinate synchronization on the centralized arbitration logic is a bit of a 
nightmare.

I'm not so convinced; you'd only have a conflict if two tasks are 
concurrently asking to update the state of the same split ID. Otherwise, state 
updates can happen in parallel.

e.g. if you know all the split IDs up front, you can initialize the data 
structure to hold all the state; when a commit request arrives, you only lock 
that particular state object. So requests that arrive for other split IDs can 
be processed in parallel.

(If you don't know all the split IDs up front, you can use something simple 
like `ConcurrentHashMap` or `ConcurrentSkipListMap` depending on what 
performance characteristics you want.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL] SPARK-5309: Use Dictionary for Binary-S...

2015-01-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4139#issuecomment-71246755
  
  [Test build #26035 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26035/consoleFull)
 for   PR 4139 at commit 
[`f383c15`](https://github.com/apache/spark/commit/f383c15b64ad0d674c09b70dd632f9a93fce44f6).
 * This patch **does not merge cleanly**.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Minor][streaming][MQTT streaming] some trivia...

2015-01-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4178#issuecomment-71241547
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26030/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4879] Use the Spark driver to authorize...

2015-01-23 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/4155#discussion_r23473132
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/OutputCommitCoordinator.scala ---
@@ -0,0 +1,178 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler
+
+import scala.collection.mutable
+import scala.concurrent.duration.FiniteDuration
+
+import akka.actor.{PoisonPill, ActorRef, Actor}
+
+import org.apache.spark.{SparkConf, Logging}
+import org.apache.spark.util.{AkkaUtils, ActorLogReceive}
+
+private[spark] sealed trait OutputCommitCoordinationMessage extends 
Serializable
+
+private[spark] case class StageStarted(stage: Int) extends 
OutputCommitCoordinationMessage
+private[spark] case class StageEnded(stage: Int) extends 
OutputCommitCoordinationMessage
+private[spark] case object StopCoordinator extends 
OutputCommitCoordinationMessage
+
+private[spark] case class AskPermissionToCommitOutput(
+stage: Int,
+task: Long,
+taskAttempt: Long)
+extends OutputCommitCoordinationMessage
+
+private[spark] case class TaskCompleted(
+stage: Int,
+task: Long,
+attempt: Long,
+successful: Boolean)
+extends OutputCommitCoordinationMessage
+
+/**
+ * Authority that decides whether tasks can commit output to HDFS.
+ *
+ * This lives on the driver, but the actor allows the tasks that commit
+ * to Hadoop to invoke it.
+ */
+private[spark] class OutputCommitCoordinator(conf: SparkConf) extends 
Logging {
+
+  // Initialized by SparkEnv
+  var coordinatorActor: Option[ActorRef] = None
+  private val timeout = AkkaUtils.askTimeout(conf)
+  private val maxAttempts = AkkaUtils.numRetries(conf)
+  private val retryInterval = AkkaUtils.retryWaitMs(conf)
+
+  private type StageId = Int
+  private type TaskId = Long
+  private type TaskAttemptId = Long
+
+  private val authorizedCommittersByStage:
+  mutable.Map[StageId, mutable.Map[TaskId, TaskAttemptId]] = 
mutable.HashMap()
+
+  def stageStart(stage: StageId) {
+sendToActor(StageStarted(stage))
+  }
+  def stageEnd(stage: StageId) {
--- End diff --

super nit: missing an empty line between methods.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4879] Use the Spark driver to authorize...

2015-01-23 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/4155#discussion_r23478213
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -808,6 +810,7 @@ class DAGScheduler(
 // will be posted, which should always come after a corresponding 
SparkListenerStageSubmitted
 // event.
 stage.latestInfo = StageInfo.fromStage(stage, 
Some(partitionsToCompute.size))
+outputCommitCoordinator.stageStart(stage.id)
--- End diff --

I wonder if it wouldn't be better to use a `SparkListener` to reduce 
coupling. Although that would potentially introduce race conditions in the code 
(since `LiveListenerBus` fires events on a separate thread).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4879] Use the Spark driver to authorize...

2015-01-23 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/4155#discussion_r23478509
  
--- Diff: core/src/main/scala/org/apache/spark/SparkHadoopWriter.scala ---
@@ -106,18 +107,25 @@ class SparkHadoopWriter(@transient jobConf: JobConf)
 val taCtxt = getTaskContext()
 val cmtr = getOutputCommitter()
 if (cmtr.needsTaskCommit(taCtxt)) {
-  try {
-cmtr.commitTask(taCtxt)
-logInfo (taID + : Committed)
-  } catch {
-case e: IOException = {
-  logError(Error committing the output of task:  + taID.value, e)
-  cmtr.abortTask(taCtxt)
-  throw e
+  val outputCommitCoordinator = SparkEnv.get.outputCommitCoordinator
+  val conf = SparkEnv.get.conf
+  val canCommit: Boolean = outputCommitCoordinator.canCommit(jobID, 
splitID, attemptID)
+  if (canCommit) {
--- End diff --

Hmm. I wonder if this can be a problem. Given the following timeline:


Time -

(1)(2)(3)

(4)--(5)

1: task 1 start
2. task 1 asks for permission to commit, it's granted
3. task 1 fails to commit
4. task 2 starts (doing same work as task 1)
5. task 2 asks for permission to commit, it's denied

Wouldn't this code force a new task to be run to recompute everything? 
Also, wouldn't task 2 actually report itself as successful, and break things, 
since there is a successful task for that particular split, but it was never 
committed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5097][WIP] DataFrame as the common abst...

2015-01-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4173#issuecomment-71261373
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26034/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5097][WIP] DataFrame as the common abst...

2015-01-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4173#issuecomment-71261364
  
  [Test build #26034 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26034/consoleFull)
 for   PR 4173 at commit 
[`38df669`](https://github.com/apache/spark/commit/38df6699c77fcaeb505350bcc73c5614814efa5d).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4879] Use the Spark driver to authorize...

2015-01-23 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/4155#discussion_r23473536
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala ---
@@ -19,12 +19,13 @@ package org.apache.spark.scheduler
 
 import scala.collection.mutable.{ArrayBuffer, HashSet, HashMap, Map}
 import scala.language.reflectiveCalls
-import scala.util.control.NonFatal
 
 import org.scalatest.{BeforeAndAfter, FunSuiteLike}
 import org.scalatest.concurrent.Timeouts
 import org.scalatest.time.SpanSugar._
 
+import org.mockito.Mockito.mock
--- End diff --

super nit: group with `org.scalatest`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4879] Use the Spark driver to authorize...

2015-01-23 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/4155#discussion_r23474074
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/OutputCommitCoordinator.scala ---
@@ -0,0 +1,178 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler
+
+import scala.collection.mutable
+import scala.concurrent.duration.FiniteDuration
+
+import akka.actor.{PoisonPill, ActorRef, Actor}
+
+import org.apache.spark.{SparkConf, Logging}
+import org.apache.spark.util.{AkkaUtils, ActorLogReceive}
+
+private[spark] sealed trait OutputCommitCoordinationMessage extends 
Serializable
+
+private[spark] case class StageStarted(stage: Int) extends 
OutputCommitCoordinationMessage
+private[spark] case class StageEnded(stage: Int) extends 
OutputCommitCoordinationMessage
+private[spark] case object StopCoordinator extends 
OutputCommitCoordinationMessage
+
+private[spark] case class AskPermissionToCommitOutput(
+stage: Int,
+task: Long,
+taskAttempt: Long)
+extends OutputCommitCoordinationMessage
+
+private[spark] case class TaskCompleted(
+stage: Int,
+task: Long,
+attempt: Long,
+successful: Boolean)
+extends OutputCommitCoordinationMessage
+
+/**
+ * Authority that decides whether tasks can commit output to HDFS.
+ *
+ * This lives on the driver, but the actor allows the tasks that commit
+ * to Hadoop to invoke it.
+ */
+private[spark] class OutputCommitCoordinator(conf: SparkConf) extends 
Logging {
+
+  // Initialized by SparkEnv
+  var coordinatorActor: Option[ActorRef] = None
+  private val timeout = AkkaUtils.askTimeout(conf)
+  private val maxAttempts = AkkaUtils.numRetries(conf)
+  private val retryInterval = AkkaUtils.retryWaitMs(conf)
+
+  private type StageId = Int
+  private type TaskId = Long
+  private type TaskAttemptId = Long
+
+  private val authorizedCommittersByStage:
+  mutable.Map[StageId, mutable.Map[TaskId, TaskAttemptId]] = 
mutable.HashMap()
+
+  def stageStart(stage: StageId) {
+sendToActor(StageStarted(stage))
+  }
+  def stageEnd(stage: StageId) {
+sendToActor(StageEnded(stage))
+  }
+
+  def canCommit(
+  stage: StageId,
+  task: TaskId,
+  attempt: TaskAttemptId): Boolean = {
+askActor(AskPermissionToCommitOutput(stage, task, attempt))
+  }
+
+  def taskCompleted(
+  stage: StageId,
+  task: TaskId,
+  attempt: TaskAttemptId,
+  successful: Boolean) {
+sendToActor(TaskCompleted(stage, task, attempt, successful))
+  }
+
+  def stop() {
--- End diff --

Minor, but I think it's slightly weird that this class mixes methods that 
should only be called from the driver (such as `stop`) and methods that 
executors can call safely. Perhaps a check here that this is only being called 
on the driver side?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4879] Use the Spark driver to authorize...

2015-01-23 Thread mccheah

Github user mccheah commented on the pull request:

https://github.com/apache/spark/pull/4155#issuecomment-71253931
  
I'm also concerned about the performance ramifications of this. We need to 
run performance benchmarks. However, the only critical path that is affected by 
this are tasks that are explicitly saving to Hadoop file. When a task 
completes, the DAGScheduler sends a message to the OutputCommitCoordinator 
actor so the DAGScheduler is not blocked by this logic.

We do actually need the processing to be single threaded, as trying to 
coordinate synchronization on the centralized arbitration logic is a bit of a 
nightmare. I mean, we could allow multiple threads to access the internal state 
of OutputCommitCoordinator and implement appropriate synchronization logic. I 
considered an optimization where the driver broadcasts to executors when tasks 
are being speculated, and the executors of the original tasks would know to 
check the commit authorization, and skip it for tasks that don't have 
speculated copies. There's a lot of race conditions that arise from that 
though, which further underlines the need to centralize everything.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL] SPARK-5309: Use Dictionary for Binary-S...

2015-01-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4139#issuecomment-71261303
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26035/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL] SPARK-5309: Use Dictionary for Binary-S...

2015-01-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4139#issuecomment-71261296
  
  [Test build #26035 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26035/consoleFull)
 for   PR 4139 at commit 
[`f383c15`](https://github.com/apache/spark/commit/f383c15b64ad0d674c09b70dd632f9a93fce44f6).
 * This patch **fails Spark unit tests**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5063] More helpful error messages for s...

2015-01-23 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/3884#discussion_r23470546
  
--- Diff: python/pyspark/rdd.py ---
@@ -141,6 +141,17 @@ def id(self):
 def __repr__(self):
 return self._jrdd.toString()
 
+def __getnewargs__(self):
+# This method is called when attempting to pickle an RDD, which is 
always an error:
+raise Exception(
+It appears that you are attempting to broadcast an RDD or 
reference an RDD from an 
+action or transforamtion. RDD transformations and actions can 
only be invoked by the 
--- End diff --

typo: transforamtion


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5097][WIP] DataFrame as the common abst...

2015-01-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4173#issuecomment-71244959
  
  [Test build #26033 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26033/consoleFull)
 for   PR 4173 at commit 
[`23b2c2d`](https://github.com/apache/spark/commit/23b2c2d1bfb6e6504e3357af5027af579020b22e).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5097][WIP] DataFrame as the common abst...

2015-01-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4173#issuecomment-71246679
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26032/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5097][WIP] DataFrame as the common abst...

2015-01-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4173#issuecomment-71246730
  
  [Test build #26034 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26034/consoleFull)
 for   PR 4173 at commit 
[`38df669`](https://github.com/apache/spark/commit/38df6699c77fcaeb505350bcc73c5614814efa5d).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-01-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-71255189
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26031/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5351][GraphX] Do not use Partitioner.de...

2015-01-23 Thread ankurdave

Github user ankurdave commented on the pull request:

https://github.com/apache/spark/pull/4136#issuecomment-71265542
  
@JoshRosen No, it doesn't seem to trigger the Snappy error! After the 
previous attempted fix (#1763, 9b225ac3072de522b40b46aba6df1f1c231f13ef), the 
GraphX unit tests (`for i in {1..10}; do sbt/sbt 'graphx/test:test-only 
org.apache.spark.graphx.*'; done`) would fail 3 out of 10 times, but they 
always succeed now.

I think we can merge this! I'm just going to bisect to see what fixed the 
error.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5097][WIP] DataFrame as the common abst...

2015-01-23 Thread davies

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/4173#discussion_r23488297
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala ---
@@ -0,0 +1,273 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one or more
+* contributor license agreements.  See the NOTICE file distributed with
+* this work for additional information regarding copyright ownership.
+* The ASF licenses this file to You under the Apache License, Version 2.0
+* (the License); you may not use this file except in compliance with
+* the License.  You may obtain a copy of the License at
+*
+*http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an AS IS BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+package org.apache.spark.sql
+
+import scala.language.implicitConversions
+import scala.reflect.ClassTag
+
+import com.fasterxml.jackson.core.JsonFactory
+
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+import org.apache.spark.sql.catalyst.ScalaReflection
+import org.apache.spark.sql.catalyst.analysis.UnresolvedRelation
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.expressions.{Literal = LiteralExpr}
+import org.apache.spark.sql.catalyst.plans.{JoinType, Inner}
+import org.apache.spark.sql.catalyst.plans.logical._
+import org.apache.spark.sql.execution.LogicalRDD
+import org.apache.spark.sql.json.JsonRDD
+import org.apache.spark.sql.types.{NumericType, StructType}
+
+
+class DataFrame(
+val sqlContext: SQLContext,
+val baseLogicalPlan: LogicalPlan,
+operatorsEnabled: Boolean)
+  extends DataFrameSpecificApi with RDDApi[Row] {
+
+  def this(sqlContext: Option[SQLContext], plan: Option[LogicalPlan]) =
+this(sqlContext.orNull, plan.orNull, sqlContext.isDefined  
plan.isDefined)
+
+  def this(sqlContext: SQLContext, plan: LogicalPlan) = this(sqlContext, 
plan, true)
+
+  @transient
+  protected[sql] lazy val queryExecution = 
sqlContext.executePlan(baseLogicalPlan)
+
+  @transient protected[sql] val logicalPlan: LogicalPlan = baseLogicalPlan 
match {
+// For various commands (like DDL) and queries with side effects, we 
force query optimization to
+// happen right away to let these side effects take place eagerly.
+case _: Command | _: InsertIntoTable | _: CreateTableAsSelect[_] |_: 
WriteToFile =
+  LogicalRDD(queryExecution.analyzed.output, 
queryExecution.toRdd)(sqlContext)
+case _ =
+  baseLogicalPlan
+  }
+
+  private[this] implicit def toDataFrame(logicalPlan: LogicalPlan): 
DataFrame = {
+new DataFrame(sqlContext, logicalPlan, true)
+  }
+
+  protected[sql] def numericColumns: Seq[Expression] = {
+schema.fields.filter(_.dataType.isInstanceOf[NumericType]).map { n =
+  logicalPlan.resolve(n.name, sqlContext.analyzer.resolver).get
+}
+  }
+
+  protected[sql] def resolve(colName: String): NamedExpression = {
+logicalPlan.resolve(colName, sqlContext.analyzer.resolver).getOrElse(
+  throw new RuntimeException(sCannot resolve column name 
$colName))
+  }
+
+  def toSchemaRDD: DataFrame = this
+
+  override def schema: StructType = queryExecution.analyzed.schema
+
+  override def dtypes: Array[(String, String)] = schema.fields.map { field 
=
+(field.name, field.dataType.toString)
+  }
+
+  override def columns: Array[String] = schema.fields.map(_.name)
+
+  override def printSchema(): Unit = println(schema.treeString)
+
+  override def show(): Unit = {
+???
+  }
+
+  override def join(right: DataFrame): DataFrame = {
+Join(logicalPlan, right.logicalPlan, joinType = Inner, None)
+  }
+
+  override def join(right: DataFrame, joinExprs: Column): DataFrame = {
+Join(logicalPlan, right.logicalPlan, Inner, Some(joinExprs.expr))
+  }
+
+  override def join(right: DataFrame, joinType: String, joinExprs: 
Column): DataFrame = {
+Join(logicalPlan, right.logicalPlan, JoinType(joinType), 
Some(joinExprs.expr))
+  }
+
+  override def sort(colName: String): DataFrame = {
+Sort(Seq(SortOrder(apply(colName).expr, Ascending)), global = true, 
logicalPlan)
+  }
+
+  @scala.annotation.varargs
+  override def sort(sortExpr: Column, sortExprs: Column*): DataFrame = {
+

[GitHub] spark pull request: [SPARK-5384][mllib] Vectors.sqdist return inco...

2015-01-23 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/4183#issuecomment-71276246
  
I agree that vectors must have the same length and we should check it. It 
may not be necessary to change the implementation. I saw couple performance 
issues in your code. For example, unnecessary index lookups. I would suggest 
only adding the check in this PR. If you want to update the implementation, 
let's do it in another PR with micro-benchmark.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5207] [MLLIB] StandardScalerModel mean ...

2015-01-23 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/4140#discussion_r23486231
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/feature/StandardScaler.scala ---
@@ -61,20 +61,30 @@ class StandardScaler(withMean: Boolean, withStd: 
Boolean) extends Logging {
  * :: Experimental ::
  * Represents a StandardScaler model that can transform vectors.
  *
- * @param withMean whether to center the data before scaling
- * @param withStd whether to scale the data to have unit standard deviation
  * @param mean column mean values
  * @param variance column variance values
+ * @param withMean whether to center the data before scaling
+ * @param withStd whether to scale the data to have unit standard deviation
  */
 @Experimental
-class StandardScalerModel private[mllib] (
-val withMean: Boolean,
-val withStd: Boolean,
+class StandardScalerModel (
 val mean: Vector,
-val variance: Vector) extends VectorTransformer {
+val variance: Vector,
+private var withMean: Boolean = false,
+private var withStd: Boolean = true) extends VectorTransformer {
 
--- End diff --

Also, users will want to know if `withMean` or `withStd` is used, do we 
really need to have them as private variables?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5207] [MLLIB] StandardScalerModel mean ...

2015-01-23 Thread dbtsai

Github user dbtsai commented on the pull request:

https://github.com/apache/spark/pull/4140#issuecomment-71281849
  
For the unit-test part, is it possible not to change too much? Also, it 
will be easier to debug if the assertion is in the test instead of abstract 
out. For example, having `validateConstant` function is not necessary, probably 
more easy to read to have all the assert code in the test. 

Having the data as global variables is okay for me. 

Thanks.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5097][WIP] DataFrame as the common abst...

2015-01-23 Thread davies

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/4173#discussion_r23488167
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala ---
@@ -0,0 +1,273 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one or more
+* contributor license agreements.  See the NOTICE file distributed with
+* this work for additional information regarding copyright ownership.
+* The ASF licenses this file to You under the Apache License, Version 2.0
+* (the License); you may not use this file except in compliance with
+* the License.  You may obtain a copy of the License at
+*
+*http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an AS IS BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+package org.apache.spark.sql
+
+import scala.language.implicitConversions
+import scala.reflect.ClassTag
+
+import com.fasterxml.jackson.core.JsonFactory
+
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+import org.apache.spark.sql.catalyst.ScalaReflection
+import org.apache.spark.sql.catalyst.analysis.UnresolvedRelation
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.expressions.{Literal = LiteralExpr}
+import org.apache.spark.sql.catalyst.plans.{JoinType, Inner}
+import org.apache.spark.sql.catalyst.plans.logical._
+import org.apache.spark.sql.execution.LogicalRDD
+import org.apache.spark.sql.json.JsonRDD
+import org.apache.spark.sql.types.{NumericType, StructType}
+
+
+class DataFrame(
+val sqlContext: SQLContext,
+val baseLogicalPlan: LogicalPlan,
+operatorsEnabled: Boolean)
+  extends DataFrameSpecificApi with RDDApi[Row] {
+
+  def this(sqlContext: Option[SQLContext], plan: Option[LogicalPlan]) =
+this(sqlContext.orNull, plan.orNull, sqlContext.isDefined  
plan.isDefined)
+
+  def this(sqlContext: SQLContext, plan: LogicalPlan) = this(sqlContext, 
plan, true)
+
+  @transient
+  protected[sql] lazy val queryExecution = 
sqlContext.executePlan(baseLogicalPlan)
+
+  @transient protected[sql] val logicalPlan: LogicalPlan = baseLogicalPlan 
match {
+// For various commands (like DDL) and queries with side effects, we 
force query optimization to
+// happen right away to let these side effects take place eagerly.
+case _: Command | _: InsertIntoTable | _: CreateTableAsSelect[_] |_: 
WriteToFile =
+  LogicalRDD(queryExecution.analyzed.output, 
queryExecution.toRdd)(sqlContext)
+case _ =
+  baseLogicalPlan
+  }
+
+  private[this] implicit def toDataFrame(logicalPlan: LogicalPlan): 
DataFrame = {
+new DataFrame(sqlContext, logicalPlan, true)
+  }
+
+  protected[sql] def numericColumns: Seq[Expression] = {
+schema.fields.filter(_.dataType.isInstanceOf[NumericType]).map { n =
+  logicalPlan.resolve(n.name, sqlContext.analyzer.resolver).get
+}
+  }
+
+  protected[sql] def resolve(colName: String): NamedExpression = {
+logicalPlan.resolve(colName, sqlContext.analyzer.resolver).getOrElse(
+  throw new RuntimeException(sCannot resolve column name 
$colName))
+  }
+
+  def toSchemaRDD: DataFrame = this
+
+  override def schema: StructType = queryExecution.analyzed.schema
+
+  override def dtypes: Array[(String, String)] = schema.fields.map { field 
=
+(field.name, field.dataType.toString)
+  }
+
+  override def columns: Array[String] = schema.fields.map(_.name)
+
+  override def printSchema(): Unit = println(schema.treeString)
+
+  override def show(): Unit = {
+???
+  }
+
+  override def join(right: DataFrame): DataFrame = {
+Join(logicalPlan, right.logicalPlan, joinType = Inner, None)
+  }
+
+  override def join(right: DataFrame, joinExprs: Column): DataFrame = {
+Join(logicalPlan, right.logicalPlan, Inner, Some(joinExprs.expr))
+  }
+
+  override def join(right: DataFrame, joinType: String, joinExprs: 
Column): DataFrame = {
+Join(logicalPlan, right.logicalPlan, JoinType(joinType), 
Some(joinExprs.expr))
+  }
+
+  override def sort(colName: String): DataFrame = {
--- End diff --

support sort by multiple columns


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-5097][WIP] DataFrame as the common abst...

2015-01-23 Thread davies

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/4173#discussion_r23488501
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala ---
@@ -0,0 +1,273 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one or more
+* contributor license agreements.  See the NOTICE file distributed with
+* this work for additional information regarding copyright ownership.
+* The ASF licenses this file to You under the Apache License, Version 2.0
+* (the License); you may not use this file except in compliance with
+* the License.  You may obtain a copy of the License at
+*
+*http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an AS IS BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+package org.apache.spark.sql
+
+import scala.language.implicitConversions
+import scala.reflect.ClassTag
+
+import com.fasterxml.jackson.core.JsonFactory
+
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+import org.apache.spark.sql.catalyst.ScalaReflection
+import org.apache.spark.sql.catalyst.analysis.UnresolvedRelation
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.expressions.{Literal = LiteralExpr}
+import org.apache.spark.sql.catalyst.plans.{JoinType, Inner}
+import org.apache.spark.sql.catalyst.plans.logical._
+import org.apache.spark.sql.execution.LogicalRDD
+import org.apache.spark.sql.json.JsonRDD
+import org.apache.spark.sql.types.{NumericType, StructType}
+
+
+class DataFrame(
+val sqlContext: SQLContext,
+val baseLogicalPlan: LogicalPlan,
+operatorsEnabled: Boolean)
+  extends DataFrameSpecificApi with RDDApi[Row] {
+
+  def this(sqlContext: Option[SQLContext], plan: Option[LogicalPlan]) =
+this(sqlContext.orNull, plan.orNull, sqlContext.isDefined  
plan.isDefined)
+
+  def this(sqlContext: SQLContext, plan: LogicalPlan) = this(sqlContext, 
plan, true)
+
+  @transient
+  protected[sql] lazy val queryExecution = 
sqlContext.executePlan(baseLogicalPlan)
+
+  @transient protected[sql] val logicalPlan: LogicalPlan = baseLogicalPlan 
match {
+// For various commands (like DDL) and queries with side effects, we 
force query optimization to
+// happen right away to let these side effects take place eagerly.
+case _: Command | _: InsertIntoTable | _: CreateTableAsSelect[_] |_: 
WriteToFile =
+  LogicalRDD(queryExecution.analyzed.output, 
queryExecution.toRdd)(sqlContext)
+case _ =
+  baseLogicalPlan
+  }
+
+  private[this] implicit def toDataFrame(logicalPlan: LogicalPlan): 
DataFrame = {
+new DataFrame(sqlContext, logicalPlan, true)
+  }
+
+  protected[sql] def numericColumns: Seq[Expression] = {
+schema.fields.filter(_.dataType.isInstanceOf[NumericType]).map { n =
+  logicalPlan.resolve(n.name, sqlContext.analyzer.resolver).get
+}
+  }
+
+  protected[sql] def resolve(colName: String): NamedExpression = {
+logicalPlan.resolve(colName, sqlContext.analyzer.resolver).getOrElse(
+  throw new RuntimeException(sCannot resolve column name 
$colName))
+  }
+
+  def toSchemaRDD: DataFrame = this
+
+  override def schema: StructType = queryExecution.analyzed.schema
+
+  override def dtypes: Array[(String, String)] = schema.fields.map { field 
=
+(field.name, field.dataType.toString)
+  }
+
+  override def columns: Array[String] = schema.fields.map(_.name)
+
+  override def printSchema(): Unit = println(schema.treeString)
+
+  override def show(): Unit = {
+???
+  }
+
+  override def join(right: DataFrame): DataFrame = {
+Join(logicalPlan, right.logicalPlan, joinType = Inner, None)
+  }
+
+  override def join(right: DataFrame, joinExprs: Column): DataFrame = {
+Join(logicalPlan, right.logicalPlan, Inner, Some(joinExprs.expr))
+  }
+
+  override def join(right: DataFrame, joinType: String, joinExprs: 
Column): DataFrame = {
+Join(logicalPlan, right.logicalPlan, JoinType(joinType), 
Some(joinExprs.expr))
+  }
+
+  override def sort(colName: String): DataFrame = {
+Sort(Seq(SortOrder(apply(colName).expr, Ascending)), global = true, 
logicalPlan)
+  }
+
+  @scala.annotation.varargs
+  override def sort(sortExpr: Column, sortExprs: Column*): DataFrame = {
+

[GitHub] spark pull request: [SPARK-5291][CORE] Add timestamp and reason wh...

2015-01-23 Thread vanzin

Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/4082#issuecomment-71272446
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an Arti...

2015-01-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-71284930
  
  [Test build #26036 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26036/consoleFull)
 for   PR 1290 at commit 
[`d18e9b5`](https://github.com/apache/spark/commit/d18e9b5460019970d5bcbb5a0e816aff5a05bf39).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5097][WIP] DataFrame as the common abst...

2015-01-23 Thread davies

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/4173#discussion_r23488078
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala ---
@@ -0,0 +1,273 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one or more
+* contributor license agreements.  See the NOTICE file distributed with
+* this work for additional information regarding copyright ownership.
+* The ASF licenses this file to You under the Apache License, Version 2.0
+* (the License); you may not use this file except in compliance with
+* the License.  You may obtain a copy of the License at
+*
+*http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an AS IS BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+package org.apache.spark.sql
+
+import scala.language.implicitConversions
+import scala.reflect.ClassTag
+
+import com.fasterxml.jackson.core.JsonFactory
+
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+import org.apache.spark.sql.catalyst.ScalaReflection
+import org.apache.spark.sql.catalyst.analysis.UnresolvedRelation
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.expressions.{Literal = LiteralExpr}
+import org.apache.spark.sql.catalyst.plans.{JoinType, Inner}
+import org.apache.spark.sql.catalyst.plans.logical._
+import org.apache.spark.sql.execution.LogicalRDD
+import org.apache.spark.sql.json.JsonRDD
+import org.apache.spark.sql.types.{NumericType, StructType}
+
+
+class DataFrame(
+val sqlContext: SQLContext,
+val baseLogicalPlan: LogicalPlan,
+operatorsEnabled: Boolean)
+  extends DataFrameSpecificApi with RDDApi[Row] {
+
+  def this(sqlContext: Option[SQLContext], plan: Option[LogicalPlan]) =
+this(sqlContext.orNull, plan.orNull, sqlContext.isDefined  
plan.isDefined)
+
+  def this(sqlContext: SQLContext, plan: LogicalPlan) = this(sqlContext, 
plan, true)
+
+  @transient
+  protected[sql] lazy val queryExecution = 
sqlContext.executePlan(baseLogicalPlan)
+
+  @transient protected[sql] val logicalPlan: LogicalPlan = baseLogicalPlan 
match {
+// For various commands (like DDL) and queries with side effects, we 
force query optimization to
+// happen right away to let these side effects take place eagerly.
+case _: Command | _: InsertIntoTable | _: CreateTableAsSelect[_] |_: 
WriteToFile =
+  LogicalRDD(queryExecution.analyzed.output, 
queryExecution.toRdd)(sqlContext)
+case _ =
+  baseLogicalPlan
+  }
+
+  private[this] implicit def toDataFrame(logicalPlan: LogicalPlan): 
DataFrame = {
+new DataFrame(sqlContext, logicalPlan, true)
+  }
+
+  protected[sql] def numericColumns: Seq[Expression] = {
+schema.fields.filter(_.dataType.isInstanceOf[NumericType]).map { n =
+  logicalPlan.resolve(n.name, sqlContext.analyzer.resolver).get
+}
+  }
+
+  protected[sql] def resolve(colName: String): NamedExpression = {
+logicalPlan.resolve(colName, sqlContext.analyzer.resolver).getOrElse(
+  throw new RuntimeException(sCannot resolve column name 
$colName))
+  }
+
+  def toSchemaRDD: DataFrame = this
+
+  override def schema: StructType = queryExecution.analyzed.schema
+
+  override def dtypes: Array[(String, String)] = schema.fields.map { field 
=
+(field.name, field.dataType.toString)
+  }
+
+  override def columns: Array[String] = schema.fields.map(_.name)
+
+  override def printSchema(): Unit = println(schema.treeString)
+
+  override def show(): Unit = {
+???
+  }
+
+  override def join(right: DataFrame): DataFrame = {
+Join(logicalPlan, right.logicalPlan, joinType = Inner, None)
+  }
+
+  override def join(right: DataFrame, joinExprs: Column): DataFrame = {
+Join(logicalPlan, right.logicalPlan, Inner, Some(joinExprs.expr))
+  }
+
+  override def join(right: DataFrame, joinType: String, joinExprs: 
Column): DataFrame = {
--- End diff --

It's easier to do in Python/R if putting joinType at the end 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a

[GitHub] spark pull request: [SPARK-5207] [MLLIB] StandardScalerModel mean ...

2015-01-23 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/4140#discussion_r23485163
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/feature/StandardScaler.scala ---
@@ -61,20 +61,30 @@ class StandardScaler(withMean: Boolean, withStd: 
Boolean) extends Logging {
  * :: Experimental ::
  * Represents a StandardScaler model that can transform vectors.
  *
- * @param withMean whether to center the data before scaling
- * @param withStd whether to scale the data to have unit standard deviation
  * @param mean column mean values
  * @param variance column variance values
+ * @param withMean whether to center the data before scaling
+ * @param withStd whether to scale the data to have unit standard deviation
  */
 @Experimental
-class StandardScalerModel private[mllib] (
-val withMean: Boolean,
-val withStd: Boolean,
+class StandardScalerModel (
 val mean: Vector,
-val variance: Vector) extends VectorTransformer {
+val variance: Vector,
+private var withMean: Boolean = false,
+private var withStd: Boolean = true) extends VectorTransformer {
 
--- End diff --

The default argument is not friendly for Java though; why don't we add 
another constructor which takes only mean and variance?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5291][CORE] Add timestamp and reason wh...

2015-01-23 Thread ksakellis

Github user ksakellis commented on the pull request:

https://github.com/apache/spark/pull/4082#issuecomment-71282577
  
LGTM - nice addition. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-984 [BUILD] SPARK_TOOLS_JAR not set if m...

2015-01-23 Thread srowen

GitHub user srowen opened a pull request:

https://github.com/apache/spark/pull/4181

SPARK-984 [BUILD] SPARK_TOOLS_JAR not set if multiple tools jars exists

Given the discussion in https://issues.apache.org/jira/browse/SPARK-984, 
this seems to be the outcome, but I'm not 100% sure if this is still the 
desired resolution. Simpler than modifying the scripts to deal with multiple 
tools assemblies if in fact these tools are not run specially this way by 
`spark-class`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/srowen/spark SPARK-984

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4181.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4181


commit 83590eae39a7cb3ef13d3060e0f001564c7aed73
Author: Sean Owen so...@cloudera.com
Date:   2015-01-23T12:27:35Z

Remove SPARK_TOOLS_JAR usages




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLIB][SPARK-3278] Monotone (Isotonic) regres...

2015-01-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3519#issuecomment-71189621
  
  [Test build #26025 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26025/consoleFull)
 for   PR 3519 at commit 
[`12151e6`](https://github.com/apache/spark/commit/12151e6b40e70c5d0a8dde8a6e4d600709eb0f12).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-984 [BUILD] SPARK_TOOLS_JAR not set if m...

2015-01-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4181#issuecomment-71194064
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26024/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-984 [BUILD] SPARK_TOOLS_JAR not set if m...

2015-01-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4181#issuecomment-71194057
  
  [Test build #26024 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26024/consoleFull)
 for   PR 4181 at commit 
[`83590ea`](https://github.com/apache/spark/commit/83590eae39a7cb3ef13d3060e0f001564c7aed73).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-984 [BUILD] SPARK_TOOLS_JAR not set if m...

2015-01-23 Thread srowen

Github user srowen closed the pull request at:

https://github.com/apache/spark/pull/4181


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-984 [BUILD] SPARK_TOOLS_JAR not set if m...

2015-01-23 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/4181#issuecomment-71194299
  
Ah. This makes Mima stop working. OK, this isn't an option!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5097][WIP] DataFrame as the common abst...

2015-01-23 Thread rxin

GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/4173

[SPARK-5097][WIP] DataFrame as the common abstraction for structured data

This is early work in progress. I am submitting the PR mainly wanted to get 
Jenkins to run through the tests so I don't have to do that on my machine.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark df1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4173.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4173


commit 08d82010d974b70ab44715a879785488356b408f
Author: Reynold Xin r...@databricks.com
Date:   2015-01-22T07:47:19Z

Checkpoint: SQL module compiles!

commit 3ccf3217d482f9c38d8122d185bb0a041e772d0e
Author: Reynold Xin r...@databricks.com
Date:   2015-01-22T08:04:32Z

SQLContext minor patch.

commit 83e872140e75c1b353479f0c7a6ff3501f609646
Author: Reynold Xin r...@databricks.com
Date:   2015-01-22T08:17:22Z

Fixed test cases in SQL except ParquetIOSuite.

commit 9e4a7d063e0cdf9ef83793eeb4808f290130b435
Author: Reynold Xin r...@databricks.com
Date:   2015-01-22T08:19:29Z

Fixed compilation error.

commit fc5acc50f3227ae90f86d6684945b200c96efced
Author: Reynold Xin r...@databricks.com
Date:   2015-01-22T08:44:59Z

Hive module.

commit feb43ef0e98d72a1372e4f3d5b1a6c811a8a13bb
Author: Reynold Xin r...@databricks.com
Date:   2015-01-23T08:02:09Z

Made MLlib and examples compile




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5351][GraphX] Do not use Partitioner.de...

2015-01-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4136#issuecomment-71160721
  
  [Test build #26004 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26004/consoleFull)
 for   PR 4136 at commit 
[`0a2f32b`](https://github.com/apache/spark/commit/0a2f32b0283b4fe319a23f7f4541d1531ddcbab2).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5214][Test] Add a test to demonstrate E...

2015-01-23 Thread zsxwing

GitHub user zsxwing opened a pull request:

https://github.com/apache/spark/pull/4174

[SPARK-5214][Test] Add a test to demonstrate EventLoop can be stopped in 
the event loop thread



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zsxwing/spark SPARK-5214-unittest

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4174.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4174


commit f0b18f940ce4b49711b5d74c4bca4a8391241bb7
Author: zsxwing zsxw...@gmail.com
Date:   2015-01-23T08:17:23Z

Add a test to demonstrate EventLoop can be stopped in the event loop thread




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4233] [SQL] WIP:Simplify the UDAF API (...

2015-01-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3247#issuecomment-71162789
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26013/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4233] [SQL] WIP:Simplify the UDAF API (...

2015-01-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3247#issuecomment-71162787
  
  [Test build #26013 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26013/consoleFull)
 for   PR 3247 at commit 
[`feb00c8`](https://github.com/apache/spark/commit/feb00c891d3ddebc056345831d2e8a30e46d6ed4).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class UnresolvedFunction(`
  * `trait AggregateFunction `
  * `trait AggregateExpression extends Expression with AggregateFunction `
  * `abstract class UnaryAggregateExpression extends UnaryExpression with 
AggregateExpression `
  * `case class Min(`
  * `case class Average(child: Expression, distinct: Boolean = false)`
  * `case class Max(child: Expression, distinct: Boolean = false)`
  * `case class Count(child: Expression)`
  * `case class CountDistinct(children: Seq[Expression])`
  * `case class Sum(child: Expression, distinct: Boolean = false)`
  * `case class First(child: Expression, distinct: Boolean = false)`
  * `case class Last(child: Expression, distinct: Boolean = false)`
  * `sealed case class AggregateFunctionBind(`
  * `sealed class InputBufferSeens(`
  * `sealed trait Aggregate `
  * `sealed trait PreShuffle extends Aggregate `
  * `sealed trait PostShuffle extends Aggregate `
  * `case class AggregatePreShuffle(`
  * `case class AggregatePostShuffle(`
  * `case class DistinctAggregate(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4233] [SQL] WIP:Simplify the UDAF API (...

2015-01-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3247#issuecomment-71162714
  
  [Test build #26013 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26013/consoleFull)
 for   PR 3247 at commit 
[`feb00c8`](https://github.com/apache/spark/commit/feb00c891d3ddebc056345831d2e8a30e46d6ed4).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5376][Mesos] MesosExecutor should have ...

2015-01-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4170#issuecomment-71166652
  
  [Test build #26015 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26015/consoleFull)
 for   PR 4170 at commit 
[`d714e8b`](https://github.com/apache/spark/commit/d714e8bb6b699e5ec2a315df65cee0f4cf7765e5).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3650][GraphX] There will be an ArrayInd...

2015-01-23 Thread Leolh

GitHub user Leolh opened a pull request:

https://github.com/apache/spark/pull/4176

[SPARK-3650][GraphX] There will be an ArrayIndexOutOfBoundsException if ...

...the format of the source file is wrong

There will be an ArrayIndexOutOfBoundsException if the format of the source 
file is wrong

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Leolh/spark patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4176.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4176


commit 23767f1239341146df49dc4d4c4956d7a3b48e0f
Author: Leolh leosand...@gmail.com
Date:   2015-01-23T09:27:02Z

[SPARK-3650][GraphX] There will be an ArrayIndexOutOfBoundsException if the 
format of the source file is wrong

There will be an ArrayIndexOutOfBoundsException if the format of the source 
file is wrong




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3298][SQL] Add flag control overwrite r...

2015-01-23 Thread OopsOutOfMemory

Github user OopsOutOfMemory commented on the pull request:

https://github.com/apache/spark/pull/4175#issuecomment-71167869
  
/cc @scwf @chenghao-intel 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5382: Use SPARK_CONF_DIR in spark-class ...

2015-01-23 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/4179#issuecomment-71294854
  
@andrewor14 since you reviewed the other PR for `SPARK_CONF_DIR`, can you 
take a quick look at this and #4177 to see if we want to pull it in for 1.2.1?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5351][GraphX] Do not use Partitioner.de...

2015-01-23 Thread ankurdave

Github user ankurdave commented on the pull request:

https://github.com/apache/spark/pull/4136#issuecomment-71297997
  
Oh, thanks! Looks like that was the problem all along; stopping the 
SparkContext fixes the problem. I'm going to merge this with the amended test 
now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5351][GraphX] Do not use Partitioner.de...

2015-01-23 Thread ankurdave

Github user ankurdave commented on the pull request:

https://github.com/apache/spark/pull/4136#issuecomment-71286155
  
@JoshRosen Actually, it seems the test failures still occur, but only when 
I add a [unit 
test](https://github.com/apache/spark/commit/9b225ac3072de522b40b46aba6df1f1c231f13ef#diff-3ade47bc293ef06e43c25f1ac1f6783bR354)
 that sets spark.default.parallelism.

Adding the test causes subsequent tests within the same run to fail with 
exceptions like
```
java.io.IOException: org.apache.spark.SparkException: Failed to get 
broadcast_0_piece0 of broadcast_0
```
and
```
java.io.IOException: PARSING_ERROR(2)
```
The exception traces always occur in TorrentBroadcast.

It seems like setting spark.default.parallelism is causing some kind of 
side effect that corrupts broadcasts in later unit tests, which is strange 
since (1) each unit test should have its own SparkContext and therefore its own 
temp directory, and (2) I'm only passing spark.default.parallelism to 
SparkConf/SparkContext, not setting it as a system property.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Bug fix for SPARK-5242: ec2/spark_ec2.py lauc...

2015-01-23 Thread shivaram

Github user shivaram commented on the pull request:

https://github.com/apache/spark/pull/4038#issuecomment-71286989
  
@voukka @nchammas - This high level goal looks fine to me. However I the 
function get_hostname is being called on all instances (its inside a loop) in 
many cases. I wonder if we can do something more lightweight by exploiting the 
fact that you typically want to use the same kind of resolution for all 
machines.  What this will mean is that for the very first machine we will try 
all four options and then just save which field was used -- Then the function 
just picks the appropriate field going forward.

Will this solve your use case ? Or are there use cases where we need to do 
this for every instance ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4983]exception handling about adding ta...

2015-01-23 Thread shivaram

Github user shivaram commented on the pull request:

https://github.com/apache/spark/pull/3986#issuecomment-71289144
  
@nchammas @GenTang - The `logging.basicConfig` seems to have been around 
since the very beginning [1]. I don't know much about Python so I can't 
recommend keeping it or removing it. @JoshRosen can comment on that.

Other than that this solution looks fine to me. It is unfortunate that we 
have so many custom sleep calls across the file, but I don't think there is 
much else we can do given the EC2 API we have right now.
 

[1] 
https://github.com/mesos/spark/blob/08c50ad1fcf323f62c80dfeb8f1caaf164211e0b/ec2/spark_ec2.py#L538
 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an Arti...

2015-01-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-71290964
  
  [Test build #26036 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26036/consoleFull)
 for   PR 1290 at commit 
[`d18e9b5`](https://github.com/apache/spark/commit/d18e9b5460019970d5bcbb5a0e816aff5a05bf39).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class OutputCanvas2D(wd: Int, ht: Int) extends Canvas `
  * `class OutputFrame2D( title: String ) extends Frame( title ) `
  * `class OutputCanvas3D(wd: Int, ht: Int, shadowFrac: Double) extends 
Canvas `
  * `class OutputFrame3D(title: String, shadowFrac: Double) extends 
Frame(title) `
  * `trait ANNClassifierHelper `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an Arti...

2015-01-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-71290975
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26036/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5063] More helpful error messages for s...

2015-01-23 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3884#issuecomment-71292584
  
@vanzin Thanks for looking this over.  The Python `RDD` objects themselves 
are never actually serialized and are used internally in a way that's slightly 
different than in Scala/Java Spark.  In the existing code, any attempt to 
serialize instances of those Python classes throws an exception in the 
`__getnewargs__` method, which is why I was able to add new exceptions there.

I'm going to fix the spelling error, take one final look over this, and 
commit it so we can get it into the first 1.2.1 RC.  I saw a couple of mailing 
list questions yesterday that could have been prevented by this patch, which 
illustrates why I really want to get this into our next maintenance release.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5063] More helpful error messages for s...

2015-01-23 Thread nchammas

Github user nchammas commented on the pull request:

https://github.com/apache/spark/pull/3884#issuecomment-71298244
  
Thank you @JoshRosen for working on usability issues like this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2015-01-23 Thread ilganeli

Github user ilganeli closed the pull request at:

https://github.com/apache/spark/pull/3518


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2015-01-23 Thread ilganeli

Github user ilganeli commented on the pull request:

https://github.com/apache/spark/pull/3518#issuecomment-71299062
  
Hey @pwendell  - not a problem. The solutions are similar but Reynold's has 
fewer moving parts. I appreciate the recognition.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5351][GraphX] Do not use Partitioner.de...

2015-01-23 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/4136#issuecomment-71293352
  
@ankurdave The exception from the new unit test sounds suspiciously similar 
to https://issues.apache.org/jira/browse/SPARK-4133.  Your new test creates a 
new `sc` local variable then never stops it, so if that test runs first then 
its leaked context will keep running and will interfere with contexts created 
in the other tests.  

Because some SparkSQL tests could not pass without it, our unit tests set 
`spark.driver.allowMultipleContexts=false` to disable the check, so this might 
be hard to notice.  If you have `unit-tests.log`, though, I'd take a look to 
see whether there are any warning messages about multiple contexts.

I'd check to see if those failures still persist after properly cleaning up 
the SparkContext created in your new test.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5063] More helpful error messages for s...

2015-01-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3884#issuecomment-71294206
  
  [Test build #26037 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26037/consoleFull)
 for   PR 3884 at commit 
[`a38774b`](https://github.com/apache/spark/commit/a38774b8892a85184520078a2187e9ce2a190038).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5063] More helpful error messages for s...

2015-01-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3884#issuecomment-71297060
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26037/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5063] More helpful error messages for s...

2015-01-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3884#issuecomment-71297055
  
  [Test build #26037 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26037/consoleFull)
 for   PR 3884 at commit 
[`a38774b`](https://github.com/apache/spark/commit/a38774b8892a85184520078a2187e9ce2a190038).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5351][GraphX] Do not use Partitioner.de...

2015-01-23 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/4136


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5063] More helpful error messages for s...

2015-01-23 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3884


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5063] More helpful error messages for s...

2015-01-23 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3884#issuecomment-71295041
  
Alright, I've merged this into `master` (1.3.0) and `branch-1.2` (1.2.1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5207] [MLLIB] StandardScalerModel mean ...

2015-01-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4140#issuecomment-71297633
  
  [Test build #26038 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26038/consoleFull)
 for   PR 4140 at commit 
[`997d2e0`](https://github.com/apache/spark/commit/997d2e0a3bbfd1be6c0a556393bbcfbd18404f77).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5207] [MLLIB] StandardScalerModel mean ...

2015-01-23 Thread ogeagla

Github user ogeagla commented on the pull request:

https://github.com/apache/spark/pull/4140#issuecomment-71297662
  
@dbtsai that makes sense.  I've changed this back in latest commit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5351][GraphX] Do not use Partitioner.de...

2015-01-23 Thread ankurdave

Github user ankurdave commented on the pull request:

https://github.com/apache/spark/pull/4136#issuecomment-71298562
  
Merged into master  branch-1.2.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5097][WIP] DataFrame as the common abst...

2015-01-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4173#issuecomment-71159715
  
  [Test build #26008 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26008/consoleFull)
 for   PR 4173 at commit 
[`feb43ef`](https://github.com/apache/spark/commit/feb43ef0e98d72a1372e4f3d5b1a6c811a8a13bb).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5374][CORE] abstract RDD's DAG graph it...

2015-01-23 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/4134#issuecomment-71159737
  
Thanks for doing it. I took a quick look at this. While it does reduce the 
LOC, I feel the change is not necessary and actually makes the code harder to 
understand with the closures. Do we really want something like this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5214][Test] Add a test to demonstrate E...

2015-01-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4174#issuecomment-71161124
  
  [Test build #26009 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26009/consoleFull)
 for   PR 4174 at commit 
[`7aaa2d7`](https://github.com/apache/spark/commit/7aaa2d73d559ef6f0b2a18f14800727994e39a4e).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5097][WIP] DataFrame as the common abst...

2015-01-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4173#issuecomment-71161215
  
  [Test build #26010 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26010/consoleFull)
 for   PR 4173 at commit 
[`1532e1e`](https://github.com/apache/spark/commit/1532e1e97209b200a03e9a093de289228e77a288).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5097][WIP] DataFrame as the common abst...

2015-01-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4173#issuecomment-71161217
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26010/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5097][WIP] DataFrame as the common abst...

2015-01-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4173#issuecomment-71162544
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26011/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5259][CORE]Make sure mapStage.pendingta...

2015-01-23 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/4055#discussion_r23437961
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala ---
@@ -106,7 +106,22 @@ private[spark] abstract class Task[T](val stageId: 
Int, var partitionId: Int) ex
 if (interruptThread  taskThread != null) {
   taskThread.interrupt()
 }
-  }  
+  }
+
+  override def hashCode(): Int = {
+val state = Seq(stageId, partitionId)
+state.map(_.hashCode()).foldLeft(0)((a, b) = 31 * a + b)
--- End diff --

Maybe a better way is `(stageId + partitionId) * (stageId + partitionId + 
1) / 2 +  partitionId`.
See http://en.wikipedia.org/wiki/Pairing_function#Cantor_pairing_function


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3298][SQL] Add flag control overwrite r...

2015-01-23 Thread OopsOutOfMemory

GitHub user OopsOutOfMemory opened a pull request:

https://github.com/apache/spark/pull/4175

[SPARK-3298][SQL] Add flag control overwrite registerAsTable / 
registerTempTable

https://issues.apache.org/jira/browse/SPARK-3298

add a flag `allowOverwrite`  to control registerTempTable.

By default it is `true` means register table will overwrite the previous 
table. 
(like var tempTable)

If set it to `false`, means the registerTempTable command will check the 
table name exists or not and if exists, throw a  table already exists 
exception. Then you should drop it first and then register it again.
(like final tempTable)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/OopsOutOfMemory/spark register

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4175.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4175


commit 49613a2f9dbd53c189cc54991f778bc55c1ec918
Author: OopsOutOfMemory victorshen...@126.com
Date:   2015-01-23T08:09:38Z

initial

commit 6fb569451dd0b880f9865a67c2851071dba59fdb
Author: OopsOutOfMemory victorshen...@126.com
Date:   2015-01-23T09:09:00Z

refine test sutie correct inconsistence




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5262] [SQL] coalesce should allow NullT...

2015-01-23 Thread adrian-wang

Github user adrian-wang commented on the pull request:

https://github.com/apache/spark/pull/4057#issuecomment-71167347
  
Yes, I moved my work to FunctionArgumentConversion, and since #4040 is 
reverted due to conflicts, I added the code together here. So I leave 
Coalesce() untouched, since we would have the same type in Coalesce for sure. 
I'll change the title accordingly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Minor][streaming][MQTT streaming] some trivia...

2015-01-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4178#issuecomment-71181359
  
  [Test build #26020 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26020/consoleFull)
 for   PR 4178 at commit 
[`5857989`](https://github.com/apache/spark/commit/5857989426db9cc51e34bf09942101750fff60ea).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5364] [SQL] HiveQL transform doesn't su...

2015-01-23 Thread viirya

Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/4158#issuecomment-71181584
  
@chenghao-intel overall it looks good for me except for small comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 >

1 - 100 of 210 matches

Mail list logo