special case of custom partitioning

2014-03-06 Thread Manoj Awasthi
Hi All, I have a three machine cluster. I have two RDDs each consisting of (K,V) pairs. RDDs have just three keys 'a', 'b' and 'c'. // list1 - List(('a',1), ('b',2), val rdd1 = sc.parallelize(list1).groupByKey(new HashPartitioner(3)) // list2 - List(('a',2), ('b',7),

[GitHub] spark pull request: SPARK-1162 Added top in python.

2014-03-06 Thread ScrapCodes
GitHub user ScrapCodes opened a pull request: https://github.com/apache/spark/pull/93 SPARK-1162 Added top in python. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ScrapCodes/spark-1 SPARK-1162/pyspark-top-takeOrdered

[GitHub] spark pull request: SPARK-1162 Added top in python.

2014-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/93#issuecomment-36887773 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1162 Added top in python.

2014-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/93#issuecomment-36887770 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: SPARK-1162 Added top in python.

2014-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/93#issuecomment-36892161 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1162 Added top in python.

2014-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/93#issuecomment-36892162 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13023/ --- If your project

ALS solve.solvePositive

2014-03-06 Thread Debasish Das
Hi, I am running ALS on a sparse problem (10M x 1M) and I am getting the following error: org.jblas.exceptions.LapackArgumentException: LAPACK DPOSV: Leading minor of order i of A is not positive definite. at org.jblas.SimpleBlas.posv(SimpleBlas.java:373) at

Re: ALS solve.solvePositive

2014-03-06 Thread Sebastian Schelter
I'm not sure about the mathematical details, but I found in some experiments with Mahout that the matrix there was also not positive definite. Therefore, we chose QR decomposition to solve the linear system. --sebastian On 03/06/2014 03:44 PM, Debasish Das wrote: Hi, I am running ALS on a

QR decomposition in Spark ALS

2014-03-06 Thread Debasish Das
Hi Sebastian, Yes Mahout ALS and Oryx runs fine on the same matrix because Sean calls QR decomposition. But the ALS objective should give us strictly positive definite matrix..I am thinking more on it.. There are some random factor assignment step but that also initializes factors with

Re: QR decomposition in Spark ALS

2014-03-06 Thread Sean Owen
Hmm, Will Xt*X be positive definite in all cases? For example it's not if X has linearly independent rows? (I'm not going to guarantee 100% that I haven't missed something there.) Even though your data is huge, if it was generated by some synthetic process, maybe it is very low rank? QR

Re: QR decomposition in Spark ALS

2014-03-06 Thread Matei Zaharia
Xt*X should mathematically always be positive semi-definite, so the only way this might be bad is if it’s not invertible due to linearly dependent rows. This might happen due to the initialization or possibly due to numerical issues, though it seems unlikely. Maybe it also happens if some users

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-03-06 Thread Konstantin Boudnik
On Tue, Feb 25, 2014 at 03:20PM, Evan Chan wrote: The correct way to exclude dependencies in SBT is actually to declare a dependency as provided. I'm not familiar with Maven or its Yes, I believe this would be equivalent to the maven exclusion of an artifact's transitive deps. Cos

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-03-06 Thread Konstantin Boudnik
With all due respect Patrick - this approach is seeking for troubles. Proacively ;) Cos On Tue, Feb 25, 2014 at 04:09PM, Patrick Wendell wrote: What I mean is this. AFIAK the shader plug-in is primarily designed for creating uber jars which contain spark and all dependencies. But since Spark

Re: QR decomposition in Spark ALS

2014-03-06 Thread Debasish Das
Matei, If the data has linearly dependent rows ALS should have a failback mechanism. Either remove the rows and then call BLAS posv or call BLAS gesv or Breeze QR decomposition. I can share the analysis over email. Thanks. Deb On Thu, Mar 6, 2014 at 9:39 AM, Matei Zaharia

Re: QR decomposition in Spark ALS

2014-03-06 Thread Matei Zaharia
But Sean, because that matrix is not invertible, you can’t solve it. That’s why I’m saying, as long as it is solvable, it will be positive definite too, and in that case solvePositive is optimized for this use case (I believe it does Cholesky decomposition). Matei On Mar 6, 2014, at 9:58 AM,

Re: QR decomposition in Spark ALS

2014-03-06 Thread Matei Zaharia
Yup, this would definitely be fine. I’d like to understand when this happens though, I imagine it might be if a user / product has no ratings (though we should certainly try to run well in that case). Matei On Mar 6, 2014, at 10:00 AM, Debasish Das debasish.da...@gmail.com wrote: Matei,

[GitHub] spark pull request: [WIP] SPARK-1192: The document for most of the...

2014-03-06 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/85#discussion_r10355654 --- Diff: core/src/main/scala/org/apache/spark/util/AkkaUtils.scala --- @@ -108,6 +108,6 @@ private[spark] object AkkaUtils { /** Returns the

Re: QR decomposition in Spark ALS

2014-03-06 Thread Sean Owen
Yes in this case you end up with the least-squares solution. I don't see a problem with that; it's a corner case anyway and the best you can do. The QR decomposition will handle it either way, finding the exact solution when it exists. I think it's slower than Cholesky? (yes I am guessing that is

[GitHub] spark pull request: [WIP] SPARK-1192: The document for most of the...

2014-03-06 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/85#discussion_r10355799 --- Diff: core/src/main/scala/org/apache/spark/util/AkkaUtils.scala --- @@ -108,6 +108,6 @@ private[spark] object AkkaUtils { /** Returns the

Re: scala.collection.immutable.Nil$ cannot be cast to org.apache.spark.util.BoundedPriorityQueue

2014-03-06 Thread yao
Hi Fabrizio, Can someone explain me why do I get SparkConf not serializable error ? First, SparkConf is not serializable and that's what the exception tells you. Why you stuck in this situation ? Well, that's must because some of your classes must require a SparkConf class. In your case, that's

[GitHub] spark pull request: [SPARK-1132] Persisting Web UI through refacto...

2014-03-06 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/42#discussion_r10356042 --- Diff: core/src/main/scala/org/apache/spark/SparkEnv.scala --- @@ -164,9 +167,18 @@ object SparkEnv extends Logging { } }

[GitHub] spark pull request: Patch for SPARK-942

2014-03-06 Thread kellrott
Github user kellrott commented on the pull request: https://github.com/apache/spark/pull/50#issuecomment-36931153 I think I've covered all the formatting requests. Any other issues? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-1132] Persisting Web UI through refacto...

2014-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/42#issuecomment-36932992 Build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-1132] Persisting Web UI through refacto...

2014-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/42#issuecomment-36932993 Build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: SPARK-1195: set map_input_file environment var...

2014-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/94#issuecomment-36932975 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-06 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/86#discussion_r10359483 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkApp.scala --- @@ -0,0 +1,178 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] spark pull request: SPARK-1187, Added missing Python APIs

2014-03-06 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/75#issuecomment-36934603 I played around with these and it looks good to me. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

Re: scala.collection.immutable.Nil$ cannot be cast to org.apache.spark.util.BoundedPriorityQueue

2014-03-06 Thread Fabrizio Milo aka misto
Thank you for the reply ! that make sense :) On Thu, Mar 6, 2014 at 11:11 AM, yao yaosheng...@gmail.com wrote: Hi Fabrizio, Can someone explain me why do I get SparkConf not serializable error ? First, SparkConf is not serializable and that's what the exception tells you. Why you stuck in

[GitHub] spark pull request: [SPARK-1132] Persisting Web UI through refacto...

2014-03-06 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/42#discussion_r10360310 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventBus.scala --- @@ -0,0 +1,77 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: MLI-1 Decision Trees

2014-03-06 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/79#discussion_r10360358 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -0,0 +1,915 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: MLI-1 Decision Trees

2014-03-06 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/79#discussion_r10360441 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -0,0 +1,915 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: MLI-1 Decision Trees

2014-03-06 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/79#discussion_r10360465 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -0,0 +1,915 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: SPARK-1197. Change yarn-standalone to yarn-clu...

2014-03-06 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/95#discussion_r10360526 --- Diff: docs/running-on-yarn.md --- @@ -82,35 +84,30 @@ For example: ./bin/spark-class org.apache.spark.deploy.yarn.Client \

[GitHub] spark pull request: MLI-1 Decision Trees

2014-03-06 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/79#discussion_r10360528 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -0,0 +1,915 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: MLI-1 Decision Trees

2014-03-06 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/79#discussion_r10360539 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -0,0 +1,915 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: MLI-1 Decision Trees

2014-03-06 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/79#discussion_r10360640 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/impurity/Impurity.scala --- @@ -0,0 +1,25 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-1189: Add Security to Spark - Akka, Http...

2014-03-06 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/33#discussion_r10361204 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -135,6 +135,8 @@ class SparkContext( val isLocal = (master == local ||

[GitHub] spark pull request: SPARK-1187, Added missing Python APIs

2014-03-06 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/75 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: SPARK-1197. Change yarn-standalone to yarn-clu...

2014-03-06 Thread tgravescs
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/95#discussion_r10361302 --- Diff: docs/running-on-yarn.md --- @@ -82,35 +84,30 @@ For example: ./bin/spark-class org.apache.spark.deploy.yarn.Client \

[GitHub] spark pull request: SPARK-1197. Change yarn-standalone to yarn-clu...

2014-03-06 Thread yaoshengzhe
Github user yaoshengzhe commented on the pull request: https://github.com/apache/spark/pull/95#issuecomment-36938296 @pwendell I agree what you saying. One more question, is that possible to move all these string constants in some class ? --- If your project is set up for it,

[GitHub] spark pull request: [SPARK-1132] Persisting Web UI through refacto...

2014-03-06 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/42#discussion_r10361480 --- Diff: core/src/main/scala/org/apache/spark/ui/SparkUI.scala --- @@ -27,28 +28,58 @@ import org.apache.spark.ui.jobs.JobProgressUI import

[GitHub] spark pull request: [SPARK-1132] Persisting Web UI through refacto...

2014-03-06 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/42#discussion_r10361549 --- Diff: core/src/main/scala/org/apache/spark/ui/UIReloader.scala --- @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] spark pull request: SPARK-1197. Change yarn-standalone to yarn-clu...

2014-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/95#issuecomment-36939034 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1195: set map_input_file environment var...

2014-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/94#issuecomment-36939045 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13024/ --- If your project

[GitHub] spark pull request: SPARK-1197. Change yarn-standalone to yarn-clu...

2014-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/95#issuecomment-36939035 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13026/ --- If your project

[GitHub] spark pull request: [SPARK-1132] Persisting Web UI through refacto...

2014-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/42#issuecomment-36939031 Build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-1132] Persisting Web UI through refacto...

2014-03-06 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/42#discussion_r10361689 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/JobProgressListener.scala --- @@ -30,16 +32,23 @@ import org.apache.spark.scheduler._ *

[GitHub] spark pull request: [SPARK-1132] Persisting Web UI through refacto...

2014-03-06 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/42#discussion_r10361732 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/JobProgressListener.scala --- @@ -30,16 +32,23 @@ import org.apache.spark.scheduler._ *

[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/86#issuecomment-36939213 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/86#issuecomment-36939354 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/86#issuecomment-36939470 One or more automated tests failed Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13027/ --- If your

[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/86#issuecomment-36939469 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-1132] Persisting Web UI through refacto...

2014-03-06 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/42#discussion_r10362120 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala --- @@ -625,6 +653,30 @@ private[spark] class Master(host: String, port: Int,

[GitHub] spark pull request: SPARK-1162 Added top in python.

2014-03-06 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/93#discussion_r10362394 --- Diff: python/pyspark/rdd.py --- @@ -628,6 +669,26 @@ def mergeMaps(m1, m2): m1[k] += v return m1

Re: Spark Streaming and Storehaus -- example?

2014-03-06 Thread Paul Brown
I'd hazard that this is a generic issue. The store is in the context of the driver code, not the worker code, and that's why Spark is trying to send it off to a worker for execution. It's not serializable (and shouldn't be...), so that fails. Try making a Scala object that lives on the worker

[GitHub] spark pull request: [SPARK-1132] Persisting Web UI through refacto...

2014-03-06 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/42#discussion_r10362811 --- Diff: core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala --- @@ -19,33 +19,43 @@ package org.apache.spark.scheduler import

[GitHub] spark pull request: [SPARK-1132] Persisting Web UI through refacto...

2014-03-06 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/42#discussion_r10362915 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -978,6 +971,11 @@ class DAGScheduler( logDebug(Additional

[GitHub] spark pull request: [SPARK-1132] Persisting Web UI through refacto...

2014-03-06 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/42#discussion_r10363030 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala --- @@ -625,6 +653,30 @@ private[spark] class Master(host: String, port: Int,

[GitHub] spark pull request: [SPARK-1132] Persisting Web UI through refacto...

2014-03-06 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/42#discussion_r10363140 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/JobProgressListener.scala --- @@ -53,27 +62,28 @@ private[spark] class JobProgressListener(val sc:

[GitHub] spark pull request: [SPARK-1132] Persisting Web UI through refacto...

2014-03-06 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/42#discussion_r10363478 --- Diff: core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala --- @@ -68,6 +70,11 @@ class TaskMetrics extends Serializable { * here

[GitHub] spark pull request: SPARK-1189: Add Security to Spark - Akka, Http...

2014-03-06 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/33#issuecomment-36943591 Hey @tgravescs this looks good to me. Is there anything else you'd like to address before merging this? If not, feel free to merge it into master. --- If your project is

[GitHub] spark pull request: [SPARK-1132] Persisting Web UI through refacto...

2014-03-06 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/42#discussion_r10363587 --- Diff: core/src/main/scala/org/apache/spark/ui/UISparkListener.scala --- @@ -0,0 +1,123 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-1132] Persisting Web UI through refacto...

2014-03-06 Thread kayousterhout
Github user kayousterhout commented on the pull request: https://github.com/apache/spark/pull/42#issuecomment-36943803 I think it would be good if the persisted UI noted somewhere that the associated application is dead -- maybe you could add this in ui/UIUtils.scala, in the header

[GitHub] spark pull request: [SPARK-1132] Persisting Web UI through refacto...

2014-03-06 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/42#discussion_r10363792 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -978,6 +971,11 @@ class DAGScheduler( logDebug(Additional

[GitHub] spark pull request: MLI-2: Start adding k-fold cross validation to...

2014-03-06 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/18#discussion_r10363910 --- Diff: core/src/test/scala/org/apache/spark/util/random/RandomSamplerSuite.scala --- @@ -48,6 +48,20 @@ class RandomSamplerSuite extends FunSuite with

[GitHub] spark pull request: MLI-2: Start adding k-fold cross validation to...

2014-03-06 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/18#issuecomment-36944266 LGTM, except the extra empty line. Do you mind creating a Spark JIRA for this PR? --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-06 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/86#issuecomment-36945861 I see, regarding the memory part, it sounds like we could do it in bash, but it might be kind of painful. We could do the following: - Look for just the driver memory

[GitHub] spark pull request: SPARK-1195: set map_input_file environment var...

2014-03-06 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/94#discussion_r10364651 --- Diff: core/src/main/scala/org/apache/spark/rdd/PipedRDD.scala --- @@ -59,6 +61,20 @@ class PipedRDD[T: ClassTag]( val currentEnvVars =

[GitHub] spark pull request: MLI-1 Decision Trees

2014-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/79#issuecomment-36946317 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: MLI-1 Decision Trees

2014-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/79#issuecomment-36946316 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: MLI-1 Decision Trees

2014-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/79#issuecomment-36946454 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: MLI-1 Decision Trees

2014-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/79#issuecomment-36946455 One or more automated tests failed Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13028/ --- If your

[GitHub] spark pull request: MLI-1 Decision Trees

2014-03-06 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/79#issuecomment-36946054 Jenkins, add to whitelist and test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: Spark 1165 rdd.intersection in python and java

2014-03-06 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/80#discussion_r10364898 --- Diff: python/pyspark/rdd.py --- @@ -319,6 +319,22 @@ def union(self, other): return RDD(self_copy._jrdd.union(other_copy._jrdd), self.ctx,

[GitHub] spark pull request: SPARK-1195: set map_input_file environment var...

2014-03-06 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/94#issuecomment-36946598 hey @tgravescs good catch on this one. In terms of abstracting. The most abstract we could do would be to add a method to a `Partition` called `getPipeEnvVars` or

[GitHub] spark pull request: SPARK-1195: set map_input_file environment var...

2014-03-06 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/94#discussion_r10364522 --- Diff: core/src/test/scala/org/apache/spark/PipedRDDSuite.scala --- @@ -89,4 +97,37 @@ class PipedRDDSuite extends FunSuite with SharedSparkContext {

[GitHub] spark pull request: SPARK-1195: set map_input_file environment var...

2014-03-06 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/94#discussion_r10364545 --- Diff: core/src/test/scala/org/apache/spark/PipedRDDSuite.scala --- @@ -89,4 +97,37 @@ class PipedRDDSuite extends FunSuite with SharedSparkContext {

Re: [GitHub] spark pull request: MLI-2: Start adding k-fold cross validation to...

2014-03-06 Thread Holden Karau
Sure, unique from MLI-2? On Thu, Mar 6, 2014 at 2:15 PM, mengxr g...@git.apache.org wrote: Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/18#issuecomment-36944266 LGTM, except the extra empty line. Do you mind creating a Spark JIRA for this

[GitHub] spark pull request: SPARK-1195: set map_input_file environment var...

2014-03-06 Thread tgravescs
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/94#discussion_r10365444 --- Diff: core/src/test/scala/org/apache/spark/PipedRDDSuite.scala --- @@ -89,4 +97,37 @@ class PipedRDDSuite extends FunSuite with SharedSparkContext {

[GitHub] spark pull request: Patch for SPARK-942

2014-03-06 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/50#issuecomment-36947949 I've created SPARK-1201 (https://spark-project.atlassian.net/browse/SPARK-1201) to cover optimizations in cases other than DISK_ONLY. --- If your project is set up for

[GitHub] spark pull request: Example for cassandra CQL read/write from spar...

2014-03-06 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/87#issuecomment-36948309 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1195: set map_input_file environment var...

2014-03-06 Thread tgravescs
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/94#issuecomment-36948463 Adding a routine to HadoopPartition sounds good. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: GRAPH-1: Map side distinct in collect vertex i...

2014-03-06 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/21#issuecomment-36948553 @ankurdave does this look okay to you? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: GRAPH-1: Map side distinct in collect vertex i...

2014-03-06 Thread ankurdave
Github user ankurdave commented on the pull request: https://github.com/apache/spark/pull/21#issuecomment-36948661 Looks good. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

RE: Spark Streaming and Storehaus -- example?

2014-03-06 Thread Jim Donahue
Thanks, that's just what I needed to fix the problem! Looks like it's singing and dancing now ... Jim -Original Message- From: Paul Brown [mailto:p...@mult.ifario.us] Sent: Thursday, March 06, 2014 1:50 PM To: dev@spark.apache.org Subject: Re: Spark Streaming and Storehaus -- example?

[GitHub] spark pull request: GRAPH-1: Map side distinct in collect vertex i...

2014-03-06 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/21#issuecomment-36949572 Actually - no ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: GRAPH-1: Map side distinct in collect vertex i...

2014-03-06 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/21#issuecomment-36949585 We should use the primitive hashmap - otherwise it is pretty slow --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: [SPARK-1132] Persisting Web UI through refacto...

2014-03-06 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/42#discussion_r10366581 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -978,6 +971,11 @@ class DAGScheduler( logDebug(Additional

[GitHub] spark pull request: Patch for SPARK-942

2014-03-06 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/50 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: SPARK-1193. Fix indentation in pom.xmls

2014-03-06 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/91#issuecomment-36950404 Thanks we can merge this. Want #33 to go in first since I think this will conflict with it. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: Example for cassandra CQL read/write from spar...

2014-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/87#issuecomment-36950985 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: Example for cassandra CQL read/write from spar...

2014-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/87#issuecomment-36950986 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1004. PySpark on YARN

2014-03-06 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/30#discussion_r10366986 --- Diff: python/setup.py --- @@ -0,0 +1,30 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] spark pull request: SPARK-1004. PySpark on YARN

2014-03-06 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/30#issuecomment-36951182 Hey @JoshRosen mind taking a look at this I think @sryza has tested it on YARN. But personally don't know enough about python packaging to look it over with confidence.

[GitHub] spark pull request: GRAPH-1: Map side distinct in collect vertex i...

2014-03-06 Thread holdenk
Github user holdenk commented on the pull request: https://github.com/apache/spark/pull/21#issuecomment-36951277 Ok I'll switch it tonight. On Thu, Mar 6, 2014 at 3:09 PM, Reynold Xin notificati...@github.comwrote: We should use the primitive hashmap - otherwise

[GitHub] spark pull request: Fix #SPARK-1149 Bad partitioners can cause Spa...

2014-03-06 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/44#discussion_r10367306 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -847,6 +847,8 @@ class SparkContext( partitions: Seq[Int],

[GitHub] spark pull request: SPARK-1004. PySpark on YARN

2014-03-06 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/30#issuecomment-36953290 Hey @sryza I tested this using a local standalone cluster and it didn't seem to work. The executors failed when they were asked to launch pyspark: ```

[GitHub] spark pull request: SPARK-1004. PySpark on YARN

2014-03-06 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/30#issuecomment-36953787 Updated to 1.0.0 and removed incubating --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: SPARK-1195: set map_input_file environment var...

2014-03-06 Thread tgravescs
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/94#discussion_r10368475 --- Diff: core/src/test/scala/org/apache/spark/PipedRDDSuite.scala --- @@ -89,4 +97,37 @@ class PipedRDDSuite extends FunSuite with SharedSparkContext {

[GitHub] spark pull request: [SPARK-1132] Persisting Web UI through refacto...

2014-03-06 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/42#discussion_r10368525 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/JobProgressListener.scala --- @@ -30,16 +32,23 @@ import org.apache.spark.scheduler._ * class,

  1   2   >