Re: Emergency maintenace on jenkins
Thanks for letting us know Patrick. - Henry On Monday, June 9, 2014, Patrick Wendell pwend...@gmail.com wrote: Just a heads up - due to an outage at UCB we've lost several of the Jenkins slaves. I'm trying to spin up new slaves on EC2 in order to compensate, but this might fail some ongoing builds. The good news is if we do get it working with EC2 workers, then we will have burst capability in the future - e.g. on release deadlines. So it's not all bad! - Patrick
Re: Emergency maintenace on jenkins
No luck with this tonight - unfortunately our Python tests aren't working well with Python 2.6 and some other issues made it hard to get the EC2 worker up to speed. Hopefully we can have this up and running tomororw. - Patrick On Mon, Jun 9, 2014 at 10:17 PM, Patrick Wendell pwend...@gmail.com wrote: Just a heads up - due to an outage at UCB we've lost several of the Jenkins slaves. I'm trying to spin up new slaves on EC2 in order to compensate, but this might fail some ongoing builds. The good news is if we do get it working with EC2 workers, then we will have burst capability in the future - e.g. on release deadlines. So it's not all bad! - Patrick
Re: debugger
Hi Josh, I came across this post when looking for a debugger or RDD visualization tool for Spark. I am using Spark 0.9.1 and upgrading soon to Spark 1.0. The links you posted are dead. Can you please direct me to how I can debug my existing Spark job. Will I need to edit my existing job's code in addition to setting any environment variables/parameters. The problem: I am running Bagel on a very large graph and when the job gets to the final step (saveAsTextFile) it will hang for up to many days until I kill it. Oftentimes if I simply rerun the job, it will finish in an hour which is the expected amount of time it should take. Thanks! -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/debugger-tp284p6982.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
Re: Emergency maintenace on jenkins
Hey just to update people - as of around 1pm PT we were back up and running with Jenkins slaves on EC2. Sorry about the disruption. - Patrick On Tue, Jun 10, 2014 at 1:15 AM, Patrick Wendell pwend...@gmail.com wrote: No luck with this tonight - unfortunately our Python tests aren't working well with Python 2.6 and some other issues made it hard to get the EC2 worker up to speed. Hopefully we can have this up and running tomororw. - Patrick On Mon, Jun 9, 2014 at 10:17 PM, Patrick Wendell pwend...@gmail.com wrote: Just a heads up - due to an outage at UCB we've lost several of the Jenkins slaves. I'm trying to spin up new slaves on EC2 in order to compensate, but this might fail some ongoing builds. The good news is if we do get it working with EC2 workers, then we will have burst capability in the future - e.g. on release deadlines. So it's not all bad! - Patrick
Run ScalaTest inside Intellij IDEA
Hi All, I want to run ScalaTest Suite in IDEA directly, but it seems didn’t pass the make phase before test running. The problems are as follows: /Users/yijie/code/apache.spark.master/core/src/main/scala/org/apache/spark/executor/MesosExecutorBackend.scala Error:(44, 35) type mismatch; found : org.apache.mesos.protobuf.ByteString required: com.google.protobuf.ByteString .setData(ByteString.copyFrom(data)) ^ /Users/yijie/code/apache.spark.master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala Error:(119, 35) type mismatch; found : org.apache.mesos.protobuf.ByteString required: com.google.protobuf.ByteString .setData(ByteString.copyFrom(createExecArg())) ^ Error:(257, 35) type mismatch; found : org.apache.mesos.protobuf.ByteString required: com.google.protobuf.ByteString .setData(ByteString.copyFrom(task.serializedTask)) ^ Before I run test in IDEA, I build spark through ’sbt/sbt assembly’, import projects into IDEA after ’sbt/sbt gen-idea’, and able to run test in Terminal ’sbt/sbt test’ Are there anything I leave out in order to run/debug testsuite inside IDEA? Best regards, Yijie
Suggestion: rdd.compute()
Hi, Regarding the following scenario, Would it be nice to have an action method named like 'compute()' that does nothing but computing/materializing the whole partitions of an RDD? It can also be useful for the profiling. -Original Message- From: innowireless TaeYun Kim [mailto:taeyun@innowireless.co.kr] Sent: Wednesday, June 11, 2014 11:40 AM To: u...@spark.apache.org Subject: Question about RDD cache, unpersist, materialization Hi, What I (seems to) know about RDD persisting API is as follows: - cache() and persist() is not an action. It only does a marking. - unpersist() is also not an action. It only removes a marking. But if the rdd is already in memory, it is unloaded. And there seems no API to forcefully materialize the RDD without requiring a data by an action method, for example first(). So, I am faced with the following scenario. { JavaRDDT rddUnion = sc.parallelize(new ArrayListT()); // create empty for merging for (int i = 0; i 10; i++) { JavaRDDT2 rdd = sc.textFile(inputFileNames[i]); rdd.cache(); // Since it will be used twice, cache. rdd.map(...).filter(...).saveAsTextFile(outputFileNames[i]); // Transform and save, rdd materializes rddUnion = rddUnion.union(rdd.map(...).filter(...)); // Do another transform to T and merge by union rdd.unpersist(); // Now it seems not needed. (But needed actually) } // Here, rddUnion actually materializes, and needs all 10 rdds that already unpersisted. // So, rebuilding all 10 rdds will occur. rddUnion.saveAsTextFile(mergedFileName); } If rddUnion can be materialized before the rdd.unpersist() line and cache()d, the rdds in the loop will not be needed on rddUnion.saveAsTextFile(). Now what is the best strategy? - Do not unpersist all 10 rdds in the loop. - Materialize rddUnion in the loop by calling 'light' action API, like first(). - Give up and just rebuild/reload all 10 rdds when saving rddUnion. Is there some misunderstanding? Thanks.
Re: Constraint Solver for Spark
Sorry last one went out by mistake: Is not for users (0 to numUsers), fullXtX is same ? In the ALS formulation this is W^TW or H^TH which should be same for all the users ? Why we are reading userXtX(index) and adding it to fullXtX in the loop over all numUsers ? // Solve the least-squares problem for each user and return the new feature vectors Array.range(0, numUsers).map { index = // Compute the full XtX matrix from the lower-triangular part we got above fillFullMatrix(userXtX(index), fullXtX) // Add regularization var i = 0 while (i rank) { fullXtX.data(i * rank + i) += lambda i += 1 } // Solve the resulting matrix, which is symmetric and positive-definite algo match { case ALSAlgo.Implicit = Solve.solvePositive(fullXtX.addi(YtY.get.value), userXy(index)).data case ALSAlgo.Explicit = Solve.solvePositive(fullXtX, userXy (index)).data } } On Tue, Jun 10, 2014 at 8:56 PM, Debasish Das debasish.da...@gmail.com wrote: Hi, I am bit confused wiht the code here: // Solve the least-squares problem for each user and return the new feature vectors Array.range(0, numUsers).map { index = // Compute the full XtX matrix from the lower-triangular part we got above fillFullMatrix(userXtX(index), fullXtX) // Add regularization var i = 0 while (i rank) { fullXtX.data(i * rank + i) += lambda i += 1 } // Solve the resulting matrix, which is symmetric and positive-definite algo match { case ALSAlgo.Implicit = Solve.solvePositive(fullXtX.addi(YtY.get.value), userXy(index)).data case ALSAlgo.Explicit = Solve.solvePositive(fullXtX, userXy (index)).data } } On Fri, Jun 6, 2014 at 10:42 AM, Debasish Das debasish.da...@gmail.com wrote: Hi Xiangrui, It's not the linear constraint, It is quadratic inequality with slack, first order taylor approximation of off diagonal cross terms and a cyclic coordinate descent, which we think will yield orthogonalityIt's still under works... Also we want to put a L1 constraint as set of linear equations when solving for ALS... I will create the JIRA...as I see it, this will evolve to a generic constraint solver for machine learning problems that has a QP structureALS is one exampleanother example is kernel SVMs... I did not know that lgpl solver can be added to the classpathif it can be then definitely we should add these in ALS.scala... Thanks. Deb On Thu, Jun 5, 2014 at 11:31 PM, Xiangrui Meng men...@gmail.com wrote: I don't quite understand why putting linear constraints can promote orthogonality. For the interfaces, if the subproblem is determined by Y^T Y and Y^T b for each iteration, then the least squares solver, the non-negative least squares solver, or your convex solver is simply a function (A, b) - x. You can define it as an interface, and make the solver pluggable by adding a setter to ALS. If you want to use your lgpl solver, just include it in the classpath. Creating two separate files still seems unnecessary to me. Could you create a JIRA and we can move our discussion there? Thanks! Best, Xiangrui On Thu, Jun 5, 2014 at 7:20 PM, Debasish Das debasish.da...@gmail.com wrote: Hi Xiangrui, For orthogonality properties in the factors we need a constraint solver other than the usuals (l1, upper and lower bounds, l2 etc) The interface of constraint solver is standard and I can add it in mllib optimization But I am not sure how will I call the gpl licensed ipm solver from mllibassume the solver interface is as follows: Qpsolver (densematrix h, array [double] f, int linearEquality, int linearInequality, bool lb, bool ub) And then I have functions to update equalities, inequalities, bounds etc followed by the run which generates the solution For l1 constraints I have to use epigraph formulation which needs a variable transformation before the solve I was thinking that for the problems that does not need constraints people will use ALS.scala and ConstrainedALS.scala will have the constrained formulations I can point you to the code once it is ready and then you can guide me how to refactor it to mllib als ? Thanks. Deb Hi Deb, Why do you want to make those methods public? If you only need to replace the solver for subproblems. You can try to make the solver pluggable. Now it supports least squares and non-negative least squares. You can define an interface for the subproblem solvers and maintain the IPM solver at your own code base, if the only information you need is Y^T Y and Y^T b. Btw, just curious, what is the use case for quadratic constraints? Best, Xiangrui On Thu, Jun 5, 2014 at 3:38 PM, Debasish Das debasish.da...@gmail.com wrote: Hi, We are adding a