Re: Emergency maintenace on jenkins

2014-06-10 Thread Henry Saputra
Thanks for letting us know Patrick.

- Henry

On Monday, June 9, 2014, Patrick Wendell pwend...@gmail.com wrote:

 Just a heads up - due to an outage at UCB we've lost several of the
 Jenkins slaves. I'm trying to spin up new slaves on EC2 in order to
 compensate, but this might fail some ongoing builds.

 The good news is if we do get it working with EC2 workers, then we
 will have burst capability in the future - e.g. on release deadlines.
 So it's not all bad!

 - Patrick



Re: Emergency maintenace on jenkins

2014-06-10 Thread Patrick Wendell
No luck with this tonight - unfortunately our Python tests aren't
working well with Python 2.6 and some other issues made it hard to get
the EC2 worker up to speed. Hopefully we can have this up and running
tomororw.

- Patrick

On Mon, Jun 9, 2014 at 10:17 PM, Patrick Wendell pwend...@gmail.com wrote:
 Just a heads up - due to an outage at UCB we've lost several of the
 Jenkins slaves. I'm trying to spin up new slaves on EC2 in order to
 compensate, but this might fail some ongoing builds.

 The good news is if we do get it working with EC2 workers, then we
 will have burst capability in the future - e.g. on release deadlines.
 So it's not all bad!

 - Patrick


Re: debugger

2014-06-10 Thread DanielH
Hi Josh,

I came across this post when looking for a debugger or RDD visualization
tool for Spark. I am using Spark 0.9.1 and upgrading soon to Spark 1.0. The
links you posted are dead. Can you please direct me to how I can debug my
existing Spark job.

Will I need to edit my existing job's code in addition to setting any
environment variables/parameters.

The problem: I am running Bagel on a very large graph and when the job gets
to the final step (saveAsTextFile) it will hang for up to many days until I
kill it. Oftentimes if I simply rerun the job, it will finish in an hour
which is the expected amount of time it should take.

Thanks!



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/debugger-tp284p6982.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.


Re: Emergency maintenace on jenkins

2014-06-10 Thread Patrick Wendell
Hey just to update people - as of around 1pm PT we were back up and
running with Jenkins slaves on EC2. Sorry about the disruption.

- Patrick

On Tue, Jun 10, 2014 at 1:15 AM, Patrick Wendell pwend...@gmail.com wrote:
 No luck with this tonight - unfortunately our Python tests aren't
 working well with Python 2.6 and some other issues made it hard to get
 the EC2 worker up to speed. Hopefully we can have this up and running
 tomororw.

 - Patrick

 On Mon, Jun 9, 2014 at 10:17 PM, Patrick Wendell pwend...@gmail.com wrote:
 Just a heads up - due to an outage at UCB we've lost several of the
 Jenkins slaves. I'm trying to spin up new slaves on EC2 in order to
 compensate, but this might fail some ongoing builds.

 The good news is if we do get it working with EC2 workers, then we
 will have burst capability in the future - e.g. on release deadlines.
 So it's not all bad!

 - Patrick


Run ScalaTest inside Intellij IDEA

2014-06-10 Thread 申毅杰
Hi All,

I want to run ScalaTest Suite in IDEA directly, but it seems didn’t pass the 
make phase before test running.
The problems are as follows:

/Users/yijie/code/apache.spark.master/core/src/main/scala/org/apache/spark/executor/MesosExecutorBackend.scala
Error:(44, 35) type mismatch;
 found   : org.apache.mesos.protobuf.ByteString
 required: com.google.protobuf.ByteString
  .setData(ByteString.copyFrom(data))
  ^
/Users/yijie/code/apache.spark.master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala
Error:(119, 35) type mismatch;
 found   : org.apache.mesos.protobuf.ByteString
 required: com.google.protobuf.ByteString
  .setData(ByteString.copyFrom(createExecArg()))
  ^
Error:(257, 35) type mismatch;
 found   : org.apache.mesos.protobuf.ByteString
 required: com.google.protobuf.ByteString
  .setData(ByteString.copyFrom(task.serializedTask))
  ^

Before I run test in IDEA, I build spark through ’sbt/sbt assembly’,
import projects into IDEA after ’sbt/sbt gen-idea’, 
and able to run test in Terminal ’sbt/sbt test’

Are there anything I leave out in order to run/debug testsuite inside IDEA?

Best regards,
Yijie

Suggestion: rdd.compute()

2014-06-10 Thread innowireless TaeYun Kim
Hi,

Regarding the following scenario, Would it be nice to have an action method
named like 'compute()' that does nothing but computing/materializing the
whole partitions of an RDD?
It can also be useful for the profiling.


-Original Message-
From: innowireless TaeYun Kim [mailto:taeyun@innowireless.co.kr] 
Sent: Wednesday, June 11, 2014 11:40 AM
To: u...@spark.apache.org
Subject: Question about RDD cache, unpersist, materialization

Hi,

What I (seems to) know about RDD persisting API is as follows:
- cache() and persist() is not an action. It only does a marking.
- unpersist() is also not an action. It only removes a marking. But if the
rdd is already in memory, it is unloaded.

And there seems no API to forcefully materialize the RDD without requiring a
data by an action method, for example first().

So, I am faced with the following scenario.

{
JavaRDDT rddUnion = sc.parallelize(new ArrayListT());  // create
empty for merging
for (int i = 0; i  10; i++)
{
JavaRDDT2 rdd = sc.textFile(inputFileNames[i]);
rdd.cache();  // Since it will be used twice, cache.
rdd.map(...).filter(...).saveAsTextFile(outputFileNames[i]);  //
Transform and save, rdd materializes
rddUnion = rddUnion.union(rdd.map(...).filter(...));  // Do another
transform to T and merge by union
rdd.unpersist();  // Now it seems not needed. (But needed actually)
}
// Here, rddUnion actually materializes, and needs all 10 rdds that
already unpersisted.
// So, rebuilding all 10 rdds will occur.
rddUnion.saveAsTextFile(mergedFileName);
}

If rddUnion can be materialized before the rdd.unpersist() line and
cache()d, the rdds in the loop will not be needed on
rddUnion.saveAsTextFile().

Now what is the best strategy?
- Do not unpersist all 10 rdds in the loop.
- Materialize rddUnion in the loop by calling 'light' action API, like
first().
- Give up and just rebuild/reload all 10 rdds when saving rddUnion.

Is there some misunderstanding?

Thanks.




Re: Constraint Solver for Spark

2014-06-10 Thread Debasish Das
Sorry last one went out by mistake:

Is not for users (0 to numUsers), fullXtX is same ? In the ALS formulation
this is W^TW or H^TH which should be same for all the users ? Why we are
reading userXtX(index) and adding it to fullXtX in the loop over all
numUsers ?

// Solve the least-squares problem for each user and return the new feature
vectors

Array.range(0, numUsers).map { index =

  // Compute the full XtX matrix from the lower-triangular part we got
above

  fillFullMatrix(userXtX(index), fullXtX)

  // Add regularization

  var i = 0

  while (i  rank) {

fullXtX.data(i * rank + i) += lambda

i += 1

  }

  // Solve the resulting matrix, which is symmetric and
positive-definite

  algo match {

case ALSAlgo.Implicit =
Solve.solvePositive(fullXtX.addi(YtY.get.value),
userXy(index)).data

case ALSAlgo.Explicit = Solve.solvePositive(fullXtX, userXy
(index)).data

  }

}


On Tue, Jun 10, 2014 at 8:56 PM, Debasish Das debasish.da...@gmail.com
wrote:

 Hi,

 I am bit confused wiht the code here:

 // Solve the least-squares problem for each user and return the new
 feature vectors

 Array.range(0, numUsers).map { index =

   // Compute the full XtX matrix from the lower-triangular part we
 got above

   fillFullMatrix(userXtX(index), fullXtX)

   // Add regularization

   var i = 0

   while (i  rank) {

 fullXtX.data(i * rank + i) += lambda

 i += 1

   }

   // Solve the resulting matrix, which is symmetric and
 positive-definite

   algo match {

 case ALSAlgo.Implicit = 
 Solve.solvePositive(fullXtX.addi(YtY.get.value),
 userXy(index)).data

 case ALSAlgo.Explicit = Solve.solvePositive(fullXtX, userXy
 (index)).data

   }

 }


 On Fri, Jun 6, 2014 at 10:42 AM, Debasish Das debasish.da...@gmail.com
 wrote:

 Hi Xiangrui,

 It's not the linear constraint, It is quadratic inequality with slack,
 first order taylor approximation of off diagonal cross terms and a cyclic
 coordinate descent, which we think will yield orthogonalityIt's still
 under works...

 Also we want to put a L1 constraint as set of linear equations when
 solving for ALS...

 I will create the JIRA...as I see it, this will evolve to a generic
 constraint solver for machine learning problems that has a QP
 structureALS is one exampleanother example is kernel SVMs...

 I did not know that lgpl solver can be added to the classpathif it
 can be then definitely we should add these in ALS.scala...

 Thanks.
 Deb



 On Thu, Jun 5, 2014 at 11:31 PM, Xiangrui Meng men...@gmail.com wrote:

 I don't quite understand why putting linear constraints can promote
 orthogonality. For the interfaces, if the subproblem is determined by
 Y^T Y and Y^T b for each iteration, then the least squares solver, the
 non-negative least squares solver, or your convex solver is simply a
 function

 (A, b) - x.

 You can define it as an interface, and make the solver pluggable by
 adding a setter to ALS. If you want to use your lgpl solver, just
 include it in the classpath. Creating two separate files still seems
 unnecessary to me. Could you create a JIRA and we can move our
 discussion there? Thanks!

 Best,
 Xiangrui

 On Thu, Jun 5, 2014 at 7:20 PM, Debasish Das debasish.da...@gmail.com
 wrote:
  Hi Xiangrui,
 
  For orthogonality properties in the factors we need a constraint solver
  other than the usuals (l1, upper and lower bounds, l2 etc)
 
  The interface of constraint solver is standard and I can add it in
 mllib
  optimization
 
  But I am not sure how will I call the gpl licensed ipm solver from
  mllibassume the solver interface is as follows:
 
  Qpsolver (densematrix h, array [double] f, int linearEquality, int
  linearInequality, bool lb, bool ub)
 
  And then I have functions to update equalities, inequalities, bounds
 etc
  followed by the run which generates the solution
 
  For l1 constraints I have to use epigraph formulation which needs a
  variable transformation before the solve
 
  I was thinking that for the problems that does not need constraints
 people
  will use ALS.scala and ConstrainedALS.scala will have the constrained
  formulations
 
  I can point you to the code once it is ready and then you can guide me
 how
  to refactor it to mllib als ?
 
  Thanks.
  Deb
  Hi Deb,
 
  Why do you want to make those methods public? If you only need to
  replace the solver for subproblems. You can try to make the solver
  pluggable. Now it supports least squares and non-negative least
  squares. You can define an interface for the subproblem solvers and
  maintain the IPM solver at your own code base, if the only information
  you need is Y^T Y and Y^T b.
 
  Btw, just curious, what is the use case for quadratic constraints?
 
  Best,
  Xiangrui
 
  On Thu, Jun 5, 2014 at 3:38 PM, Debasish Das debasish.da...@gmail.com
 
  wrote:
  Hi,
 
  We are adding a