Re: Eigenvalue solver

2016-01-12 Thread David Hall
(I don't know anything spark specific, so I'm going to treat it like a
Breeze question...)

As I understand it, Spark uses ARPACK via Breeze for SVD, and presumably
the same approach can be used for EVD. Basically, you make a function that
multiplies your "matrix" (which might be represented
implicitly/distributed, whatever) by a breeze.linalg.DenseVector.

This is the Breeze implementation for sparse SVD (which is fully generic
and might be hard to follow if you're not used to Breeze/typeclass-heavy
Scala...)

https://github.com/dlwh/breeze/blob/aa958688c428db581d853fd92eb35e82f80d8b5c/math/src/main/scala/breeze/linalg/functions/svd.scala#L205-L205

The difference between SVD and EVD in arpack (to a first approximation) is
that you need to multiple by A.t * A * x for SVD, and just A * x for EVD.

The basic idea is to implement a Breeze UFunc eig.Impl2 implicit following
the svd code (or you could just copy out the body of the function and
specialize it.) The signature you're looking to implement is:

implicit def Eig_Sparse_Impl[Mat](implicit mul: OpMulMatrix.Impl2[Mat,
DenseVector[Double], DenseVector[Double]],
  dimImpl: dim.Impl[Mat, (Int, Int)])
  : eig.Impl3[Mat, Int, Double, EigenvalueResult] = {

The type parameters of Impl3 are: the matrix type, the number of
eigenvalues you want, and a tolerance, and a result type. If you implement
this signature, then you can call eig on anything that can be multiplied by
a dense vector and that implements dim (to get the number of outputs).

(You'll need to define the class eigenvalue result to be what you want. I
don't immediately know how to unpack ARPACK's answers, but you might look
at this scipy thing:
https://github.com/thomasnat1/cdcNewsRanker/blob/71b0ff3989d5191dc6a78c40c4a7a9967cbb0e49/venv/lib/python2.7/site-packages/scipy/sparse/linalg/eigen/arpack/arpack.py#L1049
)

I'm happy to help more if you decide to go this route, here, or on the
scala-breeze google group, or on github.

-- David


On Tue, Jan 12, 2016 at 10:28 AM, Lydia Ickler 
wrote:

> Hi,
>
> I wanted to know if there are any implementations yet within the Machine
> Learning Library or generally that can efficiently solve eigenvalue
> problems?
> Or if not do you have suggestions on how to approach a parallel execution
> maybe with BLAS or Breeze?
>
> Thanks in advance!
> Lydia
>
>
> Von meinem iPhone gesendet
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


Re: [mllib] Is there any bugs to divide a Breeze sparse vectors at Spark v1.3.0-rc3?

2015-03-18 Thread David Hall
sure.

On Wed, Mar 18, 2015 at 12:19 AM, Debasish Das debasish.da...@gmail.com
wrote:

 Hi David,

 We are stress testing breeze.optimize.proximal and nnls...if you are
 cutting a release now, we will need another release soon once we get the
 runtime optimizations in place and merged to breeze.

 Thanks.
 Deb
  On Mar 15, 2015 9:39 PM, David Hall david.lw.h...@gmail.com wrote:

 snapshot is pushed. If you verify I'll publish the new artifacts.

 On Sun, Mar 15, 2015 at 1:14 AM, Yu Ishikawa 
 yuu.ishikawa+sp...@gmail.com
 wrote:

  David Hall who is a breeze creator told me that it's a bug. So, I made a
  jira
  ticket about this issue. We need to upgrade breeze from 0.11.1 to
 0.11.2 or
  later in order to fix the bug, when the new version of breeze will be
  released.
 
  [SPARK-6341] Upgrade breeze from 0.11.1 to 0.11.2 or later - ASF JIRA
  https://issues.apache.org/jira/browse/SPARK-6341
 
  Thanks,
  Yu Ishikawa
 
 
 
  -
  -- Yu Ishikawa
  --
  View this message in context:
 
 http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Is-there-any-bugs-to-divide-a-Breeze-sparse-vectors-at-Spark-v1-3-0-rc3-tp11056p11058.html
  Sent from the Apache Spark Developers List mailing list archive at
  Nabble.com.
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 
 




Re: [mllib] Is there any bugs to divide a Breeze sparse vectors at Spark v1.3.0-rc3?

2015-03-17 Thread David Hall
ping?

On Sun, Mar 15, 2015 at 9:38 PM, David Hall david.lw.h...@gmail.com wrote:

 snapshot is pushed. If you verify I'll publish the new artifacts.

 On Sun, Mar 15, 2015 at 1:14 AM, Yu Ishikawa yuu.ishikawa+sp...@gmail.com
  wrote:

 David Hall who is a breeze creator told me that it's a bug. So, I made a
 jira
 ticket about this issue. We need to upgrade breeze from 0.11.1 to 0.11.2
 or
 later in order to fix the bug, when the new version of breeze will be
 released.

 [SPARK-6341] Upgrade breeze from 0.11.1 to 0.11.2 or later - ASF JIRA
 https://issues.apache.org/jira/browse/SPARK-6341

 Thanks,
 Yu Ishikawa



 -
 -- Yu Ishikawa
 --
 View this message in context:
 http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Is-there-any-bugs-to-divide-a-Breeze-sparse-vectors-at-Spark-v1-3-0-rc3-tp11056p11058.html
 Sent from the Apache Spark Developers List mailing list archive at
 Nabble.com.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org





Re: [mllib] Is there any bugs to divide a Breeze sparse vectors at Spark v1.3.0-rc3?

2015-03-15 Thread David Hall
snapshot is pushed. If you verify I'll publish the new artifacts.

On Sun, Mar 15, 2015 at 1:14 AM, Yu Ishikawa yuu.ishikawa+sp...@gmail.com
wrote:

 David Hall who is a breeze creator told me that it's a bug. So, I made a
 jira
 ticket about this issue. We need to upgrade breeze from 0.11.1 to 0.11.2 or
 later in order to fix the bug, when the new version of breeze will be
 released.

 [SPARK-6341] Upgrade breeze from 0.11.1 to 0.11.2 or later - ASF JIRA
 https://issues.apache.org/jira/browse/SPARK-6341

 Thanks,
 Yu Ishikawa



 -
 -- Yu Ishikawa
 --
 View this message in context:
 http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Is-there-any-bugs-to-divide-a-Breeze-sparse-vectors-at-Spark-v1-3-0-rc3-tp11056p11058.html
 Sent from the Apache Spark Developers List mailing list archive at
 Nabble.com.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




Re: Breeze Library usage in Spark

2014-10-03 Thread David Hall
yeah, breeze.storage.Zero was introduced in either 0.8 or 0.9.

On Fri, Oct 3, 2014 at 9:45 AM, Xiangrui Meng men...@gmail.com wrote:

 Did you add a different version of breeze to the classpath? In Spark
 1.0, we use breeze 0.7, and in Spark 1.1 we use 0.9. If the breeze
 version you used is different from the one comes with Spark, you might
 see class not found. -Xiangrui

 On Fri, Oct 3, 2014 at 4:22 AM, Priya Ch learnings.chitt...@gmail.com
 wrote:
  Hi Team,
 
  When I am trying to use DenseMatrix of breeze library in spark, its
 throwing
  me the following error:
 
  java.lang.noclassdeffounderror: breeze/storage/Zero
 
 
  Can someone help me on this ?
 
  Thanks,
  Padma Ch
 
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




Re: Is breeze thread safe in Spark?

2014-09-03 Thread David Hall
mutating operations are not thread safe. Operations that don't mutate
should be thread safe. I can't speak to what Evan said, but I would guess
that the way they're using += should be safe.


On Wed, Sep 3, 2014 at 11:58 AM, RJ Nowling rnowl...@gmail.com wrote:

 David,

 Can you confirm that += is not thread safe but + is?  I'm assuming +
 allocates a new object for the write, while += doesn't.

 Thanks!
 RJ


 On Wed, Sep 3, 2014 at 2:50 PM, David Hall d...@cs.berkeley.edu wrote:

 In general, in Breeze we allocate separate work arrays for each call to
 lapack, so it should be fine. In general concurrent modification isn't
 thread safe of course, but things that ought to be thread safe really
 should be.


 On Wed, Sep 3, 2014 at 10:41 AM, RJ Nowling rnowl...@gmail.com wrote:

 No, it's not in all cases.   Since Breeze uses lapack under the hood,
 changes to memory between different threads is bad.

 There's actually a potential bug in the KMeans code where it uses +=
 instead of +.


 On Wed, Sep 3, 2014 at 1:26 PM, Ulanov, Alexander 
 alexander.ula...@hp.com
 wrote:

  Hi,
 
  Is breeze library called thread safe from Spark mllib code in case when
  native libs for blas and lapack are used? Might it be an issue when
 running
  Spark locally?
 
  Best regards, Alexander
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 
 


 --
 em rnowl...@gmail.com
 c 954.496.2314





 --
 em rnowl...@gmail.com
 c 954.496.2314



Re: Linear CG solver

2014-06-27 Thread David Hall
I have no ideas on benchmarks, but breeze has a CG solver:
https://github.com/scalanlp/breeze/tree/master/math/src/main/scala/breeze/optimize/linear/ConjugateGradient.scala

https://github.com/scalanlp/breeze/blob/e2adad3b885736baf890b306806a56abc77a3ed3/math/src/test/scala/breeze/optimize/linear/ConjugateGradientTest.scala

It's based on the code from TRON, and so I think it's more targeted for
norm-constrained solutions of the CG problem.








On Fri, Jun 27, 2014 at 5:54 PM, Debasish Das debasish.da...@gmail.com
wrote:

 Hi,

 I am looking for an efficient linear CG to be put inside the Quadratic
 Minimization algorithms we added for Spark mllib.

 With a good linear CG, we should be able to solve kernel SVMs with this
 solver in mllib...

 I use direct solves right now using cholesky decomposition which has higher
 complexity as matrix sizes become large...

 I found out some jblas example code:

 https://github.com/mikiobraun/jblas-examples/blob/master/src/CG.java

 I was wondering if mllib developers have any experience using this solver
 and if this is better than apache commons linear CG ?

 Thanks.
 Deb



Re: mllib vector templates

2014-05-05 Thread David Hall
On Mon, May 5, 2014 at 3:40 PM, DB Tsai dbt...@stanford.edu wrote:

 David,

 Could we use Int, Long, Float as the data feature spaces, and Double for
 optimizer?


Yes. Breeze doesn't allow operations on mixed types, so you'd need to
convert the double vectors to Floats if you wanted, e.g. dot product with
the weights vector.

You might also be interested in FeatureVector, which is just a wrapper
around Array[Int] that emulates an indicator vector. It supports dot
products, axpy, etc.

-- David




 Sincerely,

 DB Tsai
 ---
 My Blog: https://www.dbtsai.com
 LinkedIn: https://www.linkedin.com/in/dbtsai


 On Mon, May 5, 2014 at 3:06 PM, David Hall d...@cs.berkeley.edu wrote:

  Lbfgs and other optimizers would not work immediately, as they require
  vector spaces over double. Otherwise it should work.
  On May 5, 2014 3:03 PM, DB Tsai dbt...@stanford.edu wrote:
 
   Breeze could take any type (Int, Long, Double, and Float) in the matrix
   template.
  
  
   Sincerely,
  
   DB Tsai
   ---
   My Blog: https://www.dbtsai.com
   LinkedIn: https://www.linkedin.com/in/dbtsai
  
  
   On Mon, May 5, 2014 at 2:56 PM, Debasish Das debasish.da...@gmail.com
   wrote:
  
Is this a breeze issue or breeze can take templates on float /
 double ?
   
If breeze can take templates then it is a minor fix for Vectors.scala
   right
?
   
Thanks.
Deb
   
   
On Mon, May 5, 2014 at 2:45 PM, DB Tsai dbt...@stanford.edu wrote:
   
 +1  Would be nice that we can use different type in Vector.


 Sincerely,

 DB Tsai
 ---
 My Blog: https://www.dbtsai.com
 LinkedIn: https://www.linkedin.com/in/dbtsai


 On Mon, May 5, 2014 at 2:41 PM, Debasish Das 
  debasish.da...@gmail.com
 wrote:

  Hi,
 
  Why mllib vector is using double as default ?
 
  /**
 
   * Represents a numeric vector, whose index type is Int and value
   type
is
  Double.
 
   */
 
  trait Vector extends Serializable {
 
 
/**
 
 * Size of the vector.
 
 */
 
def size: Int
 
 
/**
 
 * Converts the instance to a double array.
 
 */
 
def toArray: Array[Double]
 
  Don't we need a template on float/double ? This will give us
 memory
  savings...
 
  Thanks.
 
  Deb
 

   
  
 



Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

2014-04-29 Thread David Hall
Yeah, that's probably the easiest though obviously pretty hacky.

I'm surprised that the hessian approximation isn't worse than it is. (As
in, I'd expect error messages.) It's obviously line searching much more, so
the approximation must be worse. You might be interested in this online
bfgs:
http://jmlr.org/proceedings/papers/v2/schraudolph07a/schraudolph07a.pdf

-- David


On Tue, Apr 29, 2014 at 3:30 PM, DB Tsai dbt...@stanford.edu wrote:

 Have a quick hack to understand the behavior of SLBFGS
 (Stochastic-LBFGS) by overwriting the breeze iterations method to get the
 current LBFGS step to ensure that the objective function is the same during
 the line search step. David, the following is my code, have a better way to
 inject into it?

 https://github.com/dbtsai/spark/tree/dbtsai-lbfgshack

 Couple findings,

 1) miniBatch (using rdd sample api) for each iteration is slower than full
 data training when the full data is cached. Probably because sample is not
 efficiency in Spark.

 2) Since in the line search steps, we use the same sample of data (the
 same objective function), the SLBFGS actually converges well.

 3) For news20 dataset, with 0.05 miniBatch size, it takes 14 SLBFGS steps
 (29 data iterations, 74.5seconds) to converge to loss  0.10. For LBFGS
 with full data training, it takes 9 LBFGS steps (12 data iterations, 37.6
 seconds) to converge to loss  0.10.

 It seems that as long as the noisy gradient happens in different SLBFGS
 steps, it still works.

 (ps, I also tried in line search step, I use different sample of data, and
 it just doesn't work as we expect.)



 Sincerely,

 DB Tsai
 ---
 My Blog: https://www.dbtsai.com
 LinkedIn: https://www.linkedin.com/in/dbtsai


 On Mon, Apr 28, 2014 at 8:55 AM, David Hall d...@cs.berkeley.edu wrote:

 That's right.

 FWIW, caching should be automatic now, but it might be the version of
 Breeze you're using doesn't do that yet.

 Also, In breeze.util._ there's an implicit that adds a tee method to
 iterator, and also a last method. Both are useful for things like this.

 -- David


 On Sun, Apr 27, 2014 at 11:53 PM, DB Tsai dbt...@stanford.edu wrote:

 I think I figure it out. Instead of calling minimize, and record the
 loss in the DiffFunction, I should do the following.

 val states = lbfgs.iterations(new CachedDiffFunction(costFun),
 initialWeights.toBreeze.toDenseVector)
 states.foreach(state = lossHistory.append(state.value))

 All the losses in states should be decreasing now. Am I right?



 Sincerely,

 DB Tsai
 ---
 My Blog: https://www.dbtsai.com
 LinkedIn: https://www.linkedin.com/in/dbtsai


 On Sun, Apr 27, 2014 at 11:31 PM, DB Tsai dbt...@stanford.edu wrote:

 Also, how many failure of rejection will terminate the optimization
 process? How is it related to numberOfImprovementFailures?

 Thanks.


 Sincerely,

 DB Tsai
 ---
 My Blog: https://www.dbtsai.com
 LinkedIn: https://www.linkedin.com/in/dbtsai


 On Sun, Apr 27, 2014 at 11:28 PM, DB Tsai dbt...@stanford.edu wrote:

 Hi David,

 I'm recording the loss history in the DiffFunction implementation, and
 that's why the rejected step is also recorded in my loss history.

 Is there any api in Breeze LBFGS to get the history which already
 excludes the reject step? Or should I just call iterations method and
 check iteratingShouldStop instead?

 Thanks.


 Sincerely,

 DB Tsai
 ---
 My Blog: https://www.dbtsai.com
 LinkedIn: https://www.linkedin.com/in/dbtsai


 On Fri, Apr 25, 2014 at 3:10 PM, David Hall d...@cs.berkeley.eduwrote:

 LBFGS will not take a step that sends the objective value up. It
 might try a step that is too big and reject it, so if you're just 
 logging
 everything that gets tried by LBFGS, you could see that. The iterations
 method of the minimizer should never return an increasing objective 
 value.
 If you're regularizing, are you including the regularizer in the 
 objective
 value computation?

 GD is almost never worth your time.

 -- David

 On Fri, Apr 25, 2014 at 2:57 PM, DB Tsai dbt...@stanford.edu wrote:

 Another interesting benchmark.

 *News20 dataset - 0.14M row, 1,355,191 features, 0.034% non-zero
 elements.*

 LBFGS converges in 70 seconds, while GD seems to be not progressing.

 Dense feature vector will be too big to fit in the memory, so only
 conduct the sparse benchmark.

 I saw the sometimes the loss bumps up, and it's weird for me. Since
 the cost function of logistic regression is convex, it should be
 monotonically decreasing.  David, any suggestion?

 The detail figure:

 https://github.com/dbtsai/spark-lbfgs-benchmark/raw/0b774682e398b4f7e0ce01a69c44000eb0e73454/result/news20.pdf


 *Rcv1 dataset - 6.8M row, 677,399 features, 0.15% non-zero elements.*

 LBFGS converges in 25 seconds, while GD also seems to be not
 progressing

Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

2014-04-28 Thread David Hall
That's right.

FWIW, caching should be automatic now, but it might be the version of
Breeze you're using doesn't do that yet.

Also, In breeze.util._ there's an implicit that adds a tee method to
iterator, and also a last method. Both are useful for things like this.

-- David

On Sun, Apr 27, 2014 at 11:53 PM, DB Tsai dbt...@stanford.edu wrote:

 I think I figure it out. Instead of calling minimize, and record the loss
 in the DiffFunction, I should do the following.

 val states = lbfgs.iterations(new CachedDiffFunction(costFun),
 initialWeights.toBreeze.toDenseVector)
 states.foreach(state = lossHistory.append(state.value))

 All the losses in states should be decreasing now. Am I right?



 Sincerely,

 DB Tsai
 ---
 My Blog: https://www.dbtsai.com
 LinkedIn: https://www.linkedin.com/in/dbtsai


 On Sun, Apr 27, 2014 at 11:31 PM, DB Tsai dbt...@stanford.edu wrote:

 Also, how many failure of rejection will terminate the optimization
 process? How is it related to numberOfImprovementFailures?

 Thanks.


 Sincerely,

 DB Tsai
 ---
 My Blog: https://www.dbtsai.com
 LinkedIn: https://www.linkedin.com/in/dbtsai


 On Sun, Apr 27, 2014 at 11:28 PM, DB Tsai dbt...@stanford.edu wrote:

 Hi David,

 I'm recording the loss history in the DiffFunction implementation, and
 that's why the rejected step is also recorded in my loss history.

 Is there any api in Breeze LBFGS to get the history which already
 excludes the reject step? Or should I just call iterations method and
 check iteratingShouldStop instead?

 Thanks.


 Sincerely,

 DB Tsai
 ---
 My Blog: https://www.dbtsai.com
 LinkedIn: https://www.linkedin.com/in/dbtsai


 On Fri, Apr 25, 2014 at 3:10 PM, David Hall d...@cs.berkeley.eduwrote:

 LBFGS will not take a step that sends the objective value up. It might
 try a step that is too big and reject it, so if you're just logging
 everything that gets tried by LBFGS, you could see that. The iterations
 method of the minimizer should never return an increasing objective value.
 If you're regularizing, are you including the regularizer in the objective
 value computation?

 GD is almost never worth your time.

 -- David

 On Fri, Apr 25, 2014 at 2:57 PM, DB Tsai dbt...@stanford.edu wrote:

 Another interesting benchmark.

 *News20 dataset - 0.14M row, 1,355,191 features, 0.034% non-zero
 elements.*

 LBFGS converges in 70 seconds, while GD seems to be not progressing.

 Dense feature vector will be too big to fit in the memory, so only
 conduct the sparse benchmark.

 I saw the sometimes the loss bumps up, and it's weird for me. Since
 the cost function of logistic regression is convex, it should be
 monotonically decreasing.  David, any suggestion?

 The detail figure:

 https://github.com/dbtsai/spark-lbfgs-benchmark/raw/0b774682e398b4f7e0ce01a69c44000eb0e73454/result/news20.pdf


 *Rcv1 dataset - 6.8M row, 677,399 features, 0.15% non-zero elements.*

 LBFGS converges in 25 seconds, while GD also seems to be not
 progressing.

 Only conduct sparse benchmark for the same reason. I also saw the loss
 bumps up for unknown reason.

 The detail figure:

 https://github.com/dbtsai/spark-lbfgs-benchmark/raw/0b774682e398b4f7e0ce01a69c44000eb0e73454/result/rcv1.pdf


 Sincerely,

 DB Tsai
 ---
 My Blog: https://www.dbtsai.com
 LinkedIn: https://www.linkedin.com/in/dbtsai


 On Thu, Apr 24, 2014 at 2:36 PM, DB Tsai dbt...@stanford.edu wrote:

 rcv1.binary is too sparse (0.15% non-zero elements), so dense format
 will not run due to out of memory. But sparse format runs really well.


 Sincerely,

 DB Tsai
 ---
 My Blog: https://www.dbtsai.com
 LinkedIn: https://www.linkedin.com/in/dbtsai


 On Thu, Apr 24, 2014 at 1:54 PM, DB Tsai dbt...@stanford.edu wrote:

 I'm doing the timer in runMiniBatchSGD after  val numExamples =
 data.count()

 See the following. Running rcv1 dataset now, and will update soon.

 val startTime = System.nanoTime()
 for (i - 1 to numIterations) {
   // Sample a subset (fraction miniBatchFraction) of the total
 data
   // compute and sum up the subgradients on this subset (this is
 one map-reduce)
   val (gradientSum, lossSum) = data.sample(false,
 miniBatchFraction, 42 + i)
 .aggregate((BDV.zeros[Double](weights.size), 0.0))(
   seqOp = (c, v) = (c, v) match { case ((grad, loss),
 (label, features)) =
 val l = gradient.compute(features, label, weights,
 Vectors.fromBreeze(grad))
 (grad, loss + l)
   },
   combOp = (c1, c2) = (c1, c2) match { case ((grad1,
 loss1), (grad2, loss2)) =
 (grad1 += grad2, loss1 + loss2)
   })

   /**
* NOTE(Xinghao): lossSum is computed using the weights from
 the previous iteration
* and regVal

Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

2014-04-25 Thread David Hall
LBFGS will not take a step that sends the objective value up. It might try
a step that is too big and reject it, so if you're just logging
everything that gets tried by LBFGS, you could see that. The iterations
method of the minimizer should never return an increasing objective value.
If you're regularizing, are you including the regularizer in the objective
value computation?

GD is almost never worth your time.

-- David

On Fri, Apr 25, 2014 at 2:57 PM, DB Tsai dbt...@stanford.edu wrote:

 Another interesting benchmark.

 *News20 dataset - 0.14M row, 1,355,191 features, 0.034% non-zero elements.*

 LBFGS converges in 70 seconds, while GD seems to be not progressing.

 Dense feature vector will be too big to fit in the memory, so only conduct
 the sparse benchmark.

 I saw the sometimes the loss bumps up, and it's weird for me. Since the
 cost function of logistic regression is convex, it should be monotonically
 decreasing.  David, any suggestion?

 The detail figure:

 https://github.com/dbtsai/spark-lbfgs-benchmark/raw/0b774682e398b4f7e0ce01a69c44000eb0e73454/result/news20.pdf


 *Rcv1 dataset - 6.8M row, 677,399 features, 0.15% non-zero elements.*

 LBFGS converges in 25 seconds, while GD also seems to be not progressing.

 Only conduct sparse benchmark for the same reason. I also saw the loss
 bumps up for unknown reason.

 The detail figure:

 https://github.com/dbtsai/spark-lbfgs-benchmark/raw/0b774682e398b4f7e0ce01a69c44000eb0e73454/result/rcv1.pdf


 Sincerely,

 DB Tsai
 ---
 My Blog: https://www.dbtsai.com
 LinkedIn: https://www.linkedin.com/in/dbtsai


 On Thu, Apr 24, 2014 at 2:36 PM, DB Tsai dbt...@stanford.edu wrote:

 rcv1.binary is too sparse (0.15% non-zero elements), so dense format
 will not run due to out of memory. But sparse format runs really well.


 Sincerely,

 DB Tsai
 ---
 My Blog: https://www.dbtsai.com
 LinkedIn: https://www.linkedin.com/in/dbtsai


 On Thu, Apr 24, 2014 at 1:54 PM, DB Tsai dbt...@stanford.edu wrote:

 I'm doing the timer in runMiniBatchSGD after  val numExamples =
 data.count()

 See the following. Running rcv1 dataset now, and will update soon.

 val startTime = System.nanoTime()
 for (i - 1 to numIterations) {
   // Sample a subset (fraction miniBatchFraction) of the total data
   // compute and sum up the subgradients on this subset (this is one
 map-reduce)
   val (gradientSum, lossSum) = data.sample(false, miniBatchFraction,
 42 + i)
 .aggregate((BDV.zeros[Double](weights.size), 0.0))(
   seqOp = (c, v) = (c, v) match { case ((grad, loss), (label,
 features)) =
 val l = gradient.compute(features, label, weights,
 Vectors.fromBreeze(grad))
 (grad, loss + l)
   },
   combOp = (c1, c2) = (c1, c2) match { case ((grad1, loss1),
 (grad2, loss2)) =
 (grad1 += grad2, loss1 + loss2)
   })

   /**
* NOTE(Xinghao): lossSum is computed using the weights from the
 previous iteration
* and regVal is the regularization value computed in the previous
 iteration as well.
*/
   stochasticLossHistory.append(lossSum / miniBatchSize + regVal)
   val update = updater.compute(
 weights, Vectors.fromBreeze(gradientSum / miniBatchSize),
 stepSize, i, regParam)
   weights = update._1
   regVal = update._2
   timeStamp.append(System.nanoTime() - startTime)
 }






 Sincerely,

 DB Tsai
 ---
 My Blog: https://www.dbtsai.com
 LinkedIn: https://www.linkedin.com/in/dbtsai


 On Thu, Apr 24, 2014 at 1:44 PM, Xiangrui Meng men...@gmail.com wrote:

 I don't understand why sparse falls behind dense so much at the very
 first iteration. I didn't see count() is called in

 https://github.com/dbtsai/spark-lbfgs-benchmark/blob/master/src/main/scala/org/apache/spark/mllib/benchmark/BinaryLogisticRegression.scala
 . Maybe you have local uncommitted changes.

 Best,
 Xiangrui

 On Thu, Apr 24, 2014 at 11:26 AM, DB Tsai dbt...@stanford.edu wrote:
  Hi Xiangrui,
 
  Yes, I'm using yarn-cluster mode, and I did check # of executors I
 specified
  are the same as the actual running executors.
 
  For caching and materialization, I've the timer in optimizer after
 calling
  count(); as a result, the time for materialization in cache isn't in
 the
  benchmark.
 
  The difference you saw is actually from dense feature or sparse
 feature
  vector. For LBFGS and GD dense feature, you can see the first
 iteration
  takes the same time. It's true for GD.
 
  I'm going to run rcv1.binary which only has 0.15% non-zero elements to
  verify the hypothesis.
 
 
  Sincerely,
 
  DB Tsai
  ---
  My Blog: https://www.dbtsai.com
  LinkedIn: https://www.linkedin.com/in/dbtsai
 
 
  On Thu, Apr 24, 2014 at 1:09 AM, Xiangrui Meng men...@gmail.com
 wrote:
 
 

Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

2014-04-23 Thread David Hall
On Wed, Apr 23, 2014 at 9:30 PM, Evan Sparks evan.spa...@gmail.com wrote:

 What is the number of non zeroes per row (and number of features) in the
 sparse case? We've hit some issues with breeze sparse support in the past
 but for sufficiently sparse data it's still pretty good.


Any chance you remember what the problems were? I'm sure it could be
better, but it's good to know where improvements need to happen.

-- David



  On Apr 23, 2014, at 9:21 PM, DB Tsai dbt...@stanford.edu wrote:
 
  Hi all,
 
  I'm benchmarking Logistic Regression in MLlib using the newly added
 optimizer LBFGS and GD. I'm using the same dataset and the same methodology
 in this paper, http://www.csie.ntu.edu.tw/~cjlin/papers/l1.pdf
 
  I want to know how Spark scale while adding workers, and how optimizers
 and input format (sparse or dense) impact performance.
 
  The benchmark code can be found here,
 https://github.com/dbtsai/spark-lbfgs-benchmark
 
  The first dataset I benchmarked is a9a which only has 2.2MB. I
 duplicated the dataset, and made it 762MB to have 11M rows. This dataset
 has 123 features and 11% of the data are non-zero elements.
 
  In this benchmark, all the dataset is cached in memory.
 
  As we expect, LBFGS converges faster than GD, and at some point, no
 matter how we push GD, it will converge slower and slower.
 
  However, it's surprising that sparse format runs slower than dense
 format. I did see that sparse format takes significantly smaller amount of
 memory in caching RDD, but sparse is 40% slower than dense. I think sparse
 should be fast since when we compute x wT, since x is sparse, we can do it
 faster. I wonder if there is anything I'm doing wrong.
 
  The attachment is the benchmark result.
 
  Thanks.
 
  Sincerely,
 
  DB Tsai
  ---
  My Blog: https://www.dbtsai.com
  LinkedIn: https://www.linkedin.com/in/dbtsai



Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

2014-04-23 Thread David Hall
Was the weight vector sparse? The gradients? Or just the feature vectors?


On Wed, Apr 23, 2014 at 10:08 PM, DB Tsai dbt...@dbtsai.com wrote:

 The figure showing the Log-Likelihood vs Time can be found here.


 https://github.com/dbtsai/spark-lbfgs-benchmark/raw/fd703303fb1c16ef5714901739154728550becf4/result/a9a11M.pdf

 Let me know if you can not open it.

 Sincerely,

 DB Tsai
 ---
 My Blog: https://www.dbtsai.com
 LinkedIn: https://www.linkedin.com/in/dbtsai


 On Wed, Apr 23, 2014 at 9:34 PM, Shivaram Venkataraman 
 shiva...@eecs.berkeley.edu wrote:

  I don't think the attachment came through in the list. Could you upload
  the results somewhere and link to them ?
 
 
  On Wed, Apr 23, 2014 at 9:32 PM, DB Tsai dbt...@dbtsai.com wrote:
 
  123 features per rows, and in average, 89% are zeros.
  On Apr 23, 2014 9:31 PM, Evan Sparks evan.spa...@gmail.com wrote:
 
   What is the number of non zeroes per row (and number of features) in
 the
   sparse case? We've hit some issues with breeze sparse support in the
  past
   but for sufficiently sparse data it's still pretty good.
  
On Apr 23, 2014, at 9:21 PM, DB Tsai dbt...@stanford.edu wrote:
   
Hi all,
   
I'm benchmarking Logistic Regression in MLlib using the newly added
   optimizer LBFGS and GD. I'm using the same dataset and the same
  methodology
   in this paper, http://www.csie.ntu.edu.tw/~cjlin/papers/l1.pdf
   
I want to know how Spark scale while adding workers, and how
  optimizers
   and input format (sparse or dense) impact performance.
   
The benchmark code can be found here,
   https://github.com/dbtsai/spark-lbfgs-benchmark
   
The first dataset I benchmarked is a9a which only has 2.2MB. I
   duplicated the dataset, and made it 762MB to have 11M rows. This
 dataset
   has 123 features and 11% of the data are non-zero elements.
   
In this benchmark, all the dataset is cached in memory.
   
As we expect, LBFGS converges faster than GD, and at some point, no
   matter how we push GD, it will converge slower and slower.
   
However, it's surprising that sparse format runs slower than dense
   format. I did see that sparse format takes significantly smaller
 amount
  of
   memory in caching RDD, but sparse is 40% slower than dense. I think
  sparse
   should be fast since when we compute x wT, since x is sparse, we can
 do
  it
   faster. I wonder if there is anything I'm doing wrong.
   
The attachment is the benchmark result.
   
Thanks.
   
Sincerely,
   
DB Tsai
---
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai
  
 
 
 



Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

2014-04-23 Thread David Hall
On Wed, Apr 23, 2014 at 10:18 PM, DB Tsai dbt...@dbtsai.com wrote:

 ps, it doesn't make sense to have weight and gradient sparse unless
 with strong L1 penalty.


Sure, I was just checking the obvious things. Have you run it through it a
profiler to see where the problem is?




 Sincerely,

 DB Tsai
 ---
 My Blog: https://www.dbtsai.com
 LinkedIn: https://www.linkedin.com/in/dbtsai


 On Wed, Apr 23, 2014 at 10:17 PM, DB Tsai dbt...@dbtsai.com wrote:
  In mllib, the weight, and gradient are dense. Only feature is sparse.
 
  Sincerely,
 
  DB Tsai
  ---
  My Blog: https://www.dbtsai.com
  LinkedIn: https://www.linkedin.com/in/dbtsai
 
 
  On Wed, Apr 23, 2014 at 10:16 PM, David Hall d...@cs.berkeley.edu
 wrote:
  Was the weight vector sparse? The gradients? Or just the feature
 vectors?
 
 
  On Wed, Apr 23, 2014 at 10:08 PM, DB Tsai dbt...@dbtsai.com wrote:
 
  The figure showing the Log-Likelihood vs Time can be found here.
 
 
 
 https://github.com/dbtsai/spark-lbfgs-benchmark/raw/fd703303fb1c16ef5714901739154728550becf4/result/a9a11M.pdf
 
  Let me know if you can not open it.
 
  Sincerely,
 
  DB Tsai
  ---
  My Blog: https://www.dbtsai.com
  LinkedIn: https://www.linkedin.com/in/dbtsai
 
 
  On Wed, Apr 23, 2014 at 9:34 PM, Shivaram Venkataraman 
  shiva...@eecs.berkeley.edu wrote:
 
   I don't think the attachment came through in the list. Could you
 upload
   the results somewhere and link to them ?
  
  
   On Wed, Apr 23, 2014 at 9:32 PM, DB Tsai dbt...@dbtsai.com wrote:
  
   123 features per rows, and in average, 89% are zeros.
   On Apr 23, 2014 9:31 PM, Evan Sparks evan.spa...@gmail.com
 wrote:
  
What is the number of non zeroes per row (and number of features)
 in
the
sparse case? We've hit some issues with breeze sparse support in
 the
   past
but for sufficiently sparse data it's still pretty good.
   
 On Apr 23, 2014, at 9:21 PM, DB Tsai dbt...@stanford.edu
 wrote:

 Hi all,

 I'm benchmarking Logistic Regression in MLlib using the newly
 added
optimizer LBFGS and GD. I'm using the same dataset and the same
   methodology
in this paper, http://www.csie.ntu.edu.tw/~cjlin/papers/l1.pdf

 I want to know how Spark scale while adding workers, and how
   optimizers
and input format (sparse or dense) impact performance.

 The benchmark code can be found here,
https://github.com/dbtsai/spark-lbfgs-benchmark

 The first dataset I benchmarked is a9a which only has 2.2MB. I
duplicated the dataset, and made it 762MB to have 11M rows. This
dataset
has 123 features and 11% of the data are non-zero elements.

 In this benchmark, all the dataset is cached in memory.

 As we expect, LBFGS converges faster than GD, and at some
 point, no
matter how we push GD, it will converge slower and slower.

 However, it's surprising that sparse format runs slower than
 dense
format. I did see that sparse format takes significantly smaller
amount
   of
memory in caching RDD, but sparse is 40% slower than dense. I
 think
   sparse
should be fast since when we compute x wT, since x is sparse, we
 can
do
   it
faster. I wonder if there is anything I'm doing wrong.

 The attachment is the benchmark result.

 Thanks.

 Sincerely,

 DB Tsai
 ---
 My Blog: https://www.dbtsai.com
 LinkedIn: https://www.linkedin.com/in/dbtsai
   
  
  
  
 
 



Re: RFC: varargs in Logging.scala?

2014-04-11 Thread David Hall
Another usage that's nice is:

logDebug {
   val timeS = timeMillis/1000.0
   sTime: $timeS
}

which can be useful for more complicated expressions.


On Thu, Apr 10, 2014 at 5:55 PM, Michael Armbrust mich...@databricks.comwrote:

 BTW...

 You can do calculations in string interpolation:
 sTime: ${timeMillis / 1000}s

 Or use format strings.
 fFloat with two decimal places: $floatValue%.2f

 More info:
 http://docs.scala-lang.org/overviews/core/string-interpolation.html


 On Thu, Apr 10, 2014 at 5:46 PM, Michael Armbrust mich...@databricks.com
 wrote:

  Hi Marcelo,
 
  Thanks for bringing this up here, as this has been a topic of debate
  recently.  Some thoughts below.
 
  ... all of the suffer from the fact that the log message needs to be
 built
  even
 
  though it might not be used.
 
 
  This is not true of the current implementation (and this is actually why
  Spark has a logging trait instead of just using a logger directly.)
 
  If you look at the original function signatures:
 
  protected def logDebug(msg: = String) ...
 
 
  The = implies that we are passing the msg by name instead of by value.
  Under the covers, scala is creating a closure that can be used to
 calculate
  the log message, only if its actually required.  This does result is a
  significant performance improvement, but still requires allocating an
  object for the closure.  The bytecode is really something like this:
 
  val logMessage = new Function0() { def call() =  Log message +
 someExpensiveComputation() }
  log.debug(logMessage)
 
 
  In Catalyst and Spark SQL we are using the scala-logging package, which
  uses macros to automatically rewrite all of your log statements.
 
  You write: logger.debug(sLog message $someExpensiveComputation)
 
  You get:
 
  if(logger.debugEnabled) {
val logMsg = Log message + someExpensiveComputation()
logger.debug(logMsg)
  }
 
  IMHO, this is the cleanest option (and is supported by Typesafe).  Based
  on a micro-benchmark, it is also the fastest:
 
  std logging: 19885.48ms
  spark logging 914.408ms
  scala logging 729.779ms
 
  Once the dust settles from the 1.0 release, I'd be in favor of
  standardizing on scala-logging.
 
  Michael
 



Re: MLLib - Thoughts about refactoring Updater for LBFGS?

2014-03-30 Thread David Hall
 0.07304737423337761
   Iteration 15: loss 0.9184205380342829, diff 0.0
   Iteration 16: loss 0.9184205380342829, diff 0.0
   Iteration 17: loss 0.8259870936519939, diff 0.1006438117513297
   Iteration 18: loss 0.8259870936519939, diff 0.0
   Iteration 19: loss 0.8259870936519939, diff 0.0
   Iteration 20: loss 0.6327447552109576, diff 0.233952934583647
   Iteration 21: loss 0.6327447552109576, diff 0.0
   Iteration 22: loss 0.6327447552109576, diff 0.0
   Iteration 23: loss 0.5534101162436362, diff 0.12538154276652747
   Iteration 24: loss 0.5534101162436362, diff 0.0
   Iteration 25: loss 0.5534101162436362, diff 0.0
   Iteration 26: loss 0.40450200866125635, diff 0.2690732137675816
   Iteration 27: loss 0.40450200866125635, diff 0.0
   Iteration 28: loss 0.40450200866125635, diff 0.0
   Iteration 29: loss 0.30788249908237314, diff 0.23885980452569502
  
   Sincerely,
  
   DB Tsai
   Machine Learning Engineer
   Alpine Data Labs
   --
   Web: http://alpinenow.com/
  
  
   On Wed, Mar 5, 2014 at 2:00 PM, David Hall d...@cs.berkeley.edu
  wrote:
   On Wed, Mar 5, 2014 at 1:57 PM, DB Tsai dbt...@alpinenow.com
 wrote:
  
   Hi David,
  
   On Tue, Mar 4, 2014 at 8:13 PM, dlwh david.lw.h...@gmail.com
 wrote:
I'm happy to help fix any problems. I've verified at points that
 the
implementation gives the exact same sequence of iterates for a few
   different
functions (with a particular line search) as the c port of lbfgs.
  So I'm
   a
little surprised it fails where Fortran succeeds... but only a
  little.
   This
was fixed late last year.
   I'm working on a reproducible test case using breeze vs fortran
   implementation to show the problem I've run into. The test will be
 in
   one of the test cases in my Spark fork, is it okay for you to
   investigate the issue? Or do I need to make it as a standalone test?
  
  
  
   Um, as long as it wouldn't be too hard to pull out.
  
  
  
   Will send you the test later today.
  
   Thanks.
  
   Sincerely,
  
   DB Tsai
   Machine Learning Engineer
   Alpine Data Labs
   --
   Web: http://alpinenow.com/
  
 



Re: Making RDDs Covariant

2014-03-22 Thread David Hall
On Sat, Mar 22, 2014 at 8:59 AM, Pascal Voitot Dev 
pascal.voitot@gmail.com wrote:

 The problem I was talking about is when you try to use typeclass converters
 and make them contravariant/covariant for input/output. Something like:

 Reader[-I, +O] { def read(i:I): O }

 Doing this, you soon have implicit collisions and philosophical concerns
 about what it means to serialize/deserialize a Parent class and a Child
 class...



You should (almost) never make a typeclass param contravariant. It's almost
certainly not what you want:

https://issues.scala-lang.org/browse/SI-2509

-- David


Re: MLLib - Thoughts about refactoring Updater for LBFGS?

2014-03-06 Thread David Hall
On Thu, Mar 6, 2014 at 4:21 PM, DB Tsai dbt...@alpinenow.com wrote:

 Hi David,

 I can converge to the same result with your breeze LBFGS and Fortran
 implementations now. Probably, I made some mistakes when I tried
 breeze before. I apologize that I claimed it's not stable.

 See the test case in BreezeLBFGSSuite.scala
 https://github.com/AlpineNow/spark/tree/dbtsai-breezeLBFGS

 This is training multinomial logistic regression against iris dataset,
 and both optimizers can train the models with 98% training accuracy.


great to hear! There were some bugs in LBFGS about 6 months ago, so
depending on the last time you tried it, it might indeed have been bugged.



 There are two issues to use Breeze in Spark,

 1) When the gradientSum and lossSum are computed distributively in
 custom defined DiffFunction which will be passed into your optimizer,
 Spark will complain LBFGS class is not serializable. In
 BreezeLBFGS.scala, I've to convert RDD to array to make it work
 locally. It should be easy to fix by just having LBFGS to implement
 Serializable.


I'm not sure why Spark should be serializing LBFGS? Shouldn't it live on
the controller node? Or is this a per-node thing?

But no problem to make it serializable.



 2) Breeze computes redundant gradient and loss. See the following log
 from both Fortran and Breeze implementations.


Err, yeah. I should probably have LBFGS do this automatically, but there's
a CachedDiffFunction that gets rid of the redundant calculations.

-- David



 Thanks.

 Fortran:
 Iteration -1: loss 1.3862943611198926, diff 1.0
 Iteration 0: loss 1.5846343143210866, diff 0.14307193024217352
 Iteration 1: loss 1.1242501524477688, diff 0.29053004039012126
 Iteration 2: loss 1.0930151243303563, diff 0.027782962952189336
 Iteration 3: loss 1.054036932835569, diff 0.03566113127440601
 Iteration 4: loss 0.9907956302751622, diff 0.0507649459571
 Iteration 5: loss 0.9184205380342829, diff 0.07304737423337761
 Iteration 6: loss 0.8259870936519937, diff 0.10064381175132982
 Iteration 7: loss 0.6327447552109574, diff 0.23395293458364716
 Iteration 8: loss 0.5534101162436359, diff 0.1253815427665277
 Iteration 9: loss 0.4045020086612566, diff 0.26907321376758075
 Iteration 10: loss 0.3078824990823728, diff 0.23885980452569627

 Breeze:
 Iteration -1: loss 1.3862943611198926, diff 1.0
 Mar 6, 2014 3:59:11 PM com.github.fommil.netlib.BLAS clinit
 WARNING: Failed to load implementation from:
 com.github.fommil.netlib.NativeSystemBLAS
 Mar 6, 2014 3:59:11 PM com.github.fommil.netlib.BLAS clinit
 WARNING: Failed to load implementation from:
 com.github.fommil.netlib.NativeRefBLAS
 Iteration 0: loss 1.3862943611198926, diff 0.0
 Iteration 1: loss 1.5846343143210866, diff 0.14307193024217352
 Iteration 2: loss 1.1242501524477688, diff 0.29053004039012126
 Iteration 3: loss 1.1242501524477688, diff 0.0
 Iteration 4: loss 1.1242501524477688, diff 0.0
 Iteration 5: loss 1.0930151243303563, diff 0.027782962952189336
 Iteration 6: loss 1.0930151243303563, diff 0.0
 Iteration 7: loss 1.0930151243303563, diff 0.0
 Iteration 8: loss 1.054036932835569, diff 0.03566113127440601
 Iteration 9: loss 1.054036932835569, diff 0.0
 Iteration 10: loss 1.054036932835569, diff 0.0
 Iteration 11: loss 0.9907956302751622, diff 0.0507649459571
 Iteration 12: loss 0.9907956302751622, diff 0.0
 Iteration 13: loss 0.9907956302751622, diff 0.0
 Iteration 14: loss 0.9184205380342829, diff 0.07304737423337761
 Iteration 15: loss 0.9184205380342829, diff 0.0
 Iteration 16: loss 0.9184205380342829, diff 0.0
 Iteration 17: loss 0.8259870936519939, diff 0.1006438117513297
 Iteration 18: loss 0.8259870936519939, diff 0.0
 Iteration 19: loss 0.8259870936519939, diff 0.0
 Iteration 20: loss 0.6327447552109576, diff 0.233952934583647
 Iteration 21: loss 0.6327447552109576, diff 0.0
 Iteration 22: loss 0.6327447552109576, diff 0.0
 Iteration 23: loss 0.5534101162436362, diff 0.12538154276652747
 Iteration 24: loss 0.5534101162436362, diff 0.0
 Iteration 25: loss 0.5534101162436362, diff 0.0
 Iteration 26: loss 0.40450200866125635, diff 0.2690732137675816
 Iteration 27: loss 0.40450200866125635, diff 0.0
 Iteration 28: loss 0.40450200866125635, diff 0.0
 Iteration 29: loss 0.30788249908237314, diff 0.23885980452569502

 Sincerely,

 DB Tsai
 Machine Learning Engineer
 Alpine Data Labs
 --
 Web: http://alpinenow.com/


 On Wed, Mar 5, 2014 at 2:00 PM, David Hall d...@cs.berkeley.edu wrote:
  On Wed, Mar 5, 2014 at 1:57 PM, DB Tsai dbt...@alpinenow.com wrote:
 
  Hi David,
 
  On Tue, Mar 4, 2014 at 8:13 PM, dlwh david.lw.h...@gmail.com wrote:
   I'm happy to help fix any problems. I've verified at points that the
   implementation gives the exact same sequence of iterates for a few
  different
   functions (with a particular line search) as the c port of lbfgs. So
 I'm
  a
   little surprised it fails where Fortran succeeds... but only a little.
  This
   was fixed late last year.
  I'm working

Re: MLLib - Thoughts about refactoring Updater for LBFGS?

2014-03-05 Thread David Hall
On Wed, Mar 5, 2014 at 8:50 AM, Debasish Das debasish.da...@gmail.comwrote:

 Hi David,

 Few questions on breeze solvers:

 1. I feel the right place of adding useful things from RISO LBFGS (based on
 Professor Nocedal's fortran code) will be breeze. It will involve stress
 testing breeze LBFGS on large sparse datasets and contributing fixes to
 existing breeze LBFGS with the learning from RISO LBFGS.

 You agree on that right ?


That would be great.


 2. Normally for doing experiments like 1, I fix the line search. What's
 your preferred line search in breeze BFGS ? I will also use that.

 More Thuente and Strong wolfe with backtracking helped me in the past.


The default is a Strong Wolfe search:
https://github.com/scalanlp/breeze/blob/master/src/main/scala/breeze/optimize/StrongWolfe.scala



 3. The paper that you sent also says on L-BFGS-B is better on box
 constraints. I feel It's worthwhile to have both the solvers because many
 practical problems need box constraints or complex constraints can be
 reformulated in the realm of unconstrained and box constraints.


 Example use-cases for me are automatic feature extraction from photo/video
 frames using factorization.

 I will compare L-BFGS-B vs constrained QN method that you have in Breeze
 within an analysis similar to 1.


Great. Yeah I fully expect l-bfgs-b to be better on box constraints. It
turns out that this other method looked easier to implement and gave
reasonably good results even with box constraints.



 4. Do you have a road-map on adding CG solvers in breeze ? Linear CG solver
 to compare with BLAS posv seems like a good usecase for me in mllib ALS.
 DB sent a paper by Professor Ng which shows the effectiveness of CG and
 BFGS over SGD in the email chain.


We have a linear CG solver (
https://github.com/scalanlp/breeze/blob/master/src/main/scala/breeze/optimize/linear/ConjugateGradient.scala)
which is used in the Truncated Newton solver. (
https://github.com/scalanlp/breeze/blob/master/src/main/scala/breeze/optimize/TruncatedNewtonMinimizer.scala)
But I've not implemented nonlinear CG, and honestly hadn't planned on it,
per below.



 I believe on non-convex problems like Matrix Factorization, CG family might
 converge to a better solution than BFGS. No way to know till we experiment
 on the datasets :-)


I've never had good experience with CG, and a vision colleague of mine
found that LBFGS was better on his (non-convex) stuff, but I could be
easily persuaded.

Thanks!

-- David



 Thanks.
 Deb



 On Tue, Mar 4, 2014 at 8:13 PM, dlwh david.lw.h...@gmail.com wrote:

  Just subscribing to this list, so apologies for quoting weirdly and any
  other
  etiquette offenses.
 
 
  DB Tsai wrote
   Hi Deb,
  
   I had tried breeze L-BFGS algorithm, and when I tried it couple weeks
   ago, it's not as stable as the fortran implementation. I guessed the
   problem is in the line search related thing. Since we may bring breeze
   dependency for the sparse format support as you pointed out, we can
   just try to fix the L-BFGS in breeze, and we can get OWL-QN and
   L-BFGS-B.
  
   What do you think?
 
  I'm happy to help fix any problems. I've verified at points that the
  implementation gives the exact same sequence of iterates for a few
  different
  functions (with a particular line search) as the c port of lbfgs. So I'm
 a
  little surprised it fails where Fortran succeeds... but only a little.
 This
  was fixed late last year.
 
  OWL-QN seems to mostly be stable, but probably deserves more testing.
  Presumably it has whatever defects my LBFGS does. (It's really pretty
  straightforward to implement given an L-BFGS)
 
  We don't provide an L-BFGS-B implementation. We do have a more general
  constrained qn method based on
  http://jmlr.org/proceedings/papers/v5/schmidt09a/schmidt09a.pdf (which
  uses
  a L-BFGS type update as part of the algorithm). From the experiments in
  their paper, it's likely to not work as well for bound constraints, but
 can
  do things that lbfgsb can't.
 
  Again, let me know what I can help with.
 
  -- David Hall
 
 
  On Mon, Mar 3, 2014 at 3:52 PM, DB Tsai lt;dbtsai@gt; wrote:
   Hi Deb,
  
   a.  OWL-QN for solving L1 natively in BFGS
   Based on what I saw from
  
 
 https://github.com/tjhunter/scalanlp-core/blob/master/learn/src/main/scala/breeze/optimize/OWLQN.scala
   , it seems that it's not difficult to implement OWL-QN once LBFGS is
   done.
  
  
   b.  Bound constraints in BFGS : I saw you have converted the fortran
   code.
   Is there a license issue ? I can help in getting that up to speed as
   well.
   I tried to convert the code from Fortran L-BFGS-B implementation to
   java using f2j; the translated code is just a messy, and it just
   doesn't work at all. There is no license issue here. Any idea about
   how to approach this?
  
   c. Few variants of line searches : I will discuss on it.
   For the dbtsai-lbfgs branch seems like it already got merged by
 Jenkins.
   I don't think

Re: MLLib - Thoughts about refactoring Updater for LBFGS?

2014-03-05 Thread David Hall
On Wed, Mar 5, 2014 at 1:57 PM, DB Tsai dbt...@alpinenow.com wrote:

 Hi David,

 On Tue, Mar 4, 2014 at 8:13 PM, dlwh david.lw.h...@gmail.com wrote:
  I'm happy to help fix any problems. I've verified at points that the
  implementation gives the exact same sequence of iterates for a few
 different
  functions (with a particular line search) as the c port of lbfgs. So I'm
 a
  little surprised it fails where Fortran succeeds... but only a little.
 This
  was fixed late last year.
 I'm working on a reproducible test case using breeze vs fortran
 implementation to show the problem I've run into. The test will be in
 one of the test cases in my Spark fork, is it okay for you to
 investigate the issue? Or do I need to make it as a standalone test?



Um, as long as it wouldn't be too hard to pull out.



 Will send you the test later today.

 Thanks.

 Sincerely,

 DB Tsai
 Machine Learning Engineer
 Alpine Data Labs
 --
 Web: http://alpinenow.com/



Re: MLLib - Thoughts about refactoring Updater for LBFGS?

2014-03-05 Thread David Hall
I did not. They would be nice to have.


On Wed, Mar 5, 2014 at 5:21 PM, Debasish Das debasish.da...@gmail.comwrote:

 David,

 There used to be standard BFGS testcases in Professor Nocedal's
 package...did you stress test the solver with them ?

 If not I will shoot him an email for them.

 Thanks.
 Deb



 On Wed, Mar 5, 2014 at 2:00 PM, David Hall d...@cs.berkeley.edu wrote:

  On Wed, Mar 5, 2014 at 1:57 PM, DB Tsai dbt...@alpinenow.com wrote:
 
   Hi David,
  
   On Tue, Mar 4, 2014 at 8:13 PM, dlwh david.lw.h...@gmail.com wrote:
I'm happy to help fix any problems. I've verified at points that the
implementation gives the exact same sequence of iterates for a few
   different
functions (with a particular line search) as the c port of lbfgs. So
  I'm
   a
little surprised it fails where Fortran succeeds... but only a
 little.
   This
was fixed late last year.
   I'm working on a reproducible test case using breeze vs fortran
   implementation to show the problem I've run into. The test will be in
   one of the test cases in my Spark fork, is it okay for you to
   investigate the issue? Or do I need to make it as a standalone test?
  
 
 
  Um, as long as it wouldn't be too hard to pull out.
 
 
  
   Will send you the test later today.
  
   Thanks.
  
   Sincerely,
  
   DB Tsai
   Machine Learning Engineer
   Alpine Data Labs
   --
   Web: http://alpinenow.com/