On Sun, Mar 30, 2014 at 2:01 PM, Debasish Das debasish.da...@gmail.comwrote:
Hi David,
I have started to experiment with BFGS solvers for Spark GLM over large
scale data...
I am also looking to add a good QP solver in breeze that can be used in
Spark ALS for constraint solves...More details on that soon...
I could not load up breeze 0.7 code onto eclipse...There is a folder called
natives in the master but there is no code in thatall the code is in
src/main/scala...
I added the eclipse plugin:
addSbtPlugin(com.github.mpeltonen % sbt-idea % 1.6.0)
addSbtPlugin(com.typesafe.sbteclipse % sbteclipse-plugin % 2.2.0)
But it seems the project is set to use idea...
Could you please explain the dev methodology for breeze ? My idea is to do
solver work in breeze as that's the right place and get it into Spark
through Xiangrui's WIP on Sparse data and breeze support...
It would be great to have a QP Solver: I don't know if you know about this
library: http://www.joptimizer.com/
I'm not quite sure what you mean by dev methodology. If you just mean how
to get code into Breeze, just send a PR to scalanlp/breeze. Unit tests are
good for something nontrivial like this. Maybe some basic documentation.
Thanks.
Deb
On Fri, Mar 7, 2014 at 12:46 AM, DB Tsai dbt...@alpinenow.com wrote:
Hi Xiangrui,
I think it doesn't matter whether we use Fortran/Breeze/RISO for
optimizers since optimization only takes 1% of time. Most of the
time is in gradientSum and lossSum parallel computation.
Sincerely,
DB Tsai
Machine Learning Engineer
Alpine Data Labs
--
Web: http://alpinenow.com/
On Thu, Mar 6, 2014 at 7:10 PM, Xiangrui Meng men...@gmail.com wrote:
Hi DB,
Thanks for doing the comparison! What were the running times for
fortran/breeze/riso?
Best,
Xiangrui
On Thu, Mar 6, 2014 at 4:21 PM, DB Tsai dbt...@alpinenow.com wrote:
Hi David,
I can converge to the same result with your breeze LBFGS and Fortran
implementations now. Probably, I made some mistakes when I tried
breeze before. I apologize that I claimed it's not stable.
See the test case in BreezeLBFGSSuite.scala
https://github.com/AlpineNow/spark/tree/dbtsai-breezeLBFGS
This is training multinomial logistic regression against iris dataset,
and both optimizers can train the models with 98% training accuracy.
There are two issues to use Breeze in Spark,
1) When the gradientSum and lossSum are computed distributively in
custom defined DiffFunction which will be passed into your optimizer,
Spark will complain LBFGS class is not serializable. In
BreezeLBFGS.scala, I've to convert RDD to array to make it work
locally. It should be easy to fix by just having LBFGS to implement
Serializable.
2) Breeze computes redundant gradient and loss. See the following log
from both Fortran and Breeze implementations.
Thanks.
Fortran:
Iteration -1: loss 1.3862943611198926, diff 1.0
Iteration 0: loss 1.5846343143210866, diff 0.14307193024217352
Iteration 1: loss 1.1242501524477688, diff 0.29053004039012126
Iteration 2: loss 1.0930151243303563, diff 0.027782962952189336
Iteration 3: loss 1.054036932835569, diff 0.03566113127440601
Iteration 4: loss 0.9907956302751622, diff 0.0507649459571
Iteration 5: loss 0.9184205380342829, diff 0.07304737423337761
Iteration 6: loss 0.8259870936519937, diff 0.10064381175132982
Iteration 7: loss 0.6327447552109574, diff 0.23395293458364716
Iteration 8: loss 0.5534101162436359, diff 0.1253815427665277
Iteration 9: loss 0.4045020086612566, diff 0.26907321376758075
Iteration 10: loss 0.3078824990823728, diff 0.23885980452569627
Breeze:
Iteration -1: loss 1.3862943611198926, diff 1.0
Mar 6, 2014 3:59:11 PM com.github.fommil.netlib.BLAS clinit
WARNING: Failed to load implementation from:
com.github.fommil.netlib.NativeSystemBLAS
Mar 6, 2014 3:59:11 PM com.github.fommil.netlib.BLAS clinit
WARNING: Failed to load implementation from:
com.github.fommil.netlib.NativeRefBLAS
Iteration 0: loss 1.3862943611198926, diff 0.0
Iteration 1: loss 1.5846343143210866, diff 0.14307193024217352
Iteration 2: loss 1.1242501524477688, diff 0.29053004039012126
Iteration 3: loss 1.1242501524477688, diff 0.0
Iteration 4: loss 1.1242501524477688, diff 0.0
Iteration 5: loss 1.0930151243303563, diff 0.027782962952189336
Iteration 6: loss 1.0930151243303563, diff 0.0
Iteration 7: loss 1.0930151243303563, diff 0.0
Iteration 8: loss 1.054036932835569, diff 0.03566113127440601
Iteration 9: loss 1.054036932835569, diff 0.0
Iteration 10: loss 1.054036932835569, diff 0.0
Iteration 11: loss 0.9907956302751622, diff 0.0507649459571
Iteration 12: loss 0.9907956302751622, diff 0.0
Iteration 13: loss 0.9907956302751622, diff 0.0
Iteration 14: loss 0.9184205380342829, diff