On Sun, Mar 30, 2014 at 2:01 PM, Debasish Das <debasish.da...@gmail.com>wrote:
> Hi David, > > I have started to experiment with BFGS solvers for Spark GLM over large > scale data... > > I am also looking to add a good QP solver in breeze that can be used in > Spark ALS for constraint solves...More details on that soon... > > I could not load up breeze 0.7 code onto eclipse...There is a folder called > natives in the master but there is no code in that....all the code is in > src/main/scala... > > I added the eclipse plugin: > > addSbtPlugin("com.github.mpeltonen" % "sbt-idea" % "1.6.0") > > addSbtPlugin("com.typesafe.sbteclipse" % "sbteclipse-plugin" % "2.2.0") > > But it seems the project is set to use idea... > > Could you please explain the dev methodology for breeze ? My idea is to do > solver work in breeze as that's the right place and get it into Spark > through Xiangrui's WIP on Sparse data and breeze support... > It would be great to have a QP Solver: I don't know if you know about this library: http://www.joptimizer.com/ I'm not quite sure what you mean by dev methodology. If you just mean how to get code into Breeze, just send a PR to scalanlp/breeze. Unit tests are good for something nontrivial like this. Maybe some basic documentation. > > Thanks. > Deb > > > > On Fri, Mar 7, 2014 at 12:46 AM, DB Tsai <dbt...@alpinenow.com> wrote: > > > Hi Xiangrui, > > > > I think it doesn't matter whether we use Fortran/Breeze/RISO for > > optimizers since optimization only takes << 1% of time. Most of the > > time is in gradientSum and lossSum parallel computation. > > > > Sincerely, > > > > DB Tsai > > Machine Learning Engineer > > Alpine Data Labs > > -------------------------------------- > > Web: http://alpinenow.com/ > > > > > > On Thu, Mar 6, 2014 at 7:10 PM, Xiangrui Meng <men...@gmail.com> wrote: > > > Hi DB, > > > > > > Thanks for doing the comparison! What were the running times for > > > fortran/breeze/riso? > > > > > > Best, > > > Xiangrui > > > > > > On Thu, Mar 6, 2014 at 4:21 PM, DB Tsai <dbt...@alpinenow.com> wrote: > > >> Hi David, > > >> > > >> I can converge to the same result with your breeze LBFGS and Fortran > > >> implementations now. Probably, I made some mistakes when I tried > > >> breeze before. I apologize that I claimed it's not stable. > > >> > > >> See the test case in BreezeLBFGSSuite.scala > > >> https://github.com/AlpineNow/spark/tree/dbtsai-breezeLBFGS > > >> > > >> This is training multinomial logistic regression against iris dataset, > > >> and both optimizers can train the models with 98% training accuracy. > > >> > > >> There are two issues to use Breeze in Spark, > > >> > > >> 1) When the gradientSum and lossSum are computed distributively in > > >> custom defined DiffFunction which will be passed into your optimizer, > > >> Spark will complain LBFGS class is not serializable. In > > >> BreezeLBFGS.scala, I've to convert RDD to array to make it work > > >> locally. It should be easy to fix by just having LBFGS to implement > > >> Serializable. > > >> > > >> 2) Breeze computes redundant gradient and loss. See the following log > > >> from both Fortran and Breeze implementations. > > >> > > >> Thanks. > > >> > > >> Fortran: > > >> Iteration -1: loss 1.3862943611198926, diff 1.0 > > >> Iteration 0: loss 1.5846343143210866, diff 0.14307193024217352 > > >> Iteration 1: loss 1.1242501524477688, diff 0.29053004039012126 > > >> Iteration 2: loss 1.0930151243303563, diff 0.027782962952189336 > > >> Iteration 3: loss 1.054036932835569, diff 0.03566113127440601 > > >> Iteration 4: loss 0.9907956302751622, diff 0.05999907649459571 > > >> Iteration 5: loss 0.9184205380342829, diff 0.07304737423337761 > > >> Iteration 6: loss 0.8259870936519937, diff 0.10064381175132982 > > >> Iteration 7: loss 0.6327447552109574, diff 0.23395293458364716 > > >> Iteration 8: loss 0.5534101162436359, diff 0.1253815427665277 > > >> Iteration 9: loss 0.4045020086612566, diff 0.26907321376758075 > > >> Iteration 10: loss 0.3078824990823728, diff 0.23885980452569627 > > >> > > >> Breeze: > > >> Iteration -1: loss 1.3862943611198926, diff 1.0 > > >> Mar 6, 2014 3:59:11 PM com.github.fommil.netlib.BLAS <clinit> > > >> WARNING: Failed to load implementation from: > > >> com.github.fommil.netlib.NativeSystemBLAS > > >> Mar 6, 2014 3:59:11 PM com.github.fommil.netlib.BLAS <clinit> > > >> WARNING: Failed to load implementation from: > > >> com.github.fommil.netlib.NativeRefBLAS > > >> Iteration 0: loss 1.3862943611198926, diff 0.0 > > >> Iteration 1: loss 1.5846343143210866, diff 0.14307193024217352 > > >> Iteration 2: loss 1.1242501524477688, diff 0.29053004039012126 > > >> Iteration 3: loss 1.1242501524477688, diff 0.0 > > >> Iteration 4: loss 1.1242501524477688, diff 0.0 > > >> Iteration 5: loss 1.0930151243303563, diff 0.027782962952189336 > > >> Iteration 6: loss 1.0930151243303563, diff 0.0 > > >> Iteration 7: loss 1.0930151243303563, diff 0.0 > > >> Iteration 8: loss 1.054036932835569, diff 0.03566113127440601 > > >> Iteration 9: loss 1.054036932835569, diff 0.0 > > >> Iteration 10: loss 1.054036932835569, diff 0.0 > > >> Iteration 11: loss 0.9907956302751622, diff 0.05999907649459571 > > >> Iteration 12: loss 0.9907956302751622, diff 0.0 > > >> Iteration 13: loss 0.9907956302751622, diff 0.0 > > >> Iteration 14: loss 0.9184205380342829, diff 0.07304737423337761 > > >> Iteration 15: loss 0.9184205380342829, diff 0.0 > > >> Iteration 16: loss 0.9184205380342829, diff 0.0 > > >> Iteration 17: loss 0.8259870936519939, diff 0.1006438117513297 > > >> Iteration 18: loss 0.8259870936519939, diff 0.0 > > >> Iteration 19: loss 0.8259870936519939, diff 0.0 > > >> Iteration 20: loss 0.6327447552109576, diff 0.233952934583647 > > >> Iteration 21: loss 0.6327447552109576, diff 0.0 > > >> Iteration 22: loss 0.6327447552109576, diff 0.0 > > >> Iteration 23: loss 0.5534101162436362, diff 0.12538154276652747 > > >> Iteration 24: loss 0.5534101162436362, diff 0.0 > > >> Iteration 25: loss 0.5534101162436362, diff 0.0 > > >> Iteration 26: loss 0.40450200866125635, diff 0.2690732137675816 > > >> Iteration 27: loss 0.40450200866125635, diff 0.0 > > >> Iteration 28: loss 0.40450200866125635, diff 0.0 > > >> Iteration 29: loss 0.30788249908237314, diff 0.23885980452569502 > > >> > > >> Sincerely, > > >> > > >> DB Tsai > > >> Machine Learning Engineer > > >> Alpine Data Labs > > >> -------------------------------------- > > >> Web: http://alpinenow.com/ > > >> > > >> > > >> On Wed, Mar 5, 2014 at 2:00 PM, David Hall <d...@cs.berkeley.edu> > > wrote: > > >>> On Wed, Mar 5, 2014 at 1:57 PM, DB Tsai <dbt...@alpinenow.com> > wrote: > > >>> > > >>>> Hi David, > > >>>> > > >>>> On Tue, Mar 4, 2014 at 8:13 PM, dlwh <david.lw.h...@gmail.com> > wrote: > > >>>> > I'm happy to help fix any problems. I've verified at points that > the > > >>>> > implementation gives the exact same sequence of iterates for a few > > >>>> different > > >>>> > functions (with a particular line search) as the c port of lbfgs. > > So I'm > > >>>> a > > >>>> > little surprised it fails where Fortran succeeds... but only a > > little. > > >>>> This > > >>>> > was fixed late last year. > > >>>> I'm working on a reproducible test case using breeze vs fortran > > >>>> implementation to show the problem I've run into. The test will be > in > > >>>> one of the test cases in my Spark fork, is it okay for you to > > >>>> investigate the issue? Or do I need to make it as a standalone test? > > >>>> > > >>> > > >>> > > >>> Um, as long as it wouldn't be too hard to pull out. > > >>> > > >>> > > >>>> > > >>>> Will send you the test later today. > > >>>> > > >>>> Thanks. > > >>>> > > >>>> Sincerely, > > >>>> > > >>>> DB Tsai > > >>>> Machine Learning Engineer > > >>>> Alpine Data Labs > > >>>> -------------------------------------- > > >>>> Web: http://alpinenow.com/ > > >>>> > > >