Hi guys,

The latest PR uses Breeze's L-BFGS implement which is introduced by
Xiangrui's sparse input format work in SPARK-1212.

https://github.com/apache/spark/pull/353

Now, it works with the new sparse framework!

Any feedback would be greatly appreciated.

Thanks.

Sincerely,

DB Tsai
-------------------------------------------------------
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai


On Thu, Apr 3, 2014 at 5:02 PM, DB Tsai <dbt...@alpinenow.com> wrote:
> ---------- Forwarded message ----------
> From: David Hall <d...@cs.berkeley.edu>
> Date: Sat, Mar 15, 2014 at 10:02 AM
> Subject: Re: MLLib - Thoughts about refactoring Updater for LBFGS?
> To: DB Tsai <dbt...@alpinenow.com>
>
>
> On Fri, Mar 7, 2014 at 10:56 PM, DB Tsai <dbt...@alpinenow.com> wrote:
>>
>> Hi David,
>>
>> Please let me know the version of Breeze that LBFGS can be serialized,
>> and CachedDiffFunction is built-in in LBFGS once you finish. I'll
>> update the PR to Spark from using RISO implementation to Breeze
>> implementation.
>
>
> The current master (0.7-SNAPSHOT) has these problems fixed.
>
>>
>>
>> Thanks.
>>
>> Sincerely,
>>
>> DB Tsai
>> Machine Learning Engineer
>> Alpine Data Labs
>> --------------------------------------
>> Web: http://alpinenow.com/
>>
>>
>> On Thu, Mar 6, 2014 at 4:26 PM, David Hall <d...@cs.berkeley.edu> wrote:
>> > On Thu, Mar 6, 2014 at 4:21 PM, DB Tsai <dbt...@alpinenow.com> wrote:
>> >
>> >> Hi David,
>> >>
>> >> I can converge to the same result with your breeze LBFGS and Fortran
>> >> implementations now. Probably, I made some mistakes when I tried
>> >> breeze before. I apologize that I claimed it's not stable.
>> >>
>> >> See the test case in BreezeLBFGSSuite.scala
>> >> https://github.com/AlpineNow/spark/tree/dbtsai-breezeLBFGS
>> >>
>> >> This is training multinomial logistic regression against iris dataset,
>> >> and both optimizers can train the models with 98% training accuracy.
>> >>
>> >
>> > great to hear! There were some bugs in LBFGS about 6 months ago, so
>> > depending on the last time you tried it, it might indeed have been
>> > bugged.
>> >
>> >
>> >>
>> >> There are two issues to use Breeze in Spark,
>> >>
>> >> 1) When the gradientSum and lossSum are computed distributively in
>> >> custom defined DiffFunction which will be passed into your optimizer,
>> >> Spark will complain LBFGS class is not serializable. In
>> >> BreezeLBFGS.scala, I've to convert RDD to array to make it work
>> >> locally. It should be easy to fix by just having LBFGS to implement
>> >> Serializable.
>> >>
>> >
>> > I'm not sure why Spark should be serializing LBFGS? Shouldn't it live on
>> > the controller node? Or is this a per-node thing?
>> >
>> > But no problem to make it serializable.
>> >
>> >
>> >>
>> >> 2) Breeze computes redundant gradient and loss. See the following log
>> >> from both Fortran and Breeze implementations.
>> >>
>> >
>> > Err, yeah. I should probably have LBFGS do this automatically, but
>> > there's
>> > a CachedDiffFunction that gets rid of the redundant calculations.
>> >
>> > -- David
>> >
>> >
>> >>
>> >> Thanks.
>> >>
>> >> Fortran:
>> >> Iteration -1: loss 1.3862943611198926, diff 1.0
>> >> Iteration 0: loss 1.5846343143210866, diff 0.14307193024217352
>> >> Iteration 1: loss 1.1242501524477688, diff 0.29053004039012126
>> >> Iteration 2: loss 1.0930151243303563, diff 0.027782962952189336
>> >> Iteration 3: loss 1.054036932835569, diff 0.03566113127440601
>> >> Iteration 4: loss 0.9907956302751622, diff 0.05999907649459571
>> >> Iteration 5: loss 0.9184205380342829, diff 0.07304737423337761
>> >> Iteration 6: loss 0.8259870936519937, diff 0.10064381175132982
>> >> Iteration 7: loss 0.6327447552109574, diff 0.23395293458364716
>> >> Iteration 8: loss 0.5534101162436359, diff 0.1253815427665277
>> >> Iteration 9: loss 0.4045020086612566, diff 0.26907321376758075
>> >> Iteration 10: loss 0.3078824990823728, diff 0.23885980452569627
>> >>
>> >> Breeze:
>> >> Iteration -1: loss 1.3862943611198926, diff 1.0
>> >> Mar 6, 2014 3:59:11 PM com.github.fommil.netlib.BLAS <clinit>
>> >> WARNING: Failed to load implementation from:
>> >> com.github.fommil.netlib.NativeSystemBLAS
>> >> Mar 6, 2014 3:59:11 PM com.github.fommil.netlib.BLAS <clinit>
>> >> WARNING: Failed to load implementation from:
>> >> com.github.fommil.netlib.NativeRefBLAS
>> >> Iteration 0: loss 1.3862943611198926, diff 0.0
>> >> Iteration 1: loss 1.5846343143210866, diff 0.14307193024217352
>> >> Iteration 2: loss 1.1242501524477688, diff 0.29053004039012126
>> >> Iteration 3: loss 1.1242501524477688, diff 0.0
>> >> Iteration 4: loss 1.1242501524477688, diff 0.0
>> >> Iteration 5: loss 1.0930151243303563, diff 0.027782962952189336
>> >> Iteration 6: loss 1.0930151243303563, diff 0.0
>> >> Iteration 7: loss 1.0930151243303563, diff 0.0
>> >> Iteration 8: loss 1.054036932835569, diff 0.03566113127440601
>> >> Iteration 9: loss 1.054036932835569, diff 0.0
>> >> Iteration 10: loss 1.054036932835569, diff 0.0
>> >> Iteration 11: loss 0.9907956302751622, diff 0.05999907649459571
>> >> Iteration 12: loss 0.9907956302751622, diff 0.0
>> >> Iteration 13: loss 0.9907956302751622, diff 0.0
>> >> Iteration 14: loss 0.9184205380342829, diff 0.07304737423337761
>> >> Iteration 15: loss 0.9184205380342829, diff 0.0
>> >> Iteration 16: loss 0.9184205380342829, diff 0.0
>> >> Iteration 17: loss 0.8259870936519939, diff 0.1006438117513297
>> >> Iteration 18: loss 0.8259870936519939, diff 0.0
>> >> Iteration 19: loss 0.8259870936519939, diff 0.0
>> >> Iteration 20: loss 0.6327447552109576, diff 0.233952934583647
>> >> Iteration 21: loss 0.6327447552109576, diff 0.0
>> >> Iteration 22: loss 0.6327447552109576, diff 0.0
>> >> Iteration 23: loss 0.5534101162436362, diff 0.12538154276652747
>> >> Iteration 24: loss 0.5534101162436362, diff 0.0
>> >> Iteration 25: loss 0.5534101162436362, diff 0.0
>> >> Iteration 26: loss 0.40450200866125635, diff 0.2690732137675816
>> >> Iteration 27: loss 0.40450200866125635, diff 0.0
>> >> Iteration 28: loss 0.40450200866125635, diff 0.0
>> >> Iteration 29: loss 0.30788249908237314, diff 0.23885980452569502
>> >>
>> >> Sincerely,
>> >>
>> >> DB Tsai
>> >> Machine Learning Engineer
>> >> Alpine Data Labs
>> >> --------------------------------------
>> >> Web: http://alpinenow.com/
>> >>
>> >>
>> >> On Wed, Mar 5, 2014 at 2:00 PM, David Hall <d...@cs.berkeley.edu>
>> >> wrote:
>> >> > On Wed, Mar 5, 2014 at 1:57 PM, DB Tsai <dbt...@alpinenow.com> wrote:
>> >> >
>> >> >> Hi David,
>> >> >>
>> >> >> On Tue, Mar 4, 2014 at 8:13 PM, dlwh <david.lw.h...@gmail.com>
>> >> >> wrote:
>> >> >> > I'm happy to help fix any problems. I've verified at points that
>> >> >> > the
>> >> >> > implementation gives the exact same sequence of iterates for a few
>> >> >> different
>> >> >> > functions (with a particular line search) as the c port of lbfgs.
>> >> >> > So
>> >> I'm
>> >> >> a
>> >> >> > little surprised it fails where Fortran succeeds... but only a
>> >> >> > little.
>> >> >> This
>> >> >> > was fixed late last year.
>> >> >> I'm working on a reproducible test case using breeze vs fortran
>> >> >> implementation to show the problem I've run into. The test will be
>> >> >> in
>> >> >> one of the test cases in my Spark fork, is it okay for you to
>> >> >> investigate the issue? Or do I need to make it as a standalone test?
>> >> >>
>> >> >
>> >> >
>> >> > Um, as long as it wouldn't be too hard to pull out.
>> >> >
>> >> >
>> >> >>
>> >> >> Will send you the test later today.
>> >> >>
>> >> >> Thanks.
>> >> >>
>> >> >> Sincerely,
>> >> >>
>> >> >> DB Tsai
>> >> >> Machine Learning Engineer
>> >> >> Alpine Data Labs
>> >> >> --------------------------------------
>> >> >> Web: http://alpinenow.com/
>> >> >>
>> >>
>
>
>

Reply via email to