Re: MLLib - Thoughts about refactoring Updater for LBFGS?

2014-03-30 Thread David Hall
On Sun, Mar 30, 2014 at 2:01 PM, Debasish Das debasish.da...@gmail.comwrote:

 Hi David,

 I have started to experiment with BFGS solvers for Spark GLM over large
 scale data...

 I am also looking to add a good QP solver in breeze that can be used in
 Spark ALS for constraint solves...More details on that soon...

 I could not load up breeze 0.7 code onto eclipse...There is a folder called
 natives in the master but there is no code in thatall the code is in
 src/main/scala...

 I added the eclipse plugin:

 addSbtPlugin(com.github.mpeltonen % sbt-idea % 1.6.0)

 addSbtPlugin(com.typesafe.sbteclipse % sbteclipse-plugin % 2.2.0)

 But it seems the project is set to use idea...

 Could you please explain the dev methodology for breeze ? My idea is to do
 solver work in breeze as that's the right place and get it into Spark
 through Xiangrui's WIP on Sparse data and breeze support...


It would be great to have a QP Solver: I don't know if you know about this
library: http://www.joptimizer.com/

I'm not quite sure what you mean by dev methodology. If you just mean how
to get code into Breeze, just send a PR to scalanlp/breeze. Unit tests are
good for something nontrivial like this. Maybe some basic documentation.



 Thanks.
 Deb



 On Fri, Mar 7, 2014 at 12:46 AM, DB Tsai dbt...@alpinenow.com wrote:

  Hi Xiangrui,
 
  I think it doesn't matter whether we use Fortran/Breeze/RISO for
  optimizers since optimization only takes  1% of time. Most of the
  time is in gradientSum and lossSum parallel computation.
 
  Sincerely,
 
  DB Tsai
  Machine Learning Engineer
  Alpine Data Labs
  --
  Web: http://alpinenow.com/
 
 
  On Thu, Mar 6, 2014 at 7:10 PM, Xiangrui Meng men...@gmail.com wrote:
   Hi DB,
  
   Thanks for doing the comparison! What were the running times for
   fortran/breeze/riso?
  
   Best,
   Xiangrui
  
   On Thu, Mar 6, 2014 at 4:21 PM, DB Tsai dbt...@alpinenow.com wrote:
   Hi David,
  
   I can converge to the same result with your breeze LBFGS and Fortran
   implementations now. Probably, I made some mistakes when I tried
   breeze before. I apologize that I claimed it's not stable.
  
   See the test case in BreezeLBFGSSuite.scala
   https://github.com/AlpineNow/spark/tree/dbtsai-breezeLBFGS
  
   This is training multinomial logistic regression against iris dataset,
   and both optimizers can train the models with 98% training accuracy.
  
   There are two issues to use Breeze in Spark,
  
   1) When the gradientSum and lossSum are computed distributively in
   custom defined DiffFunction which will be passed into your optimizer,
   Spark will complain LBFGS class is not serializable. In
   BreezeLBFGS.scala, I've to convert RDD to array to make it work
   locally. It should be easy to fix by just having LBFGS to implement
   Serializable.
  
   2) Breeze computes redundant gradient and loss. See the following log
   from both Fortran and Breeze implementations.
  
   Thanks.
  
   Fortran:
   Iteration -1: loss 1.3862943611198926, diff 1.0
   Iteration 0: loss 1.5846343143210866, diff 0.14307193024217352
   Iteration 1: loss 1.1242501524477688, diff 0.29053004039012126
   Iteration 2: loss 1.0930151243303563, diff 0.027782962952189336
   Iteration 3: loss 1.054036932835569, diff 0.03566113127440601
   Iteration 4: loss 0.9907956302751622, diff 0.0507649459571
   Iteration 5: loss 0.9184205380342829, diff 0.07304737423337761
   Iteration 6: loss 0.8259870936519937, diff 0.10064381175132982
   Iteration 7: loss 0.6327447552109574, diff 0.23395293458364716
   Iteration 8: loss 0.5534101162436359, diff 0.1253815427665277
   Iteration 9: loss 0.4045020086612566, diff 0.26907321376758075
   Iteration 10: loss 0.3078824990823728, diff 0.23885980452569627
  
   Breeze:
   Iteration -1: loss 1.3862943611198926, diff 1.0
   Mar 6, 2014 3:59:11 PM com.github.fommil.netlib.BLAS clinit
   WARNING: Failed to load implementation from:
   com.github.fommil.netlib.NativeSystemBLAS
   Mar 6, 2014 3:59:11 PM com.github.fommil.netlib.BLAS clinit
   WARNING: Failed to load implementation from:
   com.github.fommil.netlib.NativeRefBLAS
   Iteration 0: loss 1.3862943611198926, diff 0.0
   Iteration 1: loss 1.5846343143210866, diff 0.14307193024217352
   Iteration 2: loss 1.1242501524477688, diff 0.29053004039012126
   Iteration 3: loss 1.1242501524477688, diff 0.0
   Iteration 4: loss 1.1242501524477688, diff 0.0
   Iteration 5: loss 1.0930151243303563, diff 0.027782962952189336
   Iteration 6: loss 1.0930151243303563, diff 0.0
   Iteration 7: loss 1.0930151243303563, diff 0.0
   Iteration 8: loss 1.054036932835569, diff 0.03566113127440601
   Iteration 9: loss 1.054036932835569, diff 0.0
   Iteration 10: loss 1.054036932835569, diff 0.0
   Iteration 11: loss 0.9907956302751622, diff 0.0507649459571
   Iteration 12: loss 0.9907956302751622, diff 0.0
   Iteration 13: loss 0.9907956302751622, diff 0.0
   Iteration 14: loss 0.9184205380342829, diff 

Re: [VOTE] Release Apache Spark 0.9.1 (RC3)

2014-03-30 Thread prabeesh k
+1
tested on Ubuntu12.04 64bit


On Mon, Mar 31, 2014 at 3:56 AM, Matei Zaharia matei.zaha...@gmail.comwrote:

 +1 tested on Mac OS X.

 Matei

 On Mar 27, 2014, at 1:32 AM, Tathagata Das tathagata.das1...@gmail.com
 wrote:

  Please vote on releasing the following candidate as Apache Spark version
 0.9.1
 
  A draft of the release notes along with the CHANGES.txt file is
  attached to this e-mail.
 
  The tag to be voted on is v0.9.1-rc3 (commit 4c43182b):
 
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=4c43182b6d1b0b7717423f386c0214fe93073208
 
  The release files, including signatures, digests, etc. can be found at:
  http://people.apache.org/~tdas/spark-0.9.1-rc3/
 
  Release artifacts are signed with the following key:
  https://people.apache.org/keys/committer/tdas.asc
 
  The staging repository for this release can be found at:
  https://repository.apache.org/content/repositories/orgapachespark-1009/
 
  The documentation corresponding to this release can be found at:
  http://people.apache.org/~tdas/spark-0.9.1-rc3-docs/
 
  Please vote on releasing this package as Apache Spark 0.9.1!
 
  The vote is open until Sunday, March 30, at 10:00 UTC and passes if
  a majority of at least 3 +1 PMC votes are cast.
 
  [ ] +1 Release this package as Apache Spark 0.9.1
  [ ] -1 Do not release this package because ...
 
  To learn more about Apache Spark, please see
  http://spark.apache.org/
  CHANGES.txtRELEASE_NOTES.txt




Re: [VOTE] Release Apache Spark 0.9.1 (RC3)

2014-03-30 Thread Patrick Wendell
TD - I downloaded and did some local testing. Looks good to me!

+1

You should cast your own vote - at that point it's enough to pass.

- Patrick


On Sun, Mar 30, 2014 at 9:47 PM, prabeesh k prabsma...@gmail.com wrote:

 +1
 tested on Ubuntu12.04 64bit


 On Mon, Mar 31, 2014 at 3:56 AM, Matei Zaharia matei.zaha...@gmail.com
 wrote:

  +1 tested on Mac OS X.
 
  Matei
 
  On Mar 27, 2014, at 1:32 AM, Tathagata Das tathagata.das1...@gmail.com
  wrote:
 
   Please vote on releasing the following candidate as Apache Spark
 version
  0.9.1
  
   A draft of the release notes along with the CHANGES.txt file is
   attached to this e-mail.
  
   The tag to be voted on is v0.9.1-rc3 (commit 4c43182b):
  
 
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=4c43182b6d1b0b7717423f386c0214fe93073208
  
   The release files, including signatures, digests, etc. can be found at:
   http://people.apache.org/~tdas/spark-0.9.1-rc3/
  
   Release artifacts are signed with the following key:
   https://people.apache.org/keys/committer/tdas.asc
  
   The staging repository for this release can be found at:
  
 https://repository.apache.org/content/repositories/orgapachespark-1009/
  
   The documentation corresponding to this release can be found at:
   http://people.apache.org/~tdas/spark-0.9.1-rc3-docs/
  
   Please vote on releasing this package as Apache Spark 0.9.1!
  
   The vote is open until Sunday, March 30, at 10:00 UTC and passes if
   a majority of at least 3 +1 PMC votes are cast.
  
   [ ] +1 Release this package as Apache Spark 0.9.1
   [ ] -1 Do not release this package because ...
  
   To learn more about Apache Spark, please see
   http://spark.apache.org/
   CHANGES.txtRELEASE_NOTES.txt