Re: [VOTE] Release Apache Spark 0.9.1 (RC3)

2014-03-30 Thread Patrick Wendell
TD - I downloaded and did some local testing. Looks good to me!

+1

You should cast your own vote - at that point it's enough to pass.

- Patrick


On Sun, Mar 30, 2014 at 9:47 PM, prabeesh k  wrote:

> +1
> tested on Ubuntu12.04 64bit
>
>
> On Mon, Mar 31, 2014 at 3:56 AM, Matei Zaharia  >wrote:
>
> > +1 tested on Mac OS X.
> >
> > Matei
> >
> > On Mar 27, 2014, at 1:32 AM, Tathagata Das 
> > wrote:
> >
> > > Please vote on releasing the following candidate as Apache Spark
> version
> > 0.9.1
> > >
> > > A draft of the release notes along with the CHANGES.txt file is
> > > attached to this e-mail.
> > >
> > > The tag to be voted on is v0.9.1-rc3 (commit 4c43182b):
> > >
> >
> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=4c43182b6d1b0b7717423f386c0214fe93073208
> > >
> > > The release files, including signatures, digests, etc. can be found at:
> > > http://people.apache.org/~tdas/spark-0.9.1-rc3/
> > >
> > > Release artifacts are signed with the following key:
> > > https://people.apache.org/keys/committer/tdas.asc
> > >
> > > The staging repository for this release can be found at:
> > >
> https://repository.apache.org/content/repositories/orgapachespark-1009/
> > >
> > > The documentation corresponding to this release can be found at:
> > > http://people.apache.org/~tdas/spark-0.9.1-rc3-docs/
> > >
> > > Please vote on releasing this package as Apache Spark 0.9.1!
> > >
> > > The vote is open until Sunday, March 30, at 10:00 UTC and passes if
> > > a majority of at least 3 +1 PMC votes are cast.
> > >
> > > [ ] +1 Release this package as Apache Spark 0.9.1
> > > [ ] -1 Do not release this package because ...
> > >
> > > To learn more about Apache Spark, please see
> > > http://spark.apache.org/
> > > 
> >
> >
>


Re: [VOTE] Release Apache Spark 0.9.1 (RC3)

2014-03-30 Thread prabeesh k
+1
tested on Ubuntu12.04 64bit


On Mon, Mar 31, 2014 at 3:56 AM, Matei Zaharia wrote:

> +1 tested on Mac OS X.
>
> Matei
>
> On Mar 27, 2014, at 1:32 AM, Tathagata Das 
> wrote:
>
> > Please vote on releasing the following candidate as Apache Spark version
> 0.9.1
> >
> > A draft of the release notes along with the CHANGES.txt file is
> > attached to this e-mail.
> >
> > The tag to be voted on is v0.9.1-rc3 (commit 4c43182b):
> >
> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=4c43182b6d1b0b7717423f386c0214fe93073208
> >
> > The release files, including signatures, digests, etc. can be found at:
> > http://people.apache.org/~tdas/spark-0.9.1-rc3/
> >
> > Release artifacts are signed with the following key:
> > https://people.apache.org/keys/committer/tdas.asc
> >
> > The staging repository for this release can be found at:
> > https://repository.apache.org/content/repositories/orgapachespark-1009/
> >
> > The documentation corresponding to this release can be found at:
> > http://people.apache.org/~tdas/spark-0.9.1-rc3-docs/
> >
> > Please vote on releasing this package as Apache Spark 0.9.1!
> >
> > The vote is open until Sunday, March 30, at 10:00 UTC and passes if
> > a majority of at least 3 +1 PMC votes are cast.
> >
> > [ ] +1 Release this package as Apache Spark 0.9.1
> > [ ] -1 Do not release this package because ...
> >
> > To learn more about Apache Spark, please see
> > http://spark.apache.org/
> > 
>
>


Re: MLLib - Thoughts about refactoring Updater for LBFGS?

2014-03-30 Thread David Hall
On Sun, Mar 30, 2014 at 2:01 PM, Debasish Das wrote:

> Hi David,
>
> I have started to experiment with BFGS solvers for Spark GLM over large
> scale data...
>
> I am also looking to add a good QP solver in breeze that can be used in
> Spark ALS for constraint solves...More details on that soon...
>
> I could not load up breeze 0.7 code onto eclipse...There is a folder called
> natives in the master but there is no code in thatall the code is in
> src/main/scala...
>
> I added the eclipse plugin:
>
> addSbtPlugin("com.github.mpeltonen" % "sbt-idea" % "1.6.0")
>
> addSbtPlugin("com.typesafe.sbteclipse" % "sbteclipse-plugin" % "2.2.0")
>
> But it seems the project is set to use idea...
>
> Could you please explain the dev methodology for breeze ? My idea is to do
> solver work in breeze as that's the right place and get it into Spark
> through Xiangrui's WIP on Sparse data and breeze support...
>

It would be great to have a QP Solver: I don't know if you know about this
library: http://www.joptimizer.com/

I'm not quite sure what you mean by dev methodology. If you just mean how
to get code into Breeze, just send a PR to scalanlp/breeze. Unit tests are
good for something nontrivial like this. Maybe some basic documentation.


>
> Thanks.
> Deb
>
>
>
> On Fri, Mar 7, 2014 at 12:46 AM, DB Tsai  wrote:
>
> > Hi Xiangrui,
> >
> > I think it doesn't matter whether we use Fortran/Breeze/RISO for
> > optimizers since optimization only takes << 1% of time. Most of the
> > time is in gradientSum and lossSum parallel computation.
> >
> > Sincerely,
> >
> > DB Tsai
> > Machine Learning Engineer
> > Alpine Data Labs
> > --
> > Web: http://alpinenow.com/
> >
> >
> > On Thu, Mar 6, 2014 at 7:10 PM, Xiangrui Meng  wrote:
> > > Hi DB,
> > >
> > > Thanks for doing the comparison! What were the running times for
> > > fortran/breeze/riso?
> > >
> > > Best,
> > > Xiangrui
> > >
> > > On Thu, Mar 6, 2014 at 4:21 PM, DB Tsai  wrote:
> > >> Hi David,
> > >>
> > >> I can converge to the same result with your breeze LBFGS and Fortran
> > >> implementations now. Probably, I made some mistakes when I tried
> > >> breeze before. I apologize that I claimed it's not stable.
> > >>
> > >> See the test case in BreezeLBFGSSuite.scala
> > >> https://github.com/AlpineNow/spark/tree/dbtsai-breezeLBFGS
> > >>
> > >> This is training multinomial logistic regression against iris dataset,
> > >> and both optimizers can train the models with 98% training accuracy.
> > >>
> > >> There are two issues to use Breeze in Spark,
> > >>
> > >> 1) When the gradientSum and lossSum are computed distributively in
> > >> custom defined DiffFunction which will be passed into your optimizer,
> > >> Spark will complain LBFGS class is not serializable. In
> > >> BreezeLBFGS.scala, I've to convert RDD to array to make it work
> > >> locally. It should be easy to fix by just having LBFGS to implement
> > >> Serializable.
> > >>
> > >> 2) Breeze computes redundant gradient and loss. See the following log
> > >> from both Fortran and Breeze implementations.
> > >>
> > >> Thanks.
> > >>
> > >> Fortran:
> > >> Iteration -1: loss 1.3862943611198926, diff 1.0
> > >> Iteration 0: loss 1.5846343143210866, diff 0.14307193024217352
> > >> Iteration 1: loss 1.1242501524477688, diff 0.29053004039012126
> > >> Iteration 2: loss 1.0930151243303563, diff 0.027782962952189336
> > >> Iteration 3: loss 1.054036932835569, diff 0.03566113127440601
> > >> Iteration 4: loss 0.9907956302751622, diff 0.0507649459571
> > >> Iteration 5: loss 0.9184205380342829, diff 0.07304737423337761
> > >> Iteration 6: loss 0.8259870936519937, diff 0.10064381175132982
> > >> Iteration 7: loss 0.6327447552109574, diff 0.23395293458364716
> > >> Iteration 8: loss 0.5534101162436359, diff 0.1253815427665277
> > >> Iteration 9: loss 0.4045020086612566, diff 0.26907321376758075
> > >> Iteration 10: loss 0.3078824990823728, diff 0.23885980452569627
> > >>
> > >> Breeze:
> > >> Iteration -1: loss 1.3862943611198926, diff 1.0
> > >> Mar 6, 2014 3:59:11 PM com.github.fommil.netlib.BLAS 
> > >> WARNING: Failed to load implementation from:
> > >> com.github.fommil.netlib.NativeSystemBLAS
> > >> Mar 6, 2014 3:59:11 PM com.github.fommil.netlib.BLAS 
> > >> WARNING: Failed to load implementation from:
> > >> com.github.fommil.netlib.NativeRefBLAS
> > >> Iteration 0: loss 1.3862943611198926, diff 0.0
> > >> Iteration 1: loss 1.5846343143210866, diff 0.14307193024217352
> > >> Iteration 2: loss 1.1242501524477688, diff 0.29053004039012126
> > >> Iteration 3: loss 1.1242501524477688, diff 0.0
> > >> Iteration 4: loss 1.1242501524477688, diff 0.0
> > >> Iteration 5: loss 1.0930151243303563, diff 0.027782962952189336
> > >> Iteration 6: loss 1.0930151243303563, diff 0.0
> > >> Iteration 7: loss 1.0930151243303563, diff 0.0
> > >> Iteration 8: loss 1.054036932835569, diff 0.03566113127440601
> > >> Iteration 9: loss 1.054036932835569, diff 0.0
> > >> Iteration 10: loss 1.05403693

Re: [VOTE] Release Apache Spark 0.9.1 (RC3)

2014-03-30 Thread Matei Zaharia
+1 tested on Mac OS X.

Matei

On Mar 27, 2014, at 1:32 AM, Tathagata Das  wrote:

> Please vote on releasing the following candidate as Apache Spark version 0.9.1
> 
> A draft of the release notes along with the CHANGES.txt file is
> attached to this e-mail.
> 
> The tag to be voted on is v0.9.1-rc3 (commit 4c43182b):
> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=4c43182b6d1b0b7717423f386c0214fe93073208
> 
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~tdas/spark-0.9.1-rc3/
> 
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/tdas.asc
> 
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1009/
> 
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~tdas/spark-0.9.1-rc3-docs/
> 
> Please vote on releasing this package as Apache Spark 0.9.1!
> 
> The vote is open until Sunday, March 30, at 10:00 UTC and passes if
> a majority of at least 3 +1 PMC votes are cast.
> 
> [ ] +1 Release this package as Apache Spark 0.9.1
> [ ] -1 Do not release this package because ...
> 
> To learn more about Apache Spark, please see
> http://spark.apache.org/
> 



Re: MLLib - Thoughts about refactoring Updater for LBFGS?

2014-03-30 Thread Debasish Das
Hi David,

I have started to experiment with BFGS solvers for Spark GLM over large
scale data...

I am also looking to add a good QP solver in breeze that can be used in
Spark ALS for constraint solves...More details on that soon...

I could not load up breeze 0.7 code onto eclipse...There is a folder called
natives in the master but there is no code in thatall the code is in
src/main/scala...

I added the eclipse plugin:

addSbtPlugin("com.github.mpeltonen" % "sbt-idea" % "1.6.0")

addSbtPlugin("com.typesafe.sbteclipse" % "sbteclipse-plugin" % "2.2.0")

But it seems the project is set to use idea...

Could you please explain the dev methodology for breeze ? My idea is to do
solver work in breeze as that's the right place and get it into Spark
through Xiangrui's WIP on Sparse data and breeze support...

Thanks.
Deb



On Fri, Mar 7, 2014 at 12:46 AM, DB Tsai  wrote:

> Hi Xiangrui,
>
> I think it doesn't matter whether we use Fortran/Breeze/RISO for
> optimizers since optimization only takes << 1% of time. Most of the
> time is in gradientSum and lossSum parallel computation.
>
> Sincerely,
>
> DB Tsai
> Machine Learning Engineer
> Alpine Data Labs
> --
> Web: http://alpinenow.com/
>
>
> On Thu, Mar 6, 2014 at 7:10 PM, Xiangrui Meng  wrote:
> > Hi DB,
> >
> > Thanks for doing the comparison! What were the running times for
> > fortran/breeze/riso?
> >
> > Best,
> > Xiangrui
> >
> > On Thu, Mar 6, 2014 at 4:21 PM, DB Tsai  wrote:
> >> Hi David,
> >>
> >> I can converge to the same result with your breeze LBFGS and Fortran
> >> implementations now. Probably, I made some mistakes when I tried
> >> breeze before. I apologize that I claimed it's not stable.
> >>
> >> See the test case in BreezeLBFGSSuite.scala
> >> https://github.com/AlpineNow/spark/tree/dbtsai-breezeLBFGS
> >>
> >> This is training multinomial logistic regression against iris dataset,
> >> and both optimizers can train the models with 98% training accuracy.
> >>
> >> There are two issues to use Breeze in Spark,
> >>
> >> 1) When the gradientSum and lossSum are computed distributively in
> >> custom defined DiffFunction which will be passed into your optimizer,
> >> Spark will complain LBFGS class is not serializable. In
> >> BreezeLBFGS.scala, I've to convert RDD to array to make it work
> >> locally. It should be easy to fix by just having LBFGS to implement
> >> Serializable.
> >>
> >> 2) Breeze computes redundant gradient and loss. See the following log
> >> from both Fortran and Breeze implementations.
> >>
> >> Thanks.
> >>
> >> Fortran:
> >> Iteration -1: loss 1.3862943611198926, diff 1.0
> >> Iteration 0: loss 1.5846343143210866, diff 0.14307193024217352
> >> Iteration 1: loss 1.1242501524477688, diff 0.29053004039012126
> >> Iteration 2: loss 1.0930151243303563, diff 0.027782962952189336
> >> Iteration 3: loss 1.054036932835569, diff 0.03566113127440601
> >> Iteration 4: loss 0.9907956302751622, diff 0.0507649459571
> >> Iteration 5: loss 0.9184205380342829, diff 0.07304737423337761
> >> Iteration 6: loss 0.8259870936519937, diff 0.10064381175132982
> >> Iteration 7: loss 0.6327447552109574, diff 0.23395293458364716
> >> Iteration 8: loss 0.5534101162436359, diff 0.1253815427665277
> >> Iteration 9: loss 0.4045020086612566, diff 0.26907321376758075
> >> Iteration 10: loss 0.3078824990823728, diff 0.23885980452569627
> >>
> >> Breeze:
> >> Iteration -1: loss 1.3862943611198926, diff 1.0
> >> Mar 6, 2014 3:59:11 PM com.github.fommil.netlib.BLAS 
> >> WARNING: Failed to load implementation from:
> >> com.github.fommil.netlib.NativeSystemBLAS
> >> Mar 6, 2014 3:59:11 PM com.github.fommil.netlib.BLAS 
> >> WARNING: Failed to load implementation from:
> >> com.github.fommil.netlib.NativeRefBLAS
> >> Iteration 0: loss 1.3862943611198926, diff 0.0
> >> Iteration 1: loss 1.5846343143210866, diff 0.14307193024217352
> >> Iteration 2: loss 1.1242501524477688, diff 0.29053004039012126
> >> Iteration 3: loss 1.1242501524477688, diff 0.0
> >> Iteration 4: loss 1.1242501524477688, diff 0.0
> >> Iteration 5: loss 1.0930151243303563, diff 0.027782962952189336
> >> Iteration 6: loss 1.0930151243303563, diff 0.0
> >> Iteration 7: loss 1.0930151243303563, diff 0.0
> >> Iteration 8: loss 1.054036932835569, diff 0.03566113127440601
> >> Iteration 9: loss 1.054036932835569, diff 0.0
> >> Iteration 10: loss 1.054036932835569, diff 0.0
> >> Iteration 11: loss 0.9907956302751622, diff 0.0507649459571
> >> Iteration 12: loss 0.9907956302751622, diff 0.0
> >> Iteration 13: loss 0.9907956302751622, diff 0.0
> >> Iteration 14: loss 0.9184205380342829, diff 0.07304737423337761
> >> Iteration 15: loss 0.9184205380342829, diff 0.0
> >> Iteration 16: loss 0.9184205380342829, diff 0.0
> >> Iteration 17: loss 0.8259870936519939, diff 0.1006438117513297
> >> Iteration 18: loss 0.8259870936519939, diff 0.0
> >> Iteration 19: loss 0.8259870936519939, diff 0.0
> >> Iteration 20: loss 0.6327447552109576, diff 0.233952934583647
>