Re: [VOTE] Release Apache Spark 0.9.1 (RC3)

2014-03-31 Thread Tom Graves
I should probably pull this off into another thread, but going forward can we 
try to not have the release votes end on a weekend? Since we only seem to give 
3 days, it makes it really hard for anyone who is offline for the weekend to 
try it out.   Either that or extend the voting for more then 3 days.  

Tom
On Monday, March 31, 2014 12:50 AM, Patrick Wendell pwend...@gmail.com wrote:
 
TD - I downloaded and did some local testing. Looks good to me!

+1

You should cast your own vote - at that point it's enough to pass.

- Patrick



On Sun, Mar 30, 2014 at 9:47 PM, prabeesh k prabsma...@gmail.com wrote:

 +1
 tested on Ubuntu12.04 64bit


 On Mon, Mar 31, 2014 at 3:56 AM, Matei Zaharia matei.zaha...@gmail.com
 wrote:

  +1 tested on Mac OS X.
 
  Matei
 
  On Mar 27, 2014, at 1:32 AM, Tathagata Das tathagata.das1...@gmail.com
  wrote:
 
   Please vote on releasing the following candidate as Apache Spark
 version
  0.9.1
  
   A draft of the release notes along with the CHANGES.txt file is
   attached to this e-mail.
  
   The tag to be voted on is v0.9.1-rc3 (commit 4c43182b):
  
 
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=4c43182b6d1b0b7717423f386c0214fe93073208
  
   The release files, including signatures, digests, etc. can be found at:
   http://people.apache.org/~tdas/spark-0.9.1-rc3/
  
   Release artifacts are signed with the following key:
   https://people.apache.org/keys/committer/tdas.asc
  
   The staging repository for this release can be found at:
  
 https://repository.apache.org/content/repositories/orgapachespark-1009/
  
   The documentation corresponding to this release can be found at:
   http://people.apache.org/~tdas/spark-0.9.1-rc3-docs/
  
   Please vote on releasing this package as Apache Spark 0.9.1!
  
   The vote is open until Sunday, March 30, at 10:00 UTC and passes if
   a majority of at least 3 +1 PMC votes are cast.
  
   [ ] +1 Release this package as Apache Spark 0.9.1
   [ ] -1 Do not release this package because ...
  
   To learn more about Apache Spark, please see
   http://spark.apache.org/
   CHANGES.txtRELEASE_NOTES.txt
 
 


Re: [VOTE] Release Apache Spark 0.9.1 (RC3)

2014-03-31 Thread Patrick Wendell
Yeah good point. Let's just extend this vote another few days?


On Mon, Mar 31, 2014 at 8:12 AM, Tom Graves tgraves...@yahoo.com wrote:

 I should probably pull this off into another thread, but going forward can
 we try to not have the release votes end on a weekend? Since we only seem
 to give 3 days, it makes it really hard for anyone who is offline for the
 weekend to try it out.   Either that or extend the voting for more then 3
 days.

 Tom
 On Monday, March 31, 2014 12:50 AM, Patrick Wendell pwend...@gmail.com
 wrote:

 TD - I downloaded and did some local testing. Looks good to me!

 +1

 You should cast your own vote - at that point it's enough to pass.

 - Patrick



 On Sun, Mar 30, 2014 at 9:47 PM, prabeesh k prabsma...@gmail.com wrote:

  +1
  tested on Ubuntu12.04 64bit
 
 
  On Mon, Mar 31, 2014 at 3:56 AM, Matei Zaharia matei.zaha...@gmail.com
  wrote:
 
   +1 tested on Mac OS X.
  
   Matei
  
   On Mar 27, 2014, at 1:32 AM, Tathagata Das 
 tathagata.das1...@gmail.com
   wrote:
  
Please vote on releasing the following candidate as Apache Spark
  version
   0.9.1
   
A draft of the release notes along with the CHANGES.txt file is
attached to this e-mail.
   
The tag to be voted on is v0.9.1-rc3 (commit 4c43182b):
   
  
 
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=4c43182b6d1b0b7717423f386c0214fe93073208
   
The release files, including signatures, digests, etc. can be found
 at:
http://people.apache.org/~tdas/spark-0.9.1-rc3/
   
Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/tdas.asc
   
The staging repository for this release can be found at:
   
  https://repository.apache.org/content/repositories/orgapachespark-1009/
   
The documentation corresponding to this release can be found at:
http://people.apache.org/~tdas/spark-0.9.1-rc3-docs/
   
Please vote on releasing this package as Apache Spark 0.9.1!
   
The vote is open until Sunday, March 30, at 10:00 UTC and passes if
a majority of at least 3 +1 PMC votes are cast.
   
[ ] +1 Release this package as Apache Spark 0.9.1
[ ] -1 Do not release this package because ...
   
To learn more about Apache Spark, please see
http://spark.apache.org/
CHANGES.txtRELEASE_NOTES.txt
  
  
 



Re: MLLib - Thoughts about refactoring Updater for LBFGS?

2014-03-31 Thread Debasish Das
I added eclipse support in my qp branch:

https://github.com/debasish83/breeze/tree/qp

For the QP solver I will look into this solver http://www.joptimizer.com/

Right now my plan is to use Professor Boyd's ECOS solver which is also
designed in the very similar lines but has been tested to solve even cone
programs...

https://github.com/ifa-ethz/ecos

Any idea whether I should add C native code using jniloader as the first
version or rewrite using breeze.optimize style and call netlib-java calls
for native support (ldl, cholesky etc)...

I still have to think how much cone support we will need...In ALS for
example X^TX = I and Y^Y=I are interesting constraints for
orthogonality...and they are quadratic constraints...With BFGS and CG, it
is difficult to handle quadratic constraints...



On Sun, Mar 30, 2014 at 4:40 PM, David Hall d...@cs.berkeley.edu wrote:

 On Sun, Mar 30, 2014 at 2:01 PM, Debasish Das debasish.da...@gmail.com
 wrote:

  Hi David,
 
  I have started to experiment with BFGS solvers for Spark GLM over large
  scale data...
 
  I am also looking to add a good QP solver in breeze that can be used in
  Spark ALS for constraint solves...More details on that soon...
 
  I could not load up breeze 0.7 code onto eclipse...There is a folder
 called
  natives in the master but there is no code in thatall the code is in
  src/main/scala...
 
  I added the eclipse plugin:
 
  addSbtPlugin(com.github.mpeltonen % sbt-idea % 1.6.0)
 
  addSbtPlugin(com.typesafe.sbteclipse % sbteclipse-plugin % 2.2.0)
 
  But it seems the project is set to use idea...
 
  Could you please explain the dev methodology for breeze ? My idea is to
 do
  solver work in breeze as that's the right place and get it into Spark
  through Xiangrui's WIP on Sparse data and breeze support...
 

 It would be great to have a QP Solver: I don't know if you know about this
 library: http://www.joptimizer.com/

 I'm not quite sure what you mean by dev methodology. If you just mean how
 to get code into Breeze, just send a PR to scalanlp/breeze. Unit tests are
 good for something nontrivial like this. Maybe some basic documentation.


 
  Thanks.
  Deb
 
 
 
  On Fri, Mar 7, 2014 at 12:46 AM, DB Tsai dbt...@alpinenow.com wrote:
 
   Hi Xiangrui,
  
   I think it doesn't matter whether we use Fortran/Breeze/RISO for
   optimizers since optimization only takes  1% of time. Most of the
   time is in gradientSum and lossSum parallel computation.
  
   Sincerely,
  
   DB Tsai
   Machine Learning Engineer
   Alpine Data Labs
   --
   Web: http://alpinenow.com/
  
  
   On Thu, Mar 6, 2014 at 7:10 PM, Xiangrui Meng men...@gmail.com
 wrote:
Hi DB,
   
Thanks for doing the comparison! What were the running times for
fortran/breeze/riso?
   
Best,
Xiangrui
   
On Thu, Mar 6, 2014 at 4:21 PM, DB Tsai dbt...@alpinenow.com
 wrote:
Hi David,
   
I can converge to the same result with your breeze LBFGS and Fortran
implementations now. Probably, I made some mistakes when I tried
breeze before. I apologize that I claimed it's not stable.
   
See the test case in BreezeLBFGSSuite.scala
https://github.com/AlpineNow/spark/tree/dbtsai-breezeLBFGS
   
This is training multinomial logistic regression against iris
 dataset,
and both optimizers can train the models with 98% training accuracy.
   
There are two issues to use Breeze in Spark,
   
1) When the gradientSum and lossSum are computed distributively in
custom defined DiffFunction which will be passed into your
 optimizer,
Spark will complain LBFGS class is not serializable. In
BreezeLBFGS.scala, I've to convert RDD to array to make it work
locally. It should be easy to fix by just having LBFGS to implement
Serializable.
   
2) Breeze computes redundant gradient and loss. See the following
 log
from both Fortran and Breeze implementations.
   
Thanks.
   
Fortran:
Iteration -1: loss 1.3862943611198926, diff 1.0
Iteration 0: loss 1.5846343143210866, diff 0.14307193024217352
Iteration 1: loss 1.1242501524477688, diff 0.29053004039012126
Iteration 2: loss 1.0930151243303563, diff 0.027782962952189336
Iteration 3: loss 1.054036932835569, diff 0.03566113127440601
Iteration 4: loss 0.9907956302751622, diff 0.0507649459571
Iteration 5: loss 0.9184205380342829, diff 0.07304737423337761
Iteration 6: loss 0.8259870936519937, diff 0.10064381175132982
Iteration 7: loss 0.6327447552109574, diff 0.23395293458364716
Iteration 8: loss 0.5534101162436359, diff 0.1253815427665277
Iteration 9: loss 0.4045020086612566, diff 0.26907321376758075
Iteration 10: loss 0.3078824990823728, diff 0.23885980452569627
   
Breeze:
Iteration -1: loss 1.3862943611198926, diff 1.0
Mar 6, 2014 3:59:11 PM com.github.fommil.netlib.BLAS clinit
WARNING: Failed to load implementation from:
com.github.fommil.netlib.NativeSystemBLAS
Mar 

Re: [VOTE] Release Apache Spark 0.9.1 (RC3)

2014-03-31 Thread Tathagata Das
Yes, lets extend the vote for two more days from now. So the vote is open
till *Wednesday, April 02, at 20:00 UTC*

On that note, my +1

TD




On Mon, Mar 31, 2014 at 9:57 AM, Patrick Wendell pwend...@gmail.com wrote:

 Yeah good point. Let's just extend this vote another few days?


 On Mon, Mar 31, 2014 at 8:12 AM, Tom Graves tgraves...@yahoo.com wrote:

  I should probably pull this off into another thread, but going forward
 can
  we try to not have the release votes end on a weekend? Since we only seem
  to give 3 days, it makes it really hard for anyone who is offline for the
  weekend to try it out.   Either that or extend the voting for more then 3
  days.
 
  Tom
  On Monday, March 31, 2014 12:50 AM, Patrick Wendell pwend...@gmail.com
  wrote:
 
  TD - I downloaded and did some local testing. Looks good to me!
 
  +1
 
  You should cast your own vote - at that point it's enough to pass.
 
  - Patrick
 
 
 
  On Sun, Mar 30, 2014 at 9:47 PM, prabeesh k prabsma...@gmail.com
 wrote:
 
   +1
   tested on Ubuntu12.04 64bit
  
  
   On Mon, Mar 31, 2014 at 3:56 AM, Matei Zaharia 
 matei.zaha...@gmail.com
   wrote:
  
+1 tested on Mac OS X.
   
Matei
   
On Mar 27, 2014, at 1:32 AM, Tathagata Das 
  tathagata.das1...@gmail.com
wrote:
   
 Please vote on releasing the following candidate as Apache Spark
   version
0.9.1

 A draft of the release notes along with the CHANGES.txt file is
 attached to this e-mail.

 The tag to be voted on is v0.9.1-rc3 (commit 4c43182b):

   
  
 
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=4c43182b6d1b0b7717423f386c0214fe93073208

 The release files, including signatures, digests, etc. can be found
  at:
 http://people.apache.org/~tdas/spark-0.9.1-rc3/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/tdas.asc

 The staging repository for this release can be found at:

  
 https://repository.apache.org/content/repositories/orgapachespark-1009/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~tdas/spark-0.9.1-rc3-docs/

 Please vote on releasing this package as Apache Spark 0.9.1!

 The vote is open until Sunday, March 30, at 10:00 UTC and passes if
 a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 0.9.1
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/
 CHANGES.txtRELEASE_NOTES.txt
   
   
  
 



Re: Calling Spark enthusiasts in NYC

2014-03-31 Thread shenyan zhen
I'll be very interested.
Quick intro of myself: code java during the day, code Scala during the
night.
 On Mar 31, 2014 1:23 PM, Andy Konwinski andykonwin...@gmail.com wrote:

 Hi folks,

 We have seen a lot of community growth outside of the Bay Area and we are
 looking to help spur even more!

 For starters, the organizers of the Spark meetups here in the Bay Area want
 to help anybody that is interested in setting up a meetup in a new city.

 Some amazing Spark champions have stepped forward in Seattle, Vancouver,
 Boulder/Denver, and a few other areas already.

 Right now, we are looking to connect with you Spark enthusiasts in NYC
 about
 helping to run an inaugural Spark Meetup in your area.

 You can reply to me directly if you are interested and I can tell you about
 all of the resources we have to offer (speakers from the core community, a
 budget for food, help scheduling, etc.), and let's make this happen!

 Andy



Re: [VOTE] Release Apache Spark 0.9.1 (RC3)

2014-03-31 Thread Kevin Markey
I had specifically requested that the ASM shading be included in the RC, 
hence my testing focused on that, but I ran other tests as well.  Tested 
with a build of our project, running one of our applications from that 
build in yarn-standalone on a pseudocluster, and successfully 
redeploying and bringing up a web app that is integrated with Spark.  It 
is the latter where most ASM conflicts have typically occurred.  
Successful build and passed both tests. So, my vote:


+1

One test which I'd like to run but can't because of unrelated library 
conflicts would have been to remove various ASM exclusions from other 
libraries, recompiling and redeploying.  But I'd incur the wrath of the 
rest of my team doing that, especially after a full day of tracking down 
yet another (totally unrelated) library conflict.


Thanks for this maintenance release.

Kevin Markey


On 03/31/2014 12:32 PM, Tathagata Das wrote:

Yes, lets extend the vote for two more days from now. So the vote is open
till *Wednesday, April 02, at 20:00 UTC*

On that note, my +1

TD




On Mon, Mar 31, 2014 at 9:57 AM, Patrick Wendell pwend...@gmail.com wrote:


Yeah good point. Let's just extend this vote another few days?


On Mon, Mar 31, 2014 at 8:12 AM, Tom Graves tgraves...@yahoo.com wrote:


I should probably pull this off into another thread, but going forward

can

we try to not have the release votes end on a weekend? Since we only seem
to give 3 days, it makes it really hard for anyone who is offline for the
weekend to try it out.   Either that or extend the voting for more then 3
days.

Tom
On Monday, March 31, 2014 12:50 AM, Patrick Wendell pwend...@gmail.com
wrote:

TD - I downloaded and did some local testing. Looks good to me!

+1

You should cast your own vote - at that point it's enough to pass.

- Patrick



On Sun, Mar 30, 2014 at 9:47 PM, prabeesh k prabsma...@gmail.com

wrote:

+1
tested on Ubuntu12.04 64bit


On Mon, Mar 31, 2014 at 3:56 AM, Matei Zaharia 

matei.zaha...@gmail.com

wrote:
+1 tested on Mac OS X.

Matei

On Mar 27, 2014, at 1:32 AM, Tathagata Das 

tathagata.das1...@gmail.com

wrote:


Please vote on releasing the following candidate as Apache Spark

version

0.9.1

A draft of the release notes along with the CHANGES.txt file is
attached to this e-mail.

The tag to be voted on is v0.9.1-rc3 (commit 4c43182b):


https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=4c43182b6d1b0b7717423f386c0214fe93073208

The release files, including signatures, digests, etc. can be found

at:

http://people.apache.org/~tdas/spark-0.9.1-rc3/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/tdas.asc

The staging repository for this release can be found at:


https://repository.apache.org/content/repositories/orgapachespark-1009/

The documentation corresponding to this release can be found at:
http://people.apache.org/~tdas/spark-0.9.1-rc3-docs/

Please vote on releasing this package as Apache Spark 0.9.1!

The vote is open until Sunday, March 30, at 10:00 UTC and passes if
a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 0.9.1
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see
http://spark.apache.org/
CHANGES.txtRELEASE_NOTES.txt






Re: [VOTE] Release Apache Spark 0.9.1 (RC3)

2014-03-31 Thread Andrew Or
+1 tested on OSX


On Mon, Mar 31, 2014 at 4:33 PM, Kevin Markey kevin.mar...@oracle.comwrote:

 I had specifically requested that the ASM shading be included in the RC,
 hence my testing focused on that, but I ran other tests as well.  Tested
 with a build of our project, running one of our applications from that
 build in yarn-standalone on a pseudocluster, and successfully redeploying
 and bringing up a web app that is integrated with Spark.  It is the latter
 where most ASM conflicts have typically occurred.  Successful build and
 passed both tests. So, my vote:

 +1

 One test which I'd like to run but can't because of unrelated library
 conflicts would have been to remove various ASM exclusions from other
 libraries, recompiling and redeploying.  But I'd incur the wrath of the
 rest of my team doing that, especially after a full day of tracking down
 yet another (totally unrelated) library conflict.

 Thanks for this maintenance release.

 Kevin Markey



 On 03/31/2014 12:32 PM, Tathagata Das wrote:

 Yes, lets extend the vote for two more days from now. So the vote is open
 till *Wednesday, April 02, at 20:00 UTC*

 On that note, my +1

 TD




 On Mon, Mar 31, 2014 at 9:57 AM, Patrick Wendell pwend...@gmail.com
 wrote:

  Yeah good point. Let's just extend this vote another few days?


 On Mon, Mar 31, 2014 at 8:12 AM, Tom Graves tgraves...@yahoo.com
 wrote:

  I should probably pull this off into another thread, but going forward

 can

 we try to not have the release votes end on a weekend? Since we only
 seem
 to give 3 days, it makes it really hard for anyone who is offline for
 the
 weekend to try it out.   Either that or extend the voting for more then
 3
 days.

 Tom
 On Monday, March 31, 2014 12:50 AM, Patrick Wendell pwend...@gmail.com
 
 wrote:

 TD - I downloaded and did some local testing. Looks good to me!

 +1

 You should cast your own vote - at that point it's enough to pass.

 - Patrick



 On Sun, Mar 30, 2014 at 9:47 PM, prabeesh k prabsma...@gmail.com

 wrote:

 +1
 tested on Ubuntu12.04 64bit


 On Mon, Mar 31, 2014 at 3:56 AM, Matei Zaharia 

 matei.zaha...@gmail.com

 wrote:
 +1 tested on Mac OS X.

 Matei

 On Mar 27, 2014, at 1:32 AM, Tathagata Das 

 tathagata.das1...@gmail.com

 wrote:

  Please vote on releasing the following candidate as Apache Spark

 version

 0.9.1

 A draft of the release notes along with the CHANGES.txt file is
 attached to this e-mail.

 The tag to be voted on is v0.9.1-rc3 (commit 4c43182b):

  https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
 4c43182b6d1b0b7717423f386c0214fe93073208

 The release files, including signatures, digests, etc. can be found

 at:

 http://people.apache.org/~tdas/spark-0.9.1-rc3/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/tdas.asc

 The staging repository for this release can be found at:

  https://repository.apache.org/content/repositories/
 orgapachespark-1009/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~tdas/spark-0.9.1-rc3-docs/

 Please vote on releasing this package as Apache Spark 0.9.1!

 The vote is open until Sunday, March 30, at 10:00 UTC and passes if
 a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 0.9.1
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/
 CHANGES.txtRELEASE_NOTES.txt