Re: [VOTE] Release Apache Spark 0.9.1 (RC3)
I should probably pull this off into another thread, but going forward can we try to not have the release votes end on a weekend? Since we only seem to give 3 days, it makes it really hard for anyone who is offline for the weekend to try it out. Either that or extend the voting for more then 3 days. Tom On Monday, March 31, 2014 12:50 AM, Patrick Wendell pwend...@gmail.com wrote: TD - I downloaded and did some local testing. Looks good to me! +1 You should cast your own vote - at that point it's enough to pass. - Patrick On Sun, Mar 30, 2014 at 9:47 PM, prabeesh k prabsma...@gmail.com wrote: +1 tested on Ubuntu12.04 64bit On Mon, Mar 31, 2014 at 3:56 AM, Matei Zaharia matei.zaha...@gmail.com wrote: +1 tested on Mac OS X. Matei On Mar 27, 2014, at 1:32 AM, Tathagata Das tathagata.das1...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 0.9.1 A draft of the release notes along with the CHANGES.txt file is attached to this e-mail. The tag to be voted on is v0.9.1-rc3 (commit 4c43182b): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=4c43182b6d1b0b7717423f386c0214fe93073208 The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~tdas/spark-0.9.1-rc3/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/tdas.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1009/ The documentation corresponding to this release can be found at: http://people.apache.org/~tdas/spark-0.9.1-rc3-docs/ Please vote on releasing this package as Apache Spark 0.9.1! The vote is open until Sunday, March 30, at 10:00 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 0.9.1 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ CHANGES.txtRELEASE_NOTES.txt
Re: [VOTE] Release Apache Spark 0.9.1 (RC3)
Yeah good point. Let's just extend this vote another few days? On Mon, Mar 31, 2014 at 8:12 AM, Tom Graves tgraves...@yahoo.com wrote: I should probably pull this off into another thread, but going forward can we try to not have the release votes end on a weekend? Since we only seem to give 3 days, it makes it really hard for anyone who is offline for the weekend to try it out. Either that or extend the voting for more then 3 days. Tom On Monday, March 31, 2014 12:50 AM, Patrick Wendell pwend...@gmail.com wrote: TD - I downloaded and did some local testing. Looks good to me! +1 You should cast your own vote - at that point it's enough to pass. - Patrick On Sun, Mar 30, 2014 at 9:47 PM, prabeesh k prabsma...@gmail.com wrote: +1 tested on Ubuntu12.04 64bit On Mon, Mar 31, 2014 at 3:56 AM, Matei Zaharia matei.zaha...@gmail.com wrote: +1 tested on Mac OS X. Matei On Mar 27, 2014, at 1:32 AM, Tathagata Das tathagata.das1...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 0.9.1 A draft of the release notes along with the CHANGES.txt file is attached to this e-mail. The tag to be voted on is v0.9.1-rc3 (commit 4c43182b): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=4c43182b6d1b0b7717423f386c0214fe93073208 The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~tdas/spark-0.9.1-rc3/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/tdas.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1009/ The documentation corresponding to this release can be found at: http://people.apache.org/~tdas/spark-0.9.1-rc3-docs/ Please vote on releasing this package as Apache Spark 0.9.1! The vote is open until Sunday, March 30, at 10:00 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 0.9.1 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ CHANGES.txtRELEASE_NOTES.txt
Re: MLLib - Thoughts about refactoring Updater for LBFGS?
I added eclipse support in my qp branch: https://github.com/debasish83/breeze/tree/qp For the QP solver I will look into this solver http://www.joptimizer.com/ Right now my plan is to use Professor Boyd's ECOS solver which is also designed in the very similar lines but has been tested to solve even cone programs... https://github.com/ifa-ethz/ecos Any idea whether I should add C native code using jniloader as the first version or rewrite using breeze.optimize style and call netlib-java calls for native support (ldl, cholesky etc)... I still have to think how much cone support we will need...In ALS for example X^TX = I and Y^Y=I are interesting constraints for orthogonality...and they are quadratic constraints...With BFGS and CG, it is difficult to handle quadratic constraints... On Sun, Mar 30, 2014 at 4:40 PM, David Hall d...@cs.berkeley.edu wrote: On Sun, Mar 30, 2014 at 2:01 PM, Debasish Das debasish.da...@gmail.com wrote: Hi David, I have started to experiment with BFGS solvers for Spark GLM over large scale data... I am also looking to add a good QP solver in breeze that can be used in Spark ALS for constraint solves...More details on that soon... I could not load up breeze 0.7 code onto eclipse...There is a folder called natives in the master but there is no code in thatall the code is in src/main/scala... I added the eclipse plugin: addSbtPlugin(com.github.mpeltonen % sbt-idea % 1.6.0) addSbtPlugin(com.typesafe.sbteclipse % sbteclipse-plugin % 2.2.0) But it seems the project is set to use idea... Could you please explain the dev methodology for breeze ? My idea is to do solver work in breeze as that's the right place and get it into Spark through Xiangrui's WIP on Sparse data and breeze support... It would be great to have a QP Solver: I don't know if you know about this library: http://www.joptimizer.com/ I'm not quite sure what you mean by dev methodology. If you just mean how to get code into Breeze, just send a PR to scalanlp/breeze. Unit tests are good for something nontrivial like this. Maybe some basic documentation. Thanks. Deb On Fri, Mar 7, 2014 at 12:46 AM, DB Tsai dbt...@alpinenow.com wrote: Hi Xiangrui, I think it doesn't matter whether we use Fortran/Breeze/RISO for optimizers since optimization only takes 1% of time. Most of the time is in gradientSum and lossSum parallel computation. Sincerely, DB Tsai Machine Learning Engineer Alpine Data Labs -- Web: http://alpinenow.com/ On Thu, Mar 6, 2014 at 7:10 PM, Xiangrui Meng men...@gmail.com wrote: Hi DB, Thanks for doing the comparison! What were the running times for fortran/breeze/riso? Best, Xiangrui On Thu, Mar 6, 2014 at 4:21 PM, DB Tsai dbt...@alpinenow.com wrote: Hi David, I can converge to the same result with your breeze LBFGS and Fortran implementations now. Probably, I made some mistakes when I tried breeze before. I apologize that I claimed it's not stable. See the test case in BreezeLBFGSSuite.scala https://github.com/AlpineNow/spark/tree/dbtsai-breezeLBFGS This is training multinomial logistic regression against iris dataset, and both optimizers can train the models with 98% training accuracy. There are two issues to use Breeze in Spark, 1) When the gradientSum and lossSum are computed distributively in custom defined DiffFunction which will be passed into your optimizer, Spark will complain LBFGS class is not serializable. In BreezeLBFGS.scala, I've to convert RDD to array to make it work locally. It should be easy to fix by just having LBFGS to implement Serializable. 2) Breeze computes redundant gradient and loss. See the following log from both Fortran and Breeze implementations. Thanks. Fortran: Iteration -1: loss 1.3862943611198926, diff 1.0 Iteration 0: loss 1.5846343143210866, diff 0.14307193024217352 Iteration 1: loss 1.1242501524477688, diff 0.29053004039012126 Iteration 2: loss 1.0930151243303563, diff 0.027782962952189336 Iteration 3: loss 1.054036932835569, diff 0.03566113127440601 Iteration 4: loss 0.9907956302751622, diff 0.0507649459571 Iteration 5: loss 0.9184205380342829, diff 0.07304737423337761 Iteration 6: loss 0.8259870936519937, diff 0.10064381175132982 Iteration 7: loss 0.6327447552109574, diff 0.23395293458364716 Iteration 8: loss 0.5534101162436359, diff 0.1253815427665277 Iteration 9: loss 0.4045020086612566, diff 0.26907321376758075 Iteration 10: loss 0.3078824990823728, diff 0.23885980452569627 Breeze: Iteration -1: loss 1.3862943611198926, diff 1.0 Mar 6, 2014 3:59:11 PM com.github.fommil.netlib.BLAS clinit WARNING: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS Mar
Re: [VOTE] Release Apache Spark 0.9.1 (RC3)
Yes, lets extend the vote for two more days from now. So the vote is open till *Wednesday, April 02, at 20:00 UTC* On that note, my +1 TD On Mon, Mar 31, 2014 at 9:57 AM, Patrick Wendell pwend...@gmail.com wrote: Yeah good point. Let's just extend this vote another few days? On Mon, Mar 31, 2014 at 8:12 AM, Tom Graves tgraves...@yahoo.com wrote: I should probably pull this off into another thread, but going forward can we try to not have the release votes end on a weekend? Since we only seem to give 3 days, it makes it really hard for anyone who is offline for the weekend to try it out. Either that or extend the voting for more then 3 days. Tom On Monday, March 31, 2014 12:50 AM, Patrick Wendell pwend...@gmail.com wrote: TD - I downloaded and did some local testing. Looks good to me! +1 You should cast your own vote - at that point it's enough to pass. - Patrick On Sun, Mar 30, 2014 at 9:47 PM, prabeesh k prabsma...@gmail.com wrote: +1 tested on Ubuntu12.04 64bit On Mon, Mar 31, 2014 at 3:56 AM, Matei Zaharia matei.zaha...@gmail.com wrote: +1 tested on Mac OS X. Matei On Mar 27, 2014, at 1:32 AM, Tathagata Das tathagata.das1...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 0.9.1 A draft of the release notes along with the CHANGES.txt file is attached to this e-mail. The tag to be voted on is v0.9.1-rc3 (commit 4c43182b): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=4c43182b6d1b0b7717423f386c0214fe93073208 The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~tdas/spark-0.9.1-rc3/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/tdas.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1009/ The documentation corresponding to this release can be found at: http://people.apache.org/~tdas/spark-0.9.1-rc3-docs/ Please vote on releasing this package as Apache Spark 0.9.1! The vote is open until Sunday, March 30, at 10:00 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 0.9.1 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ CHANGES.txtRELEASE_NOTES.txt
Re: Calling Spark enthusiasts in NYC
I'll be very interested. Quick intro of myself: code java during the day, code Scala during the night. On Mar 31, 2014 1:23 PM, Andy Konwinski andykonwin...@gmail.com wrote: Hi folks, We have seen a lot of community growth outside of the Bay Area and we are looking to help spur even more! For starters, the organizers of the Spark meetups here in the Bay Area want to help anybody that is interested in setting up a meetup in a new city. Some amazing Spark champions have stepped forward in Seattle, Vancouver, Boulder/Denver, and a few other areas already. Right now, we are looking to connect with you Spark enthusiasts in NYC about helping to run an inaugural Spark Meetup in your area. You can reply to me directly if you are interested and I can tell you about all of the resources we have to offer (speakers from the core community, a budget for food, help scheduling, etc.), and let's make this happen! Andy
Re: [VOTE] Release Apache Spark 0.9.1 (RC3)
I had specifically requested that the ASM shading be included in the RC, hence my testing focused on that, but I ran other tests as well. Tested with a build of our project, running one of our applications from that build in yarn-standalone on a pseudocluster, and successfully redeploying and bringing up a web app that is integrated with Spark. It is the latter where most ASM conflicts have typically occurred. Successful build and passed both tests. So, my vote: +1 One test which I'd like to run but can't because of unrelated library conflicts would have been to remove various ASM exclusions from other libraries, recompiling and redeploying. But I'd incur the wrath of the rest of my team doing that, especially after a full day of tracking down yet another (totally unrelated) library conflict. Thanks for this maintenance release. Kevin Markey On 03/31/2014 12:32 PM, Tathagata Das wrote: Yes, lets extend the vote for two more days from now. So the vote is open till *Wednesday, April 02, at 20:00 UTC* On that note, my +1 TD On Mon, Mar 31, 2014 at 9:57 AM, Patrick Wendell pwend...@gmail.com wrote: Yeah good point. Let's just extend this vote another few days? On Mon, Mar 31, 2014 at 8:12 AM, Tom Graves tgraves...@yahoo.com wrote: I should probably pull this off into another thread, but going forward can we try to not have the release votes end on a weekend? Since we only seem to give 3 days, it makes it really hard for anyone who is offline for the weekend to try it out. Either that or extend the voting for more then 3 days. Tom On Monday, March 31, 2014 12:50 AM, Patrick Wendell pwend...@gmail.com wrote: TD - I downloaded and did some local testing. Looks good to me! +1 You should cast your own vote - at that point it's enough to pass. - Patrick On Sun, Mar 30, 2014 at 9:47 PM, prabeesh k prabsma...@gmail.com wrote: +1 tested on Ubuntu12.04 64bit On Mon, Mar 31, 2014 at 3:56 AM, Matei Zaharia matei.zaha...@gmail.com wrote: +1 tested on Mac OS X. Matei On Mar 27, 2014, at 1:32 AM, Tathagata Das tathagata.das1...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 0.9.1 A draft of the release notes along with the CHANGES.txt file is attached to this e-mail. The tag to be voted on is v0.9.1-rc3 (commit 4c43182b): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=4c43182b6d1b0b7717423f386c0214fe93073208 The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~tdas/spark-0.9.1-rc3/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/tdas.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1009/ The documentation corresponding to this release can be found at: http://people.apache.org/~tdas/spark-0.9.1-rc3-docs/ Please vote on releasing this package as Apache Spark 0.9.1! The vote is open until Sunday, March 30, at 10:00 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 0.9.1 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ CHANGES.txtRELEASE_NOTES.txt
Re: [VOTE] Release Apache Spark 0.9.1 (RC3)
+1 tested on OSX On Mon, Mar 31, 2014 at 4:33 PM, Kevin Markey kevin.mar...@oracle.comwrote: I had specifically requested that the ASM shading be included in the RC, hence my testing focused on that, but I ran other tests as well. Tested with a build of our project, running one of our applications from that build in yarn-standalone on a pseudocluster, and successfully redeploying and bringing up a web app that is integrated with Spark. It is the latter where most ASM conflicts have typically occurred. Successful build and passed both tests. So, my vote: +1 One test which I'd like to run but can't because of unrelated library conflicts would have been to remove various ASM exclusions from other libraries, recompiling and redeploying. But I'd incur the wrath of the rest of my team doing that, especially after a full day of tracking down yet another (totally unrelated) library conflict. Thanks for this maintenance release. Kevin Markey On 03/31/2014 12:32 PM, Tathagata Das wrote: Yes, lets extend the vote for two more days from now. So the vote is open till *Wednesday, April 02, at 20:00 UTC* On that note, my +1 TD On Mon, Mar 31, 2014 at 9:57 AM, Patrick Wendell pwend...@gmail.com wrote: Yeah good point. Let's just extend this vote another few days? On Mon, Mar 31, 2014 at 8:12 AM, Tom Graves tgraves...@yahoo.com wrote: I should probably pull this off into another thread, but going forward can we try to not have the release votes end on a weekend? Since we only seem to give 3 days, it makes it really hard for anyone who is offline for the weekend to try it out. Either that or extend the voting for more then 3 days. Tom On Monday, March 31, 2014 12:50 AM, Patrick Wendell pwend...@gmail.com wrote: TD - I downloaded and did some local testing. Looks good to me! +1 You should cast your own vote - at that point it's enough to pass. - Patrick On Sun, Mar 30, 2014 at 9:47 PM, prabeesh k prabsma...@gmail.com wrote: +1 tested on Ubuntu12.04 64bit On Mon, Mar 31, 2014 at 3:56 AM, Matei Zaharia matei.zaha...@gmail.com wrote: +1 tested on Mac OS X. Matei On Mar 27, 2014, at 1:32 AM, Tathagata Das tathagata.das1...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 0.9.1 A draft of the release notes along with the CHANGES.txt file is attached to this e-mail. The tag to be voted on is v0.9.1-rc3 (commit 4c43182b): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h= 4c43182b6d1b0b7717423f386c0214fe93073208 The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~tdas/spark-0.9.1-rc3/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/tdas.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/ orgapachespark-1009/ The documentation corresponding to this release can be found at: http://people.apache.org/~tdas/spark-0.9.1-rc3-docs/ Please vote on releasing this package as Apache Spark 0.9.1! The vote is open until Sunday, March 30, at 10:00 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 0.9.1 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ CHANGES.txtRELEASE_NOTES.txt