I think users would benefit a lot by 1) to 3) and would be dismayed if we could not maintain data consistency between releases (maybe just point releases?). This could require us to build and ship migrating tools along with any releases which change these formats.
4) and 5) are related and it is a question which is more important if we can't do both. Since a lot of users are using the CLI I think backwards compatibility is pretty important there. This is especially the case for the MiA examples. The book is really our user manual and many people will be turned off if gratuitous API changes make the book obsolete as a learning tool. Of course, the book has plenty of API usage examples which need to keep compatibility too. Our 1.0 release will have a lot of solid implementations of scalable machine learning software, but everything is not at the same level of maturity. I think it is critical that we adopt a maturity scheme so that we can realistically make changes to evolving algorithms while making reasonable guarantees about stable code. Moving still-evolving implementations to a separate source tree would certainly make their status visible, but I wonder about the mechanics: to we need a parallel contrib universe (with math, core, integration, examples subtrees?) or would the annotations work better? I kind of favor the annotations as the former seems like too much dependency plumbing. And, of course, defining the content of 1.0 is still something we need to do. That is a separate thread TBD. -----Original Message----- From: Isabel Drost [mailto:isa...@apache.org] Sent: Saturday, October 29, 2011 8:46 PM To: dev@mahout.apache.org Subject: Towards 1.0 - Defining backwards compatibility guarantees Mahout seems to be at a stage where we have covered most of the interesting machine learning problems, where it is being used in production by quite some developers - hey, we even got a book that is now available in a printed version. Maybe it's time to start taking first steps towards a 1.0 release. One* important step in my opinion is to define what kind of backwards compatibility guarantees we want to give our users - and what guarantees our users really need - after releasing 1.0. Just a rough list below - feel free to extend, shrink and change: 1) Data input formats - people probably do not want to re-generate vectors from their original data every time they use a new Mahout version. 2) Model formats - people probably do not want to have to retrain a model only to make it work with the latest and greatest features of a new Mahout release. 3) Model output - when upgrading users probably want to receive model output that is then integrated in their system the same way as with the older relase. 4) APIs - I don't see us keeping all interfaces or even abstract classes stable. However users should know which APIs we consider "public facing" and will likely keep stable. Maybe an annotation makes that clear? 5) Command line scripts - is there a significant user base relying on the bin/mahout script to warrant working towards keeping that stable between releases? Most likely I've forgotten about other vital pieces - just wanted to kick off that discussion. Isabel * though not the only one - others include but are not limited to the time frame for which we offer support for any given release.