I think users would benefit a lot by 1) to 3) and would be dismayed if we could 
not maintain data consistency between releases (maybe just point releases?). 
This could require us to build and ship migrating tools along with any releases 
which change these formats.

4) and 5) are related and it is a question which is more important if we can't 
do both. Since a lot of users are using the CLI I think backwards compatibility 
is pretty important there. This is especially the case for the MiA examples. 
The book is really our user manual and many people will be turned off if 
gratuitous API changes make the book obsolete as a learning tool. Of course, 
the book has plenty of API usage examples which need to keep compatibility too. 

Our 1.0 release will have a lot of solid implementations of scalable machine 
learning software, but everything is not at the same level of maturity. I think 
it is critical that we adopt a maturity scheme so that we can realistically 
make changes to evolving algorithms while making reasonable guarantees about 
stable code. Moving still-evolving implementations to a separate source tree 
would certainly make their status visible, but I wonder about the mechanics: to 
we need a parallel contrib universe (with math, core, integration, examples 
subtrees?) or would the annotations work better? I kind of favor the 
annotations as the former seems like too much dependency plumbing.

And, of course, defining the content of 1.0 is still something we need to do. 
That is a separate thread TBD. 

-----Original Message-----
From: Isabel Drost [mailto:isa...@apache.org] 
Sent: Saturday, October 29, 2011 8:46 PM
To: dev@mahout.apache.org
Subject: Towards 1.0 - Defining backwards compatibility guarantees


Mahout seems to be at a stage where we have covered most of the interesting 
machine learning problems, where it is being used in production by quite some 
developers - hey, we even got a book that is now available in a printed version.

Maybe it's time to start taking first steps towards a 1.0 release. One* 
important step in my opinion is to define what kind of backwards compatibility 
guarantees we want to give our users - and what guarantees our users really need
- after releasing 1.0.

Just a rough list below - feel free to extend, shrink and change:

1) Data input formats - people probably do not want to re-generate vectors from 
their original data every time they use a new Mahout version.

2) Model formats - people probably do not want to have to retrain a model only 
to make it work with the latest and greatest features of a new Mahout release.

3) Model output - when upgrading users probably want to receive model output 
that is then integrated in their system the same way as with the older relase.

4) APIs - I don't see us keeping all interfaces or even abstract classes 
stable. 
However users should know which APIs we consider "public facing" and will 
likely keep stable. Maybe an annotation makes that clear?

5) Command line scripts - is there a significant user base relying on the 
bin/mahout script to warrant working towards keeping that stable between 
releases?

Most likely I've forgotten about other vital pieces - just wanted to kick off 
that discussion.


Isabel


* though not the only one - others include but are not limited to the time 
frame 
for which we offer support for any given release.

Reply via email to