On Nov 1, 2011, at 8:09 AM, Grant Ingersoll wrote: > FWIW, in Lucene, we do the following: > > 1. All minor versions within a major release can read prior versions index > within the same major release. That is, 3.4 can read a 3.3 index. However, > 3.3 cannot read a 3.4 index. When a user reads a 3.3 index w/ 3.4, it is > silently upgraded to 3.4. I think this versioning scheme should work well > for us to when it comes to models. In the new 4.x line, we have a Codec > system which will make it fairly easy for any version to read any other > version.
This assumes, of course, that a model is upgradeable in format, which I haven't thought about whether it applies to us or not. > > 2. For APIs, we typically mark things as @lucene.experimental if we think > they may change within minor releases. We also mark things as deprecated > that are going away. Deprecated items are then removed on the next major > release. The upgrade path is usually to go to x.9, remove all deprecations > and then go to x+1.0. > > We also communicate to users via release notes when we purposefully broke > back compat. > > For the most part this works and I would recommend we take similar steps. > First steps would be to start versioning our models and perhaps our input > formats. I suspect we could simply take the Lucene code for this (it's time > stamp plus something else that I forget, I think) > > -Grant > > On Oct 29, 2011, at 11:45 PM, Isabel Drost wrote: > >> >> Mahout seems to be at a stage where we have covered most of the interesting >> machine learning problems, where it is being used in production by quite >> some >> developers - hey, we even got a book that is now available in a printed >> version. >> >> Maybe it's time to start taking first steps towards a 1.0 release. One* >> important step in my opinion is to define what kind of backwards >> compatibility >> guarantees we want to give our users - and what guarantees our users really >> need >> - after releasing 1.0. >> >> Just a rough list below - feel free to extend, shrink and change: >> >> 1) Data input formats - people probably do not want to re-generate vectors >> from >> their original data every time they use a new Mahout version. >> >> 2) Model formats - people probably do not want to have to retrain a model >> only >> to make it work with the latest and greatest features of a new Mahout >> release. >> >> 3) Model output - when upgrading users probably want to receive model output >> that is then integrated in their system the same way as with the older >> relase. >> >> 4) APIs - I don't see us keeping all interfaces or even abstract classes >> stable. >> However users should know which APIs we consider "public facing" and will >> likely >> keep stable. Maybe an annotation makes that clear? >> >> 5) Command line scripts - is there a significant user base relying on the >> bin/mahout script to warrant working towards keeping that stable between >> releases? >> >> Most likely I've forgotten about other vital pieces - just wanted to kick >> off >> that discussion. >> >> >> Isabel >> >> >> * though not the only one - others include but are not limited to the time >> frame >> for which we offer support for any given release. > > -------------------------------------------- > Grant Ingersoll > http://www.lucidimagination.com > > > -------------------------------------------- Grant Ingersoll http://www.lucidimagination.com