On Nov 1, 2011, at 8:09 AM, Grant Ingersoll wrote:

> FWIW, in Lucene, we do the following:
> 
> 1. All minor versions within a major release can read prior versions index 
> within the same major release.  That is, 3.4 can read a 3.3 index.  However, 
> 3.3 cannot read a 3.4 index.  When a user reads a 3.3 index w/ 3.4, it is 
> silently upgraded to 3.4.  I think this versioning scheme should work well 
> for us to when it comes to models.  In the new 4.x line, we have a Codec 
> system which will make it fairly easy for any version to read any other 
> version.

This assumes, of course, that a model is upgradeable in format, which I haven't 
thought about whether it applies to us or not.

> 
> 2. For APIs, we typically mark things as @lucene.experimental if we think 
> they may change within minor releases.  We also mark things as deprecated 
> that are going away.  Deprecated items are then removed on the next major 
> release.  The upgrade path is usually to go to x.9, remove all deprecations 
> and then go to x+1.0.
> 
> We also communicate to users via release notes when we purposefully broke 
> back compat.
> 
> For the most part this works and I would recommend we take similar steps.  
> First steps would be to start versioning our models and perhaps our input 
> formats.  I suspect we could simply take the Lucene code for this (it's time 
> stamp plus something else that I forget, I think)
> 
> -Grant
> 
> On Oct 29, 2011, at 11:45 PM, Isabel Drost wrote:
> 
>> 
>> Mahout seems to be at a stage where we have covered most of the interesting 
>> machine learning problems, where it is being used in production by quite 
>> some 
>> developers - hey, we even got a book that is now available in a printed 
>> version.
>> 
>> Maybe it's time to start taking first steps towards a 1.0 release. One* 
>> important step in my opinion is to define what kind of backwards 
>> compatibility 
>> guarantees we want to give our users - and what guarantees our users really 
>> need 
>> - after releasing 1.0.
>> 
>> Just a rough list below - feel free to extend, shrink and change:
>> 
>> 1) Data input formats - people probably do not want to re-generate vectors 
>> from 
>> their original data every time they use a new Mahout version.
>> 
>> 2) Model formats - people probably do not want to have to retrain a model 
>> only 
>> to make it work with the latest and greatest features of a new Mahout 
>> release.
>> 
>> 3) Model output - when upgrading users probably want to receive model output 
>> that is then integrated in their system the same way as with the older 
>> relase.
>> 
>> 4) APIs - I don't see us keeping all interfaces or even abstract classes 
>> stable. 
>> However users should know which APIs we consider "public facing" and will 
>> likely 
>> keep stable. Maybe an annotation makes that clear?
>> 
>> 5) Command line scripts - is there a significant user base relying on the 
>> bin/mahout script to warrant working towards keeping that stable between 
>> releases?
>> 
>> Most likely I've forgotten about other vital pieces - just wanted to kick 
>> off 
>> that discussion.
>> 
>> 
>> Isabel
>> 
>> 
>> * though not the only one - others include but are not limited to the time 
>> frame 
>> for which we offer support for any given release.
> 
> --------------------------------------------
> Grant Ingersoll
> http://www.lucidimagination.com
> 
> 
> 

--------------------------------------------
Grant Ingersoll
http://www.lucidimagination.com



Reply via email to