6) Quick access to the online algorithms

The servlet implementation in taste is simple. It should be possible to
package a lot of the online algorithms in one big servlet. Call it " Mahout
Online"?

One problem here is that uploading and downloading data for each operation
is not practical. MO would be very useful if it has direct file system
access to user data. Yes, this is a security problem :)

Lance

On Sun, Oct 30, 2011 at 7:01 PM, Lance Norskog <goks...@gmail.com> wrote:

> 2) Model formats
> Proposal: a few common structures with higher-level conventions about how
> to compose them.
> .
> For matrix data, the R "dataframe" is a time-tested format for dense
> vectors, matrices and tensors. Something like this that also handles most
> sparsity cases would allow ditching a lot of hard-coded formats.
>
> We would need a counterpart format for discrete data structures like
> graphs, fpgrowth etc. If there are none in the public sphere, here is one:
> an object with two lists, each with a label. This can represent one node or
> edge of a graph. To read in the graph you would need to fill hashtables from
> the labels. Add a double and you have a weighted graph. Call it a "bundle".
>
> FPGrowth uses a more complex data structure. This provides 2 use cases:
> 1) a hard use case for composing its data with a simpler object, because
> you have to save the simple objects with metadata that lets you read and
> reconstitute.
> 2) a simpler use case is saving "flattened" variations of the full data
> structure as a stream of bundles.
>
>
> On Sat, Oct 29, 2011 at 8:45 PM, Isabel Drost <isa...@apache.org> wrote:
>
>>
>> Mahout seems to be at a stage where we have covered most of the
>> interesting
>> machine learning problems, where it is being used in production by quite
>> some
>> developers - hey, we even got a book that is now available in a printed
>> version.
>>
>> Maybe it's time to start taking first steps towards a 1.0 release. One*
>> important step in my opinion is to define what kind of backwards
>> compatibility
>> guarantees we want to give our users - and what guarantees our users
>> really need
>> - after releasing 1.0.
>>
>> Just a rough list below - feel free to extend, shrink and change:
>>
>> 1) Data input formats - people probably do not want to re-generate vectors
>> from
>> their original data every time they use a new Mahout version.
>>
>> 2) Model formats - people probably do not want to have to retrain a model
>> only
>> to make it work with the latest and greatest features of a new Mahout
>> release.
>>
>> 3) Model output - when upgrading users probably want to receive model
>> output
>> that is then integrated in their system the same way as with the older
>> relase.
>>
>> 4) APIs - I don't see us keeping all interfaces or even abstract classes
>> stable.
>> However users should know which APIs we consider "public facing" and will
>> likely
>> keep stable. Maybe an annotation makes that clear?
>>
>> 5) Command line scripts - is there a significant user base relying on the
>> bin/mahout script to warrant working towards keeping that stable between
>> releases?
>>
>> Most likely I've forgotten about other vital pieces - just wanted to kick
>> off
>> that discussion.
>>
>>
>> Isabel
>>
>>
>> * though not the only one - others include but are not limited to the time
>> frame
>> for which we offer support for any given release.
>>
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>
>


-- 
Lance Norskog
goks...@gmail.com

Reply via email to