On Wed, Feb 17, 2010 at 7:10 PM, Jason Surratt <[email protected]>wrote:

> I've spent a bit of time looking over Drew's Avro stuff as well as
> http://issues.apache.org/jira/browse/MAHOUT-262, SVM and the SGD
> implementations. Is it the intent for classifiers to use
> SingleLabelVectorWritable as the input value during the map step at some
> point in the future? If so, I'm happy to write up some code around Naive
> Bayes and an input format to do just that -- maybe it'll be useful to
> someone else.
>

We definitely want to have a common input format for all algorithms (where
it makes sense).  The two candidates are honest to goodness sparse or dense
vectors versus something like a document.  Since it saves a huge amount of
effort to integrate the conversion from document to vector directly into the
algorithm it is looking like all algorithms will need to support both.

Doing that without lots of effort in each algorithm is the trick that Robin
and Drew are working on just now.  Your contributions would be invaluable
(you are a real live user!)


>  There is a lot of code and JIRAs to take in so I apologize if I'm missing
> something.
>

No problem.  It is an exciting project that way.

-- 
Ted Dunning, CTO
DeepDyve

Reply via email to