Ted,

Thanks for the speedy response!

> This sounds great.  I would suggest you test the naive Bayes,
> complementary
> Naive Bayes, SVM and SGD implementations.  Given that naive Bayes has
> worked
> well on a sample, you will probably be very happy with SVM and SGD
> since
> they handle very large cardinality well.

Thanks! I'll be sure and try the other classifiers after I get NB working.

> You will need to vectorize your input.  Since you have many columns,
> you may
> want to look at Drew's document style stuff.  See
> https://issues.apache.org/jira/browse/MAHOUT-274

I've spent a bit of time looking over Drew's Avro stuff as well as 
http://issues.apache.org/jira/browse/MAHOUT-262, SVM and the SGD 
implementations. Is it the intent for classifiers to use 
SingleLabelVectorWritable as the input value during the map step at some point 
in the future? If so, I'm happy to write up some code around Naive Bayes and an 
input format to do just that -- maybe it'll be useful to someone else.

There is a lot of code and JIRAs to take in so I apologize if I'm missing 
something.

Cheers!

-jason

Reply via email to