Ted, Thanks for the speedy response!
> This sounds great. I would suggest you test the naive Bayes, > complementary > Naive Bayes, SVM and SGD implementations. Given that naive Bayes has > worked > well on a sample, you will probably be very happy with SVM and SGD > since > they handle very large cardinality well. Thanks! I'll be sure and try the other classifiers after I get NB working. > You will need to vectorize your input. Since you have many columns, > you may > want to look at Drew's document style stuff. See > https://issues.apache.org/jira/browse/MAHOUT-274 I've spent a bit of time looking over Drew's Avro stuff as well as http://issues.apache.org/jira/browse/MAHOUT-262, SVM and the SGD implementations. Is it the intent for classifiers to use SingleLabelVectorWritable as the input value during the map step at some point in the future? If so, I'm happy to write up some code around Naive Bayes and an input format to do just that -- maybe it'll be useful to someone else. There is a lot of code and JIRAs to take in so I apologize if I'm missing something. Cheers! -jason
