On May 8, 2012, at 12:43 PM, Jake Mannix wrote:

> On Tue, May 8, 2012 at 9:31 AM, Ted Dunning <[email protected]> wrote:
> 
>> This is frustrating to consider losing Bayes, but I would consider keeping
>> it if only to decrease the number of questions on the list about why the
>> examples from the book don't work.
>> 
> 
> Could maybe someone just sit down and rewrite it?  Naive Bayes is not a
> particularly
> difficult thing to implement, even distributed (it's like, word-count,
> basically.  Ok,
> maybe it's more like counting collocations, but still!).
> 
> It would be pretty silly to not have an NB impl (although I agree that it's
> even worse
> to have a broken or clunky one).

I agree.  The vector based one is a rewrite, so we probably should just go from 
there.  Not sure it is broken, but Robin is the primary person familiar with it 
and in the past I've pinged the list on the state of it (and trying to get 
explanations on certain parts of it) and not gotten answers.

With all of these Hadoop algorithms, the other thing we really need is to make 
them programmatically easier to integrate.  The Driver mode is not too bad for 
testing, etc. but it makes it harder to integrate, as others have pointed out.

Reply via email to