On Sep 29, 2009, at 8:47 AM, Sandra Clover wrote:
Hi, I'm using Mahout 0.1 for document classification (using the
distributed Bayesian Network) and I'm getting some answers
back. I
have noticed 1 thing that is really bugging me. I'm wondering can you
help please:-
Problem: Concernign the Classify() method there are 2 constructors in
the API. The first one returns just one answer (according to the API
it
returns: "the single best category"). The second constructor says that
it: "return the top numResults, ranked by score" My problem is that I
have compared and contrasted the results in both techniques. I have
noticed that the single best category does not appear at *all* in the
range of categories given by the second contructor! Strange no? I
would
of expected that it should come top of the list. I have gone to a
value
of 20 deep in the numResults level and have not even see in the best
category. Has anyone encountered this before? I would appreciate
any
comments/suggestions/user-experience that you may like to share.
Thanks,
Sandra.
That sounds like a bug. Can you try out the trunk version of Mahout
and see if it is still there? A lot of the classification stuff has
been reworked recently (I'm not even sure at the moment that those two
classify methods are even still in the code!)