Hi Sandra, I tested the priority queue implementation it does seem that there is some problem with the priority queue implementation of hadoop import org.apache.hadoop.util.PriorityQueue; PriorityQueue<ClassifierResult> queue = new ClassifierResultPriorityQueue(3); queue.insert(new ClassifierResult("label1", 5)); queue.insert(new ClassifierResult("label2", 4)); queue.insert(new ClassifierResult("label3", 3)); queue.insert(new ClassifierResult("label4", 2)); queue.insert(new ClassifierResult("label5", 1));
assertEquals("Incorrect Size", 3, queue.size()); log.info(queue.pop().toString()); log.info(queue.pop().toString()); log.info(queue.pop().toString()); 09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest: ClassifierResult{category='label3', score=3.0} 09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest: ClassifierResult{category='label4', score=2.0} 09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest: ClassifierResult{category='label5', score=1.0} label1 and label2 were missing. I couldn't explain this behaviour. I changed it to java.util PriorityQueue. So its working now. On Wed, Sep 30, 2009 at 6:43 PM, Sandra Clover <sclo...@consultant.com>wrote: > Hi Robin, Thanks for the reply & for updating the documentation & > your advice. I'll try the trunk version. To answer your question I am > using Mahout version 0.1 & Hadoop 0.19.2. Hope this helps... Thanks > again, Robin Sandra. > > ----- Original Message ----- > From: "Robin Anil" > To: mahout-u...@lucene.apache.org > Subject: Re: Classify() method results anomoly - help! > Date: Wed, 30 Sep 2009 18:08:05 +0530 > > > Hi Sandra, those scores are indicative of the relative score not the > probability, Thank for bringing this to our notice, I will fix the > documentation, you may try the trunk and see if the former error is > coming. Also > could you tell me the version of hadoop you are using. > > > > On Wed, Sep 30, 2009 at 5:30 PM, Sandra Clover wrote: > > > Thanks Grant, I'll look into that. I've been having a look at the > > numbers returned from the getScore() method also. I have noticed a > range > > from 0 to around 20000.243434+ with numbers in between like: > > 1659.930763537123 According to the API documentation for this > method: > > "The label and the associated score(Usually probabilty)". This does > not > > look like probability to me. I was kind of expecting an answer > between 0 > > and 1 or 0 and 100 or something like that. Are these results > typical or > > indicative of some sort of bug? Once again, comments/suggestions > > appreciated.Sandra. > > > > > > > > ----- Original Message ----- > > From: "Grant Ingersoll" > > To: mahout-u...@lucene.apache.org > > Subject: Re: Classify() method results anomoly - help! > > Date: Tue, 29 Sep 2009 16:02:46 -0400 > > > > > > > > On Sep 29, 2009, at 8:47 AM, Sandra Clover wrote: > > > > > Hi, I'm using Mahout 0.1 for document classification (using the > > > distributed Bayesian Network) and I'm getting some answers back. > I > > > have noticed 1 thing that is really bugging me. I'm wondering can > > you > > > help please:- > > > Problem: Concernign the Classify() method there are 2 > constructors > > in > > > the API. The first one returns just one answer (according to the > > API it > > > returns: "the single best category"). The second constructor says > > that > > > it: "return the top numResults, ranked by score" My problem is > that > > I > > > have compared and contrasted the results in both techniques. I > have > > > noticed that the single best category does not appear at *all* in > > the > > > range of categories given by the second contructor! Strange no? I > > would > > > of expected that it should come top of the list. I have gone to a > > value > > > of 20 deep in the numResults level and have not even see in the > > best > > > category. Has anyone encountered this before? I would appreciate > > any > > > comments/suggestions/user-experience that you may like to share. > > Thanks, > > > Sandra. > > > > > > > That sounds like a bug. Can you try out the trunk version of > > Mahout and see if it is still there? A lot of the classification > > stuff has been reworked recently (I'm not even sure at the moment > > that those two classify methods are even still in the code!) > > > > -- > > An Excellent Credit Score is 750 > > See Yours in Just 2 Easy Steps! > > > > > > -- > An Excellent Credit Score is 750 > See Yours in Just 2 Easy Steps! > >