[jira] Commented: (MAHOUT-138) Convert main() methods to use Commons CLI for argument processing

2009-10-07 Thread Isabel Drost (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12762898#action_12762898
 ] 

Isabel Drost commented on MAHOUT-138:
-

Sean, you can easily follow what is going on with this issue on the subversion 
commit panel:

https://issues.apache.org/jira/browse/MAHOUT-138?page=com.atlassian.jira.plugin.ext.subversion%3Asubversion-commits-tabpanel

 Convert main() methods to use Commons CLI for argument processing
 -

 Key: MAHOUT-138
 URL: https://issues.apache.org/jira/browse/MAHOUT-138
 Project: Mahout
  Issue Type: Improvement
Affects Versions: 0.2
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 0.3

 Attachments: MAHOUT-138.patch, MAHOUT-138_fuzzyKMeansJob.patch


 Commons CLI is in the classpath and makes it much easier to handle command 
 line args and they are more self-documenting when done right.  We should 
 convert our main methods to use CLI

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Classify() method results anomoly - help!

2009-10-07 Thread Robin Anil
Hi Sandra,
 I tested the priority queue implementation it does seem that there is some
problem with the priority queue implementation of hadoop
import org.apache.hadoop.util.PriorityQueue;
PriorityQueueClassifierResult queue = new
ClassifierResultPriorityQueue(3);
queue.insert(new ClassifierResult(label1, 5));
queue.insert(new ClassifierResult(label2, 4));
queue.insert(new ClassifierResult(label3, 3));
queue.insert(new ClassifierResult(label4, 2));
queue.insert(new ClassifierResult(label5, 1));

assertEquals(Incorrect Size, 3, queue.size());
log.info(queue.pop().toString());
log.info(queue.pop().toString());
log.info(queue.pop().toString());

09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest:
ClassifierResult{category='label3', score=3.0}
09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest:
ClassifierResult{category='label4', score=2.0}
09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest:
ClassifierResult{category='label5', score=1.0}
label1 and label2 were missing. I couldn't explain this behaviour.

I changed it to java.util PriorityQueue. So its working now.


On Wed, Sep 30, 2009 at 6:43 PM, Sandra Clover sclo...@consultant.comwrote:

 Hi Robin, Thanks for the reply  for updating the documentation 
 your advice. I'll try the trunk version. To answer your question I am
 using Mahout version 0.1  Hadoop 0.19.2. Hope this helps... Thanks
 again, Robin Sandra.

  - Original Message -
  From: Robin Anil
  To: mahout-u...@lucene.apache.org
  Subject: Re: Classify() method results anomoly - help!
   Date: Wed, 30 Sep 2009 18:08:05 +0530


  Hi Sandra, those scores are indicative of the relative score not the
  probability, Thank for bringing this to our notice, I will fix the
  documentation, you may try the trunk and see if the former error is
  coming. Also
  could you tell me the version of hadoop you are using.



   On Wed, Sep 30, 2009 at 5:30 PM, Sandra Clover wrote:

   Thanks Grant, I'll look into that. I've been having a look at the
   numbers returned from the getScore() method also. I have noticed a
  range
   from 0 to around 2.243434+ with numbers in between like:
   1659.930763537123 According to the API documentation for this
  method:
   The label and the associated score(Usually probabilty). This does
  not
   look like probability to me. I was kind of expecting an answer
  between 0
   and 1 or 0 and 100 or something like that. Are these results
  typical or
   indicative of some sort of bug? Once again, comments/suggestions
   appreciated.Sandra.
  
  
  
   - Original Message -
   From: Grant Ingersoll
   To: mahout-u...@lucene.apache.org
   Subject: Re: Classify() method results anomoly - help!
   Date: Tue, 29 Sep 2009 16:02:46 -0400
  
  
  
   On Sep 29, 2009, at 8:47 AM, Sandra Clover wrote:
  
Hi, I'm using Mahout 0.1 for document classification (using the
distributed Bayesian Network) and I'm getting some answers back.
  I
have noticed 1 thing that is really bugging me. I'm wondering can
   you
help please:-
Problem: Concernign the Classify() method there are 2
  constructors
   in
the API. The first one returns just one answer (according to the
   API it
returns: the single best category). The second constructor says
   that
it: return the top numResults, ranked by score My problem is
  that
   I
have compared and contrasted the results in both techniques. I
  have
noticed that the single best category does not appear at *all* in
   the
range of categories given by the second contructor! Strange no? I
   would
of expected that it should come top of the list. I have gone to a
   value
of 20 deep in the numResults level and have not even see in the
   best
category. Has anyone encountered this before? I would appreciate
   any
comments/suggestions/user-experience that you may like to share.
   Thanks,
Sandra.
   
  
   That sounds like a bug. Can you try out the trunk version of
   Mahout and see if it is still there? A lot of the classification
   stuff has been reworked recently (I'm not even sure at the moment
   that those two classify methods are even still in the code!)
  
   --
   An Excellent Credit Score is 750
   See Yours in Just 2 Easy Steps!
  
  

 --
 An Excellent Credit Score is 750
 See Yours in Just 2 Easy Steps!




[jira] Created: (MAHOUT-186) Classifier PriorityQueue returns erroneous results

2009-10-07 Thread Robin Anil (JIRA)
Classifier PriorityQueue returns erroneous results
--

 Key: MAHOUT-186
 URL: https://issues.apache.org/jira/browse/MAHOUT-186
 Project: Mahout
  Issue Type: Bug
Affects Versions: 0.1, 0.2
Reporter: Robin Anil
Assignee: Robin Anil
 Fix For: 0.2


A simple test fails 

import org.apache.hadoop.util.PriorityQueue;
PriorityQueueClassifierResult queue = new ClassifierResultPriorityQueue(3);
queue.insert(new ClassifierResult(label1, 5));
queue.insert(new ClassifierResult(label2, 4));
queue.insert(new ClassifierResult(label3, 3));
queue.insert(new ClassifierResult(label4, 2));
queue.insert(new ClassifierResult(label5, 1));

assertEquals(Incorrect Size, 3, queue.size());
log.info(queue.pop().toString());
log.info(queue.pop().toString());
log.info(queue.pop().toString());

09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest: 
ClassifierResult{category='label3', score=3.0}
09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest: 
ClassifierResult{category='label4', score=2.0}
09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest: 
ClassifierResult{category='label5', score=1.0}

Expected label1 and label2 at the top

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Work started: (MAHOUT-186) Classifier PriorityQueue returns erroneous results

2009-10-07 Thread Robin Anil (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on MAHOUT-186 started by Robin Anil.

 Classifier PriorityQueue returns erroneous results
 --

 Key: MAHOUT-186
 URL: https://issues.apache.org/jira/browse/MAHOUT-186
 Project: Mahout
  Issue Type: Bug
Affects Versions: 0.1, 0.2
Reporter: Robin Anil
Assignee: Robin Anil
 Fix For: 0.2


 A simple test fails 
 import org.apache.hadoop.util.PriorityQueue;
 PriorityQueueClassifierResult queue = new ClassifierResultPriorityQueue(3);
 queue.insert(new ClassifierResult(label1, 5));
 queue.insert(new ClassifierResult(label2, 4));
 queue.insert(new ClassifierResult(label3, 3));
 queue.insert(new ClassifierResult(label4, 2));
 queue.insert(new ClassifierResult(label5, 1));
 
 assertEquals(Incorrect Size, 3, queue.size());
 log.info(queue.pop().toString());
 log.info(queue.pop().toString());
 log.info(queue.pop().toString());
 09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest: 
 ClassifierResult{category='label3', score=3.0}
 09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest: 
 ClassifierResult{category='label4', score=2.0}
 09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest: 
 ClassifierResult{category='label5', score=1.0}
 Expected label1 and label2 at the top

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAHOUT-186) Classifier PriorityQueue returns erroneous results

2009-10-07 Thread Robin Anil (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robin Anil updated MAHOUT-186:
--

Attachment: MAHOUT-186.patch

Fix:
Added PriorityQueue Test. 

Used java.util.PriorityQueue instead of the org.apache.hadoop.util.PriorityQueue



 Classifier PriorityQueue returns erroneous results
 --

 Key: MAHOUT-186
 URL: https://issues.apache.org/jira/browse/MAHOUT-186
 Project: Mahout
  Issue Type: Bug
Affects Versions: 0.1, 0.2
Reporter: Robin Anil
Assignee: Robin Anil
 Fix For: 0.2

 Attachments: MAHOUT-186.patch


 A simple test fails 
 import org.apache.hadoop.util.PriorityQueue;
 PriorityQueueClassifierResult queue = new ClassifierResultPriorityQueue(3);
 queue.insert(new ClassifierResult(label1, 5));
 queue.insert(new ClassifierResult(label2, 4));
 queue.insert(new ClassifierResult(label3, 3));
 queue.insert(new ClassifierResult(label4, 2));
 queue.insert(new ClassifierResult(label5, 1));
 
 assertEquals(Incorrect Size, 3, queue.size());
 log.info(queue.pop().toString());
 log.info(queue.pop().toString());
 log.info(queue.pop().toString());
 09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest: 
 ClassifierResult{category='label3', score=3.0}
 09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest: 
 ClassifierResult{category='label4', score=2.0}
 09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest: 
 ClassifierResult{category='label5', score=1.0}
 Expected label1 and label2 at the top

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAHOUT-186) Classifier PriorityQueue returns erroneous results

2009-10-07 Thread Robin Anil (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robin Anil updated MAHOUT-186:
--

Status: Patch Available  (was: In Progress)

 Classifier PriorityQueue returns erroneous results
 --

 Key: MAHOUT-186
 URL: https://issues.apache.org/jira/browse/MAHOUT-186
 Project: Mahout
  Issue Type: Bug
Affects Versions: 0.1, 0.2
Reporter: Robin Anil
Assignee: Robin Anil
 Fix For: 0.2

 Attachments: MAHOUT-186.patch


 A simple test fails 
 import org.apache.hadoop.util.PriorityQueue;
 PriorityQueueClassifierResult queue = new ClassifierResultPriorityQueue(3);
 queue.insert(new ClassifierResult(label1, 5));
 queue.insert(new ClassifierResult(label2, 4));
 queue.insert(new ClassifierResult(label3, 3));
 queue.insert(new ClassifierResult(label4, 2));
 queue.insert(new ClassifierResult(label5, 1));
 
 assertEquals(Incorrect Size, 3, queue.size());
 log.info(queue.pop().toString());
 log.info(queue.pop().toString());
 log.info(queue.pop().toString());
 09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest: 
 ClassifierResult{category='label3', score=3.0}
 09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest: 
 ClassifierResult{category='label4', score=2.0}
 09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest: 
 ClassifierResult{category='label5', score=1.0}
 Expected label1 and label2 at the top

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAHOUT-148) Convert Classification Algs to use richer Writable syntax

2009-10-07 Thread Robin Anil (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robin Anil updated MAHOUT-148:
--

Attachment: MAHOUT-148.patch

Verified by running all combinations of

Bayes|CBayes
hdfs|hbase 
sequential|mapreduce
both Training and Testing.

Noticed a slight improvement in running time of various map/reduce jobs (20% 
decrease for 20newsgroups dataset)



 Convert Classification Algs to use richer Writable syntax
 -

 Key: MAHOUT-148
 URL: https://issues.apache.org/jira/browse/MAHOUT-148
 Project: Mahout
  Issue Type: Improvement
  Components: Classification
Affects Versions: 0.1, 0.2
Reporter: Grant Ingersoll
Assignee: Robin Anil
 Fix For: 0.2

 Attachments: MAHOUT-148-Work-In-Progress.patch, MAHOUT-148.patch


 Much of the classification capabilities relies on parsing values out from the 
 Text object just to determine what type of thing is being used.  We should 
 try to avoid having to do string manipulation for this kind of thing and 
 instead encapsulate it in Writable instances.  This should make things 
 perform faster and bring stronger typing to the problem, which should make it 
 easier to understand and debug the code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAHOUT-148) Convert Classification Algs to use richer Writable syntax

2009-10-07 Thread Robin Anil (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robin Anil updated MAHOUT-148:
--

Status: Patch Available  (was: In Progress)

 Convert Classification Algs to use richer Writable syntax
 -

 Key: MAHOUT-148
 URL: https://issues.apache.org/jira/browse/MAHOUT-148
 Project: Mahout
  Issue Type: Improvement
  Components: Classification
Affects Versions: 0.1, 0.2
Reporter: Grant Ingersoll
Assignee: Robin Anil
 Fix For: 0.2

 Attachments: MAHOUT-148-Work-In-Progress.patch, MAHOUT-148.patch


 Much of the classification capabilities relies on parsing values out from the 
 Text object just to determine what type of thing is being used.  We should 
 try to avoid having to do string manipulation for this kind of thing and 
 instead encapsulate it in Writable instances.  This should make things 
 perform faster and bring stronger typing to the problem, which should make it 
 easier to understand and debug the code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-186) Classifier PriorityQueue returns erroneous results

2009-10-07 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12763035#action_12763035
 ] 

Sean Owen commented on MAHOUT-186:
--

Not sure what's up with the hadoop class, but sure makes sense to use the 
standard PriorityQueue class. why do we need a custom subclass at all? seems 
like this can be done with a regular PriorityQueue, a Comparator, and use of 
the standard PriorityQueue methods. That is, do we need getTopResults(), for 
example.

 Classifier PriorityQueue returns erroneous results
 --

 Key: MAHOUT-186
 URL: https://issues.apache.org/jira/browse/MAHOUT-186
 Project: Mahout
  Issue Type: Bug
Affects Versions: 0.1, 0.2
Reporter: Robin Anil
Assignee: Robin Anil
 Fix For: 0.2

 Attachments: MAHOUT-186.patch


 A simple test fails 
 import org.apache.hadoop.util.PriorityQueue;
 PriorityQueueClassifierResult queue = new ClassifierResultPriorityQueue(3);
 queue.insert(new ClassifierResult(label1, 5));
 queue.insert(new ClassifierResult(label2, 4));
 queue.insert(new ClassifierResult(label3, 3));
 queue.insert(new ClassifierResult(label4, 2));
 queue.insert(new ClassifierResult(label5, 1));
 
 assertEquals(Incorrect Size, 3, queue.size());
 log.info(queue.pop().toString());
 log.info(queue.pop().toString());
 log.info(queue.pop().toString());
 09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest: 
 ClassifierResult{category='label3', score=3.0}
 09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest: 
 ClassifierResult{category='label4', score=2.0}
 09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest: 
 ClassifierResult{category='label5', score=1.0}
 Expected label1 and label2 at the top

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-170) Enable Java compile optimize flag during build

2009-10-07 Thread Robin Anil (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12763085#action_12763085
 ] 

Robin Anil commented on MAHOUT-170:
---

HBase does jvm tuning out of the by enabling Concurrent GC Sweep  in the 
hbase-env.sh

For Sequential Versions we can enable it from the Shell Script
For Hadoop jobs to get the benefit, it has to be put in hadoop-env.sh or in 
mapred.child.java.opts  conf parameter

 Enable Java compile optimize flag during build
 --

 Key: MAHOUT-170
 URL: https://issues.apache.org/jira/browse/MAHOUT-170
 Project: Mahout
  Issue Type: Improvement
Affects Versions: 0.2
Reporter: Robin Anil
 Fix For: 0.2

 Attachments: optimize.patch


 in maven compile plugin enable optimize=true flag

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-138) Convert main() methods to use Commons CLI for argument processing

2009-10-07 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12763109#action_12763109
 ] 

Sean Owen commented on MAHOUT-138:
--

I see, there was a commit, from Isabel. Is it done then? Isabel you had 
suggested moving this to 0.3, so I suppose you're saying it's not done, but 
wonder what the delta is then.

Grant I tend to agree with quick review and commits since patches very quickly 
go stale. But my question I suppose was, if you don't want to mark this for 
0.3, who is waiting to do what for how long on this, if it is to block 0.2?

This isn't my patch at all, I'm not involved.

 Convert main() methods to use Commons CLI for argument processing
 -

 Key: MAHOUT-138
 URL: https://issues.apache.org/jira/browse/MAHOUT-138
 Project: Mahout
  Issue Type: Improvement
Affects Versions: 0.2
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 0.3

 Attachments: MAHOUT-138.patch, MAHOUT-138_fuzzyKMeansJob.patch


 Commons CLI is in the classpath and makes it much easier to handle command 
 line args and they are more self-documenting when done right.  We should 
 convert our main methods to use CLI

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-186) Classifier PriorityQueue returns erroneous results

2009-10-07 Thread Ted Dunning (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12763124#action_12763124
 ] 

Ted Dunning commented on MAHOUT-186:



I don't quite understand the last comment, but generally if you want the top n 
items in descending order, you keep a descending queue as you say in order to 
make insertion efficient.  It is generally good to cache the score of the least 
element to speed comparisons even a little bit more.

Then when you want the results, you can just fill a list in reverse order 

or just do this:

List r = new ArrayList(priorityQueue);
Collections.reverse(r);

Since this is pretty simple, I think I misunderstood the question.

 Classifier PriorityQueue returns erroneous results
 --

 Key: MAHOUT-186
 URL: https://issues.apache.org/jira/browse/MAHOUT-186
 Project: Mahout
  Issue Type: Bug
Affects Versions: 0.1, 0.2
Reporter: Robin Anil
Assignee: Robin Anil
 Fix For: 0.2

 Attachments: MAHOUT-186.patch


 A simple test fails 
 import org.apache.hadoop.util.PriorityQueue;
 PriorityQueueClassifierResult queue = new ClassifierResultPriorityQueue(3);
 queue.insert(new ClassifierResult(label1, 5));
 queue.insert(new ClassifierResult(label2, 4));
 queue.insert(new ClassifierResult(label3, 3));
 queue.insert(new ClassifierResult(label4, 2));
 queue.insert(new ClassifierResult(label5, 1));
 
 assertEquals(Incorrect Size, 3, queue.size());
 log.info(queue.pop().toString());
 log.info(queue.pop().toString());
 log.info(queue.pop().toString());
 09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest: 
 ClassifierResult{category='label3', score=3.0}
 09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest: 
 ClassifierResult{category='label4', score=2.0}
 09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest: 
 ClassifierResult{category='label5', score=1.0}
 Expected label1 and label2 at the top

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-186) Classifier PriorityQueue returns erroneous results

2009-10-07 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12763162#action_12763162
 ] 

Sean Owen commented on MAHOUT-186:
--

I will make up an alternate patch that either shows what I mean or shows me I'm 
wrong. My central question is, what requires a custom subclass of 
PriorityQueue? I understand that the new List() thing doesn't give the items 
in order but that doesn't imply a subclass is needed.

 Classifier PriorityQueue returns erroneous results
 --

 Key: MAHOUT-186
 URL: https://issues.apache.org/jira/browse/MAHOUT-186
 Project: Mahout
  Issue Type: Bug
Affects Versions: 0.1, 0.2
Reporter: Robin Anil
Assignee: Robin Anil
 Fix For: 0.2

 Attachments: MAHOUT-186.patch


 A simple test fails 
 import org.apache.hadoop.util.PriorityQueue;
 PriorityQueueClassifierResult queue = new ClassifierResultPriorityQueue(3);
 queue.insert(new ClassifierResult(label1, 5));
 queue.insert(new ClassifierResult(label2, 4));
 queue.insert(new ClassifierResult(label3, 3));
 queue.insert(new ClassifierResult(label4, 2));
 queue.insert(new ClassifierResult(label5, 1));
 
 assertEquals(Incorrect Size, 3, queue.size());
 log.info(queue.pop().toString());
 log.info(queue.pop().toString());
 log.info(queue.pop().toString());
 09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest: 
 ClassifierResult{category='label3', score=3.0}
 09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest: 
 ClassifierResult{category='label4', score=2.0}
 09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest: 
 ClassifierResult{category='label5', score=1.0}
 Expected label1 and label2 at the top

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-186) Classifier PriorityQueue returns erroneous results

2009-10-07 Thread Ted Dunning (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12763172#action_12763172
 ] 

Ted Dunning commented on MAHOUT-186:


You are right that I should code up an example before speaking.  But it does 
seem that, against all odds, that what I was suggesting works.

Here is a test case that illustrates what I meant.  I am still not sure what 
everybody is saying:

{noformat}
package com.infovell.logging.test;

import junit.framework.TestCase;

import java.util.PriorityQueue;
import java.util.Random;
import java.util.List;
import java.util.ArrayList;
import java.util.Collections;

public class FooTest extends TestCase {
public void testQueue() {
PriorityQueueDouble pq = new PriorityQueueDouble(10);
Random gen = new Random(123L);
for (int i = 0; i  1000; i++) {
double x = gen.nextDouble();
if (pq.size()  10 || x  pq.peek()) {
pq.add(x);
while (pq.size()  10) {
pq.remove();
}
}
}

ListDouble r = new ArrayListDouble(pq);
Collections.reverse(r);
System.out.printf(%s\n, r);
assertEquals(0.994991252160446, r.get(0), 1e-7);
assertEquals(0.9881699208527764, r.get(9), 1e-7);
}
}
{noformat}

 Classifier PriorityQueue returns erroneous results
 --

 Key: MAHOUT-186
 URL: https://issues.apache.org/jira/browse/MAHOUT-186
 Project: Mahout
  Issue Type: Bug
Affects Versions: 0.1, 0.2
Reporter: Robin Anil
Assignee: Robin Anil
 Fix For: 0.2

 Attachments: MAHOUT-186.patch


 A simple test fails 
 import org.apache.hadoop.util.PriorityQueue;
 PriorityQueueClassifierResult queue = new ClassifierResultPriorityQueue(3);
 queue.insert(new ClassifierResult(label1, 5));
 queue.insert(new ClassifierResult(label2, 4));
 queue.insert(new ClassifierResult(label3, 3));
 queue.insert(new ClassifierResult(label4, 2));
 queue.insert(new ClassifierResult(label5, 1));
 
 assertEquals(Incorrect Size, 3, queue.size());
 log.info(queue.pop().toString());
 log.info(queue.pop().toString());
 log.info(queue.pop().toString());
 09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest: 
 ClassifierResult{category='label3', score=3.0}
 09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest: 
 ClassifierResult{category='label4', score=2.0}
 09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest: 
 ClassifierResult{category='label5', score=1.0}
 Expected label1 and label2 at the top

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-138) Convert main() methods to use Commons CLI for argument processing

2009-10-07 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12763265#action_12763265
 ] 

Grant Ingersoll commented on MAHOUT-138:


I think we just need to go through the various main() methods and see what is 
left.

 Convert main() methods to use Commons CLI for argument processing
 -

 Key: MAHOUT-138
 URL: https://issues.apache.org/jira/browse/MAHOUT-138
 Project: Mahout
  Issue Type: Improvement
Affects Versions: 0.2
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 0.3

 Attachments: MAHOUT-138.patch, MAHOUT-138_fuzzyKMeansJob.patch


 Commons CLI is in the classpath and makes it much easier to handle command 
 line args and they are more self-documenting when done right.  We should 
 convert our main methods to use CLI

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAHOUT-157) Frequent Pattern Mining using Parallel FP-Growth

2009-10-07 Thread Robin Anil (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robin Anil updated MAHOUT-157:
--

Attachment: MAHOUT-157-Oct-8.pfpgrowth.patch

Implementation of Top K Parallel FPGrowth using the optimised algorithm 
detailed above. 
This implementation uses Custom Writable Classes instead of Text. 


Need to do testing and verification of results.  But code wise the 
implementation is done

 Frequent Pattern Mining using Parallel FP-Growth
 

 Key: MAHOUT-157
 URL: https://issues.apache.org/jira/browse/MAHOUT-157
 Project: Mahout
  Issue Type: New Feature
  Components: Frequent Itemset/Association Rule Mining
Affects Versions: 0.2
Reporter: Robin Anil
Assignee: Robin Anil
 Fix For: 0.2

 Attachments: MAHOUT-157-August-17.patch, MAHOUT-157-August-24.patch, 
 MAHOUT-157-August-31.patch, MAHOUT-157-August-6.patch, 
 MAHOUT-157-Combinations-BSD-License.patch, 
 MAHOUT-157-Combinations-BSD-License.patch, 
 MAHOUT-157-inProgress-August-5.patch, MAHOUT-157-Oct-1.patch, 
 MAHOUT-157-Oct-8.pfpgrowth.patch, MAHOUT-157-September-10.patch, 
 MAHOUT-157-September-18.patch, MAHOUT-157-September-5.patch


 Implement: http://infolab.stanford.edu/~echang/recsys08-69.pdf

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.