[jira] Commented: (MAHOUT-138) Convert main() methods to use Commons CLI for argument processing
[ https://issues.apache.org/jira/browse/MAHOUT-138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12762898#action_12762898 ] Isabel Drost commented on MAHOUT-138: - Sean, you can easily follow what is going on with this issue on the subversion commit panel: https://issues.apache.org/jira/browse/MAHOUT-138?page=com.atlassian.jira.plugin.ext.subversion%3Asubversion-commits-tabpanel Convert main() methods to use Commons CLI for argument processing - Key: MAHOUT-138 URL: https://issues.apache.org/jira/browse/MAHOUT-138 Project: Mahout Issue Type: Improvement Affects Versions: 0.2 Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 0.3 Attachments: MAHOUT-138.patch, MAHOUT-138_fuzzyKMeansJob.patch Commons CLI is in the classpath and makes it much easier to handle command line args and they are more self-documenting when done right. We should convert our main methods to use CLI -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Classify() method results anomoly - help!
Hi Sandra, I tested the priority queue implementation it does seem that there is some problem with the priority queue implementation of hadoop import org.apache.hadoop.util.PriorityQueue; PriorityQueueClassifierResult queue = new ClassifierResultPriorityQueue(3); queue.insert(new ClassifierResult(label1, 5)); queue.insert(new ClassifierResult(label2, 4)); queue.insert(new ClassifierResult(label3, 3)); queue.insert(new ClassifierResult(label4, 2)); queue.insert(new ClassifierResult(label5, 1)); assertEquals(Incorrect Size, 3, queue.size()); log.info(queue.pop().toString()); log.info(queue.pop().toString()); log.info(queue.pop().toString()); 09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest: ClassifierResult{category='label3', score=3.0} 09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest: ClassifierResult{category='label4', score=2.0} 09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest: ClassifierResult{category='label5', score=1.0} label1 and label2 were missing. I couldn't explain this behaviour. I changed it to java.util PriorityQueue. So its working now. On Wed, Sep 30, 2009 at 6:43 PM, Sandra Clover sclo...@consultant.comwrote: Hi Robin, Thanks for the reply for updating the documentation your advice. I'll try the trunk version. To answer your question I am using Mahout version 0.1 Hadoop 0.19.2. Hope this helps... Thanks again, Robin Sandra. - Original Message - From: Robin Anil To: mahout-u...@lucene.apache.org Subject: Re: Classify() method results anomoly - help! Date: Wed, 30 Sep 2009 18:08:05 +0530 Hi Sandra, those scores are indicative of the relative score not the probability, Thank for bringing this to our notice, I will fix the documentation, you may try the trunk and see if the former error is coming. Also could you tell me the version of hadoop you are using. On Wed, Sep 30, 2009 at 5:30 PM, Sandra Clover wrote: Thanks Grant, I'll look into that. I've been having a look at the numbers returned from the getScore() method also. I have noticed a range from 0 to around 2.243434+ with numbers in between like: 1659.930763537123 According to the API documentation for this method: The label and the associated score(Usually probabilty). This does not look like probability to me. I was kind of expecting an answer between 0 and 1 or 0 and 100 or something like that. Are these results typical or indicative of some sort of bug? Once again, comments/suggestions appreciated.Sandra. - Original Message - From: Grant Ingersoll To: mahout-u...@lucene.apache.org Subject: Re: Classify() method results anomoly - help! Date: Tue, 29 Sep 2009 16:02:46 -0400 On Sep 29, 2009, at 8:47 AM, Sandra Clover wrote: Hi, I'm using Mahout 0.1 for document classification (using the distributed Bayesian Network) and I'm getting some answers back. I have noticed 1 thing that is really bugging me. I'm wondering can you help please:- Problem: Concernign the Classify() method there are 2 constructors in the API. The first one returns just one answer (according to the API it returns: the single best category). The second constructor says that it: return the top numResults, ranked by score My problem is that I have compared and contrasted the results in both techniques. I have noticed that the single best category does not appear at *all* in the range of categories given by the second contructor! Strange no? I would of expected that it should come top of the list. I have gone to a value of 20 deep in the numResults level and have not even see in the best category. Has anyone encountered this before? I would appreciate any comments/suggestions/user-experience that you may like to share. Thanks, Sandra. That sounds like a bug. Can you try out the trunk version of Mahout and see if it is still there? A lot of the classification stuff has been reworked recently (I'm not even sure at the moment that those two classify methods are even still in the code!) -- An Excellent Credit Score is 750 See Yours in Just 2 Easy Steps! -- An Excellent Credit Score is 750 See Yours in Just 2 Easy Steps!
[jira] Created: (MAHOUT-186) Classifier PriorityQueue returns erroneous results
Classifier PriorityQueue returns erroneous results -- Key: MAHOUT-186 URL: https://issues.apache.org/jira/browse/MAHOUT-186 Project: Mahout Issue Type: Bug Affects Versions: 0.1, 0.2 Reporter: Robin Anil Assignee: Robin Anil Fix For: 0.2 A simple test fails import org.apache.hadoop.util.PriorityQueue; PriorityQueueClassifierResult queue = new ClassifierResultPriorityQueue(3); queue.insert(new ClassifierResult(label1, 5)); queue.insert(new ClassifierResult(label2, 4)); queue.insert(new ClassifierResult(label3, 3)); queue.insert(new ClassifierResult(label4, 2)); queue.insert(new ClassifierResult(label5, 1)); assertEquals(Incorrect Size, 3, queue.size()); log.info(queue.pop().toString()); log.info(queue.pop().toString()); log.info(queue.pop().toString()); 09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest: ClassifierResult{category='label3', score=3.0} 09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest: ClassifierResult{category='label4', score=2.0} 09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest: ClassifierResult{category='label5', score=1.0} Expected label1 and label2 at the top -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Work started: (MAHOUT-186) Classifier PriorityQueue returns erroneous results
[ https://issues.apache.org/jira/browse/MAHOUT-186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on MAHOUT-186 started by Robin Anil. Classifier PriorityQueue returns erroneous results -- Key: MAHOUT-186 URL: https://issues.apache.org/jira/browse/MAHOUT-186 Project: Mahout Issue Type: Bug Affects Versions: 0.1, 0.2 Reporter: Robin Anil Assignee: Robin Anil Fix For: 0.2 A simple test fails import org.apache.hadoop.util.PriorityQueue; PriorityQueueClassifierResult queue = new ClassifierResultPriorityQueue(3); queue.insert(new ClassifierResult(label1, 5)); queue.insert(new ClassifierResult(label2, 4)); queue.insert(new ClassifierResult(label3, 3)); queue.insert(new ClassifierResult(label4, 2)); queue.insert(new ClassifierResult(label5, 1)); assertEquals(Incorrect Size, 3, queue.size()); log.info(queue.pop().toString()); log.info(queue.pop().toString()); log.info(queue.pop().toString()); 09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest: ClassifierResult{category='label3', score=3.0} 09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest: ClassifierResult{category='label4', score=2.0} 09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest: ClassifierResult{category='label5', score=1.0} Expected label1 and label2 at the top -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAHOUT-186) Classifier PriorityQueue returns erroneous results
[ https://issues.apache.org/jira/browse/MAHOUT-186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robin Anil updated MAHOUT-186: -- Attachment: MAHOUT-186.patch Fix: Added PriorityQueue Test. Used java.util.PriorityQueue instead of the org.apache.hadoop.util.PriorityQueue Classifier PriorityQueue returns erroneous results -- Key: MAHOUT-186 URL: https://issues.apache.org/jira/browse/MAHOUT-186 Project: Mahout Issue Type: Bug Affects Versions: 0.1, 0.2 Reporter: Robin Anil Assignee: Robin Anil Fix For: 0.2 Attachments: MAHOUT-186.patch A simple test fails import org.apache.hadoop.util.PriorityQueue; PriorityQueueClassifierResult queue = new ClassifierResultPriorityQueue(3); queue.insert(new ClassifierResult(label1, 5)); queue.insert(new ClassifierResult(label2, 4)); queue.insert(new ClassifierResult(label3, 3)); queue.insert(new ClassifierResult(label4, 2)); queue.insert(new ClassifierResult(label5, 1)); assertEquals(Incorrect Size, 3, queue.size()); log.info(queue.pop().toString()); log.info(queue.pop().toString()); log.info(queue.pop().toString()); 09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest: ClassifierResult{category='label3', score=3.0} 09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest: ClassifierResult{category='label4', score=2.0} 09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest: ClassifierResult{category='label5', score=1.0} Expected label1 and label2 at the top -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAHOUT-186) Classifier PriorityQueue returns erroneous results
[ https://issues.apache.org/jira/browse/MAHOUT-186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robin Anil updated MAHOUT-186: -- Status: Patch Available (was: In Progress) Classifier PriorityQueue returns erroneous results -- Key: MAHOUT-186 URL: https://issues.apache.org/jira/browse/MAHOUT-186 Project: Mahout Issue Type: Bug Affects Versions: 0.1, 0.2 Reporter: Robin Anil Assignee: Robin Anil Fix For: 0.2 Attachments: MAHOUT-186.patch A simple test fails import org.apache.hadoop.util.PriorityQueue; PriorityQueueClassifierResult queue = new ClassifierResultPriorityQueue(3); queue.insert(new ClassifierResult(label1, 5)); queue.insert(new ClassifierResult(label2, 4)); queue.insert(new ClassifierResult(label3, 3)); queue.insert(new ClassifierResult(label4, 2)); queue.insert(new ClassifierResult(label5, 1)); assertEquals(Incorrect Size, 3, queue.size()); log.info(queue.pop().toString()); log.info(queue.pop().toString()); log.info(queue.pop().toString()); 09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest: ClassifierResult{category='label3', score=3.0} 09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest: ClassifierResult{category='label4', score=2.0} 09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest: ClassifierResult{category='label5', score=1.0} Expected label1 and label2 at the top -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAHOUT-148) Convert Classification Algs to use richer Writable syntax
[ https://issues.apache.org/jira/browse/MAHOUT-148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robin Anil updated MAHOUT-148: -- Attachment: MAHOUT-148.patch Verified by running all combinations of Bayes|CBayes hdfs|hbase sequential|mapreduce both Training and Testing. Noticed a slight improvement in running time of various map/reduce jobs (20% decrease for 20newsgroups dataset) Convert Classification Algs to use richer Writable syntax - Key: MAHOUT-148 URL: https://issues.apache.org/jira/browse/MAHOUT-148 Project: Mahout Issue Type: Improvement Components: Classification Affects Versions: 0.1, 0.2 Reporter: Grant Ingersoll Assignee: Robin Anil Fix For: 0.2 Attachments: MAHOUT-148-Work-In-Progress.patch, MAHOUT-148.patch Much of the classification capabilities relies on parsing values out from the Text object just to determine what type of thing is being used. We should try to avoid having to do string manipulation for this kind of thing and instead encapsulate it in Writable instances. This should make things perform faster and bring stronger typing to the problem, which should make it easier to understand and debug the code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAHOUT-148) Convert Classification Algs to use richer Writable syntax
[ https://issues.apache.org/jira/browse/MAHOUT-148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robin Anil updated MAHOUT-148: -- Status: Patch Available (was: In Progress) Convert Classification Algs to use richer Writable syntax - Key: MAHOUT-148 URL: https://issues.apache.org/jira/browse/MAHOUT-148 Project: Mahout Issue Type: Improvement Components: Classification Affects Versions: 0.1, 0.2 Reporter: Grant Ingersoll Assignee: Robin Anil Fix For: 0.2 Attachments: MAHOUT-148-Work-In-Progress.patch, MAHOUT-148.patch Much of the classification capabilities relies on parsing values out from the Text object just to determine what type of thing is being used. We should try to avoid having to do string manipulation for this kind of thing and instead encapsulate it in Writable instances. This should make things perform faster and bring stronger typing to the problem, which should make it easier to understand and debug the code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-186) Classifier PriorityQueue returns erroneous results
[ https://issues.apache.org/jira/browse/MAHOUT-186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12763035#action_12763035 ] Sean Owen commented on MAHOUT-186: -- Not sure what's up with the hadoop class, but sure makes sense to use the standard PriorityQueue class. why do we need a custom subclass at all? seems like this can be done with a regular PriorityQueue, a Comparator, and use of the standard PriorityQueue methods. That is, do we need getTopResults(), for example. Classifier PriorityQueue returns erroneous results -- Key: MAHOUT-186 URL: https://issues.apache.org/jira/browse/MAHOUT-186 Project: Mahout Issue Type: Bug Affects Versions: 0.1, 0.2 Reporter: Robin Anil Assignee: Robin Anil Fix For: 0.2 Attachments: MAHOUT-186.patch A simple test fails import org.apache.hadoop.util.PriorityQueue; PriorityQueueClassifierResult queue = new ClassifierResultPriorityQueue(3); queue.insert(new ClassifierResult(label1, 5)); queue.insert(new ClassifierResult(label2, 4)); queue.insert(new ClassifierResult(label3, 3)); queue.insert(new ClassifierResult(label4, 2)); queue.insert(new ClassifierResult(label5, 1)); assertEquals(Incorrect Size, 3, queue.size()); log.info(queue.pop().toString()); log.info(queue.pop().toString()); log.info(queue.pop().toString()); 09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest: ClassifierResult{category='label3', score=3.0} 09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest: ClassifierResult{category='label4', score=2.0} 09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest: ClassifierResult{category='label5', score=1.0} Expected label1 and label2 at the top -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-170) Enable Java compile optimize flag during build
[ https://issues.apache.org/jira/browse/MAHOUT-170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12763085#action_12763085 ] Robin Anil commented on MAHOUT-170: --- HBase does jvm tuning out of the by enabling Concurrent GC Sweep in the hbase-env.sh For Sequential Versions we can enable it from the Shell Script For Hadoop jobs to get the benefit, it has to be put in hadoop-env.sh or in mapred.child.java.opts conf parameter Enable Java compile optimize flag during build -- Key: MAHOUT-170 URL: https://issues.apache.org/jira/browse/MAHOUT-170 Project: Mahout Issue Type: Improvement Affects Versions: 0.2 Reporter: Robin Anil Fix For: 0.2 Attachments: optimize.patch in maven compile plugin enable optimize=true flag -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-138) Convert main() methods to use Commons CLI for argument processing
[ https://issues.apache.org/jira/browse/MAHOUT-138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12763109#action_12763109 ] Sean Owen commented on MAHOUT-138: -- I see, there was a commit, from Isabel. Is it done then? Isabel you had suggested moving this to 0.3, so I suppose you're saying it's not done, but wonder what the delta is then. Grant I tend to agree with quick review and commits since patches very quickly go stale. But my question I suppose was, if you don't want to mark this for 0.3, who is waiting to do what for how long on this, if it is to block 0.2? This isn't my patch at all, I'm not involved. Convert main() methods to use Commons CLI for argument processing - Key: MAHOUT-138 URL: https://issues.apache.org/jira/browse/MAHOUT-138 Project: Mahout Issue Type: Improvement Affects Versions: 0.2 Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 0.3 Attachments: MAHOUT-138.patch, MAHOUT-138_fuzzyKMeansJob.patch Commons CLI is in the classpath and makes it much easier to handle command line args and they are more self-documenting when done right. We should convert our main methods to use CLI -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-186) Classifier PriorityQueue returns erroneous results
[ https://issues.apache.org/jira/browse/MAHOUT-186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12763124#action_12763124 ] Ted Dunning commented on MAHOUT-186: I don't quite understand the last comment, but generally if you want the top n items in descending order, you keep a descending queue as you say in order to make insertion efficient. It is generally good to cache the score of the least element to speed comparisons even a little bit more. Then when you want the results, you can just fill a list in reverse order or just do this: List r = new ArrayList(priorityQueue); Collections.reverse(r); Since this is pretty simple, I think I misunderstood the question. Classifier PriorityQueue returns erroneous results -- Key: MAHOUT-186 URL: https://issues.apache.org/jira/browse/MAHOUT-186 Project: Mahout Issue Type: Bug Affects Versions: 0.1, 0.2 Reporter: Robin Anil Assignee: Robin Anil Fix For: 0.2 Attachments: MAHOUT-186.patch A simple test fails import org.apache.hadoop.util.PriorityQueue; PriorityQueueClassifierResult queue = new ClassifierResultPriorityQueue(3); queue.insert(new ClassifierResult(label1, 5)); queue.insert(new ClassifierResult(label2, 4)); queue.insert(new ClassifierResult(label3, 3)); queue.insert(new ClassifierResult(label4, 2)); queue.insert(new ClassifierResult(label5, 1)); assertEquals(Incorrect Size, 3, queue.size()); log.info(queue.pop().toString()); log.info(queue.pop().toString()); log.info(queue.pop().toString()); 09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest: ClassifierResult{category='label3', score=3.0} 09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest: ClassifierResult{category='label4', score=2.0} 09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest: ClassifierResult{category='label5', score=1.0} Expected label1 and label2 at the top -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-186) Classifier PriorityQueue returns erroneous results
[ https://issues.apache.org/jira/browse/MAHOUT-186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12763162#action_12763162 ] Sean Owen commented on MAHOUT-186: -- I will make up an alternate patch that either shows what I mean or shows me I'm wrong. My central question is, what requires a custom subclass of PriorityQueue? I understand that the new List() thing doesn't give the items in order but that doesn't imply a subclass is needed. Classifier PriorityQueue returns erroneous results -- Key: MAHOUT-186 URL: https://issues.apache.org/jira/browse/MAHOUT-186 Project: Mahout Issue Type: Bug Affects Versions: 0.1, 0.2 Reporter: Robin Anil Assignee: Robin Anil Fix For: 0.2 Attachments: MAHOUT-186.patch A simple test fails import org.apache.hadoop.util.PriorityQueue; PriorityQueueClassifierResult queue = new ClassifierResultPriorityQueue(3); queue.insert(new ClassifierResult(label1, 5)); queue.insert(new ClassifierResult(label2, 4)); queue.insert(new ClassifierResult(label3, 3)); queue.insert(new ClassifierResult(label4, 2)); queue.insert(new ClassifierResult(label5, 1)); assertEquals(Incorrect Size, 3, queue.size()); log.info(queue.pop().toString()); log.info(queue.pop().toString()); log.info(queue.pop().toString()); 09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest: ClassifierResult{category='label3', score=3.0} 09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest: ClassifierResult{category='label4', score=2.0} 09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest: ClassifierResult{category='label5', score=1.0} Expected label1 and label2 at the top -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-186) Classifier PriorityQueue returns erroneous results
[ https://issues.apache.org/jira/browse/MAHOUT-186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12763172#action_12763172 ] Ted Dunning commented on MAHOUT-186: You are right that I should code up an example before speaking. But it does seem that, against all odds, that what I was suggesting works. Here is a test case that illustrates what I meant. I am still not sure what everybody is saying: {noformat} package com.infovell.logging.test; import junit.framework.TestCase; import java.util.PriorityQueue; import java.util.Random; import java.util.List; import java.util.ArrayList; import java.util.Collections; public class FooTest extends TestCase { public void testQueue() { PriorityQueueDouble pq = new PriorityQueueDouble(10); Random gen = new Random(123L); for (int i = 0; i 1000; i++) { double x = gen.nextDouble(); if (pq.size() 10 || x pq.peek()) { pq.add(x); while (pq.size() 10) { pq.remove(); } } } ListDouble r = new ArrayListDouble(pq); Collections.reverse(r); System.out.printf(%s\n, r); assertEquals(0.994991252160446, r.get(0), 1e-7); assertEquals(0.9881699208527764, r.get(9), 1e-7); } } {noformat} Classifier PriorityQueue returns erroneous results -- Key: MAHOUT-186 URL: https://issues.apache.org/jira/browse/MAHOUT-186 Project: Mahout Issue Type: Bug Affects Versions: 0.1, 0.2 Reporter: Robin Anil Assignee: Robin Anil Fix For: 0.2 Attachments: MAHOUT-186.patch A simple test fails import org.apache.hadoop.util.PriorityQueue; PriorityQueueClassifierResult queue = new ClassifierResultPriorityQueue(3); queue.insert(new ClassifierResult(label1, 5)); queue.insert(new ClassifierResult(label2, 4)); queue.insert(new ClassifierResult(label3, 3)); queue.insert(new ClassifierResult(label4, 2)); queue.insert(new ClassifierResult(label5, 1)); assertEquals(Incorrect Size, 3, queue.size()); log.info(queue.pop().toString()); log.info(queue.pop().toString()); log.info(queue.pop().toString()); 09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest: ClassifierResult{category='label3', score=3.0} 09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest: ClassifierResult{category='label4', score=2.0} 09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest: ClassifierResult{category='label5', score=1.0} Expected label1 and label2 at the top -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-138) Convert main() methods to use Commons CLI for argument processing
[ https://issues.apache.org/jira/browse/MAHOUT-138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12763265#action_12763265 ] Grant Ingersoll commented on MAHOUT-138: I think we just need to go through the various main() methods and see what is left. Convert main() methods to use Commons CLI for argument processing - Key: MAHOUT-138 URL: https://issues.apache.org/jira/browse/MAHOUT-138 Project: Mahout Issue Type: Improvement Affects Versions: 0.2 Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 0.3 Attachments: MAHOUT-138.patch, MAHOUT-138_fuzzyKMeansJob.patch Commons CLI is in the classpath and makes it much easier to handle command line args and they are more self-documenting when done right. We should convert our main methods to use CLI -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAHOUT-157) Frequent Pattern Mining using Parallel FP-Growth
[ https://issues.apache.org/jira/browse/MAHOUT-157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robin Anil updated MAHOUT-157: -- Attachment: MAHOUT-157-Oct-8.pfpgrowth.patch Implementation of Top K Parallel FPGrowth using the optimised algorithm detailed above. This implementation uses Custom Writable Classes instead of Text. Need to do testing and verification of results. But code wise the implementation is done Frequent Pattern Mining using Parallel FP-Growth Key: MAHOUT-157 URL: https://issues.apache.org/jira/browse/MAHOUT-157 Project: Mahout Issue Type: New Feature Components: Frequent Itemset/Association Rule Mining Affects Versions: 0.2 Reporter: Robin Anil Assignee: Robin Anil Fix For: 0.2 Attachments: MAHOUT-157-August-17.patch, MAHOUT-157-August-24.patch, MAHOUT-157-August-31.patch, MAHOUT-157-August-6.patch, MAHOUT-157-Combinations-BSD-License.patch, MAHOUT-157-Combinations-BSD-License.patch, MAHOUT-157-inProgress-August-5.patch, MAHOUT-157-Oct-1.patch, MAHOUT-157-Oct-8.pfpgrowth.patch, MAHOUT-157-September-10.patch, MAHOUT-157-September-18.patch, MAHOUT-157-September-5.patch Implement: http://infolab.stanford.edu/~echang/recsys08-69.pdf -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.