Sorry, cocoon v.s. commons. On Wed, Jan 4, 2012 at 2:24 PM, Lance Norskog <[email protected]> wrote: > I have a separate solution: strip the quoted text. Quoted text in the > emails spams the term vectors; just plain TF-IDF is not enough to > combat this. Lucene has a lot of tools besides TFi-IDF. > > I have a patch, gotta start the JIRA. Also added more measurements to > the confusion matrix. I want to get a good measurement of the > performance on each producer and consumer, not just a global ratio. > 'testnb' gives 80% but one of the false boxes has a 1. This is bogus. > (I'm using your complete corpus of commons v.s. cocoon, classifying > dev v.s. user.) > > On Wed, Jan 4, 2012 at 6:57 AM, Grant Ingersoll (Updated) (JIRA) > <[email protected]> wrote: >> >> [ >> https://issues.apache.org/jira/browse/MAHOUT-939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel >> ] >> >> Grant Ingersoll updated MAHOUT-939: >> ----------------------------------- >> >> Attachment: MAHOUT-939.patch >> >> Here's a start on this. Added some more construction options to the >> AdaptiveLogisticRegression class. Still testing what values to use in >> TrainASFEmail, but thought I would put this up for now. >> >>> ASF Email SGD Examples don't produce good results >>> ------------------------------------------------- >>> >>> Key: MAHOUT-939 >>> URL: https://issues.apache.org/jira/browse/MAHOUT-939 >>> Project: Mahout >>> Issue Type: Bug >>> Affects Versions: 0.6 >>> Reporter: Grant Ingersoll >>> Assignee: Grant Ingersoll >>> Labels: MAHOUT_INTRO_CONTRIBUTE >>> Fix For: 0.7 >>> >>> Attachments: MAHOUT-939.patch >>> >>> >>> The SGD examples for the ASF email don't work all that well currently in >>> terms of quality. Also, need to determine how much memory is required for >>> vectors of cardinality size 100K. >> >> -- >> This message is automatically generated by JIRA. >> If you think it was sent incorrectly, please contact your JIRA >> administrators: >> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa >> For more information on JIRA, see: http://www.atlassian.com/software/jira >> >> > > > > -- > Lance Norskog > [email protected]
-- Lance Norskog [email protected]
