[ 
https://issues.apache.org/jira/browse/MAHOUT-939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lance Norskog updated MAHOUT-939:
---------------------------------

    Attachment: strip_reject.patch

This patch includes the MAHOUT-941 code for stripping quoted text, and also an 
option for rejecting messages which contain specific lines. The rejecter allows 
you to identify and remove "spam" messages.

For the Apache mails, we can add a job props file that lists patterns to 
remove, like build notifications.
                
> ASF Email Classification Examples don't always produce good results
> -------------------------------------------------------------------
>
>                 Key: MAHOUT-939
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-939
>             Project: Mahout
>          Issue Type: Bug
>    Affects Versions: 0.6
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>              Labels: MAHOUT_INTRO_CONTRIBUTE
>             Fix For: 0.7
>
>         Attachments: MAHOUT-939.patch, MAHOUT-939.patch, MAHOUT-939.patch, 
> strip_reject.patch
>
>
> The classification examples for the ASF email don't work all that well 
> currently in terms of quality when it comes to more than a few labels.  Also, 
> need to determine how much memory is required for vectors of cardinality size 
> 100K.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to