[ 
https://issues.apache.org/jira/browse/MAHOUT-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13182326#comment-13182326
 ] 

Lance Norskog commented on MAHOUT-939:
--------------------------------------

I did my testing on the Apache commons/cocoon dataset. rejectionPercent 20, 
mapreduce splits. The SGD option. Key sequence 3-3-2 for classification, sgd, 2 
labels.


39903 test files
=======================================================
Summary
-------------------------------------------------------
Correctly Classified Instances          :       1198        3.0023%
Incorrectly Classified Instances        :      38705       96.9977%
Total Classified Instances              :      39903

=======================================================
Confusion Matrix
-------------------------------------------------------
a       b       <--Classified as
861     19244    |  20105       a     = commons_apache_org
19461   337      |  19798       b     = cocoon_apache_org



Avg. Log-likelihood: -2.17279868506715 25%-ile: -2.572078768074387 75%-ile: 
-1.8312213190676514

                
> ASF Email Classification Examples don't always produce good results
> -------------------------------------------------------------------
>
>                 Key: MAHOUT-939
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-939
>             Project: Mahout
>          Issue Type: Bug
>    Affects Versions: 0.6
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>              Labels: MAHOUT_INTRO_CONTRIBUTE
>             Fix For: 0.7
>
>         Attachments: MAHOUT-939.patch, MAHOUT-939.patch, MAHOUT-939.patch, 
> strip_reject.patch
>
>
> The classification examples for the ASF email don't work all that well 
> currently in terms of quality when it comes to more than a few labels.  Also, 
> need to determine how much memory is required for vectors of cardinality size 
> 100K.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to