[
https://issues.apache.org/jira/browse/MAHOUT-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13182326#comment-13182326
]
Lance Norskog commented on MAHOUT-939:
--------------------------------------
I did my testing on the Apache commons/cocoon dataset. rejectionPercent 20,
mapreduce splits. The SGD option. Key sequence 3-3-2 for classification, sgd, 2
labels.
39903 test files
=======================================================
Summary
-------------------------------------------------------
Correctly Classified Instances : 1198 3.0023%
Incorrectly Classified Instances : 38705 96.9977%
Total Classified Instances : 39903
=======================================================
Confusion Matrix
-------------------------------------------------------
a b <--Classified as
861 19244 | 20105 a = commons_apache_org
19461 337 | 19798 b = cocoon_apache_org
Avg. Log-likelihood: -2.17279868506715 25%-ile: -2.572078768074387 75%-ile:
-1.8312213190676514
> ASF Email Classification Examples don't always produce good results
> -------------------------------------------------------------------
>
> Key: MAHOUT-939
> URL: https://issues.apache.org/jira/browse/MAHOUT-939
> Project: Mahout
> Issue Type: Bug
> Affects Versions: 0.6
> Reporter: Grant Ingersoll
> Assignee: Grant Ingersoll
> Labels: MAHOUT_INTRO_CONTRIBUTE
> Fix For: 0.7
>
> Attachments: MAHOUT-939.patch, MAHOUT-939.patch, MAHOUT-939.patch,
> strip_reject.patch
>
>
> The classification examples for the ASF email don't work all that well
> currently in terms of quality when it comes to more than a few labels. Also,
> need to determine how much memory is required for vectors of cardinality size
> 100K.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira