[
https://issues.apache.org/jira/browse/MAHOUT-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13289373#comment-13289373
]
Robin Anil commented on MAHOUT-939:
-----------------------------------
Verified SGD works as well.
encoding time: 200sec
training time: 510sec
testing time: 4sec
Running SGD Training
hadoop binary is not in PATH,HADOOP_HOME/bin,HADOOP_PREFIX/bin, running locally
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/Users/robinanil/mahout-revert/examples/target/mahout-examples-0.7-SNAPSHOT-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/Users/robinanil/mahout-revert/examples/target/dependency/slf4j-jcl-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/Users/robinanil/mahout-revert/examples/target/dependency/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
12/06/05 14:29:01 WARN driver.MahoutDriver: No
org.apache.mahout.classifier.sgd.TrainASFEmail.props found on classpath, will
use command-line arguments only
12/06/05 14:29:01 INFO common.AbstractJob: Command line arguments:
{--cardinality=[100000], --categories=[2], --endPhase=[2147483647],
--input=[/tmp/mahout-asf/classification/sgd/splits/mapRedOut/],
--output=[/tmp/mahout-asf/classification/sgd/models], --poolSize=[5],
--startPhase=[0], --tempDir=[temp], --threads=[20]}
2012-06-05 14:29:01.949 java[91795:1903] Unable to load realm info from
SCDynamicStore
159915 training files
0.00 0.00 0.00 0.00 0.0000000 0.0000000 1 0.000
0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 2 0.000
0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 3 0.000
0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 4 0.000
0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 6 0.000
0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 8 0.000
0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 10 0.000
0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 12 0.000
0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 15 0.000
0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 20 0.000
0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 25 0.000
0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 30 0.000
0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 40 0.000
0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 50 0.000
0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 60 0.000
0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 70 0.000
0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 80 0.000
0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 100 0.000
0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 120 0.000
0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 140 0.000
0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 150 0.000
0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 200 0.000
0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 250 0.000
0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 300 0.000
0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 400 0.000
0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 500 0.000
0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 600 0.000
0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 700 0.000
0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 800 0.000
0.00 none
0.00 290.00 284.00 0.00 0.079926227 1.0003742e-08 1000 -0.692
93.38 none
0.00 290.00 284.00 0.00 0.079926227 1.0003742e-08 1200 -0.692
93.38 none
0.00 290.00 284.00 0.00 0.079926227 1.0003742e-08 1400 -0.692
93.38 none
0.00 290.00 284.00 0.00 0.079926227 1.0003742e-08 1500 -0.692
93.38 none
0.00 286.00 500.00 0.01 0.079926227 1.0000000e-08 2000 -0.692
94.13 none
0.00 207.00 555.00 0.01 0.079926227 1.0000000e-08 2500 -0.692
94.13 none
0.00 207.00 555.00 0.01 0.079926227 1.0000000e-08 3000 -0.692
94.13 none
0.00 89.00 196.00 0.01 0.079926227 1.0000000e-08 4000 -0.692
94.23 none
0.00 74.00 244.00 0.01 0.079926264 1.0000000e-08 5000 -0.691
94.88 none
0.00 74.00 998.00 0.01 0.079955672 1.0000000e-08 6000 -0.691
93.95 none
0.00 70.00 1127.00 0.01 0.079955672 1.0000000e-08 7000 -0.691
93.80 none
0.00 70.00 2057.00 0.01 0.079955672 1.0000000e-08 8000 -0.691
93.73 none
0.00 70.00 630.00 0.01 0.079955672 1.0000000e-08 10000 -0.691
94.08 none
0.00 70.00 365.00 0.01 0.079955672 1.0000000e-08 12000 -0.691
93.99 none
0.00 66.00 674.00 0.01 0.079955672 1.0000000e-08 14000 -0.691
94.22 none
0.00 66.00 310.00 0.01 0.079955672 1.0000000e-08 15000 -0.691
94.25 none
0.00 66.00 449.00 0.01 0.079955672 1.0000000e-08 20000 -0.691
94.31 none
0.00 65.00 418.00 0.01 0.079955672 1.0000000e-08 25000 -0.691
94.26 none
0.00 63.00 409.00 0.01 0.079955672 1.0000000e-08 30000 -0.691
94.41 none
0.00 61.00 461.00 0.01 0.079955672 1.0000000e-08 40000 -0.691
94.55 none
0.00 61.00 855.00 0.01 0.079955672 1.0000000e-08 50000 -0.691
94.41 none
0.00 59.00 229.00 0.01 0.079955672 1.0000000e-08 60000 -0.691
93.88 none
0.00 59.00 211.00 0.01 0.079955672 1.0000000e-08 70000 -0.691
94.56 none
0.00 58.00 576.00 0.01 0.079955672 1.0000000e-08 80000 -0.691
93.99 none
0.00 55.00 250.00 0.01 0.079955672 1.0000000e-08 100000 -0.691
95.35 none
0.00 55.00 419.00 0.01 0.079955672 1.0000000e-08 120000 -0.691
94.26 none
0.00 55.00 94.00 0.01 0.079955672 1.0000000e-08 140000 -0.691
94.01 none
0.00 55.00 117.00 0.01 0.079955672 1.0000000e-08 150000 -0.691
93.45 none
exiting main, writing model to /tmp/mahout-asf/classification/sgd/models
Word counts
12/06/05 14:37:32 INFO driver.MahoutDriver: Program took 510979 ms (Minutes:
8.516316666666667)
Running Test
hadoop binary is not in PATH,HADOOP_HOME/bin,HADOOP_PREFIX/bin, running locally
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/Users/robinanil/mahout-revert/examples/target/mahout-examples-0.7-SNAPSHOT-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/Users/robinanil/mahout-revert/examples/target/dependency/slf4j-jcl-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/Users/robinanil/mahout-revert/examples/target/dependency/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
12/06/05 14:37:32 WARN driver.MahoutDriver: No
org.apache.mahout.classifier.sgd.TestASFEmail.props found on classpath, will
use command-line arguments only
2012-06-05 14:37:33.401 java[93869:1903] Unable to load realm info from
SCDynamicStore
40085 test files
=======================================================
Summary
-------------------------------------------------------
Correctly Classified Instances : 38346 95.6617%
Incorrectly Classified Instances : 1739 4.3383%
Total Classified Instances : 40085
=======================================================
Confusion Matrix
-------------------------------------------------------
a b <--Classified as
18683 1478 | 20161 a = cocoon_apache_org
261 19663 | 19924 b = commons_apache_org
Avg. Log-likelihood: -0.6909925745820731 25%-ile: -0.6923558418191567 75%-ile:
-0.6915062054522141
12/06/05 14:37:37 INFO driver.MahoutDriver: Program took 4247 ms (Minutes:
0.07078333333333334)
> ASF Email Classification Examples don't always produce good results
> -------------------------------------------------------------------
>
> Key: MAHOUT-939
> URL: https://issues.apache.org/jira/browse/MAHOUT-939
> Project: Mahout
> Issue Type: Bug
> Affects Versions: 0.6
> Reporter: Grant Ingersoll
> Assignee: Robin Anil
> Labels: MAHOUT_INTRO_CONTRIBUTE
> Fix For: 0.8
>
> Attachments: 939.patch, MAHOUT-939.patch, MAHOUT-939.patch,
> MAHOUT-939.patch, asf_sample_list.txt, bayes.patch, strip_reject.patch
>
>
> The classification examples for the ASF email don't work all that well
> currently in terms of quality when it comes to more than a few labels. Also,
> need to determine how much memory is required for vectors of cardinality size
> 100K.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira