Whoops! I hadn't checked it since a few days ago. Thanks for updating
- I'm sure it'll be helpful for others trying to follow the example.
Regards,
Loek
On Feb 9, 2010, at 09:23, Robin Anil wrote:
Yes it was updated shortly. Its here.
http://cwiki.apache.org/MAHOUT/twentynewsgroups.html
On Tue, Feb 9, 2010 at 1:48 PM, Loek Cleophas <[email protected]
>wrote:
Hi Robin,
Thank you, that was definitely enough. I ran the
PrepareTwentyNewsgroups
task using the mvn exec command you suggested now (seems it's time
for me to
read up on Maven - useful how it takes care of finding the includes
etc.).
For training and testing, I'm using hadoop directly, which works
fine.
This is probably already on your/someone's to do list, but it might
be a
good idea to update the wiki page describing the example, so that
it deals
with 0.2 or the trunk vs. some pre 0.2 release version (?). I know,
you
probably have enough to work on without that..
Regards,
Loek
On Feb 7, 2010, at 13:48, Robin Anil wrote:
Is the mvn exec commands to run 20-newsgroups example enough?. I
havent
used
the ant for a while(read 8 months), and mahout has shifted to maven
anyways
So here goes. In examples directory
$ tar zxf 20news-18828.tar.gz
$ mkdir 20news-input
$ mvn -e exec:java
-
Dexec
.mainClass
=org.apache.mahout.classifier.bayes.PrepareTwentyNewsgroups
-Dexec.args="-p 20news-18828 -o 20news-input -a
org.apache.lucene.analysis.standard.StandardAnalyzer -c UTF-8"
To Train
$ mvn -e exec:java
-Dexec.mainClass=org.apache.mahout.classifier.bayes.TrainClassifier
-Dexec.args="-i 20news-input -o 20news-model -type cbayes -ng 1 -
source
hdfs"
To Test
$ mvn -e exec:java
-Dexec.mainClass=org.apache.mahout.classifier.bayes.TestClassifier
-Dexec.args="-m 20news-model -d 20news-input -type cbayes -ng 1 -
source
hdfs
-method sequential"
On Sun, Feb 7, 2010 at 2:26 PM, Loek Cleophas <[email protected]
wrote:
Hi
A few weeks ago, after some toiling, I managed to get the input
data for
the 20 newsgroups example into the format used by the Bayes
classifiers
in
Mahout. I did this on the trunk, and remember that it took some
tricks in
particular to get the PrepareTwentyNewsgroups code to run on the
expanded
data and extract/collapse it into the format used by Mahout's Bayes
classifiers.
For some reason now beyond me, I removed that copy of the trunk
with the
example data. Now, I'm trying to redo the same (albeit this time on
release
0.2), but am having trouble. I copied the maven/build.xml into
examples/build.xml according to a September post on the user
group (
http://old.nabble.com/20-newsgroups-example-td25235941.html).
That post
also suggested modifying the file, i.e. taking out the reference
classpath
refid="maven.test.classpath"/ (which indeed is not recognized
when I run
the
extract-20news-18828 ant target), and adding the following lines:
<classpath>
<path id="lib.path.ref">
<fileset dir="target" includes="*.jar"/>
</path>
<path id="lib.path.ref">
<fileset dir="lib" includes="*.jar"/>
</path>
</classpath>
The "target" one makes some sense, but the lib one does not - I
don't see
any lib folder in my mahout-0.2 checkout (even after having done
the mvn
install of core and mvn compile of examples). Can anyone (Robin?)
tell me
what lines to add instead to get the Ant task to work? I know I
managed
to
get it working before on my own, but can't remember for the life
of me
how I
did it :-\
Regards,
Loek