Hi,

Just committed a new version of TestForest. If you add "-mr" to the command 
line it should launch a Hadoop Job to classify the data. This is a basic 
implementation that can't compute the confusion matrix, so using "-a" has no 
effect. This implementation is also not tested very well (being a work in 
progress), so if you want to test it, select a random subset of your test data 
and classify them using the sequential implementation (without using -mr) then 
compare the predictions with those of the distributed implementation, the 
results won't be exactly the same (due the random behavior of the classifier 
when it encounter ties) but 90% of the predictions should be the same.

let me know what you think of it. I'm working on the confusion matrix, but it 
should take some time to finish

--- En date de : Ven 26.3.10, Yang Sun <[email protected]> a écrit :

> De: Yang Sun <[email protected]>
> Objet: Question about mahout Describe
> À: [email protected]
> Date: Vendredi 26 mars 2010, 22h16
> I was testing mahout recently. It
> runs great on small testing datasets.
> However, when I try to expand the dataset to a big dataset
> directory, I got
> the following error message:
> 
> [localhost]$ hjar
> examples/target/mahout-examples-0.4-SNAPSHOT.job
> org.apache.mahout.df.mapreduce.TestForest -i
> /user/fulltestdata/* -ds rf/
> testdata.info -m rf-testmodel-5-100 -a -o
> rf/fulltestprediction
> 
> Exception in thread "main" java.io.IOException: Cannot open
> filename
> /user/fulltestdata/*
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1474)
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1465)
>         at
> org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:372)
>         at
> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:178)
>         at
> org.apache.hadoop.fs.FileSystem.open(FileSystem.java:351)
>         at
> org.apache.mahout.df.mapreduce.TestForest.testForest(TestForest.java:190)
>         at
> org.apache.mahout.df.mapreduce.TestForest.run(TestForest.java:137)
>         at
> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at
> org.apache.mahout.df.mapreduce.TestForest.main(TestForest.java:228)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at
> java.lang.reflect.Method.invoke(Method.java:597)
>         at
> org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> My question is: can I use mahout on directories instead of
> single files? and
> how?
> 
> Thanks,
> 



Reply via email to