I was testing mahout recently. It runs great on small testing datasets.
However, when I try to expand the dataset to a big dataset directory, I got
the following error message:

[localhost]$ hjar examples/target/mahout-examples-0.4-SNAPSHOT.job
org.apache.mahout.df.mapreduce.TestForest -i /user/fulltestdata/* -ds rf/
testdata.info -m rf-testmodel-5-100 -a -o rf/fulltestprediction

Exception in thread "main" java.io.IOException: Cannot open filename
/user/fulltestdata/*
        at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1474)
        at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1465)
        at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:372)
        at
org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:178)
        at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:351)
        at
org.apache.mahout.df.mapreduce.TestForest.testForest(TestForest.java:190)
        at
org.apache.mahout.df.mapreduce.TestForest.run(TestForest.java:137)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at
org.apache.mahout.df.mapreduce.TestForest.main(TestForest.java:228)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
My question is: can I use mahout on directories instead of single files? and
how?

Thanks,

Reply via email to