Yes indeed, the current implementation of TestForest is sequential. I guess I know what I have to this weekend =D
--- En date de : Lun 29.3.10, Yang Sun <[email protected]> a écrit : > De: Yang Sun <[email protected]> > Objet: Re: Re : Question about mahout Describe > À: [email protected] > Date: Lundi 29 mars 2010, 19h05 > Hi Deneche, > > Thanks for the update. I really appreciate your fast > response. I just tested > the new TestForest class. Yes, it can predict multiple > files. However, I > don't think it takes the advantage of Hadoop. It looks like > processing one > file at a time. My prediction dataset has about 400 million > records split in > ~300 files. One file needs about 2-3 mins for TestForest to > predict. The > whole dataset will take 10 hours which is too slow. I'll be > waiting for your > next update :) > > Thanks, > Yang > > On Sat, Mar 27, 2010 at 12:02 AM, deneche abdelhakim > <[email protected]>wrote: > > > One important clarification, for now only TestForest > can handle directory > > input paths, BuildForest won't work with input > directories > > > > --- En date de : Sam 27.3.10, deneche abdelhakim > <[email protected]> > a > > écrit : > > > > > De: deneche abdelhakim <[email protected]> > > > Objet: Re : Question about mahout Describe > > > À: [email protected] > > > Date: Samedi 27 mars 2010, 7h43 > > > Wasn't possible, but it is now :) > > > Just committed a patch that allow the input path > to be a > > > directory, checkout the last version of mahout > and run > > > TestForest like this: > > > > > > [localhost]$ hjar > > > examples/target/mahout-examples-0.4-SNAPSHOT.job > > > org.apache.mahout.df.mapreduce.TestForest -i > > > /user/fulltestdata -ds rf/testdata.info -m > > > rf-testmodel-5-100 -a -o rf/fulltestprediction > > > > > > for every file in fulltestdata (e.g. > > > fulltestdata/file1.data) you'll get a prediction > file in > > > fulltestprediction (e.g. > fulltestprediction/file1.data.out) > > > > > > Hope it helps you > > > > > > > > > --- En date de : Ven 26.3.10, Yang Sun <[email protected]> > > > a écrit : > > > > > > > De: Yang Sun <[email protected]> > > > > Objet: Question about mahout Describe > > > > À: [email protected] > > > > Date: Vendredi 26 mars 2010, 22h16 > > > > I was testing mahout recently. It > > > > runs great on small testing datasets. > > > > However, when I try to expand the dataset to > a big > > > dataset > > > > directory, I got > > > > the following error message: > > > > > > > > [localhost]$ hjar > > > > > examples/target/mahout-examples-0.4-SNAPSHOT.job > > > > org.apache.mahout.df.mapreduce.TestForest > -i > > > > /user/fulltestdata/* -ds rf/ > > > > testdata.info -m rf-testmodel-5-100 -a -o > > > > rf/fulltestprediction > > > > > > > > Exception in thread "main" > java.io.IOException: Cannot > > > open > > > > filename > > > > /user/fulltestdata/* > > > > at > > > > > > > > > > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1474) > > > > at > > > > > > > > > > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1465) > > > > at > > > > > > > > org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:372) > > > > at > > > > > > > > > > org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:178) > > > > at > > > > > > > > org.apache.hadoop.fs.FileSystem.open(FileSystem.java:351) > > > > at > > > > > > > > org.apache.mahout.df.mapreduce.TestForest.testForest(TestForest.java:190) > > > > at > > > > > > > > org.apache.mahout.df.mapreduce.TestForest.run(TestForest.java:137) > > > > at > > > > > > > > org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > > > > at > > > > > > > > org.apache.mahout.df.mapreduce.TestForest.main(TestForest.java:228) > > > > at > > > > > sun.reflect.NativeMethodAccessorImpl.invoke0(Native > > > Method) > > > > at > > > > > > > > > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > > > > at > > > > > > > > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > > > at > > > > > java.lang.reflect.Method.invoke(Method.java:597) > > > > at > > > > > org.apache.hadoop.util.RunJar.main(RunJar.java:156) > > > > My question is: can I use mahout on > directories > > > instead of > > > > single files? and > > > > how? > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > > > > >
