Hi Deneche,

Thanks for the update. I really appreciate your fast response. I just tested
the new TestForest class. Yes, it can predict multiple files. However, I
don't think it takes the advantage of Hadoop. It looks like processing one
file at a time. My prediction dataset has about 400 million records split in
~300 files. One file needs about 2-3 mins for TestForest to predict. The
whole dataset will take 10 hours which is too slow. I'll be waiting for your
next update :)

Thanks,
Yang

On Sat, Mar 27, 2010 at 12:02 AM, deneche abdelhakim <[email protected]>wrote:

> One important clarification, for now only TestForest can handle directory
> input paths, BuildForest won't work with input directories
>
> --- En date de : Sam 27.3.10, deneche abdelhakim <[email protected]> a
> écrit :
>
> > De: deneche abdelhakim <[email protected]>
> > Objet: Re : Question about mahout Describe
> > À: [email protected]
> > Date: Samedi 27 mars 2010, 7h43
>  > Wasn't possible, but it is now :)
> > Just committed a patch that allow the input path to be a
> > directory, checkout the last version of mahout and run
> > TestForest like this:
> >
> > [localhost]$ hjar
> > examples/target/mahout-examples-0.4-SNAPSHOT.job
> > org.apache.mahout.df.mapreduce.TestForest -i
> > /user/fulltestdata -ds rf/testdata.info -m
> > rf-testmodel-5-100 -a -o rf/fulltestprediction
> >
> > for every file in fulltestdata (e.g.
> > fulltestdata/file1.data) you'll get a prediction file in
> > fulltestprediction (e.g. fulltestprediction/file1.data.out)
> >
> > Hope it helps you
> >
> >
> > --- En date de : Ven 26.3.10, Yang Sun <[email protected]>
> > a écrit :
> >
> > > De: Yang Sun <[email protected]>
> > > Objet: Question about mahout Describe
> > > À: [email protected]
> > > Date: Vendredi 26 mars 2010, 22h16
> > > I was testing mahout recently. It
> > > runs great on small testing datasets.
> > > However, when I try to expand the dataset to a big
> > dataset
> > > directory, I got
> > > the following error message:
> > >
> > > [localhost]$ hjar
> > > examples/target/mahout-examples-0.4-SNAPSHOT.job
> > > org.apache.mahout.df.mapreduce.TestForest -i
> > > /user/fulltestdata/* -ds rf/
> > > testdata.info -m rf-testmodel-5-100 -a -o
> > > rf/fulltestprediction
> > >
> > > Exception in thread "main" java.io.IOException: Cannot
> > open
> > > filename
> > > /user/fulltestdata/*
> > >         at
> > >
> >
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1474)
> > >         at
> > >
> >
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1465)
> > >         at
> > >
> > org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:372)
> > >         at
> > >
> >
> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:178)
> > >         at
> > >
> > org.apache.hadoop.fs.FileSystem.open(FileSystem.java:351)
> > >         at
> > >
> > org.apache.mahout.df.mapreduce.TestForest.testForest(TestForest.java:190)
> > >         at
> > >
> > org.apache.mahout.df.mapreduce.TestForest.run(TestForest.java:137)
> > >         at
> > >
> > org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> > >         at
> > >
> > org.apache.mahout.df.mapreduce.TestForest.main(TestForest.java:228)
> > >         at
> > > sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> > Method)
> > >         at
> > >
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> > >         at
> > >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > >         at
> > > java.lang.reflect.Method.invoke(Method.java:597)
> > >         at
> > > org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> > > My question is: can I use mahout on directories
> > instead of
> > > single files? and
> > > how?
> > >
> > > Thanks,
> > >
> >
> >
> >
> >
>
>
>
>

Reply via email to