Re: Re : Question about mahout Describe

deneche abdelhakim Mon, 29 Mar 2010 13:48:13 -0700

Yes indeed, the current implementation of TestForest is sequential. I guess I 
know what I have to this weekend =D


--- En date de : Lun 29.3.10, Yang Sun <[email protected]> a écrit :

> De: Yang Sun <[email protected]>
> Objet: Re: Re : Question about mahout Describe
> À: [email protected]
> Date: Lundi 29 mars 2010, 19h05
> Hi Deneche,
> 
> Thanks for the update. I really appreciate your fast
> response. I just tested
> the new TestForest class. Yes, it can predict multiple
> files. However, I
> don't think it takes the advantage of Hadoop. It looks like
> processing one
> file at a time. My prediction dataset has about 400 million
> records split in
> ~300 files. One file needs about 2-3 mins for TestForest to
> predict. The
> whole dataset will take 10 hours which is too slow. I'll be
> waiting for your
> next update :)
> 
> Thanks,
> Yang
> 
> On Sat, Mar 27, 2010 at 12:02 AM, deneche abdelhakim 
> <[email protected]>wrote:
> 
> > One important clarification, for now only TestForest
> can handle directory
> > input paths, BuildForest won't work with input
> directories
> >
> > --- En date de : Sam 27.3.10, deneche abdelhakim
> <[email protected]>
> a
> > écrit :
> >
> > > De: deneche abdelhakim <[email protected]>
> > > Objet: Re : Question about mahout Describe
> > > À: [email protected]
> > > Date: Samedi 27 mars 2010, 7h43
> >  > Wasn't possible, but it is now :)
> > > Just committed a patch that allow the input path
> to be a
> > > directory, checkout the last version of mahout
> and run
> > > TestForest like this:
> > >
> > > [localhost]$ hjar
> > > examples/target/mahout-examples-0.4-SNAPSHOT.job
> > > org.apache.mahout.df.mapreduce.TestForest -i
> > > /user/fulltestdata -ds rf/testdata.info -m
> > > rf-testmodel-5-100 -a -o rf/fulltestprediction
> > >
> > > for every file in fulltestdata (e.g.
> > > fulltestdata/file1.data) you'll get a prediction
> file in
> > > fulltestprediction (e.g.
> fulltestprediction/file1.data.out)
> > >
> > > Hope it helps you
> > >
> > >
> > > --- En date de : Ven 26.3.10, Yang Sun <[email protected]>
> > > a écrit :
> > >
> > > > De: Yang Sun <[email protected]>
> > > > Objet: Question about mahout Describe
> > > > À: [email protected]
> > > > Date: Vendredi 26 mars 2010, 22h16
> > > > I was testing mahout recently. It
> > > > runs great on small testing datasets.
> > > > However, when I try to expand the dataset to
> a big
> > > dataset
> > > > directory, I got
> > > > the following error message:
> > > >
> > > > [localhost]$ hjar
> > > >
> examples/target/mahout-examples-0.4-SNAPSHOT.job
> > > > org.apache.mahout.df.mapreduce.TestForest
> -i
> > > > /user/fulltestdata/* -ds rf/
> > > > testdata.info -m rf-testmodel-5-100 -a -o
> > > > rf/fulltestprediction
> > > >
> > > > Exception in thread "main"
> java.io.IOException: Cannot
> > > open
> > > > filename
> > > > /user/fulltestdata/*
> > > >         at
> > > >
> > >
> >
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1474)
> > > >         at
> > > >
> > >
> >
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1465)
> > > >         at
> > > >
> > >
> org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:372)
> > > >         at
> > > >
> > >
> >
> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:178)
> > > >         at
> > > >
> > >
> org.apache.hadoop.fs.FileSystem.open(FileSystem.java:351)
> > > >         at
> > > >
> > >
> org.apache.mahout.df.mapreduce.TestForest.testForest(TestForest.java:190)
> > > >         at
> > > >
> > >
> org.apache.mahout.df.mapreduce.TestForest.run(TestForest.java:137)
> > > >         at
> > > >
> > >
> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> > > >         at
> > > >
> > >
> org.apache.mahout.df.mapreduce.TestForest.main(TestForest.java:228)
> > > >         at
> > > >
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> > > Method)
> > > >         at
> > > >
> > >
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> > > >         at
> > > >
> > >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > > >         at
> > > >
> java.lang.reflect.Method.invoke(Method.java:597)
> > > >         at
> > > >
> org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> > > > My question is: can I use mahout on
> directories
> > > instead of
> > > > single files? and
> > > > how?
> > > >
> > > > Thanks,
> > > >
> > >
> > >
> > >
> > >
> >
> >
> >
> >
>

Re: Re : Question about mahout Describe

Reply via email to