using the Describe tool, the partial implementation Wiki page explains how to use it. And yes the descriptor file must be supplied
On Sat, Jul 16, 2011 at 5:57 AM, Xiaobo Gu <[email protected]> wrote: > But if I just use CSV file, how can I generate the descriptor file, > does descriptor file must be supplied for BuildForest and TestForest? > > > On Sat, Jul 16, 2011 at 5:39 AM, deneche abdelhakim <[email protected]> > wrote: > > you don't need to convert the CSV file to ARFF, you can use it right > away. > > > > you can use a small dataset as long as all values of categorical > attributes > > are available in the dataset > > > > On Fri, Jul 15, 2011 at 2:28 PM, Xiaobo Gu <[email protected]> > wrote: > > > >> Can we make the file descriptor as following: > >> > >> 1. make a small csv file with the same format as the actual dataset, > >> say a CSV file with header and only one record, > >> 2. Use java weka.core.converters.CSVLoader filename.csv > > >> filename.arff to convert the small CSV into a ARFF file, see > >> http://maya.cs.depaul.edu/classes/ect584/weka/preprocess.html > >> 3. Use org.apache.mahout.df.tools.Describe to generate a descriptor > >> > >> > >> The only consern here is: does the small CSV file with one record > >> sufficient enough to generate the ARFF file header, or do we have to > >> use the whole file to avoid losing information? > >> > >> > >> Xiaobo Gu > >> > >> > >> > >> > >> On Fri, Jul 15, 2011 at 9:10 PM, Xiaobo Gu <[email protected]> > wrote: > >> > But if we use CSV files, how can we generate descriptors for datasets? > >> > > >> > Cheers > >> > > >> > Xiaobo Gu > >> > > >> > On Thu, Jul 14, 2011 at 1:27 AM, deneche abdelhakim < > [email protected]> > >> wrote: > >> >> I guess yes. as long as you don't use quotes or double quotes to > embed > >> the > >> >> fields. > >> >> > >> >> On Wed, Jul 13, 2011 at 2:58 PM, Xiaobo Gu <[email protected]> > >> wrote: > >> >> > >> >>> So for simple datasets, which only have numeric and character > >> >>> lable(without blank) category columns, can we just use CSV tools to > >> >>> save it as a standard CSV file without header? > >> >>> > >> >>> > >> >>> On Wed, Jul 13, 2011 at 3:53 AM, deneche abdelhakim < > >> [email protected]> > >> >>> wrote: > >> >>> > the current implementation doesn't support the ARFF format > >> >>> out-of-the-box, > >> >>> > as described in the Wiki you need to remove the header of the file > >> and > >> >>> leave > >> >>> > only the data. Actually, this implementation is fully compatible > with > >> >>> UCI's > >> >>> > datasets which are comma separated text files. You'll also need to > >> call > >> >>> the > >> >>> > dataset description tool (see the wiki) in order to generate a > proper > >> >>> > description file (contains the nature of each attribute: Numerical > or > >> >>> > Categorical). > >> >>> > > >> >>> > Yes you can use BuildForest and TestForest to generate and use > Random > >> >>> forest > >> >>> > models from the command line > >> >>> > > >> >>> > On Tue, Jul 12, 2011 at 2:19 PM, Xiaobo Gu < > [email protected]> > >> >>> wrote: > >> >>> > > >> >>> >> Hi, > >> >>> >> > >> >>> >> The Random Forest partial implementation in > >> >>> >> > >> >>> > >> > https://cwiki.apache.org/confluence/display/MAHOUT/Partial+Implementation > >> >>> >> use the ARFF file format, is ARFF the only supportted file format > >> when > >> >>> >> using the BuildForest and TestForest program, and are BuildForest > >> and > >> >>> >> TestForest program are official tools to build Random Forest > models > >> >>> >> from the command line? > >> >>> >> > >> >>> >> Regards, > >> >>> >> > >> >>> >> Xiaobo Gu > >> >>> >> > >> >>> > > >> >>> > >> >> > >> > > >> > > >
