But if I just use CSV file, how can I generate the descriptor file, does descriptor file must be supplied for BuildForest and TestForest?
On Sat, Jul 16, 2011 at 5:39 AM, deneche abdelhakim <[email protected]> wrote: > you don't need to convert the CSV file to ARFF, you can use it right away. > > you can use a small dataset as long as all values of categorical attributes > are available in the dataset > > On Fri, Jul 15, 2011 at 2:28 PM, Xiaobo Gu <[email protected]> wrote: > >> Can we make the file descriptor as following: >> >> 1. make a small csv file with the same format as the actual dataset, >> say a CSV file with header and only one record, >> 2. Use java weka.core.converters.CSVLoader filename.csv > >> filename.arff to convert the small CSV into a ARFF file, see >> http://maya.cs.depaul.edu/classes/ect584/weka/preprocess.html >> 3. Use org.apache.mahout.df.tools.Describe to generate a descriptor >> >> >> The only consern here is: does the small CSV file with one record >> sufficient enough to generate the ARFF file header, or do we have to >> use the whole file to avoid losing information? >> >> >> Xiaobo Gu >> >> >> >> >> On Fri, Jul 15, 2011 at 9:10 PM, Xiaobo Gu <[email protected]> wrote: >> > But if we use CSV files, how can we generate descriptors for datasets? >> > >> > Cheers >> > >> > Xiaobo Gu >> > >> > On Thu, Jul 14, 2011 at 1:27 AM, deneche abdelhakim <[email protected]> >> wrote: >> >> I guess yes. as long as you don't use quotes or double quotes to embed >> the >> >> fields. >> >> >> >> On Wed, Jul 13, 2011 at 2:58 PM, Xiaobo Gu <[email protected]> >> wrote: >> >> >> >>> So for simple datasets, which only have numeric and character >> >>> lable(without blank) category columns, can we just use CSV tools to >> >>> save it as a standard CSV file without header? >> >>> >> >>> >> >>> On Wed, Jul 13, 2011 at 3:53 AM, deneche abdelhakim < >> [email protected]> >> >>> wrote: >> >>> > the current implementation doesn't support the ARFF format >> >>> out-of-the-box, >> >>> > as described in the Wiki you need to remove the header of the file >> and >> >>> leave >> >>> > only the data. Actually, this implementation is fully compatible with >> >>> UCI's >> >>> > datasets which are comma separated text files. You'll also need to >> call >> >>> the >> >>> > dataset description tool (see the wiki) in order to generate a proper >> >>> > description file (contains the nature of each attribute: Numerical or >> >>> > Categorical). >> >>> > >> >>> > Yes you can use BuildForest and TestForest to generate and use Random >> >>> forest >> >>> > models from the command line >> >>> > >> >>> > On Tue, Jul 12, 2011 at 2:19 PM, Xiaobo Gu <[email protected]> >> >>> wrote: >> >>> > >> >>> >> Hi, >> >>> >> >> >>> >> The Random Forest partial implementation in >> >>> >> >> >>> >> https://cwiki.apache.org/confluence/display/MAHOUT/Partial+Implementation >> >>> >> use the ARFF file format, is ARFF the only supportted file format >> when >> >>> >> using the BuildForest and TestForest program, and are BuildForest >> and >> >>> >> TestForest program are official tools to build Random Forest models >> >>> >> from the command line? >> >>> >> >> >>> >> Regards, >> >>> >> >> >>> >> Xiaobo Gu >> >>> >> >> >>> > >> >>> >> >> >> > >> >
