Re: [Rd] loading multiple CSV files into a single data frame

victor jimenez Thu, 03 May 2012 14:41:06 -0700

First of all, thank you for the answers. I did not know about zoo. However,
it seems that none approach can do what I exactly want (please, correct me
if I am wrong).


Probably, it was not clear in my original question. The CSV files only
contain the performance values. The other two columns (ASSOC and SIZE) are
obtained from the existing values in the directory tree. So, in my opinion,
none of the proposed solutions would work, unless every single "data.csv"
file contained all the three columns (ASSOC, SIZE and PERF).

In my case, my experimentation framework basically outputs a CSV with some
values read from the processor's performance counters (PMCs). For each
cache size and associativity I conduct an experiment, creating a CSV file,
and placing that file into its own directory. I could modify the
experimentation framework, so that it also outputs the cache size and
associativity, but that may not be ideal in some circumstances and I also
have a significant amount of old results and I want keep using them without
manually fixing the CSV files.

Has anyone else faced such a situation? Any good solutions?

Thank you,
Victor

On Thu, May 3, 2012 at 8:54 PM, Gabor Grothendieck
<ggrothendi...@gmail.com>wrote:

> On Thu, May 3, 2012 at 2:07 PM, victor jimenez <betaband...@gmail.com>
> wrote:
> > Sometimes I have hundreds of CSV files scattered in a directory tree,
> > resulting from experiments' executions. For instance, giving an example
> > from my field, I may want to collect the performance of a processor for
> > several design parameters such as "cache size" (possible values: 2, 4, 8
> > and 16) and "cache associativity" (possible values: direct-mapped, 4-way,
> > fully-associative). The results of all these experiments will be stored
> in
> > a directory tree like:
> >
> > results
> >  |-- direct-mapped
> >  |       |-- 2 -- data.csv
> >  |       |-- 4 -- data.csv
> >  |       |-- 8 -- data.csv
> >  |       |-- 16 -- data.csv
> >  |-- 4-way
> >  |       |-- 2 -- data.csv
> >  |       |-- 4 -- data.csv
> > ...
> >  |-- fully-associative
> >  |       |-- 2 -- data.csv
> >  |       |-- 4 -- data.csv
> > ...
> >
> > I am developing a package that would allow me to gather all those CSV
> into
> > a single data frame. Currently, I just need to execute the following
> > statement:
> >
> > dframe <- gather("results/@ASSOC@/@SIZE@/data.csv")
> >
> > and this command returns a data frame containing the columns ASSOC, SIZE
> > and all the remaining columns inside the CSV files (in my case the
> > processor performance), effectively loading all the CSV files into a
> single
> > data frame. So, I would get something like:
> >
> > ASSOC,          SIZE, PERF
> > direct-mapped,       2,     1.4
> > direct-mapped,       4,     1.6
> > direct-mapped,       8,     1.7
> > direct-mapped,     16,     1.7
> > 4-way,                   2,     1.4
> > 4-way,                   4,     1.5
> > ...
> >
> > I would like to ask whether there is any similar functionality already
> > implemented in R. If so, there is no need to reinvent the wheel :)
> > If it is not implemented and the R community believes that this feature
> > would be useful, I would be glad to contribute my code.
> >
>
> If your csv files all have the same columns and represent time series
> then read.zoo in the zoo package can read multiple csv files in at
> once using a single read.zoo command producing a single zoo object.
>
> library(zoo)
> ?read.zoo
> vignette("zoo-read")
>
> Also see the other zoo vignettes and help files.
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
>

        [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] loading multiple CSV files into a single data frame

Reply via email to