Re: [R] Reading a bunch of csv files into R

Bryan Hanson Mon, 28 May 2012 09:56:39 -0700

Here's what I would do then, to keep it simple.

1.  Put all the relevant csv files into a single directory.
2. setwd() to that directory.
3. Use the approach I suggested before:


 files <- list.files(pattern = "\\.(csv|CSV)$")

   for (i in 1:length(files)) {
           temp <- read.csv(files[i], header = FALSE)
... do whatever you want with the contents of temp...
   }

Under ...do whatever you want... the contents of each individual file is 
temporarily in the data frame 'temp'.  Use the decoded file names (in files[]) 
to figure out what you need to do with that particular file contents.  Then do 
it.  Since it sounds like you need to hold each 'temp' for possible combination 
with other 'temp's, you could initialize an empty list of the right size 
(faster), then store each 'temp' in it (which from your note was where you are 
headed).  That would mean changing the above to something like this (in 
approx/pseudo code):

 files <- list.files(pattern = "\\.(csv|CSV)$")
myList <- vector("list", length(files))
names(myList) <- paste("Data", files, sep = ".")

   for (i in 1:length(files)) {
           myList[i] <- read.csv(files[i], header = FALSE)
        # This might need to be myList[[i]] -- experiment to get it right
        # I'd stick with numerical indices for lists.
        # indexing of lists is a pain but once you get it they rock.
        # see ?"[" and study it carefully
        # One thing is says which is helpful is
        # The most important distinction between [, [[ and $ is that the [ can 
select more than one element whereas the other two select a single element.

   }

This gets them all read-in, with each csv as a data frame in myList (so myList 
is a list of data frames).  Now you can loop over myList and work on the data 
itself (and edit the file names as you go).  Sounds like you would have to grep 
for phrases in the list element names (names(myList) to figure out which ones 
you want.  You could grep and subset myList and basically turn it into related 
chunks of the original.

HTH.  Bryan

On May 28, 2012, at 12:27 PM, HJ YAN wrote:

> Dear Bryan
>  
> Thank you so much for your prompt reply!
>  
> Please see my responds below under ===== in your reply...
>  
> Many thanks again!
>  
> HJ
> 
> On Mon, May 28, 2012 at 4:45 PM, Bryan Hanson <han...@depauw.edu> wrote:
> OK, a couple of things (I only looked through quickly):
> 
> 1.  R doesn't allow variable names to begin with a number.  Be sure you don't 
> try that.
> ============
>     Yes, I understand this. Some of my csv files' name begining with number, 
> so I put 'Data' infront them using  'NAME<- paste("Data",data_names, 
> sep=".")' as shown in my last email.
>  
> 2.  What's the overall goal here?  Read them in, change the name, then write 
> them out?  Let us know and it will be easier to help you.
> =============
>     The overall goal here is for my current study I receive hundreds of csv 
> files every two weeks, and I need to read them into R for futher analysis, 
> e.g. the data are recorded in 10 minutes apart interval and are collected 
> every two weeks from a few hundreds monitors. 
>  
>      So I want to know how to do these jobs more efficiently:
>  
> (i) Read them into R; Put the data from same monitors together and checking 
> missing values, manipulate the data in the way we need, e.g. accordig to 
> region, monitoring type, which involves aggregating the whole group (or a sub 
> group) of the data etc;
>  
> (ii) Edit the names, because sometimes we want to match names in one format 
> to another, e.g. 512180_20120523150757==London_2012_May_23rd_15:07:57  (e.g. 
> Location name_Year_Month_Day_Hour_Minute_Second)
>  
> (iii) If (i) and (ii) can be done I would think 'write them out' into csv 
> would not be too difficult. Mainly we do analysis in R and no need output in 
> csv format so far...
>  
>  
>  
> 3.  Regardless of your goal, I think you are "over thinking" the solution.  
> Let us know what you want to accomplish and we can shorten it up I'm sure.
> =====================
>     I am trying to input the data as a list which might be easier, but I am 
> not sure if other data type has advantage over that...
>  
>  
> Data1<-list( NAME)
>  
> [1] NAME
>  "Data.512180_20120523150757" "Data.513687_20120523181947" 
> "Data.513690_20120524112111" "Data.521858_20120524091428" 
> "Data.523215_20120523123419"
>  
> for(i in 1:length(filenames)) {Data1[[i]]<-read.csv(filenames[i])}
>  
> But when I tried to access the components in this list 'Data1', only the 
> first method of the three (shown below) works, and I think the other two are 
> more useful for me. Any ideas?? 
>  
> (1) Data1[[1]]  
>      *** this one works
> (2) Data1[["Data.512180_20120523150757"]]
>      *** this one doesn't work
> (3)  Data1$Data.512180_20120523150757
>       *** this one doesn't work
>  
> Hope I have made myself clear here.
>  
> Thanks!
> HJ
>  
> 
> Bryan
> 
> On May 28, 2012, at 11:20 AM, HJ YAN wrote:
> 
>>  Dear Rui, Kevin, Bryan and Nutter
>>  
>>  
>> Thank you so much for your very helpful hints!
>>  
>> Now I have extracted all the file names and managed to edit them using the 
>> code (1)-(4) below and obtained the name format as I wanted
>>  
>> (1) files<-list.files(path = "myworking directory", pattern = NULL, 
>> all.files = FALSE,
>>            full.names = FALSE, recursive = FALSE,ignore.case = FALSE, 
>> include.dirs = FALSE)
>> 
>> (2) filenames <- files[grep("[.]csv", files)]
>>  
>> [1] "512180_20120523150757.csv"
>> "513687_20120523181947.csv"
>> "513690_20120524112111.csv"
>>  "521858_20120524091428.csv"
>>  "523215_20120523123419.csv"
>> ...(a few hundred more...)
>>  
>>  
>> (3) data_names <- gsub("[.]csv", "", filenames)
>> 
>> (4) NAME<- paste("Data",data_names, sep=".")
>>  
>>  
>> Up to here I got NAME containing all the names I'm going to use..
>>  
>> > NAME
>> [1] "Data.512180_20120523150757"
>> "Data.513687_20120523181947"
>> "Data.513690_20120524112111"
>>  "Data.521858_20120524091428"
>>  "Data.523215_20120523123419"
>> ....
>>  
>>  
>>  But I still haven't successfuly  read the whole bunch of csv files into R 
>> and name them as expected...e.g. I want to read "512180_20120523150757.csv" 
>> into R and name it "Data.512180_20120523150757" and so on...
>> For a single file we can just write
>>  
>> Data.512180_20120523150757<-read.csv("512180_20120523150757.csv")
>>  
>> If any of the following commands (as you suggested) works, then my question 
>> is sorted out. But I got error messages for every attempt... 
>> (i)
>> > df.list <- lapply(seq_len(filenames), read.csv)
>> 
>> Error in seq_len(filenames) : 
>>   argument must be coercible to non-negative integer
>> In addition: Warning message:
>> In is.vector(X) : NAs introduced by coercion
>> 
>> > filenames
>> [1] "512180_20120523150757.csv" "513687_20120523181947.csv" 
>> "513690_20120524112111.csv" "521858_20120524091428.csv"
>> [5] "523215_20120523123419.csv"...
>>  
>>  
>> (ii) None of the following code works...
>>  
>> myDir="myworking directory"
>>  
>> #for(i in 1:length(filenames)){assign(NAME[i], read.csv(file.path(myDir, 
>> filenames[i])))}
>> #for(i in 1:5){assign(NAME[i], read.csv(file.path=myDir, filenames[i]))}
>>  
>> setwd("myworking directory")
>> #for(i in 1:5){assign(NAME[i], read.csv( filenames[i]))}
>>  
>>  
>>  
>> Warning messages:
>> 1: In N[i] <- read.csv(filenames[i]) :
>>   number of items to replace is not a multiple of replacement length
>> 2: In N[i] <- read.csv(filenames[i]) :
>>   number of items to replace is not a multiple of replacement length
>> 3: In N[i] <- read.csv(filenames[i]) :
>>   number of items to replace is not a multiple of replacement length
>> 4: In N[i] <- read.csv(filenames[i]) :
>>   number of items to replace is not a multiple of replacement length
>> 5: In N[i] <- read.csv(filenames[i]) :
>>   number of items to replace is not a multiple of replacement length
>>  
>>  
>> Seems I am getting there, but could you spot where my code went wrong 
>> please??
>>  
>> Many thanks again!
>>  
>> HJ
>>  
>>  
>> 
>> 
>>  
>> On Fri, May 25, 2012 at 8:36 PM, Rui barradas <rui1...@sapo.pt> wrote:
>> Hello,
>> 
>> Or maybe put the data frames in a list
>> 
>> df.list <- lapply(seq_len(filenames), read.csv, ...) # '...other...' are 
>> options you might want to pass, (like headers=TRUE)
>> names(df.list) <- data_names
>> 
>> Now access the data frames by number in the list or by name in data_names.
>> 
>> Hope this helps,
>> 
>> Rui Barradas
>> Em 25-05-2012 20:08, Nutter, Benjamin escreveu:
>> For example:
>> 
>> myDir<- "some file path"
>> filenames<- list.files(myDir)
>> filenames<- filenames[grep("[.]csv", filenames)]
>> 
>> data_names<- gsub("[.]csv", "", filenames)
>> 
>> for(i in 1:length(filenames)) assign(data_names[i], 
>> read.csv(file.path(myDir, filenames[i])))
>> 
>>  
>>  Benjamin Nutter |  Biostatistician     |  Quantitative Health Sciences
>>   Cleveland Clinic    |  9500 Euclid Ave.  |  Cleveland, OH 44195  | (216) 
>> 445-1365
>> 
>> 
>> -----Original Message-----
>> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
>> Behalf Of Kevin Wright
>> Sent: Friday, May 25, 2012 2:55 PM
>> To: HJ YAN
>> Cc: r-help@r-project.org
>> Subject: Re: [R] Reading a bunch of csv files into R
>> 
>> See ?dir
>> 
>> Assign the value to a vector and loop over the elements of the vector.
>> 
>> Kevin
>> 
>> 
>> On Fri, May 25, 2012 at 12:16 PM, HJ YAN<yhj...@googlemail.com>  wrote:
>> Dear R users
>> 
>> 
>> I am struggling from a data importing issue:
>> 
>> I have some hundreds of csv files needed to be read into R for futher
>> analysis. All those csv files are named in one of the three formats:
>> 
>> (1) strings: e.g. London_Oxford street
>> (2) Integer: e.g. 1234_5678
>> (3) combined: e.g. London_1234
>> 
>> I intend to use read.csv("xxxx_xxx.csv") but I only dealt with sigle
>> documents before and if there are only no more than 20 files, I do not
>> bother to search a more efficient way.
>> 
>> 
>> Is there any claver way that I do not have to type in all these
>> hundreds names by hand, maybe using a R package or write some code in
>> some other languages if it is not too difficult to learn.
>> 
>> Any thoughts/hints please??
>> 
>> Many thanks in advance!
>> 
>> HJ
>> 
>>        [[alternative HTML version deleted]]
>> 
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
>> 
>> --
>> Kevin Wright
>> 
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
>> 
>> ===================================
>> 
>>  Please consider the environment before printing this e-mail
>> 
>> Cleveland Clinic is ranked one of the top hospitals
>> in America by U.S.News&  World Report (2010).
>> 
>> Visit us online at http://www.clevelandclinic.org for
>> a complete listing of our services, staff and
>> locations.
>> 
>> 
>> Confidentiality Note:  This message is intended for use\...{{dropped:13}}
>> 
>> 
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
> 
> 


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reading a bunch of csv files into R

Reply via email to