Re: [R] Writing a single output file

Hadley Wickham Thu, 30 Dec 2010 05:09:00 -0800

It looks like you have csv files, so use read.csv instead of read.table.
Hadley


On Thu, Dec 30, 2010 at 12:18 AM, Amy Milano <milano_...@yahoo.com> wrote:
> Dear sir,
>
> At the outset I sincerely apologize for reverting back bit late as I was out 
> of office. I thank you for your guidance extended by you in response to my 
> earlier mail regarding "Writing a single output file" where I was trying to 
> read multiple output files and create a single output date.frame. However, I 
> think things are not working as I am mentioning below -
>
>
> # Your code
>
> setwd('/temp')
> fileNames <- list.files(pattern = "file.*.csv")
>
> input <- do.call(rbind, lapply(fileNames, function(.name)
> {
> .data <- read.table(.name, header = TRUE, as.is = TRUE)
> .data$file <- .name
> .data
> }))
>
>
> # This produces following output containing only two columns and moreover 
> date and yield_rates are clubbed together.
>
>
>
>  date.yield_rate      file
> 1   12/23/10,5.25 file1.csv
> 2   12/22/10,5.19 file1.csv
> 3   12/23/10,4.16 file2.csv
> 4   12/22/10,4.59 file2.csv
> 5   12/23/10,6.15 file3.csv
> 6   12/22/10,6.41 file3.csv
> 7   12/23/10,8.15 file4.csv
> 8   12/22/10,8.68 file4.csv
>
>
> # and NOT the kind of output given below where date and yield_rates are 
> different.
>
>> input
>         date      yield_rate      file
> 1 12/23/2010       5.25 file1.csv
> 2 12/22/2010       5.19 file1.csv
> 3 12/23/2010       5.25 file2.csv
> 4 12/22/2010       5.19 file2.csv
> 5 12/23/2010       5.25 file3.csv
> 6
>  12/22/2010       5.19 file3.csv
> 7 12/23/2010       5.25 file4.csv
> 8 12/22/2010       5.19 file4.csv
>
> So when I tried following code to produce the required result, it throws me 
> an error.
>
> require(reshape)
>
> in.melt <- melt(input, measure = 'yield_rate')
>> in.melt <- melt(input, measure = 'yield_rate')
> Error: measure variables not found in data: yield_rate
>
> # So I tried
>
> in.melt <- melt(input, measure = 'date.yield_rate')
>
>
> cast(in.melt, date.yield_rate ~ file)
>
>> cast(in.melt, date ~ file)
> Error: Casting formula contains variables not found in molten data: date
>
> # If I try to change it as
>
> cast(in.melt, date.yield_rate ~ file)    # Gives following error.
> Error: Casting formula contains variables not found in molten data: 
> date.yield_rate
>
> Sir, it will be a
>  great help if you can guide me and once again sinserely apologize for 
> reverting so late.
>
> Regards
>
> Amy
>
>
> --- On Thu, 12/23/10, jim holtman <jholt...@gmail.com> wrote:
>
> From: jim holtman <jholt...@gmail.com>
> Subject: Re: [R] Writing a single output file
> To: "Amy Milano" <milano_...@yahoo.com>
> Cc: r-help@r-project.org
> Date: Thursday, December 23, 2010, 1:39 PM
>
> This should get you close:
>
>> # get file names
>> setwd('/temp')
>> fileNames <- list.files(pattern = "file.*.csv")
>> fileNames
> [1] "file1.csv" "file2.csv" "file3.csv" "file4.csv"
>> input <- do.call(rbind, lapply(fileNames, function(.name){
> +     .data <- read.table(.name, header = TRUE, as.is = TRUE)
> +     # add
>  file name to the data
> +     .data$file <- .name
> +     .data
> + }))
>> input
>         date yield_rate      file
> 1 12/23/2010       5.25 file1.csv
> 2 12/22/2010       5.19 file1.csv
> 3 12/23/2010       5.25 file2.csv
> 4 12/22/2010       5.19 file2.csv
> 5 12/23/2010       5.25 file3.csv
> 6 12/22/2010       5.19 file3.csv
> 7 12/23/2010       5.25 file4.csv
> 8 12/22/2010       5.19 file4.csv
>> require(reshape)
>> in.melt <- melt(input, measure = 'yield_rate')
>> cast(in.melt, date ~ file)
>         date file1.csv file2.csv file3.csv file4.csv
> 1 12/22/2010      5.19      5.19
>      5.19      5.19
> 2 12/23/2010      5.25      5.25      5.25      5.25
>>
>
>
> On Thu, Dec 23, 2010 at 8:07 AM, Amy Milano <milano_...@yahoo.com> wrote:
>> Dear R helpers!
>>
>> Let me first wish all of you "Merry Christmas and Very Happy New year 2011"
>>
>> "Christmas day is a day of Joy and Charity,
>> May God make you rich in both" - Phillips Brooks
>>
>> ## 
>> ----------------------------------------------------------------------------------------------------------------------------
>>
>> I have a process which generates number of outputs. The R code for the same 
>> is as given below.
>>
>> for(i in 1:n)
>> {
>> write.csv(output[i], file = paste("output", i, ".csv", sep = ""), row.names =
>  FALSE)
>> }
>>
>> Depending on value of 'n', I get different output files.
>>
>> Suppose n = 3, that means I am having three output csv files viz. 
>> 'output1.csv', 'output2.csv' and 'output3.csv'
>>
>> output1.csv
>> date               yield_rate
>> 12/23/2010        5.25
>> 12/22/2010        5.19
>> .................................
>> .................................
>>
>>
>> output2.csv
>>
>> date               yield_rate
>>
>> 12/23/2010        4.16
>>
>> 12/22/2010        4.59
>>
>> .................................
>>
>>
>  .................................
>>
>> output3.csv
>>
>>
>> date               yield_rate
>>
>>
>> 12/23/2010        6.15
>>
>>
>> 12/22/2010        6.41
>>
>>
>> .................................
>>
>>
>> .................................
>>
>>
>>
>> Thus all the output files have same column names viz. Date and yield_rate. 
>> Also, I do need these files individually too.
>>
>> My further requirement is to have a single dataframe as given below.
>>
>> Date             yield_rate1
>  yield_rate2                yield_rate3
>> 12/23/2010       5.25                          4.16                          
>> 6.15
>> 12/22/2010       5.19                          4.59                          
>> 6.41
>> ...............................................................................................
>> ...............................................................................................
>>
>> where
>  yield_rate1 = output1$yield_rate and so on.
>>
>> One way is to simply create a dataframe as
>>
>> df = data.frame(Date = read.csv('output1.csv')$Date, yield_rate1 =  
>> read.csv('output1.csv')$yield_rate,   yield_rate2 = 
>> read.csv('output2.csv')$yield_rate,
>> yield_rate3 = read.csv('output3.csv')$yield_rate)
>>
>> However, the problem arises when I am not aware how many output files are 
>> there as n can be 5 or even 100.
>>
>> So is it possible to write some loop or some function which will enable me 
>> to read 'n' files individually and then keeping "Date" common, only pickup 
>> the yield_curve data from each output file.
>>
>> Thanking in advance for any guidance.
>>
>> Regards
>>
>> Amy
>>
>>
>>
>>
>>
>>        [[alternative HTML version deleted]]
>>
>>
>>
>  ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
>
>
> --
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
>
>
>
>
>        [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Writing a single output file

Reply via email to