Re: [R] read multiple large files into one dataframe

2009-05-13 Thread baptiste auguie

I'd first try plyr and see if it's efficient enough,


library(plyr)

listOfFiles <- list.files(pattern= ".txt")

d <- ldply(listOfFiles, read.table)
str(d)



alternatively,


d <- do.call(rbind, lapply(listOfFiles, read.table))



HTH,

baptiste


On 13 May 2009, at 12:45, SYKES, Jennifer wrote:


Hello



Apologies if this is a simple question, I have searched the help and
have not managed to work out a solution.

Does anybody know an efficient method for reading many text files of  
the

same format into one table/dataframe?



I have around 90 files that contain continuous data over 3 months but
that are split into individual days data and I need the whole 3 months
in one file for analysis.  Each days file contains a large amount of
data (approx 30MB each) and so I need a memory efficient method to  
merge
all of the files into the one dataframe object.  From what I have  
read I

will probably want to avoid using for loops etc?  All files are in the
same directory, none have a header row, and each contain around  
180,000
rows and the same 25 columns/variables.  Any suggested packages/ 
routines

would be very useful.



Thanks



Jennifer







-
***If
you are not the intended recipient, please notify our Help Desk at
Email postmas...@nats.co.uk immediately. You should not copy or use
this email or attachment(s) for any purpose nor disclose their
contents to any other person. NATS computer systems may be
monitored and communications carried on them recorded, to secure
the effective operation of the system and for other lawful
purposes. Please note that neither NATS nor the sender accepts any
responsibility for viruses or any losses caused as a result of
viruses and it is your responsibility to scan or otherwise check
this email and any attachments. NATS means NATS (En Route) plc
(company number: 4129273), NATS (Services) Ltd (company number
4129270), NATSNAV Ltd (company number: 4164590) or NATS Ltd
(company number 3155567) or NATS Holdings Ltd (company number
4138218). All companies are registered in England and their
registered office is at 5th Floor, Brettenham House South,
Lancaster Place, London, WC2E 7EN.
**

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


_

Baptiste Auguié

School of Physics
University of Exeter
Stocker Road,
Exeter, Devon,
EX4 4QL, UK

Phone: +44 1392 264187

http://newton.ex.ac.uk/research/emag

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read multiple large files into one dataframe

2009-05-13 Thread Mike Lawrence
What types of data are in each file? All numbers, or a mix of numbers
and characters? Any missing data or special NA values?

On Wed, May 13, 2009 at 7:45 AM, SYKES, Jennifer
 wrote:
> Hello
>
>
>
> Apologies if this is a simple question, I have searched the help and
> have not managed to work out a solution.
>
> Does anybody know an efficient method for reading many text files of the
> same format into one table/dataframe?
>
>
>
> I have around 90 files that contain continuous data over 3 months but
> that are split into individual days data and I need the whole 3 months
> in one file for analysis.  Each days file contains a large amount of
> data (approx 30MB each) and so I need a memory efficient method to merge
> all of the files into the one dataframe object.  From what I have read I
> will probably want to avoid using for loops etc?  All files are in the
> same directory, none have a header row, and each contain around 180,000
> rows and the same 25 columns/variables.  Any suggested packages/routines
> would be very useful.
>
>
>
> Thanks
>
>
>
> Jennifer
>
>
>
>
>
>
>
> -
> ***If
> you are not the intended recipient, please notify our Help Desk at
> Email postmas...@nats.co.uk immediately. You should not copy or use
> this email or attachment(s) for any purpose nor disclose their
> contents to any other person. NATS computer systems may be
> monitored and communications carried on them recorded, to secure
> the effective operation of the system and for other lawful
> purposes. Please note that neither NATS nor the sender accepts any
> responsibility for viruses or any losses caused as a result of
> viruses and it is your responsibility to scan or otherwise check
> this email and any attachments. NATS means NATS (En Route) plc
> (company number: 4129273), NATS (Services) Ltd (company number
> 4129270), NATSNAV Ltd (company number: 4164590) or NATS Ltd
> (company number 3155567) or NATS Holdings Ltd (company number
> 4138218). All companies are registered in England and their
> registered office is at 5th Floor, Brettenham House South,
> Lancaster Place, London, WC2E 7EN.
> **
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Mike Lawrence
Graduate Student
Department of Psychology
Dalhousie University

Looking to arrange a meeting? Check my public calendar:
http://tr.im/mikes_public_calendar

~ Certainty is folly... I think. ~

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read multiple large files into one dataframe

2009-05-13 Thread Simon Pickett

can you provide reproducible code please?

even a fake example would help.

I would

1) set up a loop to read in each file from a directory
2)  inside the loop chop up/ aggregate the data, each file in turn and spit 
each new aggreagated file out to a directory using write.table(). This will 
reduce the memory needed by only including the info you want. Make sure each 
file is a data frame with the same names.
3) set up a new loop to read in each new small file and rbind them all 
together to make your new "master file".


The R gurus may have a more parsimonious solution.

HTH

Simon.


- Original Message - 
From: "SYKES, Jennifer" 

To: 
Sent: Wednesday, May 13, 2009 11:45 AM
Subject: [R] read multiple large files into one dataframe



Hello



Apologies if this is a simple question, I have searched the help and
have not managed to work out a solution.

Does anybody know an efficient method for reading many text files of the
same format into one table/dataframe?



I have around 90 files that contain continuous data over 3 months but
that are split into individual days data and I need the whole 3 months
in one file for analysis.  Each days file contains a large amount of
data (approx 30MB each) and so I need a memory efficient method to merge
all of the files into the one dataframe object.  From what I have read I
will probably want to avoid using for loops etc?  All files are in the
same directory, none have a header row, and each contain around 180,000
rows and the same 25 columns/variables.  Any suggested packages/routines
would be very useful.



Thanks



Jennifer







-
***If
you are not the intended recipient, please notify our Help Desk at
Email postmas...@nats.co.uk immediately. You should not copy or use
this email or attachment(s) for any purpose nor disclose their
contents to any other person. NATS computer systems may be
monitored and communications carried on them recorded, to secure
the effective operation of the system and for other lawful
purposes. Please note that neither NATS nor the sender accepts any
responsibility for viruses or any losses caused as a result of
viruses and it is your responsibility to scan or otherwise check
this email and any attachments. NATS means NATS (En Route) plc
(company number: 4129273), NATS (Services) Ltd (company number
4129270), NATSNAV Ltd (company number: 4164590) or NATS Ltd
(company number 3155567) or NATS Holdings Ltd (company number
4138218). All companies are registered in England and their
registered office is at 5th Floor, Brettenham House South,
Lancaster Place, London, WC2E 7EN.
**

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read multiple large files into one dataframe

2009-05-13 Thread Liaw, Andy
A few points to consider:

- If all the data are numeric, then use matrices instead of data frames.

- With either data frames or matrices, there is no way (that I'm aware
of anyway) in R to stack them without making at least one copy in
memory.

- Since none of the files has a header row, I would concatenate them
into one file outside R (e.g., on *nix, cat * > all.txt) and then read
that in.  You can also try it inside R with something like
read.table(pipe()).  You will want to make use of the colClasses
argument in read.table() to specify the column types, though, to ensure
that read.table() only go through the input once.

- You're probably better off getting the data into a database (even
something like sqlite) and use an R interface to that database.

- 30MB x 90 = 2.7GB.  Unless you're on a 64-bit machine with lots of
RAM, you're not likely to have much fun with the data even when you
manage to get it into R in one piece.

Andy

From: SYKES, Jennifer
> 
> Hello
> 
>  
> 
> Apologies if this is a simple question, I have searched the help and
> have not managed to work out a solution.
> 
> Does anybody know an efficient method for reading many text 
> files of the
> same format into one table/dataframe?
> 
>  
> 
> I have around 90 files that contain continuous data over 3 months but
> that are split into individual days data and I need the whole 3 months
> in one file for analysis.  Each days file contains a large amount of
> data (approx 30MB each) and so I need a memory efficient 
> method to merge
> all of the files into the one dataframe object.  From what I 
> have read I
> will probably want to avoid using for loops etc?  All files are in the
> same directory, none have a header row, and each contain 
> around 180,000
> rows and the same 25 columns/variables.  Any suggested 
> packages/routines
> would be very useful.
> 
>  
> 
> Thanks
> 
>  
> 
> Jennifer
> 
>  
> 
>  
> 
> 
> 
> -
> ***If
> you are not the intended recipient, please notify our Help Desk at
> Email postmas...@nats.co.uk immediately. You should not copy or use
> this email or attachment(s) for any purpose nor disclose their
> contents to any other person. NATS computer systems may be
> monitored and communications carried on them recorded, to secure
> the effective operation of the system and for other lawful
> purposes. Please note that neither NATS nor the sender accepts any
> responsibility for viruses or any losses caused as a result of
> viruses and it is your responsibility to scan or otherwise check
> this email and any attachments. NATS means NATS (En Route) plc
> (company number: 4129273), NATS (Services) Ltd (company number
> 4129270), NATSNAV Ltd (company number: 4164590) or NATS Ltd
> (company number 3155567) or NATS Holdings Ltd (company number
> 4138218). All companies are registered in England and their
> registered office is at 5th Floor, Brettenham House South,
> Lancaster Place, London, WC2E 7EN.
> **
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
Notice:  This e-mail message, together with any attachme...{{dropped:12}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.