Re: [R] efficient equivalent to read.csv / write.csv

2010-09-28 Thread statquant2

Hi, after testing 
R) system.time(read.csv(myfile.csv))
   user  system elapsed
  1.126   0.038   1.177

R) system.time(read.csv.sql(myfile.csv))
   user  system elapsed
  1.405   0.025   1.439
Warning messages:
1: closing unused connection 4 ()
2: closing unused connection 3 ()

It seems that the function is less efficient that the base one ... so ...
-- 
View this message in context: 
http://r.789695.n4.nabble.com/efficient-equivalent-to-read-csv-write-csv-tp2714325p2717585.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] efficient equivalent to read.csv / write.csv

2010-09-28 Thread Henrik Bengtsson
To speed things up, you certainly want to give R more clues about your
data files by being more explicit by many of the arguments (cf.
help(read.table), especially specifying argument 'colClasses' makes a
big difference.

/Henrik

On Tue, Sep 28, 2010 at 10:24 AM, statquant2 statqu...@gmail.com wrote:

 Hi, after testing
 R) system.time(read.csv(myfile.csv))
   user  system elapsed
  1.126   0.038   1.177

 R) system.time(read.csv.sql(myfile.csv))
   user  system elapsed
  1.405   0.025   1.439
 Warning messages:
 1: closing unused connection 4 ()
 2: closing unused connection 3 ()

 It seems that the function is less efficient that the base one ... so ...
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/efficient-equivalent-to-read-csv-write-csv-tp2714325p2717585.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] efficient equivalent to read.csv / write.csv

2010-09-28 Thread David Scott

On 29/09/2010 6:24 a.m., statquant2 wrote:


Hi, after testing
R) system.time(read.csv(myfile.csv))
user  system elapsed
   1.126   0.038   1.177

R) system.time(read.csv.sql(myfile.csv))
user  system elapsed
   1.405   0.025   1.439
Warning messages:
1: closing unused connection 4 ()
2: closing unused connection 3 ()

It seems that the function is less efficient that the base one ... so ...


I presume you have had a good look at the R Data Import/Export manual?

It does there warn of inefficiency with read.table (hence also read.csv) 
and suggest more direct use of scan which in your case might be via 
connections and readLines and writeLines.


If that doesn't work, why not go to a database. Use RODBC or some such 
to read and write tables in the database. There are many options for 
databases to use (MySQL works for me). You can easily read data in and 
out of the database in .csv format. If the .csv files are similar there 
shouldn't be too much overhead in defining table formats for the database.



David Scott

--
_
David Scott Department of Statistics
The University of Auckland, PB 92019
Auckland 1142,NEW ZEALAND
Phone: +64 9 923 5055, or +64 9 373 7599 ext 85055
Email:  d.sc...@auckland.ac.nz,  Fax: +64 9 373 7018

Director of Consulting, Department of Statistics

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] efficient equivalent to read.csv / write.csv

2010-09-28 Thread Gabor Grothendieck
On Tue, Sep 28, 2010 at 1:24 PM, statquant2 statqu...@gmail.com wrote:

 Hi, after testing
 R) system.time(read.csv(myfile.csv))
   user  system elapsed
  1.126   0.038   1.177

 R) system.time(read.csv.sql(myfile.csv))
   user  system elapsed
  1.405   0.025   1.439
 Warning messages:
 1: closing unused connection 4 ()
 2: closing unused connection 3 ()

 It seems that the function is less efficient that the base one ... so ...

The benefit comes with larger files.  With small files there is not
much point in speeding it up since the absolute time is already small.

Suggest you look at the benchmarks on the sqldf home page where a
couple of users benchmarked larger files.   Since sqldf was intended
for convenience and not really performance I was surprised as anyone
when several users independently noticed that sqldf ran several times
faster than unoptimized R code.

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] efficient equivalent to read.csv / write.csv

2010-09-28 Thread statquant2

Hello all,
the test I provided was just to pinpoint that for loading once a big csv
file with read.csv was quicker than read.csv.sql... I have already
optimized my calls to read.csv for my particular problem, but is a simple
call to read.csv was quicker than read.csv.sql I doubt that specifying args
would invert the reult a lot...

May be I should outline my problem :

I am working on a powerful machine with 32Go or 64Go of RAM, so loading file
and keeping them in memory is not really an issue.
Those files (let's say 100) are shared by many and are flat csv files (this
to say that modify them is out of question).
Those files have lots of rows and between 10 and 20 colums, string and
numeric...

I basically need to be able to load these files to quicker possible and then
I will keep those data frame in memory... 
So :
Should I write my own C++ function and call it from R ?
Or is there a R way of improving drastically read.csv ?

Thanks a lot
-- 
View this message in context: 
http://r.789695.n4.nabble.com/efficient-equivalent-to-read-csv-write-csv-tp2714325p2717937.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] efficient equivalent to read.csv / write.csv

2010-09-28 Thread Gabor Grothendieck
On Tue, Sep 28, 2010 at 5:02 PM, statquant2 statqu...@gmail.com wrote:

 Hello all,
 the test I provided was just to pinpoint that for loading once a big csv

A file that can be read in under 2 seconds is not big.

 file with read.csv was quicker than read.csv.sql... I have already
 optimized my calls to read.csv for my particular problem, but is a simple
 call to read.csv was quicker than read.csv.sql I doubt that specifying args
 would invert the reult a lot...

 May be I should outline my problem :

 I am working on a powerful machine with 32Go or 64Go of RAM, so loading file
 and keeping them in memory is not really an issue.
 Those files (let's say 100) are shared by many and are flat csv files (this
 to say that modify them is out of question).
 Those files have lots of rows and between 10 and 20 colums, string and
 numeric...

 I basically need to be able to load these files to quicker possible and then
 I will keep those data frame in memory...
 So :
 Should I write my own C++ function and call it from R ?
 Or is there a R way of improving drastically read.csv ?

 Thanks a lot

So you have a bunch of small files and want to read them fast.  Are
they always the same or are they changing or a combination of the two?
  If they are the same or if many of them are the same then read those
once, save() them as RData files and load() them when you want them.
The load will be very fast.

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] efficient equivalent to read.csv / write.csv

2010-09-27 Thread statquant2

thank you very much for this sql package, the thing is that thoses table I
read are loaded into memory once and for all, and then we work with the
data.frames...
Do you think then that this is going to be quicker (as I would have thougth
that building the SQL DB from the flat file would already be a long
process...)?
For the RData file it is not possible as those files are shared to them into
RData, on the write side the file writte are read by other apps so the csv
format can't be changed.

Looking forward to read from you
Thanks
-- 
View this message in context: 
http://r.789695.n4.nabble.com/efficient-equivalent-to-read-csv-write-csv-tp2714325p2715275.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] efficient equivalent to read.csv / write.csv

2010-09-27 Thread Gabor Grothendieck
On Mon, Sep 27, 2010 at 7:49 AM, statquant2 statqu...@gmail.com wrote:

 thank you very much for this sql package, the thing is that thoses table I
 read are loaded into memory once and for all, and then we work with the
 data.frames...
 Do you think then that this is going to be quicker (as I would have thougth
 that building the SQL DB from the flat file would already be a long
 process...)?

Even including that read.csv.sql is typically several times faster
than unoptimized read.csv for large files.   See the introductory
remarks on the sqldf home page which specifically address that.
http://sqldf.googlecode.com

In fact, just try it and see whether its ok for you.  Its just one
line of code to read in a file.

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] efficient equivalent to read.csv / write.csv

2010-09-26 Thread Gabor Grothendieck
On Sun, Sep 26, 2010 at 8:38 AM, statquant2 statqu...@gmail.com wrote:

 Hello everyone,
 I currently run R code that have to read 100 or more large csv files (= 100
 Mo), and usually write csv too.
 My collegues and I like R very much but are a little bit ashtonished by how
 slow those functions are. We have looked on every argument of those
 functions and if specifying some parameters help a bit, this is still too
 slow.
 I am sure a lot of people have the same problem so I thought one of you
 would know a trick or a package that would help speeding this up a lot.

 (we work on LINUX Red Hat R 2.10.0 but I guess this is of no use for this
 pb)

 Thanks for reading this.
 Have a nice week end

You could try read.csv.sql in the sqldf package:

http://code.google.com/p/sqldf/#Example_13._read.csv.sql_and_read.csv2.sql

See ?read.csv.sql in sqldf.  It uses RSQLite and SQLite to read the
file into an sqlite database (which it sets up for you) completely
bypassing R and from there grabs it into R removing the database it
created at the end.

There are also CSVREAD and CSVWRITE sql functions in the H2 database
which is also supported by sqldf although I have never checked their
speed:
http://code.google.com/p/sqldf/#10.__What_are_some_of_the_differences_between_using_SQLite_and_H

--
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] efficient equivalent to read.csv / write.csv

2010-09-26 Thread Uwe Ligges



On 26.09.2010 14:38, statquant2 wrote:


Hello everyone,
I currently run R code that have to read 100 or more large csv files (= 100
Mo), and usually write csv too.
My collegues and I like R very much but are a little bit ashtonished by how
slow those functions are. We have looked on every argument of those
functions and if specifying some parameters help a bit, this is still too
slow.
I am sure a lot of people have the same problem so I thought one of you
would know a trick or a package that would help speeding this up a lot.

(we work on LINUX Red Hat R 2.10.0 but I guess this is of no use for this
pb)

Thanks for reading this.
Have a nice week end



Most of us read the csv file and write an Rdata file at once (see 
?save). Then we can read in the data much quicker after they have been 
imported once with read.csv and friends.


Uwe Ligges

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.