subject:"\[R\] efficient equivalent to read.csv \/ write.csv"

Re: [R] efficient equivalent to read.csv / write.csv

2010-09-28 Thread Gabor Grothendieck

On Tue, Sep 28, 2010 at 5:02 PM, statquant2  wrote:
>
> Hello all,
> the test I provided was just to pinpoint that for loading once a big csv

A file that can be read in under 2 seconds is not big.

> file with read.csv was quicker than read.csv.sql... I have already
> "optimized" my calls to read.csv for my particular problem, but is a simple
> call to read.csv was quicker than read.csv.sql I doubt that specifying args
> would invert the reult a lot...
>
> May be I should outline my problem :
>
> I am working on a powerful machine with 32Go or 64Go of RAM, so loading file
> and keeping them in memory is not really an issue.
> Those files (let's say 100) are shared by many and are flat csv files (this
> to say that modify them is out of question).
> Those files have lots of rows and between 10 and 20 colums, string and
> numeric...
>
> I basically need to be able to load these files to quicker possible and then
> I will keep those data frame in memory...
> So :
> Should I write my own C++ function and call it from R ?
> Or is there a R way of improving drastically read.csv ?
>
> Thanks a lot

So you have a bunch of small files and want to read them fast.  Are
they always the same or are they changing or a combination of the two?
  If they are the same or if many of them are the same then read those
once, save() them as RData files and load() them when you want them.
The load will be very fast.

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] efficient equivalent to read.csv / write.csv

2010-09-28 Thread statquant2


Hello all,
the test I provided was just to pinpoint that for loading once a big csv
file with read.csv was quicker than read.csv.sql... I have already
"optimized" my calls to read.csv for my particular problem, but is a simple
call to read.csv was quicker than read.csv.sql I doubt that specifying args
would invert the reult a lot...

May be I should outline my problem :

I am working on a powerful machine with 32Go or 64Go of RAM, so loading file
and keeping them in memory is not really an issue.
Those files (let's say 100) are shared by many and are flat csv files (this
to say that modify them is out of question).
Those files have lots of rows and between 10 and 20 colums, string and
numeric...

I basically need to be able to load these files to quicker possible and then
I will keep those data frame in memory... 
So :
Should I write my own C++ function and call it from R ?
Or is there a R way of improving drastically read.csv ?

Thanks a lot
-- 
View this message in context: 
http://r.789695.n4.nabble.com/efficient-equivalent-to-read-csv-write-csv-tp2714325p2717937.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] efficient equivalent to read.csv / write.csv

2010-09-28 Thread Gabor Grothendieck

On Tue, Sep 28, 2010 at 1:24 PM, statquant2  wrote:
>
> Hi, after testing
> R) system.time(read.csv("myfile.csv"))
>   user  system elapsed
>  1.126   0.038   1.177
>
> R) system.time(read.csv.sql("myfile.csv"))
>   user  system elapsed
>  1.405   0.025   1.439
> Warning messages:
> 1: closing unused connection 4 ()
> 2: closing unused connection 3 ()
>
> It seems that the function is less efficient that the base one ... so ...

The benefit comes with larger files.  With small files there is not
much point in speeding it up since the absolute time is already small.

Suggest you look at the benchmarks on the sqldf home page where a
couple of users benchmarked larger files.   Since sqldf was intended
for convenience and not really performance I was surprised as anyone
when several users independently noticed that sqldf ran several times
faster than unoptimized R code.

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] efficient equivalent to read.csv / write.csv

2010-09-28 Thread David Scott


On 29/09/2010 6:24 a.m., statquant2 wrote:


Hi, after testing
R) system.time(read.csv("myfile.csv"))
user  system elapsed
   1.126   0.038   1.177

R) system.time(read.csv.sql("myfile.csv"))
user  system elapsed
   1.405   0.025   1.439
Warning messages:
1: closing unused connection 4 ()
2: closing unused connection 3 ()

It seems that the function is less efficient that the base one ... so ...


I presume you have had a good look at the R Data Import/Export manual?

It does there warn of inefficiency with read.table (hence also read.csv) 
and suggest more direct use of scan which in your case might be via 
connections and readLines and writeLines.


If that doesn't work, why not go to a database. Use RODBC or some such 
to read and write tables in the database. There are many options for 
databases to use (MySQL works for me). You can easily read data in and 
out of the database in .csv format. If the .csv files are similar there 
shouldn't be too much overhead in defining table formats for the database.



David Scott

--
_
David Scott Department of Statistics
The University of Auckland, PB 92019
Auckland 1142,NEW ZEALAND
Phone: +64 9 923 5055, or +64 9 373 7599 ext 85055
Email:  d.sc...@auckland.ac.nz,  Fax: +64 9 373 7018

Director of Consulting, Department of Statistics

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] efficient equivalent to read.csv / write.csv

2010-09-28 Thread Henrik Bengtsson

To speed things up, you certainly want to give R more clues about your
data files by being more explicit by many of the arguments (cf.
help(read.table), especially specifying argument 'colClasses' makes a
big difference.

/Henrik

On Tue, Sep 28, 2010 at 10:24 AM, statquant2  wrote:
>
> Hi, after testing
> R) system.time(read.csv("myfile.csv"))
>   user  system elapsed
>  1.126   0.038   1.177
>
> R) system.time(read.csv.sql("myfile.csv"))
>   user  system elapsed
>  1.405   0.025   1.439
> Warning messages:
> 1: closing unused connection 4 ()
> 2: closing unused connection 3 ()
>
> It seems that the function is less efficient that the base one ... so ...
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/efficient-equivalent-to-read-csv-write-csv-tp2714325p2717585.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] efficient equivalent to read.csv / write.csv

2010-09-28 Thread statquant2


Hi, after testing 
R) system.time(read.csv("myfile.csv"))
   user  system elapsed
  1.126   0.038   1.177

R) system.time(read.csv.sql("myfile.csv"))
   user  system elapsed
  1.405   0.025   1.439
Warning messages:
1: closing unused connection 4 ()
2: closing unused connection 3 ()

It seems that the function is less efficient that the base one ... so ...
-- 
View this message in context: 
http://r.789695.n4.nabble.com/efficient-equivalent-to-read-csv-write-csv-tp2714325p2717585.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] efficient equivalent to read.csv / write.csv

2010-09-27 Thread Gabor Grothendieck

On Mon, Sep 27, 2010 at 7:49 AM, statquant2  wrote:
>
> thank you very much for this sql package, the thing is that thoses table I
> read are loaded into memory once and for all, and then we work with the
> data.frames...
> Do you think then that this is going to be quicker (as I would have thougth
> that building the SQL DB from the flat file would already be a long
> process...)?

Even including that read.csv.sql is typically several times faster
than unoptimized read.csv for large files.   See the introductory
remarks on the sqldf home page which specifically address that.
http://sqldf.googlecode.com

In fact, just try it and see whether its ok for you.  Its just one
line of code to read in a file.

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] efficient equivalent to read.csv / write.csv

2010-09-27 Thread statquant2


thank you very much for this sql package, the thing is that thoses table I
read are loaded into memory once and for all, and then we work with the
data.frames...
Do you think then that this is going to be quicker (as I would have thougth
that building the SQL DB from the flat file would already be a long
process...)?
For the RData file it is not possible as those files are shared to them into
RData, on the write side the file writte are read by other apps so the csv
format can't be changed.

Looking forward to read from you
Thanks
-- 
View this message in context: 
http://r.789695.n4.nabble.com/efficient-equivalent-to-read-csv-write-csv-tp2714325p2715275.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] efficient equivalent to read.csv / write.csv

2010-09-26 Thread Uwe Ligges




On 26.09.2010 14:38, statquant2 wrote:


Hello everyone,
I currently run R code that have to read 100 or more large csv files (>= 100
Mo), and usually write csv too.
My collegues and I like R very much but are a little bit ashtonished by how
slow those functions are. We have looked on every argument of those
functions and if specifying some parameters help a bit, this is still too
slow.
I am sure a lot of people have the same problem so I thought one of you
would know a trick or a package that would help speeding this up a lot.

(we work on LINUX Red Hat R 2.10.0 but I guess this is of no use for this
pb)

Thanks for reading this.
Have a nice week end



Most of us read the csv file and write an Rdata file at once (see 
?save). Then we can read in the data much quicker after they have been 
imported once with read.csv and friends.


Uwe Ligges

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] efficient equivalent to read.csv / write.csv

2010-09-26 Thread Gabor Grothendieck

On Sun, Sep 26, 2010 at 8:38 AM, statquant2  wrote:
>
> Hello everyone,
> I currently run R code that have to read 100 or more large csv files (>= 100
> Mo), and usually write csv too.
> My collegues and I like R very much but are a little bit ashtonished by how
> slow those functions are. We have looked on every argument of those
> functions and if specifying some parameters help a bit, this is still too
> slow.
> I am sure a lot of people have the same problem so I thought one of you
> would know a trick or a package that would help speeding this up a lot.
>
> (we work on LINUX Red Hat R 2.10.0 but I guess this is of no use for this
> pb)
>
> Thanks for reading this.
> Have a nice week end

You could try read.csv.sql in the sqldf package:

http://code.google.com/p/sqldf/#Example_13._read.csv.sql_and_read.csv2.sql

See ?read.csv.sql in sqldf.  It uses RSQLite and SQLite to read the
file into an sqlite database (which it sets up for you) completely
bypassing R and from there grabs it into R removing the database it
created at the end.

There are also CSVREAD and CSVWRITE sql functions in the H2 database
which is also supported by sqldf although I have never checked their
speed:
http://code.google.com/p/sqldf/#10.__What_are_some_of_the_differences_between_using_SQLite_and_H

--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] efficient equivalent to read.csv / write.csv

2010-09-26 Thread statquant2


Hello everyone,
I currently run R code that have to read 100 or more large csv files (>= 100
Mo), and usually write csv too.
My collegues and I like R very much but are a little bit ashtonished by how
slow those functions are. We have looked on every argument of those
functions and if specifying some parameters help a bit, this is still too
slow.
I am sure a lot of people have the same problem so I thought one of you
would know a trick or a package that would help speeding this up a lot.

(we work on LINUX Red Hat R 2.10.0 but I guess this is of no use for this
pb)

Thanks for reading this.
Have a nice week end
-- 
View this message in context: 
http://r.789695.n4.nabble.com/efficient-equivalent-to-read-csv-write-csv-tp2714325p2714325.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] efficient equivalent to read.csv / write.csv

Re: [R] efficient equivalent to read.csv / write.csv

Re: [R] efficient equivalent to read.csv / write.csv

Re: [R] efficient equivalent to read.csv / write.csv

Re: [R] efficient equivalent to read.csv / write.csv

Re: [R] efficient equivalent to read.csv / write.csv

Re: [R] efficient equivalent to read.csv / write.csv

Re: [R] efficient equivalent to read.csv / write.csv

Re: [R] efficient equivalent to read.csv / write.csv

Re: [R] efficient equivalent to read.csv / write.csv

[R] efficient equivalent to read.csv / write.csv

11 matches

Site Navigation

Mail list logo

Footer information