Re: [R] efficient equivalent to read.csv / write.csv
On Tue, Sep 28, 2010 at 5:02 PM, statquant2 wrote: > > Hello all, > the test I provided was just to pinpoint that for loading once a big csv A file that can be read in under 2 seconds is not big. > file with read.csv was quicker than read.csv.sql... I have already > "optimized" my calls to read.csv for my particular problem, but is a simple > call to read.csv was quicker than read.csv.sql I doubt that specifying args > would invert the reult a lot... > > May be I should outline my problem : > > I am working on a powerful machine with 32Go or 64Go of RAM, so loading file > and keeping them in memory is not really an issue. > Those files (let's say 100) are shared by many and are flat csv files (this > to say that modify them is out of question). > Those files have lots of rows and between 10 and 20 colums, string and > numeric... > > I basically need to be able to load these files to quicker possible and then > I will keep those data frame in memory... > So : > Should I write my own C++ function and call it from R ? > Or is there a R way of improving drastically read.csv ? > > Thanks a lot So you have a bunch of small files and want to read them fast. Are they always the same or are they changing or a combination of the two? If they are the same or if many of them are the same then read those once, save() them as RData files and load() them when you want them. The load will be very fast. -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] efficient equivalent to read.csv / write.csv
Hello all, the test I provided was just to pinpoint that for loading once a big csv file with read.csv was quicker than read.csv.sql... I have already "optimized" my calls to read.csv for my particular problem, but is a simple call to read.csv was quicker than read.csv.sql I doubt that specifying args would invert the reult a lot... May be I should outline my problem : I am working on a powerful machine with 32Go or 64Go of RAM, so loading file and keeping them in memory is not really an issue. Those files (let's say 100) are shared by many and are flat csv files (this to say that modify them is out of question). Those files have lots of rows and between 10 and 20 colums, string and numeric... I basically need to be able to load these files to quicker possible and then I will keep those data frame in memory... So : Should I write my own C++ function and call it from R ? Or is there a R way of improving drastically read.csv ? Thanks a lot -- View this message in context: http://r.789695.n4.nabble.com/efficient-equivalent-to-read-csv-write-csv-tp2714325p2717937.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] efficient equivalent to read.csv / write.csv
On Tue, Sep 28, 2010 at 1:24 PM, statquant2 wrote: > > Hi, after testing > R) system.time(read.csv("myfile.csv")) > user system elapsed > 1.126 0.038 1.177 > > R) system.time(read.csv.sql("myfile.csv")) > user system elapsed > 1.405 0.025 1.439 > Warning messages: > 1: closing unused connection 4 () > 2: closing unused connection 3 () > > It seems that the function is less efficient that the base one ... so ... The benefit comes with larger files. With small files there is not much point in speeding it up since the absolute time is already small. Suggest you look at the benchmarks on the sqldf home page where a couple of users benchmarked larger files. Since sqldf was intended for convenience and not really performance I was surprised as anyone when several users independently noticed that sqldf ran several times faster than unoptimized R code. -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] efficient equivalent to read.csv / write.csv
On 29/09/2010 6:24 a.m., statquant2 wrote: Hi, after testing R) system.time(read.csv("myfile.csv")) user system elapsed 1.126 0.038 1.177 R) system.time(read.csv.sql("myfile.csv")) user system elapsed 1.405 0.025 1.439 Warning messages: 1: closing unused connection 4 () 2: closing unused connection 3 () It seems that the function is less efficient that the base one ... so ... I presume you have had a good look at the R Data Import/Export manual? It does there warn of inefficiency with read.table (hence also read.csv) and suggest more direct use of scan which in your case might be via connections and readLines and writeLines. If that doesn't work, why not go to a database. Use RODBC or some such to read and write tables in the database. There are many options for databases to use (MySQL works for me). You can easily read data in and out of the database in .csv format. If the .csv files are similar there shouldn't be too much overhead in defining table formats for the database. David Scott -- _ David Scott Department of Statistics The University of Auckland, PB 92019 Auckland 1142,NEW ZEALAND Phone: +64 9 923 5055, or +64 9 373 7599 ext 85055 Email: d.sc...@auckland.ac.nz, Fax: +64 9 373 7018 Director of Consulting, Department of Statistics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] efficient equivalent to read.csv / write.csv
To speed things up, you certainly want to give R more clues about your data files by being more explicit by many of the arguments (cf. help(read.table), especially specifying argument 'colClasses' makes a big difference. /Henrik On Tue, Sep 28, 2010 at 10:24 AM, statquant2 wrote: > > Hi, after testing > R) system.time(read.csv("myfile.csv")) > user system elapsed > 1.126 0.038 1.177 > > R) system.time(read.csv.sql("myfile.csv")) > user system elapsed > 1.405 0.025 1.439 > Warning messages: > 1: closing unused connection 4 () > 2: closing unused connection 3 () > > It seems that the function is less efficient that the base one ... so ... > -- > View this message in context: > http://r.789695.n4.nabble.com/efficient-equivalent-to-read-csv-write-csv-tp2714325p2717585.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] efficient equivalent to read.csv / write.csv
Hi, after testing R) system.time(read.csv("myfile.csv")) user system elapsed 1.126 0.038 1.177 R) system.time(read.csv.sql("myfile.csv")) user system elapsed 1.405 0.025 1.439 Warning messages: 1: closing unused connection 4 () 2: closing unused connection 3 () It seems that the function is less efficient that the base one ... so ... -- View this message in context: http://r.789695.n4.nabble.com/efficient-equivalent-to-read-csv-write-csv-tp2714325p2717585.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] efficient equivalent to read.csv / write.csv
On Mon, Sep 27, 2010 at 7:49 AM, statquant2 wrote: > > thank you very much for this sql package, the thing is that thoses table I > read are loaded into memory once and for all, and then we work with the > data.frames... > Do you think then that this is going to be quicker (as I would have thougth > that building the SQL DB from the flat file would already be a long > process...)? Even including that read.csv.sql is typically several times faster than unoptimized read.csv for large files. See the introductory remarks on the sqldf home page which specifically address that. http://sqldf.googlecode.com In fact, just try it and see whether its ok for you. Its just one line of code to read in a file. -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] efficient equivalent to read.csv / write.csv
thank you very much for this sql package, the thing is that thoses table I read are loaded into memory once and for all, and then we work with the data.frames... Do you think then that this is going to be quicker (as I would have thougth that building the SQL DB from the flat file would already be a long process...)? For the RData file it is not possible as those files are shared to them into RData, on the write side the file writte are read by other apps so the csv format can't be changed. Looking forward to read from you Thanks -- View this message in context: http://r.789695.n4.nabble.com/efficient-equivalent-to-read-csv-write-csv-tp2714325p2715275.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] efficient equivalent to read.csv / write.csv
On 26.09.2010 14:38, statquant2 wrote: Hello everyone, I currently run R code that have to read 100 or more large csv files (>= 100 Mo), and usually write csv too. My collegues and I like R very much but are a little bit ashtonished by how slow those functions are. We have looked on every argument of those functions and if specifying some parameters help a bit, this is still too slow. I am sure a lot of people have the same problem so I thought one of you would know a trick or a package that would help speeding this up a lot. (we work on LINUX Red Hat R 2.10.0 but I guess this is of no use for this pb) Thanks for reading this. Have a nice week end Most of us read the csv file and write an Rdata file at once (see ?save). Then we can read in the data much quicker after they have been imported once with read.csv and friends. Uwe Ligges __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] efficient equivalent to read.csv / write.csv
On Sun, Sep 26, 2010 at 8:38 AM, statquant2 wrote: > > Hello everyone, > I currently run R code that have to read 100 or more large csv files (>= 100 > Mo), and usually write csv too. > My collegues and I like R very much but are a little bit ashtonished by how > slow those functions are. We have looked on every argument of those > functions and if specifying some parameters help a bit, this is still too > slow. > I am sure a lot of people have the same problem so I thought one of you > would know a trick or a package that would help speeding this up a lot. > > (we work on LINUX Red Hat R 2.10.0 but I guess this is of no use for this > pb) > > Thanks for reading this. > Have a nice week end You could try read.csv.sql in the sqldf package: http://code.google.com/p/sqldf/#Example_13._read.csv.sql_and_read.csv2.sql See ?read.csv.sql in sqldf. It uses RSQLite and SQLite to read the file into an sqlite database (which it sets up for you) completely bypassing R and from there grabs it into R removing the database it created at the end. There are also CSVREAD and CSVWRITE sql functions in the H2 database which is also supported by sqldf although I have never checked their speed: http://code.google.com/p/sqldf/#10.__What_are_some_of_the_differences_between_using_SQLite_and_H -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] efficient equivalent to read.csv / write.csv
Hello everyone, I currently run R code that have to read 100 or more large csv files (>= 100 Mo), and usually write csv too. My collegues and I like R very much but are a little bit ashtonished by how slow those functions are. We have looked on every argument of those functions and if specifying some parameters help a bit, this is still too slow. I am sure a lot of people have the same problem so I thought one of you would know a trick or a package that would help speeding this up a lot. (we work on LINUX Red Hat R 2.10.0 but I guess this is of no use for this pb) Thanks for reading this. Have a nice week end -- View this message in context: http://r.789695.n4.nabble.com/efficient-equivalent-to-read-csv-write-csv-tp2714325p2714325.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.