"Sean" == Sean Davis <[EMAIL PROTECTED]> writes: Sean> Have you tried just grabbing the whole column using dbGetQuery? Sean> Try doing this: Sean> Sean> spams <- dbGetQuery(con,"select unixtime from email limit Sean> 1000000") Sean> Sean> Then increase from 1,000,000 to 1.5 million, to 2 million, etc. Sean> until you break something (run out of memory), if you do at all.
Yes, you are right. For the example problem that I posed, R can indeed process the entire query result in memory. (The R process grows to 240MB, though!) Sean> However, the BETTER way to do this, if you already have the data Sean> in the database is to allow the database to do the histogram for Sean> you. For example, to get a count of spams by day, in MySQL do Sean> something like: [...] Yes, again you are right --- the particular problem that I posed is probably better handled by formulating a more sophisticated SQL query. But really, my goal isn't to solve the the example problem that I posed --- rather, it is to understand how people use R to process very large data sets. The research project that I'm working on will eventually need to deal with query results that cannot fit in main memory, and for which the built-in statistical facilities of most DBMSs will be insufficient. Some of my colleagues have previously written their analyses "by hand," using various scripting languages to read and process records from a DB in chunks. Writing things in this way, however, can be tedious and error-prone. Instead of taking this approach, I would like to be able to use existing statistics packages that have the ability to deal with large datasets in good ways. So, I seek to understand the ways that people deal with these sorts of situations in R. Your advice is very helpful --- one should solve problems in the simplest ways available! --- but I would still like to understand the harder cases, and how one can use "general" R functions in combination with DBI's `dbApply' and `fetch' interfaces, which divide results into chunks. Thanks! Eric. -- ------------------------------------------------------------------------------- Eric Eide <[EMAIL PROTECTED]> . University of Utah School of Computing http://www.cs.utah.edu/~eeide/ . +1 (801) 585-5512 voice, +1 (801) 581-5843 FAX ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html