i wish it were that simple. unfortunately the logic i have to do on each transaction is substantially more complicated, and involves referencing the existing values of the user table through a number of conditions.
any other thoughts on how to get better-than-linear performance time? is there a recommended binary searching/sorting (i.e. BTree) module that I could use to maintain my own index? thanks, mike Peter Dalgaard wrote: > mfrumin wrote: >> Hey all; I'm a beginner++ user of R, trying to use it to do some >> processing >> of data sets of over 1M rows, and running into a snafu. imagine that my >> input is a huge table of transactions, each linked to a specif user >> id. as >> I run through the transactions, I need to update a separate table for >> the >> users, but I am finding that the traditional ways of doing a table >> lookup >> are way too slow to support this kind of operation. >> >> i.e: >> >> for(i in 1:1000000) { >> userid = transactions$userid[i]; >> amt = transactions$amounts[i]; >> users[users$id == userid,'amt'] += amt; >> } >> >> I assume this is a linear lookup through the users table (in which >> there are >> 10's of thousands of rows), when really what I need is O(constant >> time), or >> at worst O(log(# users)). >> >> is there any way to manage a list of ID's (be they numeric, string, >> etc) and >> have them efficiently mapped to some other table index? >> >> I see the CRAN package for SQLite hashes, but that seems to be going >> a bit >> too far. >> > Sometimes you need a bit of lateral thinking. I suspect that you could > do it like this: > > tbl <- with(transactions, tapply(amount, userid, sum)) > users$amt <- users$amt + tbl[users$id] > > one catch is that there could be users with no transactions, in which > case you may need to replace userid by factor(userid, > levels=users$id). None of this is tested, of course. ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.