Re: [R] data manipulation and summaries with few million rows

2011-08-27 Thread jim holtman
Factors are you friend here: myData mydate gender mygroup id mygrp.f 1 2012-03-25 F A 1 1 2 2005-05-23 F B 2 2 3 2005-09-08 F B 2 2 4 2005-12-07 F B 2 2 5 2006-02-26 F C 2 3 6 2006-05-13 F

[R] data manipulation and summaries with few million rows

2011-08-24 Thread Juliet Hannah
I have a data set with about 6 million rows and 50 columns. It is a mixture of dates, factors, and numerics. What I am trying to accomplish can be seen with the following simplified data, which is given as dput output below. head(myData) mydate gender mygroup id 1 2012-03-25 F

Re: [R] data manipulation and summaries with few million rows

2011-08-24 Thread Dennis Murphy
Hi Juliet: Here's a Q D solution: # (1) plyr f - function(d) length(unique(d$mygroup)) - 1 ddply(myData, .(id), f) id V1 1 1 0 2 2 2 3 3 1 4 4 0 # (2) data.table myDT - data.table(myData, key = 'id') myDT[, list(nswitch = length(unique(mygroup)) - 1), by = 'id'] If one can switch

Re: [R] data manipulation and summaries with few million rows

2011-08-24 Thread Juliet Hannah
Thanks Dennis! I'll check this out. Just to clarify, I need the total number of switches/changes regardless of if that state had occurred in the past. So A-A-B-A, would have 2 changes: A to B and B to A. Thanks again. On Wed, Aug 24, 2011 at 1:28 PM, Dennis Murphy djmu...@gmail.com wrote: Hi