Since you have sorted the data.frame by 'subid', breaking ties with 'year', doesn't the following do the same thing as the other solutions. f4 <- function(df) df[ c(TRUE,diff(df$var1)!=0) & c(FALSE,diff(df$subid)==0), ] It gives the same answer for your df2 and is quicker than the others.
Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -----Original Message----- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf > Of arun > Sent: Tuesday, June 04, 2013 10:19 AM > To: R help > Subject: Re: [R] Read 2 rows in 1 dataframe for diff - longitudinal data > > > > Hi, > > By comparing some of the solutions: > set.seed(25) > subid<- sample(30:50,22e5,replace=TRUE) > set.seed(27) > year<- sample(1990:2012,22e5,replace=TRUE) > set.seed(35) > var1<- sample(c(1,3,5,7),22e5,replace=TRUE) > df2<- data.frame(subid,year,var1) > df2<- df2[order(df2$subid,df2$year),] > system.time(res<-subset(ddply(df2,.(subid),mutate,delta=c(FALSE,var1[-1]!=var1[- > length(var1)])),delta)[,-4]) > # user system elapsed > # 8.036 0.132 8.188 > > system.time(res2<-df2[ as.logical( ave( df2$var1, df2$subid, FUN=function(x) > c( FALSE, > x[-1] != x[-length(x)]) ) ), ]) > # user system elapsed > # 1.220 0.000 1.222 > system.time(res3<-df2[with(df2,unlist(tapply(var1,list(subid),FUN=function(x) > c(FALSE,diff(x)!=0)),use.names=FALSE)),]) > # user system elapsed > # 1.729 0.000 1.730 > identical(res2,res3) > #[1] TRUE > > row.names(res)<-1:nrow(res) > row.names(res2)<-1:nrow(res) > identical(res,res2) > #[1] TRUE > > I found half an hour a bit too extreme by comparing the above numbers. > > > A.K. > > > David: > > 6 47 1999 1 > > should not be included in the output list because, we are trying > to detect changes within the subid's. 1999 was the first year for > subject 47 and changes have to be detected after that year - hence we > were using ddply to group. Your solution ran very fast as expected. > > AK- I have a large dataset and your solution is taking too long - > as a matter of fact i had to kill it afte 1/2 hr on a 22K row dataset. > > Thanks for the suggestions. > > -ST > > > ----- Original Message ----- > From: David Winsemius <dwinsem...@comcast.net> > To: arun <smartpink...@yahoo.com> > Cc: R help <r-help@r-project.org> > Sent: Tuesday, June 4, 2013 11:13 AM > Subject: Re: [R] Read 2 rows in 1 dataframe for diff - longitudinal data > > > On Jun 3, 2013, at 9:51 PM, arun wrote: > > > If it is grouped by "subid" (that would be the difference in the number of > > changes) > > > > subset(ddply(df1,.(subid),mutate,delta=c(FALSE,var[-1]!=var[-length(var)])),delta)[,-4] > > # subid year var > > #3 36 2003 3 > > #7 47 2001 3 > > #9 47 2005 1 > > #10 47 2007 3 > > A.K. > > I'm not sure why the first one retruns integer values from the ave() call but > the second > version works: > > > df1[ ave( df1$var, df1$subid, FUN=function(x) c( FALSE, x[-1] != > > x[-length(x)]) ), ] > subid year var > 1 36 1999 1 > 1.1 36 1999 1 > 1.2 36 1999 1 > 1.3 36 1999 1 > > ave( df1$var, df1$subid, FUN=function(x) c( FALSE, x[-1] != x[-length(x)])) > [1] 0 0 1 0 0 0 1 0 1 1 > > Perhaps one of the single item groups sabotaged my simple function. > > > > df1[ as.logical( ave( df1$var, df1$subid, FUN=function(x) c( FALSE, x[-1] > > != x[-length(x)]) > ) ), ] > subid year var > 3 36 2003 3 > 7 47 2001 3 > 9 47 2005 1 > 10 47 2007 3 > > -- > David. > > > > > > ----- Original Message ----- > > From: David Winsemius <dwinsem...@comcast.net> > > To: arun <smartpink...@yahoo.com> > > Cc: R help <r-help@r-project.org> > > Sent: Tuesday, June 4, 2013 12:37 AM > > Subject: Re: [R] Read 2 rows in 1 dataframe for diff - longitudinal data > > > > > > On Jun 3, 2013, at 7:10 PM, arun wrote: > > > >> Hi, > >> May be this helps: > >> res1<-df1[with(df1,unlist(tapply(var,list(subid),FUN=function(x) > c(FALSE,diff(x)!=0)),use.names=FALSE)),] > >> res1 > >> # subid year var > >> #3 36 2003 3 > >> #7 47 2001 3 > >> #9 47 2005 1 > >> #10 47 2007 3 > >> #or > >> library(plyr) > >> subset(ddply(df1,.(subid),mutate,delta=c(FALSE,diff(var)!=0)),delta)[,-4] > >> # subid year var > >> #3 36 2003 3 > >> #7 47 2001 3 > >> #9 47 2005 1 > >> #10 47 2007 3 > >> A.K. > >> > > It's pretty simple with logical indexing: > > > >> df1[ c(FALSE, df1$var[-1]!=df1$var[-length(df1$var)]), ] > > subid year var > > 3 36 2003 3 > > 6 47 1999 1 > > 7 47 2001 3 > > 9 47 2005 1 > > 10 47 2007 3 > > > > > > When I count the number of changes in value of var is give me 5. Not sure > > why you are > both leaving out row 6. > > > > -- > > David. > >> > >> > >> I need to output a dataframe whenever var changes a value. > >> > >> df1 <- > data.frame(subid=rep(c(36,47),each=5),year=rep(seq(1999,2007,2),2),var=c(1,1,3,3,3,1,3 > ,3,1,3)) > >> subid year var > >> 1 36 1999 1 > >> 2 36 2001 1 > >> 3 36 2003 3 > >> 4 36 2005 3 > >> 5 36 2007 3 > >> 6 47 1999 1 > >> 7 47 2001 3 > >> 8 47 2003 3 > >> 9 47 2005 1 > >> 10 47 2007 3 > >>> > >> > >> I need: > >> 36 2003 3 > >> 47 2001 3 > >> 47 2005 1 > >> 47 2007 3 > >> > >> I am trying to use ddply over subid and use the diff function, but it is > >> not working quiet > right. > >> > >>> dd <- ddply(df1,.(subid),summarize,delta=diff(var) != 0) > >>> dd > >> subid delta > >> 1 36 FALSE > >> 2 36 TRUE > >> 3 36 FALSE > >> 4 36 FALSE > >> 5 47 TRUE > >> 6 47 FALSE > >> 7 47 TRUE > >> 8 47 TRUE > >> > >> I would appreciate any help on this. > >> Thank You! > >> -ST > >> > >> ______________________________________________ > >> R-help@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > >> http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > > > > David Winsemius > > Alameda, CA, USA > > > > David Winsemius > Alameda, CA, USA > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.