Re: [R] Read 2 rows in 1 dataframe for diff - longitudinal data
On Jun 3, 2013, at 9:51 PM, arun wrote: If it is grouped by subid (that would be the difference in the number of changes) subset(ddply(df1,.(subid),mutate,delta=c(FALSE,var[-1]!=var[-length(var)])),delta)[,-4] # subid year var #3 36 2003 3 #7 47 2001 3 #9 47 2005 1 #1047 2007 3 A.K. Ah. I see. Then this looks simpler to my eyes: df1[ ave( df1$var, df1$subid, FUN=function(x) c( FALSE, x[-1] != x[-length(x)]) ) , ] -- David. - Original Message - From: David Winsemius dwinsem...@comcast.net To: arun smartpink...@yahoo.com Cc: R help r-help@r-project.org Sent: Tuesday, June 4, 2013 12:37 AM Subject: Re: [R] Read 2 rows in 1 dataframe for diff - longitudinal data On Jun 3, 2013, at 7:10 PM, arun wrote: Hi, May be this helps: res1-df1[with(df1,unlist(tapply(var,list(subid),FUN=function(x) c(FALSE,diff(x)!=0)),use.names=FALSE)),] res1 # subid year var #3 36 2003 3 #7 47 2001 3 #9 47 2005 1 #1047 2007 3 #or library(plyr) subset(ddply(df1,.(subid),mutate,delta=c(FALSE,diff(var)!=0)),delta)[,-4] # subid year var #3 36 2003 3 #7 47 2001 3 #9 47 2005 1 #1047 2007 3 A.K. It's pretty simple with logical indexing: df1[ c(FALSE, df1$var[-1]!=df1$var[-length(df1$var)]), ] subid year var 3 36 2003 3 6 47 1999 1 7 47 2001 3 9 47 2005 1 1047 2007 3 When I count the number of changes in value of var is give me 5. Not sure why you are both leaving out row 6. -- David. I need to output a dataframe whenever var changes a value. df1 - data.frame(subid=rep(c(36,47),each=5),year=rep(seq(1999,2007,2),2),var=c(1,1,3,3,3,1,3,3,1,3)) subid year var 1 36 1999 1 2 36 2001 1 3 36 2003 3 4 36 2005 3 5 36 2007 3 6 47 1999 1 7 47 2001 3 8 47 2003 3 9 47 2005 1 1047 2007 3 I need: 36 2003 3 47 2001 3 47 2005 1 47 2007 3 I am trying to use ddply over subid and use the diff function, but it is not working quiet right. dd - ddply(df1,.(subid),summarize,delta=diff(var) != 0) dd subid delta 136 FALSE 236 TRUE 336 FALSE 436 FALSE 547 TRUE 647 FALSE 747 TRUE 847 TRUE I would appreciate any help on this. Thank You! -ST __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Read 2 rows in 1 dataframe for diff - longitudinal data
On Jun 3, 2013, at 9:51 PM, arun wrote: If it is grouped by subid (that would be the difference in the number of changes) subset(ddply(df1,.(subid),mutate,delta=c(FALSE,var[-1]!=var[-length(var)])),delta)[,-4] # subid year var #3 36 2003 3 #7 47 2001 3 #9 47 2005 1 #1047 2007 3 A.K. I'm not sure why the first one retruns integer values from the ave() call but the second version works: df1[ ave( df1$var, df1$subid, FUN=function(x) c( FALSE, x[-1] != x[-length(x)]) ), ] subid year var 1 36 1999 1 1.136 1999 1 1.236 1999 1 1.336 1999 1 ave( df1$var, df1$subid, FUN=function(x) c( FALSE, x[-1] != x[-length(x)])) [1] 0 0 1 0 0 0 1 0 1 1 Perhaps one of the single item groups sabotaged my simple function. df1[ as.logical( ave( df1$var, df1$subid, FUN=function(x) c( FALSE, x[-1] != x[-length(x)]) ) ), ] subid year var 3 36 2003 3 7 47 2001 3 9 47 2005 1 1047 2007 3 -- David. - Original Message - From: David Winsemius dwinsem...@comcast.net To: arun smartpink...@yahoo.com Cc: R help r-help@r-project.org Sent: Tuesday, June 4, 2013 12:37 AM Subject: Re: [R] Read 2 rows in 1 dataframe for diff - longitudinal data On Jun 3, 2013, at 7:10 PM, arun wrote: Hi, May be this helps: res1-df1[with(df1,unlist(tapply(var,list(subid),FUN=function(x) c(FALSE,diff(x)!=0)),use.names=FALSE)),] res1 # subid year var #3 36 2003 3 #7 47 2001 3 #9 47 2005 1 #1047 2007 3 #or library(plyr) subset(ddply(df1,.(subid),mutate,delta=c(FALSE,diff(var)!=0)),delta)[,-4] # subid year var #3 36 2003 3 #7 47 2001 3 #9 47 2005 1 #1047 2007 3 A.K. It's pretty simple with logical indexing: df1[ c(FALSE, df1$var[-1]!=df1$var[-length(df1$var)]), ] subid year var 3 36 2003 3 6 47 1999 1 7 47 2001 3 9 47 2005 1 1047 2007 3 When I count the number of changes in value of var is give me 5. Not sure why you are both leaving out row 6. -- David. I need to output a dataframe whenever var changes a value. df1 - data.frame(subid=rep(c(36,47),each=5),year=rep(seq(1999,2007,2),2),var=c(1,1,3,3,3,1,3,3,1,3)) subid year var 1 36 1999 1 2 36 2001 1 3 36 2003 3 4 36 2005 3 5 36 2007 3 6 47 1999 1 7 47 2001 3 8 47 2003 3 9 47 2005 1 1047 2007 3 I need: 36 2003 3 47 2001 3 47 2005 1 47 2007 3 I am trying to use ddply over subid and use the diff function, but it is not working quiet right. dd - ddply(df1,.(subid),summarize,delta=diff(var) != 0) dd subid delta 136 FALSE 236 TRUE 336 FALSE 436 FALSE 547 TRUE 647 FALSE 747 TRUE 847 TRUE I would appreciate any help on this. Thank You! -ST __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Read 2 rows in 1 dataframe for diff - longitudinal data
Hi, By comparing some of the solutions: set.seed(25) subid- sample(30:50,22e5,replace=TRUE) set.seed(27) year- sample(1990:2012,22e5,replace=TRUE) set.seed(35) var1- sample(c(1,3,5,7),22e5,replace=TRUE) df2- data.frame(subid,year,var1) df2- df2[order(df2$subid,df2$year),] system.time(res-subset(ddply(df2,.(subid),mutate,delta=c(FALSE,var1[-1]!=var1[-length(var1)])),delta)[,-4]) # user system elapsed # 8.036 0.132 8.188 system.time(res2-df2[ as.logical( ave( df2$var1, df2$subid, FUN=function(x) c( FALSE, x[-1] != x[-length(x)]) ) ), ]) # user system elapsed # 1.220 0.000 1.222 system.time(res3-df2[with(df2,unlist(tapply(var1,list(subid),FUN=function(x) c(FALSE,diff(x)!=0)),use.names=FALSE)),]) # user system elapsed # 1.729 0.000 1.730 identical(res2,res3) #[1] TRUE row.names(res)-1:nrow(res) row.names(res2)-1:nrow(res) identical(res,res2) #[1] TRUE I found half an hour a bit too extreme by comparing the above numbers. A.K. David: 6 47 1999 1 should not be included in the output list because, we are trying to detect changes within the subid's. 1999 was the first year for subject 47 and changes have to be detected after that year - hence we were using ddply to group. Your solution ran very fast as expected. AK- I have a large dataset and your solution is taking too long - as a matter of fact i had to kill it afte 1/2 hr on a 22K row dataset. Thanks for the suggestions. -ST - Original Message - From: David Winsemius dwinsem...@comcast.net To: arun smartpink...@yahoo.com Cc: R help r-help@r-project.org Sent: Tuesday, June 4, 2013 11:13 AM Subject: Re: [R] Read 2 rows in 1 dataframe for diff - longitudinal data On Jun 3, 2013, at 9:51 PM, arun wrote: If it is grouped by subid (that would be the difference in the number of changes) subset(ddply(df1,.(subid),mutate,delta=c(FALSE,var[-1]!=var[-length(var)])),delta)[,-4] # subid year var #3 36 2003 3 #7 47 2001 3 #9 47 2005 1 #10 47 2007 3 A.K. I'm not sure why the first one retruns integer values from the ave() call but the second version works: df1[ ave( df1$var, df1$subid, FUN=function(x) c( FALSE, x[-1] != x[-length(x)]) ), ] subid year var 1 36 1999 1 1.1 36 1999 1 1.2 36 1999 1 1.3 36 1999 1 ave( df1$var, df1$subid, FUN=function(x) c( FALSE, x[-1] != x[-length(x)])) [1] 0 0 1 0 0 0 1 0 1 1 Perhaps one of the single item groups sabotaged my simple function. df1[ as.logical( ave( df1$var, df1$subid, FUN=function(x) c( FALSE, x[-1] != x[-length(x)]) ) ), ] subid year var 3 36 2003 3 7 47 2001 3 9 47 2005 1 10 47 2007 3 -- David. - Original Message - From: David Winsemius dwinsem...@comcast.net To: arun smartpink...@yahoo.com Cc: R help r-help@r-project.org Sent: Tuesday, June 4, 2013 12:37 AM Subject: Re: [R] Read 2 rows in 1 dataframe for diff - longitudinal data On Jun 3, 2013, at 7:10 PM, arun wrote: Hi, May be this helps: res1-df1[with(df1,unlist(tapply(var,list(subid),FUN=function(x) c(FALSE,diff(x)!=0)),use.names=FALSE)),] res1 # subid year var #3 36 2003 3 #7 47 2001 3 #9 47 2005 1 #10 47 2007 3 #or library(plyr) subset(ddply(df1,.(subid),mutate,delta=c(FALSE,diff(var)!=0)),delta)[,-4] # subid year var #3 36 2003 3 #7 47 2001 3 #9 47 2005 1 #10 47 2007 3 A.K. It's pretty simple with logical indexing: df1[ c(FALSE, df1$var[-1]!=df1$var[-length(df1$var)]), ] subid year var 3 36 2003 3 6 47 1999 1 7 47 2001 3 9 47 2005 1 10 47 2007 3 When I count the number of changes in value of var is give me 5. Not sure why you are both leaving out row 6. -- David. I need to output a dataframe whenever var changes a value. df1 - data.frame(subid=rep(c(36,47),each=5),year=rep(seq(1999,2007,2),2),var=c(1,1,3,3,3,1,3,3,1,3)) subid year var 1 36 1999 1 2 36 2001 1 3 36 2003 3 4 36 2005 3 5 36 2007 3 6 47 1999 1 7 47 2001 3 8 47 2003 3 9 47 2005 1 10 47 2007 3 I need: 36 2003 3 47 2001 3 47 2005 1 47 2007 3 I am trying to use ddply over subid and use the diff function, but it is not working quiet right. dd - ddply(df1,.(subid),summarize,delta=diff(var) != 0) dd subid delta 1 36 FALSE 2 36 TRUE 3 36 FALSE 4 36 FALSE 5 47 TRUE 6 47 FALSE 7 47 TRUE 8 47 TRUE I would appreciate any help on this. Thank You! -ST __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA David Winsemius Alameda, CA, USA
Re: [R] Read 2 rows in 1 dataframe for diff - longitudinal data
HI ST, In case, you wanted to further decrease the time: library(data.table) dt1- data.table(df2) #using the same example as below system.time({ dt1-dt1[,indx:=c(FALSE,diff(var1)!=0),by=subid] res3-subset(dt1,indx,select=1:3) }) # user system elapsed # 0.32 0.00 0.32 head(res3) # subid year var1 #1: 30 1990 7 #2: 30 1990 1 #3: 30 1990 5 #4: 30 1990 7 #5: 30 1990 5 #6: 30 1990 7 head(res2) # subid year var1 #1 30 1990 7 #2 30 1990 1 #3 30 1990 5 #4 30 1990 7 #5 30 1990 5 #6 30 1990 7 Since you mentioned this half-hour running time, it would be good to check your data. ?str() A.K. - Original Message - From: arun smartpink...@yahoo.com To: R help r-help@r-project.org Cc: David Winsemius dwinsem...@comcast.net Sent: Tuesday, June 4, 2013 1:18 PM Subject: Re: [R] Read 2 rows in 1 dataframe for diff - longitudinal data Hi, By comparing some of the solutions: set.seed(25) subid- sample(30:50,22e5,replace=TRUE) set.seed(27) year- sample(1990:2012,22e5,replace=TRUE) set.seed(35) var1- sample(c(1,3,5,7),22e5,replace=TRUE) df2- data.frame(subid,year,var1) df2- df2[order(df2$subid,df2$year),] system.time(res-subset(ddply(df2,.(subid),mutate,delta=c(FALSE,var1[-1]!=var1[-length(var1)])),delta)[,-4]) # user system elapsed # 8.036 0.132 8.188 system.time(res2-df2[ as.logical( ave( df2$var1, df2$subid, FUN=function(x) c( FALSE, x[-1] != x[-length(x)]) ) ), ]) # user system elapsed # 1.220 0.000 1.222 system.time(res3-df2[with(df2,unlist(tapply(var1,list(subid),FUN=function(x) c(FALSE,diff(x)!=0)),use.names=FALSE)),]) # user system elapsed # 1.729 0.000 1.730 identical(res2,res3) #[1] TRUE row.names(res)-1:nrow(res) row.names(res2)-1:nrow(res) identical(res,res2) #[1] TRUE I found half an hour a bit too extreme by comparing the above numbers. A.K. David: 6 47 1999 1 should not be included in the output list because, we are trying to detect changes within the subid's. 1999 was the first year for subject 47 and changes have to be detected after that year - hence we were using ddply to group. Your solution ran very fast as expected. AK- I have a large dataset and your solution is taking too long - as a matter of fact i had to kill it afte 1/2 hr on a 22K row dataset. Thanks for the suggestions. -ST - Original Message - From: David Winsemius dwinsem...@comcast.net To: arun smartpink...@yahoo.com Cc: R help r-help@r-project.org Sent: Tuesday, June 4, 2013 11:13 AM Subject: Re: [R] Read 2 rows in 1 dataframe for diff - longitudinal data On Jun 3, 2013, at 9:51 PM, arun wrote: If it is grouped by subid (that would be the difference in the number of changes) subset(ddply(df1,.(subid),mutate,delta=c(FALSE,var[-1]!=var[-length(var)])),delta)[,-4] # subid year var #3 36 2003 3 #7 47 2001 3 #9 47 2005 1 #10 47 2007 3 A.K. I'm not sure why the first one retruns integer values from the ave() call but the second version works: df1[ ave( df1$var, df1$subid, FUN=function(x) c( FALSE, x[-1] != x[-length(x)]) ), ] subid year var 1 36 1999 1 1.1 36 1999 1 1.2 36 1999 1 1.3 36 1999 1 ave( df1$var, df1$subid, FUN=function(x) c( FALSE, x[-1] != x[-length(x)])) [1] 0 0 1 0 0 0 1 0 1 1 Perhaps one of the single item groups sabotaged my simple function. df1[ as.logical( ave( df1$var, df1$subid, FUN=function(x) c( FALSE, x[-1] != x[-length(x)]) ) ), ] subid year var 3 36 2003 3 7 47 2001 3 9 47 2005 1 10 47 2007 3 -- David. - Original Message - From: David Winsemius dwinsem...@comcast.net To: arun smartpink...@yahoo.com Cc: R help r-help@r-project.org Sent: Tuesday, June 4, 2013 12:37 AM Subject: Re: [R] Read 2 rows in 1 dataframe for diff - longitudinal data On Jun 3, 2013, at 7:10 PM, arun wrote: Hi, May be this helps: res1-df1[with(df1,unlist(tapply(var,list(subid),FUN=function(x) c(FALSE,diff(x)!=0)),use.names=FALSE)),] res1 # subid year var #3 36 2003 3 #7 47 2001 3 #9 47 2005 1 #10 47 2007 3 #or library(plyr) subset(ddply(df1,.(subid),mutate,delta=c(FALSE,diff(var)!=0)),delta)[,-4] # subid year var #3 36 2003 3 #7 47 2001 3 #9 47 2005 1 #10 47 2007 3 A.K. It's pretty simple with logical indexing: df1[ c(FALSE, df1$var[-1]!=df1$var[-length(df1$var)]), ] subid year var 3 36 2003 3 6 47 1999 1 7 47 2001 3 9 47 2005 1 10 47 2007 3 When I count the number of changes in value of var is give me 5. Not sure why you are both leaving out row 6. -- David. I need to output a dataframe whenever var changes a value. df1 - data.frame(subid=rep(c(36,47),each=5),year=rep(seq(1999,2007,2),2),var=c(1,1,3,3,3,1,3,3,1,3)) subid year var 1 36 1999 1 2 36 2001 1 3 36 2003 3 4 36
Re: [R] Read 2 rows in 1 dataframe for diff - longitudinal data
Since you have sorted the data.frame by 'subid', breaking ties with 'year', doesn't the following do the same thing as the other solutions. f4 - function(df) df[ c(TRUE,diff(df$var1)!=0) c(FALSE,diff(df$subid)==0), ] It gives the same answer for your df2 and is quicker than the others. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of arun Sent: Tuesday, June 04, 2013 10:19 AM To: R help Subject: Re: [R] Read 2 rows in 1 dataframe for diff - longitudinal data Hi, By comparing some of the solutions: set.seed(25) subid- sample(30:50,22e5,replace=TRUE) set.seed(27) year- sample(1990:2012,22e5,replace=TRUE) set.seed(35) var1- sample(c(1,3,5,7),22e5,replace=TRUE) df2- data.frame(subid,year,var1) df2- df2[order(df2$subid,df2$year),] system.time(res-subset(ddply(df2,.(subid),mutate,delta=c(FALSE,var1[-1]!=var1[- length(var1)])),delta)[,-4]) # user system elapsed # 8.036 0.132 8.188 system.time(res2-df2[ as.logical( ave( df2$var1, df2$subid, FUN=function(x) c( FALSE, x[-1] != x[-length(x)]) ) ), ]) # user system elapsed # 1.220 0.000 1.222 system.time(res3-df2[with(df2,unlist(tapply(var1,list(subid),FUN=function(x) c(FALSE,diff(x)!=0)),use.names=FALSE)),]) # user system elapsed # 1.729 0.000 1.730 identical(res2,res3) #[1] TRUE row.names(res)-1:nrow(res) row.names(res2)-1:nrow(res) identical(res,res2) #[1] TRUE I found half an hour a bit too extreme by comparing the above numbers. A.K. David: 6 47 1999 1 should not be included in the output list because, we are trying to detect changes within the subid's. 1999 was the first year for subject 47 and changes have to be detected after that year - hence we were using ddply to group. Your solution ran very fast as expected. AK- I have a large dataset and your solution is taking too long - as a matter of fact i had to kill it afte 1/2 hr on a 22K row dataset. Thanks for the suggestions. -ST - Original Message - From: David Winsemius dwinsem...@comcast.net To: arun smartpink...@yahoo.com Cc: R help r-help@r-project.org Sent: Tuesday, June 4, 2013 11:13 AM Subject: Re: [R] Read 2 rows in 1 dataframe for diff - longitudinal data On Jun 3, 2013, at 9:51 PM, arun wrote: If it is grouped by subid (that would be the difference in the number of changes) subset(ddply(df1,.(subid),mutate,delta=c(FALSE,var[-1]!=var[-length(var)])),delta)[,-4] # subid year var #3 36 2003 3 #7 47 2001 3 #9 47 2005 1 #10 47 2007 3 A.K. I'm not sure why the first one retruns integer values from the ave() call but the second version works: df1[ ave( df1$var, df1$subid, FUN=function(x) c( FALSE, x[-1] != x[-length(x)]) ), ] subid year var 1 36 1999 1 1.1 36 1999 1 1.2 36 1999 1 1.3 36 1999 1 ave( df1$var, df1$subid, FUN=function(x) c( FALSE, x[-1] != x[-length(x)])) [1] 0 0 1 0 0 0 1 0 1 1 Perhaps one of the single item groups sabotaged my simple function. df1[ as.logical( ave( df1$var, df1$subid, FUN=function(x) c( FALSE, x[-1] != x[-length(x)]) ) ), ] subid year var 3 36 2003 3 7 47 2001 3 9 47 2005 1 10 47 2007 3 -- David. - Original Message - From: David Winsemius dwinsem...@comcast.net To: arun smartpink...@yahoo.com Cc: R help r-help@r-project.org Sent: Tuesday, June 4, 2013 12:37 AM Subject: Re: [R] Read 2 rows in 1 dataframe for diff - longitudinal data On Jun 3, 2013, at 7:10 PM, arun wrote: Hi, May be this helps: res1-df1[with(df1,unlist(tapply(var,list(subid),FUN=function(x) c(FALSE,diff(x)!=0)),use.names=FALSE)),] res1 # subid year var #3 36 2003 3 #7 47 2001 3 #9 47 2005 1 #10 47 2007 3 #or library(plyr) subset(ddply(df1,.(subid),mutate,delta=c(FALSE,diff(var)!=0)),delta)[,-4] # subid year var #3 36 2003 3 #7 47 2001 3 #9 47 2005 1 #10 47 2007 3 A.K. It's pretty simple with logical indexing: df1[ c(FALSE, df1$var[-1]!=df1$var[-length(df1$var)]), ] subid year var 3 36 2003 3 6 47 1999 1 7 47 2001 3 9 47 2005 1 10 47 2007 3 When I count the number of changes in value of var is give me 5. Not sure why you are both leaving out row 6. -- David. I need to output a dataframe whenever var changes a value. df1 - data.frame(subid=rep(c(36,47),each=5),year=rep(seq(1999,2007,2),2),var=c(1,1,3,3,3,1,3 ,3,1,3)) subid year var 1 36 1999 1 2 36 2001 1 3 36 2003 3 4 36 2005 3 5 36 2007 3 6 47 1999 1 7 47 2001 3 8 47 2003 3 9 47 2005 1 10 47 2007 3 I need: 36 2003 3 47 2001 3 47 2005 1 47 2007 3 I
Re: [R] Read 2 rows in 1 dataframe for diff - longitudinal data
Hi, May be this helps: res1-df1[with(df1,unlist(tapply(var,list(subid),FUN=function(x) c(FALSE,diff(x)!=0)),use.names=FALSE)),] res1 # subid year var #3 36 2003 3 #7 47 2001 3 #9 47 2005 1 #10 47 2007 3 #or library(plyr) subset(ddply(df1,.(subid),mutate,delta=c(FALSE,diff(var)!=0)),delta)[,-4] # subid year var #3 36 2003 3 #7 47 2001 3 #9 47 2005 1 #10 47 2007 3 A.K. I need to output a dataframe whenever var changes a value. df1 - data.frame(subid=rep(c(36,47),each=5),year=rep(seq(1999,2007,2),2),var=c(1,1,3,3,3,1,3,3,1,3)) subid year var 1 36 1999 1 2 36 2001 1 3 36 2003 3 4 36 2005 3 5 36 2007 3 6 47 1999 1 7 47 2001 3 8 47 2003 3 9 47 2005 1 10 47 2007 3 I need: 36 2003 3 47 2001 3 47 2005 1 47 2007 3 I am trying to use ddply over subid and use the diff function, but it is not working quiet right. dd - ddply(df1,.(subid),summarize,delta=diff(var) != 0) dd subid delta 1 36 FALSE 2 36 TRUE 3 36 FALSE 4 36 FALSE 5 47 TRUE 6 47 FALSE 7 47 TRUE 8 47 TRUE I would appreciate any help on this. Thank You! -ST __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Read 2 rows in 1 dataframe for diff - longitudinal data
On Jun 3, 2013, at 7:10 PM, arun wrote: Hi, May be this helps: res1-df1[with(df1,unlist(tapply(var,list(subid),FUN=function(x) c(FALSE,diff(x)!=0)),use.names=FALSE)),] res1 # subid year var #3 36 2003 3 #7 47 2001 3 #9 47 2005 1 #1047 2007 3 #or library(plyr) subset(ddply(df1,.(subid),mutate,delta=c(FALSE,diff(var)!=0)),delta)[,-4] # subid year var #3 36 2003 3 #7 47 2001 3 #9 47 2005 1 #1047 2007 3 A.K. It's pretty simple with logical indexing: df1[ c(FALSE, df1$var[-1]!=df1$var[-length(df1$var)]), ] subid year var 3 36 2003 3 6 47 1999 1 7 47 2001 3 9 47 2005 1 1047 2007 3 When I count the number of changes in value of var is give me 5. Not sure why you are both leaving out row 6. -- David. I need to output a dataframe whenever var changes a value. df1 - data.frame(subid=rep(c(36,47),each=5),year=rep(seq(1999,2007,2),2),var=c(1,1,3,3,3,1,3,3,1,3)) subid year var 1 36 1999 1 2 36 2001 1 3 36 2003 3 4 36 2005 3 5 36 2007 3 6 47 1999 1 7 47 2001 3 8 47 2003 3 9 47 2005 1 1047 2007 3 I need: 36 2003 3 47 2001 3 47 2005 1 47 2007 3 I am trying to use ddply over subid and use the diff function, but it is not working quiet right. dd - ddply(df1,.(subid),summarize,delta=diff(var) != 0) dd subid delta 136 FALSE 236 TRUE 336 FALSE 436 FALSE 547 TRUE 647 FALSE 747 TRUE 847 TRUE I would appreciate any help on this. Thank You! -ST __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Read 2 rows in 1 dataframe for diff - longitudinal data
If it is grouped by subid (that would be the difference in the number of changes) subset(ddply(df1,.(subid),mutate,delta=c(FALSE,var[-1]!=var[-length(var)])),delta)[,-4] # subid year var #3 36 2003 3 #7 47 2001 3 #9 47 2005 1 #10 47 2007 3 A.K. - Original Message - From: David Winsemius dwinsem...@comcast.net To: arun smartpink...@yahoo.com Cc: R help r-help@r-project.org Sent: Tuesday, June 4, 2013 12:37 AM Subject: Re: [R] Read 2 rows in 1 dataframe for diff - longitudinal data On Jun 3, 2013, at 7:10 PM, arun wrote: Hi, May be this helps: res1-df1[with(df1,unlist(tapply(var,list(subid),FUN=function(x) c(FALSE,diff(x)!=0)),use.names=FALSE)),] res1 # subid year var #3 36 2003 3 #7 47 2001 3 #9 47 2005 1 #10 47 2007 3 #or library(plyr) subset(ddply(df1,.(subid),mutate,delta=c(FALSE,diff(var)!=0)),delta)[,-4] # subid year var #3 36 2003 3 #7 47 2001 3 #9 47 2005 1 #10 47 2007 3 A.K. It's pretty simple with logical indexing: df1[ c(FALSE, df1$var[-1]!=df1$var[-length(df1$var)]), ] subid year var 3 36 2003 3 6 47 1999 1 7 47 2001 3 9 47 2005 1 10 47 2007 3 When I count the number of changes in value of var is give me 5. Not sure why you are both leaving out row 6. -- David. I need to output a dataframe whenever var changes a value. df1 - data.frame(subid=rep(c(36,47),each=5),year=rep(seq(1999,2007,2),2),var=c(1,1,3,3,3,1,3,3,1,3)) subid year var 1 36 1999 1 2 36 2001 1 3 36 2003 3 4 36 2005 3 5 36 2007 3 6 47 1999 1 7 47 2001 3 8 47 2003 3 9 47 2005 1 10 47 2007 3 I need: 36 2003 3 47 2001 3 47 2005 1 47 2007 3 I am trying to use ddply over subid and use the diff function, but it is not working quiet right. dd - ddply(df1,.(subid),summarize,delta=diff(var) != 0) dd subid delta 1 36 FALSE 2 36 TRUE 3 36 FALSE 4 36 FALSE 5 47 TRUE 6 47 FALSE 7 47 TRUE 8 47 TRUE I would appreciate any help on this. Thank You! -ST __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.