Re: [R] Read 2 rows in 1 dataframe for diff - longitudinal data

2013-06-04 Thread David Winsemius

On Jun 3, 2013, at 9:51 PM, arun wrote:

 If it is grouped by subid (that would be the difference in the number of 
 changes)
 
 subset(ddply(df1,.(subid),mutate,delta=c(FALSE,var[-1]!=var[-length(var)])),delta)[,-4]
 #   subid year var
 #3 36 2003   3
 #7 47 2001   3
 #9 47 2005   1
 #1047 2007   3
 A.K.
 

Ah. I see. Then this looks simpler to my eyes:

df1[ ave( df1$var, df1$subid, FUN=function(x) c( FALSE, x[-1] != x[-length(x)]) 
) , ]

-- 
David.


 
 - Original Message -
 From: David Winsemius dwinsem...@comcast.net
 To: arun smartpink...@yahoo.com
 Cc: R help r-help@r-project.org
 Sent: Tuesday, June 4, 2013 12:37 AM
 Subject: Re: [R] Read 2 rows in 1 dataframe for diff - longitudinal data
 
 
 On Jun 3, 2013, at 7:10 PM, arun wrote:
 
 Hi,
 May be this helps:
 res1-df1[with(df1,unlist(tapply(var,list(subid),FUN=function(x) 
 c(FALSE,diff(x)!=0)),use.names=FALSE)),]
   res1
 #   subid year var
 #3 36 2003   3
 #7 47 2001   3
 #9 47 2005   1
 #1047 2007   3
 #or
 library(plyr)
   subset(ddply(df1,.(subid),mutate,delta=c(FALSE,diff(var)!=0)),delta)[,-4]
 #   subid year var
 #3 36 2003   3
 #7 47 2001   3
 #9 47 2005   1
 #1047 2007   3
 A.K.
 
 It's pretty simple with logical indexing:
 
 df1[ c(FALSE, df1$var[-1]!=df1$var[-length(df1$var)]), ]
subid year var
 3 36 2003   3
 6 47 1999   1
 7 47 2001   3
 9 47 2005   1
 1047 2007   3
 
 
 When I count the number of changes in value of var is give me 5. Not sure why 
 you are both leaving out row 6.
 
 -- 
 David.
 
 
 I need to output a dataframe whenever var changes a value. 
 
 df1 - 
 data.frame(subid=rep(c(36,47),each=5),year=rep(seq(1999,2007,2),2),var=c(1,1,3,3,3,1,3,3,1,3))
  
 subid year var 
 1 36 1999   1 
 2 36 2001   1 
 3 36 2003   3 
 4 36 2005   3 
 5 36 2007   3 
 6 47 1999   1 
 7 47 2001   3 
 8 47 2003   3 
 9 47 2005   1 
 1047 2007   3 
 
 
 I need: 
 36 2003   3 
 47 2001   3 
 47 2005   1 
 47 2007   3 
 
 I am trying to use ddply over subid and use the diff function, but it is not 
 working quiet right. 
 
 dd - ddply(df1,.(subid),summarize,delta=diff(var) != 0) 
 dd 
subid delta 
 136 FALSE 
 236  TRUE 
 336 FALSE 
 436 FALSE 
 547  TRUE 
 647 FALSE 
 747  TRUE 
 847  TRUE 
 
 I would appreciate any help on this. 
 Thank You! 
 -ST
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 David Winsemius
 Alameda, CA, USA
 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Read 2 rows in 1 dataframe for diff - longitudinal data

2013-06-04 Thread David Winsemius

On Jun 3, 2013, at 9:51 PM, arun wrote:

 If it is grouped by subid (that would be the difference in the number of 
 changes)
 
 subset(ddply(df1,.(subid),mutate,delta=c(FALSE,var[-1]!=var[-length(var)])),delta)[,-4]
 #   subid year var
 #3 36 2003   3
 #7 47 2001   3
 #9 47 2005   1
 #1047 2007   3
 A.K.

I'm not sure why the first one retruns integer values from the ave() call but 
the second version works:

 df1[ ave( df1$var, df1$subid, FUN=function(x) c( FALSE, x[-1] != 
 x[-length(x)]) ), ]
subid year var
1  36 1999   1
1.136 1999   1
1.236 1999   1
1.336 1999   1

ave( df1$var, df1$subid, FUN=function(x) c( FALSE, x[-1] != x[-length(x)]))
 [1] 0 0 1 0 0 0 1 0 1 1

Perhaps one of the single item groups sabotaged my simple function.


 df1[ as.logical( ave( df1$var, df1$subid, FUN=function(x) c( FALSE, x[-1] != 
 x[-length(x)]) ) ), ]
   subid year var
3 36 2003   3
7 47 2001   3
9 47 2005   1
1047 2007   3

-- 
David.
 
 
 - Original Message -
 From: David Winsemius dwinsem...@comcast.net
 To: arun smartpink...@yahoo.com
 Cc: R help r-help@r-project.org
 Sent: Tuesday, June 4, 2013 12:37 AM
 Subject: Re: [R] Read 2 rows in 1 dataframe for diff - longitudinal data
 
 
 On Jun 3, 2013, at 7:10 PM, arun wrote:
 
 Hi,
 May be this helps:
 res1-df1[with(df1,unlist(tapply(var,list(subid),FUN=function(x) 
 c(FALSE,diff(x)!=0)),use.names=FALSE)),]
   res1
 #   subid year var
 #3 36 2003   3
 #7 47 2001   3
 #9 47 2005   1
 #1047 2007   3
 #or
 library(plyr)
   subset(ddply(df1,.(subid),mutate,delta=c(FALSE,diff(var)!=0)),delta)[,-4]
 #   subid year var
 #3 36 2003   3
 #7 47 2001   3
 #9 47 2005   1
 #1047 2007   3
 A.K.
 
 It's pretty simple with logical indexing:
 
 df1[ c(FALSE, df1$var[-1]!=df1$var[-length(df1$var)]), ]
subid year var
 3 36 2003   3
 6 47 1999   1
 7 47 2001   3
 9 47 2005   1
 1047 2007   3
 
 
 When I count the number of changes in value of var is give me 5. Not sure why 
 you are both leaving out row 6.
 
 -- 
 David.
 
 
 I need to output a dataframe whenever var changes a value. 
 
 df1 - 
 data.frame(subid=rep(c(36,47),each=5),year=rep(seq(1999,2007,2),2),var=c(1,1,3,3,3,1,3,3,1,3))
  
 subid year var 
 1 36 1999   1 
 2 36 2001   1 
 3 36 2003   3 
 4 36 2005   3 
 5 36 2007   3 
 6 47 1999   1 
 7 47 2001   3 
 8 47 2003   3 
 9 47 2005   1 
 1047 2007   3 
 
 
 I need: 
 36 2003   3 
 47 2001   3 
 47 2005   1 
 47 2007   3 
 
 I am trying to use ddply over subid and use the diff function, but it is not 
 working quiet right. 
 
 dd - ddply(df1,.(subid),summarize,delta=diff(var) != 0) 
 dd 
subid delta 
 136 FALSE 
 236  TRUE 
 336 FALSE 
 436 FALSE 
 547  TRUE 
 647 FALSE 
 747  TRUE 
 847  TRUE 
 
 I would appreciate any help on this. 
 Thank You! 
 -ST
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 David Winsemius
 Alameda, CA, USA
 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Read 2 rows in 1 dataframe for diff - longitudinal data

2013-06-04 Thread arun


Hi,

By comparing some of the solutions:
 set.seed(25)
 subid- sample(30:50,22e5,replace=TRUE)
set.seed(27)
year- sample(1990:2012,22e5,replace=TRUE)
set.seed(35)
 var1- sample(c(1,3,5,7),22e5,replace=TRUE)
df2- data.frame(subid,year,var1)
df2- df2[order(df2$subid,df2$year),]
system.time(res-subset(ddply(df2,.(subid),mutate,delta=c(FALSE,var1[-1]!=var1[-length(var1)])),delta)[,-4])
 
#  user  system elapsed 
 # 8.036   0.132   8.188 

system.time(res2-df2[ as.logical( ave( df2$var1, df2$subid, FUN=function(x) c( 
FALSE, x[-1] != x[-length(x)]) ) ), ])
#  user  system elapsed 
 # 1.220   0.000   1.222 
system.time(res3-df2[with(df2,unlist(tapply(var1,list(subid),FUN=function(x) 
c(FALSE,diff(x)!=0)),use.names=FALSE)),])
#  user  system elapsed 
 # 1.729   0.000   1.730 
identical(res2,res3)
#[1] TRUE

row.names(res)-1:nrow(res)
 row.names(res2)-1:nrow(res)
 identical(res,res2)
#[1] TRUE

I found half an hour a bit too extreme by comparing the above numbers.


A.K.


David: 

6     47 1999   1 

should not be included in the output list because, we are trying
 to detect changes within the subid's.  1999 was the first year for 
subject 47 and changes have to be detected after that year - hence we 
were using ddply to group. Your solution ran very fast as expected. 

AK- I have a large dataset and your solution is taking too long -
 as a matter of fact i had to kill it afte 1/2 hr on a 22K row dataset. 

Thanks for the suggestions. 

-ST 


- Original Message -
From: David Winsemius dwinsem...@comcast.net
To: arun smartpink...@yahoo.com
Cc: R help r-help@r-project.org
Sent: Tuesday, June 4, 2013 11:13 AM
Subject: Re: [R] Read 2 rows in 1 dataframe for diff - longitudinal data


On Jun 3, 2013, at 9:51 PM, arun wrote:

 If it is grouped by subid (that would be the difference in the number of 
 changes)
 
 subset(ddply(df1,.(subid),mutate,delta=c(FALSE,var[-1]!=var[-length(var)])),delta)[,-4]
 #   subid year var
 #3     36 2003   3
 #7     47 2001   3
 #9     47 2005   1
 #10    47 2007   3
 A.K.

I'm not sure why the first one retruns integer values from the ave() call but 
the second version works:

 df1[ ave( df1$var, df1$subid, FUN=function(x) c( FALSE, x[-1] != 
 x[-length(x)]) ), ]
    subid year var
1      36 1999   1
1.1    36 1999   1
1.2    36 1999   1
1.3    36 1999   1

ave( df1$var, df1$subid, FUN=function(x) c( FALSE, x[-1] != x[-length(x)]))
[1] 0 0 1 0 0 0 1 0 1 1

Perhaps one of the single item groups sabotaged my simple function.


 df1[ as.logical( ave( df1$var, df1$subid, FUN=function(x) c( FALSE, x[-1] != 
 x[-length(x)]) ) ), ]
   subid year var
3     36 2003   3
7     47 2001   3
9     47 2005   1
10    47 2007   3

-- 
David.
 
 
 - Original Message -
 From: David Winsemius dwinsem...@comcast.net
 To: arun smartpink...@yahoo.com
 Cc: R help r-help@r-project.org
 Sent: Tuesday, June 4, 2013 12:37 AM
 Subject: Re: [R] Read 2 rows in 1 dataframe for diff - longitudinal data
 
 
 On Jun 3, 2013, at 7:10 PM, arun wrote:
 
 Hi,
 May be this helps:
 res1-df1[with(df1,unlist(tapply(var,list(subid),FUN=function(x) 
 c(FALSE,diff(x)!=0)),use.names=FALSE)),]
   res1
 #   subid year var
 #3     36 2003   3
 #7     47 2001   3
 #9     47 2005   1
 #10    47 2007   3
 #or
 library(plyr)
   subset(ddply(df1,.(subid),mutate,delta=c(FALSE,diff(var)!=0)),delta)[,-4]
 #   subid year var
 #3     36 2003   3
 #7     47 2001   3
 #9     47 2005   1
 #10    47 2007   3
 A.K.
 
 It's pretty simple with logical indexing:
 
 df1[ c(FALSE, df1$var[-1]!=df1$var[-length(df1$var)]), ]
    subid year var
 3     36 2003   3
 6     47 1999   1
 7     47 2001   3
 9     47 2005   1
 10    47 2007   3
 
 
 When I count the number of changes in value of var is give me 5. Not sure why 
 you are both leaving out row 6.
 
 -- 
 David.
 
 
 I need to output a dataframe whenever var changes a value. 
 
 df1 - 
 data.frame(subid=rep(c(36,47),each=5),year=rep(seq(1999,2007,2),2),var=c(1,1,3,3,3,1,3,3,1,3))
  
     subid year var 
 1     36 1999   1 
 2     36 2001   1 
 3     36 2003   3 
 4     36 2005   3 
 5     36 2007   3 
 6     47 1999   1 
 7     47 2001   3 
 8     47 2003   3 
 9     47 2005   1 
 10    47 2007   3 
 
 
 I need: 
 36 2003   3 
 47 2001   3 
 47 2005   1 
 47 2007   3 
 
 I am trying to use ddply over subid and use the diff function, but it is not 
 working quiet right. 
 
 dd - ddply(df1,.(subid),summarize,delta=diff(var) != 0) 
 dd 
    subid delta 
 1    36 FALSE 
 2    36  TRUE 
 3    36 FALSE 
 4    36 FALSE 
 5    47  TRUE 
 6    47 FALSE 
 7    47  TRUE 
 8    47  TRUE 
 
 I would appreciate any help on this. 
 Thank You! 
 -ST
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 David Winsemius
 Alameda, CA, USA
 

David Winsemius
Alameda, CA, USA

Re: [R] Read 2 rows in 1 dataframe for diff - longitudinal data

2013-06-04 Thread arun
HI ST,

In case, you wanted to further decrease the time:
library(data.table)
dt1- data.table(df2) #using the same example as below
system.time({
 dt1-dt1[,indx:=c(FALSE,diff(var1)!=0),by=subid]
res3-subset(dt1,indx,select=1:3)
})
# user  system elapsed 
#   0.32    0.00    0.32 
 head(res3)
#   subid year var1
#1:    30 1990    7
#2:    30 1990    1
#3:    30 1990    5
#4:    30 1990    7
#5:    30 1990    5
#6:    30 1990    7
 head(res2)
#  subid year var1
#1    30 1990    7
#2    30 1990    1
#3    30 1990    5
#4    30 1990    7
#5    30 1990    5
#6    30 1990    7


Since you mentioned this  half-hour running time, it would be good to check 
your data.  

?str()
A.K.



- Original Message -
From: arun smartpink...@yahoo.com
To: R help r-help@r-project.org
Cc: David Winsemius dwinsem...@comcast.net
Sent: Tuesday, June 4, 2013 1:18 PM
Subject: Re: [R] Read 2 rows in 1 dataframe for diff - longitudinal data



Hi,

By comparing some of the solutions:
 set.seed(25)
 subid- sample(30:50,22e5,replace=TRUE)
set.seed(27)
year- sample(1990:2012,22e5,replace=TRUE)
set.seed(35)
 var1- sample(c(1,3,5,7),22e5,replace=TRUE)
df2- data.frame(subid,year,var1)
df2- df2[order(df2$subid,df2$year),]
system.time(res-subset(ddply(df2,.(subid),mutate,delta=c(FALSE,var1[-1]!=var1[-length(var1)])),delta)[,-4])
 
#  user  system elapsed 
 # 8.036   0.132   8.188 

system.time(res2-df2[ as.logical( ave( df2$var1, df2$subid, FUN=function(x) c( 
FALSE, x[-1] != x[-length(x)]) ) ), ])
#  user  system elapsed 
 # 1.220   0.000   1.222 
system.time(res3-df2[with(df2,unlist(tapply(var1,list(subid),FUN=function(x) 
c(FALSE,diff(x)!=0)),use.names=FALSE)),])
#  user  system elapsed 
 # 1.729   0.000   1.730 
identical(res2,res3)
#[1] TRUE

row.names(res)-1:nrow(res)
 row.names(res2)-1:nrow(res)
 identical(res,res2)
#[1] TRUE

I found half an hour a bit too extreme by comparing the above numbers.


A.K.


David: 

6     47 1999   1 

should not be included in the output list because, we are trying
to detect changes within the subid's.  1999 was the first year for 
subject 47 and changes have to be detected after that year - hence we 
were using ddply to group. Your solution ran very fast as expected. 

AK- I have a large dataset and your solution is taking too long -
as a matter of fact i had to kill it afte 1/2 hr on a 22K row dataset. 

Thanks for the suggestions. 

-ST 


- Original Message -
From: David Winsemius dwinsem...@comcast.net
To: arun smartpink...@yahoo.com
Cc: R help r-help@r-project.org
Sent: Tuesday, June 4, 2013 11:13 AM
Subject: Re: [R] Read 2 rows in 1 dataframe for diff - longitudinal data


On Jun 3, 2013, at 9:51 PM, arun wrote:

 If it is grouped by subid (that would be the difference in the number of 
 changes)
 
 subset(ddply(df1,.(subid),mutate,delta=c(FALSE,var[-1]!=var[-length(var)])),delta)[,-4]
 #   subid year var
 #3     36 2003   3
 #7     47 2001   3
 #9     47 2005   1
 #10    47 2007   3
 A.K.

I'm not sure why the first one retruns integer values from the ave() call but 
the second version works:

 df1[ ave( df1$var, df1$subid, FUN=function(x) c( FALSE, x[-1] != 
 x[-length(x)]) ), ]
    subid year var
1      36 1999   1
1.1    36 1999   1
1.2    36 1999   1
1.3    36 1999   1

ave( df1$var, df1$subid, FUN=function(x) c( FALSE, x[-1] != x[-length(x)]))
[1] 0 0 1 0 0 0 1 0 1 1

Perhaps one of the single item groups sabotaged my simple function.


 df1[ as.logical( ave( df1$var, df1$subid, FUN=function(x) c( FALSE, x[-1] != 
 x[-length(x)]) ) ), ]
   subid year var
3     36 2003   3
7     47 2001   3
9     47 2005   1
10    47 2007   3

-- 
David.
 
 
 - Original Message -
 From: David Winsemius dwinsem...@comcast.net
 To: arun smartpink...@yahoo.com
 Cc: R help r-help@r-project.org
 Sent: Tuesday, June 4, 2013 12:37 AM
 Subject: Re: [R] Read 2 rows in 1 dataframe for diff - longitudinal data
 
 
 On Jun 3, 2013, at 7:10 PM, arun wrote:
 
 Hi,
 May be this helps:
 res1-df1[with(df1,unlist(tapply(var,list(subid),FUN=function(x) 
 c(FALSE,diff(x)!=0)),use.names=FALSE)),]
   res1
 #   subid year var
 #3     36 2003   3
 #7     47 2001   3
 #9     47 2005   1
 #10    47 2007   3
 #or
 library(plyr)
   subset(ddply(df1,.(subid),mutate,delta=c(FALSE,diff(var)!=0)),delta)[,-4]
 #   subid year var
 #3     36 2003   3
 #7     47 2001   3
 #9     47 2005   1
 #10    47 2007   3
 A.K.
 
 It's pretty simple with logical indexing:
 
 df1[ c(FALSE, df1$var[-1]!=df1$var[-length(df1$var)]), ]
    subid year var
 3     36 2003   3
 6     47 1999   1
 7     47 2001   3
 9     47 2005   1
 10    47 2007   3
 
 
 When I count the number of changes in value of var is give me 5. Not sure why 
 you are both leaving out row 6.
 
 -- 
 David.
 
 
 I need to output a dataframe whenever var changes a value. 
 
 df1 - 
 data.frame(subid=rep(c(36,47),each=5),year=rep(seq(1999,2007,2),2),var=c(1,1,3,3,3,1,3,3,1,3))
  
     subid year var 
 1     36 1999   1 
 2     36 2001   1 
 3     36 2003   3 
 4     36

Re: [R] Read 2 rows in 1 dataframe for diff - longitudinal data

2013-06-04 Thread William Dunlap
Since you have sorted the data.frame by 'subid', breaking ties with 'year',
doesn't the following do the same thing as the other solutions.
  f4 - function(df) df[ c(TRUE,diff(df$var1)!=0)  c(FALSE,diff(df$subid)==0), 
]
It gives the same answer for your df2 and is quicker than the others.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf
 Of arun
 Sent: Tuesday, June 04, 2013 10:19 AM
 To: R help
 Subject: Re: [R] Read 2 rows in 1 dataframe for diff - longitudinal data
 
 
 
 Hi,
 
 By comparing some of the solutions:
  set.seed(25)
  subid- sample(30:50,22e5,replace=TRUE)
 set.seed(27)
 year- sample(1990:2012,22e5,replace=TRUE)
 set.seed(35)
  var1- sample(c(1,3,5,7),22e5,replace=TRUE)
 df2- data.frame(subid,year,var1)
 df2- df2[order(df2$subid,df2$year),]
 system.time(res-subset(ddply(df2,.(subid),mutate,delta=c(FALSE,var1[-1]!=var1[-
 length(var1)])),delta)[,-4])
 #  user  system elapsed
  # 8.036   0.132   8.188
 
 system.time(res2-df2[ as.logical( ave( df2$var1, df2$subid, FUN=function(x) 
 c( FALSE,
 x[-1] != x[-length(x)]) ) ), ])
 #  user  system elapsed
  # 1.220   0.000   1.222
 system.time(res3-df2[with(df2,unlist(tapply(var1,list(subid),FUN=function(x)
 c(FALSE,diff(x)!=0)),use.names=FALSE)),])
 #  user  system elapsed
  # 1.729   0.000   1.730
 identical(res2,res3)
 #[1] TRUE
 
 row.names(res)-1:nrow(res)
  row.names(res2)-1:nrow(res)
  identical(res,res2)
 #[1] TRUE
 
 I found half an hour a bit too extreme by comparing the above numbers.
 
 
 A.K.
 
 
 David:
 
 6     47 1999   1
 
 should not be included in the output list because, we are trying
  to detect changes within the subid's.  1999 was the first year for
 subject 47 and changes have to be detected after that year - hence we
 were using ddply to group. Your solution ran very fast as expected.
 
 AK- I have a large dataset and your solution is taking too long -
  as a matter of fact i had to kill it afte 1/2 hr on a 22K row dataset.
 
 Thanks for the suggestions.
 
 -ST
 
 
 - Original Message -
 From: David Winsemius dwinsem...@comcast.net
 To: arun smartpink...@yahoo.com
 Cc: R help r-help@r-project.org
 Sent: Tuesday, June 4, 2013 11:13 AM
 Subject: Re: [R] Read 2 rows in 1 dataframe for diff - longitudinal data
 
 
 On Jun 3, 2013, at 9:51 PM, arun wrote:
 
  If it is grouped by subid (that would be the difference in the number of 
  changes)
 
  subset(ddply(df1,.(subid),mutate,delta=c(FALSE,var[-1]!=var[-length(var)])),delta)[,-4]
  #   subid year var
  #3     36 2003   3
  #7     47 2001   3
  #9     47 2005   1
  #10    47 2007   3
  A.K.
 
 I'm not sure why the first one retruns integer values from the ave() call but 
 the second
 version works:
 
  df1[ ave( df1$var, df1$subid, FUN=function(x) c( FALSE, x[-1] != 
  x[-length(x)]) ), ]
     subid year var
 1      36 1999   1
 1.1    36 1999   1
 1.2    36 1999   1
 1.3    36 1999   1
 
 ave( df1$var, df1$subid, FUN=function(x) c( FALSE, x[-1] != x[-length(x)]))
 [1] 0 0 1 0 0 0 1 0 1 1
 
 Perhaps one of the single item groups sabotaged my simple function.
 
 
  df1[ as.logical( ave( df1$var, df1$subid, FUN=function(x) c( FALSE, x[-1] 
  != x[-length(x)])
 ) ), ]
    subid year var
 3     36 2003   3
 7     47 2001   3
 9     47 2005   1
 10    47 2007   3
 
 --
 David.
 
 
  - Original Message -
  From: David Winsemius dwinsem...@comcast.net
  To: arun smartpink...@yahoo.com
  Cc: R help r-help@r-project.org
  Sent: Tuesday, June 4, 2013 12:37 AM
  Subject: Re: [R] Read 2 rows in 1 dataframe for diff - longitudinal data
 
 
  On Jun 3, 2013, at 7:10 PM, arun wrote:
 
  Hi,
  May be this helps:
  res1-df1[with(df1,unlist(tapply(var,list(subid),FUN=function(x)
 c(FALSE,diff(x)!=0)),use.names=FALSE)),]
    res1
  #   subid year var
  #3     36 2003   3
  #7     47 2001   3
  #9     47 2005   1
  #10    47 2007   3
  #or
  library(plyr)
    subset(ddply(df1,.(subid),mutate,delta=c(FALSE,diff(var)!=0)),delta)[,-4]
  #   subid year var
  #3     36 2003   3
  #7     47 2001   3
  #9     47 2005   1
  #10    47 2007   3
  A.K.
 
  It's pretty simple with logical indexing:
 
  df1[ c(FALSE, df1$var[-1]!=df1$var[-length(df1$var)]), ]
     subid year var
  3     36 2003   3
  6     47 1999   1
  7     47 2001   3
  9     47 2005   1
  10    47 2007   3
 
 
  When I count the number of changes in value of var is give me 5. Not sure 
  why you are
 both leaving out row 6.
 
  --
  David.
 
 
  I need to output a dataframe whenever var changes a value.
 
  df1 -
 data.frame(subid=rep(c(36,47),each=5),year=rep(seq(1999,2007,2),2),var=c(1,1,3,3,3,1,3
 ,3,1,3))
      subid year var
  1     36 1999   1
  2     36 2001   1
  3     36 2003   3
  4     36 2005   3
  5     36 2007   3
  6     47 1999   1
  7     47 2001   3
  8     47 2003   3
  9     47 2005   1
  10    47 2007   3
 
 
  I need:
  36 2003   3
  47 2001   3
  47 2005   1
  47 2007   3
 
  I

Re: [R] Read 2 rows in 1 dataframe for diff - longitudinal data

2013-06-03 Thread arun
Hi,
May be this helps:
res1-df1[with(df1,unlist(tapply(var,list(subid),FUN=function(x) 
c(FALSE,diff(x)!=0)),use.names=FALSE)),]
 res1
#   subid year var
#3 36 2003   3
#7 47 2001   3
#9 47 2005   1
#10    47 2007   3
#or
library(plyr)
 subset(ddply(df1,.(subid),mutate,delta=c(FALSE,diff(var)!=0)),delta)[,-4]
#   subid year var
#3 36 2003   3
#7 47 2001   3
#9 47 2005   1
#10    47 2007   3
A.K.



I need to output a dataframe whenever var changes a value. 

df1 - 
data.frame(subid=rep(c(36,47),each=5),year=rep(seq(1999,2007,2),2),var=c(1,1,3,3,3,1,3,3,1,3))
 
   subid year var 
1     36 1999   1 
2     36 2001   1 
3     36 2003   3 
4     36 2005   3 
5     36 2007   3 
6     47 1999   1 
7     47 2001   3 
8     47 2003   3 
9     47 2005   1 
10    47 2007   3 
 

I need: 
36 2003   3 
47 2001   3 
47 2005   1 
47 2007   3 

I am trying to use ddply over subid and use the diff function, but it is not 
working quiet right. 

 dd - ddply(df1,.(subid),summarize,delta=diff(var) != 0) 
 dd 
  subid delta 
1    36 FALSE 
2    36  TRUE 
3    36 FALSE 
4    36 FALSE 
5    47  TRUE 
6    47 FALSE 
7    47  TRUE 
8    47  TRUE 

I would appreciate any help on this. 
Thank You! 
-ST

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Read 2 rows in 1 dataframe for diff - longitudinal data

2013-06-03 Thread David Winsemius

On Jun 3, 2013, at 7:10 PM, arun wrote:

 Hi,
 May be this helps:
 res1-df1[with(df1,unlist(tapply(var,list(subid),FUN=function(x) 
 c(FALSE,diff(x)!=0)),use.names=FALSE)),]
  res1
 #   subid year var
 #3 36 2003   3
 #7 47 2001   3
 #9 47 2005   1
 #1047 2007   3
 #or
 library(plyr)
  subset(ddply(df1,.(subid),mutate,delta=c(FALSE,diff(var)!=0)),delta)[,-4]
 #   subid year var
 #3 36 2003   3
 #7 47 2001   3
 #9 47 2005   1
 #1047 2007   3
 A.K.
 
It's pretty simple with logical indexing:

 df1[ c(FALSE, df1$var[-1]!=df1$var[-length(df1$var)]), ]
   subid year var
3 36 2003   3
6 47 1999   1
7 47 2001   3
9 47 2005   1
1047 2007   3


When I count the number of changes in value of var is give me 5. Not sure why 
you are both leaving out row 6.

-- 
David.
 
 
 I need to output a dataframe whenever var changes a value. 
 
 df1 - 
 data.frame(subid=rep(c(36,47),each=5),year=rep(seq(1999,2007,2),2),var=c(1,1,3,3,3,1,3,3,1,3))
  
subid year var 
 1 36 1999   1 
 2 36 2001   1 
 3 36 2003   3 
 4 36 2005   3 
 5 36 2007   3 
 6 47 1999   1 
 7 47 2001   3 
 8 47 2003   3 
 9 47 2005   1 
 1047 2007   3 
 
 
 I need: 
 36 2003   3 
 47 2001   3 
 47 2005   1 
 47 2007   3 
 
 I am trying to use ddply over subid and use the diff function, but it is not 
 working quiet right. 
 
 dd - ddply(df1,.(subid),summarize,delta=diff(var) != 0) 
 dd 
   subid delta 
 136 FALSE 
 236  TRUE 
 336 FALSE 
 436 FALSE 
 547  TRUE 
 647 FALSE 
 747  TRUE 
 847  TRUE 
 
 I would appreciate any help on this. 
 Thank You! 
 -ST
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Read 2 rows in 1 dataframe for diff - longitudinal data

2013-06-03 Thread arun
If it is grouped by subid (that would be the difference in the number of 
changes)

subset(ddply(df1,.(subid),mutate,delta=c(FALSE,var[-1]!=var[-length(var)])),delta)[,-4]
#   subid year var
#3 36 2003   3
#7 47 2001   3
#9 47 2005   1
#10    47 2007   3
A.K.


- Original Message -
From: David Winsemius dwinsem...@comcast.net
To: arun smartpink...@yahoo.com
Cc: R help r-help@r-project.org
Sent: Tuesday, June 4, 2013 12:37 AM
Subject: Re: [R] Read 2 rows in 1 dataframe for diff - longitudinal data


On Jun 3, 2013, at 7:10 PM, arun wrote:

 Hi,
 May be this helps:
 res1-df1[with(df1,unlist(tapply(var,list(subid),FUN=function(x) 
 c(FALSE,diff(x)!=0)),use.names=FALSE)),]
  res1
 #   subid year var
 #3     36 2003   3
 #7     47 2001   3
 #9     47 2005   1
 #10    47 2007   3
 #or
 library(plyr)
  subset(ddply(df1,.(subid),mutate,delta=c(FALSE,diff(var)!=0)),delta)[,-4]
 #   subid year var
 #3     36 2003   3
 #7     47 2001   3
 #9     47 2005   1
 #10    47 2007   3
 A.K.
 
It's pretty simple with logical indexing:

 df1[ c(FALSE, df1$var[-1]!=df1$var[-length(df1$var)]), ]
   subid year var
3     36 2003   3
6     47 1999   1
7     47 2001   3
9     47 2005   1
10    47 2007   3


When I count the number of changes in value of var is give me 5. Not sure why 
you are both leaving out row 6.

-- 
David.
 
 
 I need to output a dataframe whenever var changes a value. 
 
 df1 - 
 data.frame(subid=rep(c(36,47),each=5),year=rep(seq(1999,2007,2),2),var=c(1,1,3,3,3,1,3,3,1,3))
  
    subid year var 
 1     36 1999   1 
 2     36 2001   1 
 3     36 2003   3 
 4     36 2005   3 
 5     36 2007   3 
 6     47 1999   1 
 7     47 2001   3 
 8     47 2003   3 
 9     47 2005   1 
 10    47 2007   3 
 
 
 I need: 
 36 2003   3 
 47 2001   3 
 47 2005   1 
 47 2007   3 
 
 I am trying to use ddply over subid and use the diff function, but it is not 
 working quiet right. 
 
 dd - ddply(df1,.(subid),summarize,delta=diff(var) != 0) 
 dd 
   subid delta 
 1    36 FALSE 
 2    36  TRUE 
 3    36 FALSE 
 4    36 FALSE 
 5    47  TRUE 
 6    47 FALSE 
 7    47  TRUE 
 8    47  TRUE 
 
 I would appreciate any help on this. 
 Thank You! 
 -ST
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.