Hi,One way would be: df<- data.frame(case,obsdate=as.Date(obsdate,format="%d/%m/%Y"),score,stringsAsFactors=FALSE) #using as.data.frame(cbind(... should be avoided
df$date_end<-as.Date(unlist(lapply(with(df,tapply(obsdate,case,FUN=function(x) x-1)),function(x) c(x[-1],as.Date("31/03/2011",format="%d/%m/%Y")))),origin="1970-01-01") df1<- df[,c(1,2,4,3)] df1 # case obsdate date_end score #1 a 2001-04-01 2007-05-19 60 #2 a 2007-05-20 2010-10-07 72 #3 a 2010-10-08 2011-03-31 85 #4 b 2001-04-01 2005-11-09 72 #5 b 2005-11-10 2011-03-31 79 #6 c 2001-04-01 2011-03-31 65 A.K. ----- Original Message ----- From: Gavin Rudge <g.ru...@bham.ac.uk> To: "'r-help@r-project.org'" <r-help@r-project.org> Cc: Sent: Thursday, August 15, 2013 9:22 AM Subject: [R] Create new records based on event dates in a data frame One of those simple tasks, but I can't get to first base with it. I've got a data set of observations of subjects over a 10 year period beginning on 1st April 2001 and ending on 31st March 2011. One of may variables is a score based on an intervention on a given date. Before the intervention there is baseline score, on the day of intervention the score changes as a result. If there is no observation the baseline score remains constant over the entire period. Now I have a data.frame with subject ids, the baseline score with a baseline date of 01/04/2001, a baseline score , and the date of any subsequent observations and the scores resulting from them. Here is a rough approximation with just three subjects, one with two interventions one with one, and one with none. My actual data set has about 30,000 observations most of them with one or two interventions. case=c("a","a","a","b","b","c") obsdate<-c("01/04/2001","20/05/2007","08/10/2010","01/04/2001","10/11/2005","01/04/2001") score=c(60,72,85,72,79,65) df<-as.data.frame(cbind(case,obsdate,score)) df$obsdate<-as.Date(df$obsdate,format="%d/%m/%Y") df Now the data set I am trying to obtain for my analysis will consist of exposure periods for each subject, with the start and end date and the score during the period of exposure. So each subject will have at least one exposure period beginning on the start date and a score. In those cases where there has been an intervention (most of them) the next exposure period will start on the day of intervention, and the earlier period will end the day before. If there are no subsequent interventions between the start of one and the end of the study period, 31/03/2011, then the last exposure period is censored at this date. Where there is no intervention at all, (case 'c' is an example) the exposure period is the duration of the study, from 01/04/2001 to 31/03/2011. So for the above example my resulting data frame should look like this: exp_case=c("a","a","a","b","b","c") date_begin=c("01/04/2001","20/05/2007","08/10/2010","01/04/2001","10/11/2005","01/04/2001") date_end=c("19/05/2001","07/10/2010","31/03/2011","09/11/2005","31/03/2011","31/03/2011") exp_score=c(60,72,85,72,79,65) expdata<-as.data.frame(cbind(exp_case,date_begin,date_end,exp_score)) expdata$date_begin<-as.Date(expdata$date_begin,format="%d/%m/%Y") expdata$date_end<-as.Date(expdata$date_end,format="%d/%m/%Y") expdata Sorry about the clunky way I've handled the dates, this is the only way I know how to do this. All assistance gratefully received GavinR ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.