Hi,One way would be:
df<- 
data.frame(case,obsdate=as.Date(obsdate,format="%d/%m/%Y"),score,stringsAsFactors=FALSE)
  #using as.data.frame(cbind(...  should be avoided

df$date_end<-as.Date(unlist(lapply(with(df,tapply(obsdate,case,FUN=function(x) 
x-1)),function(x) 
c(x[-1],as.Date("31/03/2011",format="%d/%m/%Y")))),origin="1970-01-01")
 df1<- df[,c(1,2,4,3)]
 df1
#  case    obsdate   date_end score
#1    a 2001-04-01 2007-05-19    60
#2    a 2007-05-20 2010-10-07    72
#3    a 2010-10-08 2011-03-31    85
#4    b 2001-04-01 2005-11-09    72
#5    b 2005-11-10 2011-03-31    79
#6    c 2001-04-01 2011-03-31    65
A.K.



----- Original Message -----
From: Gavin Rudge <g.ru...@bham.ac.uk>
To: "'r-help@r-project.org'" <r-help@r-project.org>
Cc: 
Sent: Thursday, August 15, 2013 9:22 AM
Subject: [R] Create new records based on event dates in a data frame

One of those simple tasks, but I can't get to first base with it.  I've got a 
data set of observations of subjects over a 10 year period beginning on 1st 
April 2001 and ending on 31st March 2011.  One of may variables is a score 
based on an intervention on a given date. Before the intervention there is 
baseline score, on the day of intervention the score changes as a result.  If 
there is no observation the baseline score remains constant over the entire 
period.

Now I have a data.frame with subject ids, the baseline score with a baseline 
date of 01/04/2001, a baseline score , and the date of any subsequent 
observations and the scores resulting from them.

Here is a rough approximation with just three subjects, one with two 
interventions one with one, and one with none.  My actual data set has about 
30,000 observations most of them with one or two interventions.

case=c("a","a","a","b","b","c")
obsdate<-c("01/04/2001","20/05/2007","08/10/2010","01/04/2001","10/11/2005","01/04/2001")
score=c(60,72,85,72,79,65)
df<-as.data.frame(cbind(case,obsdate,score))
df$obsdate<-as.Date(df$obsdate,format="%d/%m/%Y")
df

Now the data set I am trying to obtain for my analysis will consist of exposure 
periods for each subject, with the start and end date and the score during the 
period of exposure. So each subject will have at least one exposure period 
beginning on the start date and a score.  In those cases where there has been 
an intervention (most of them) the next exposure period will start on the day 
of intervention, and the earlier period will end the day before. If there are 
no subsequent interventions between the start of one and the end of the study 
period, 31/03/2011, then the last exposure period is censored at this date. 
Where there is no intervention at all, (case 'c' is an example) the exposure 
period is the duration of the study, from 01/04/2001 to 31/03/2011.

So for the above example my resulting data frame should look like this:

exp_case=c("a","a","a","b","b","c")
date_begin=c("01/04/2001","20/05/2007","08/10/2010","01/04/2001","10/11/2005","01/04/2001")
date_end=c("19/05/2001","07/10/2010","31/03/2011","09/11/2005","31/03/2011","31/03/2011")
exp_score=c(60,72,85,72,79,65)
expdata<-as.data.frame(cbind(exp_case,date_begin,date_end,exp_score))
expdata$date_begin<-as.Date(expdata$date_begin,format="%d/%m/%Y")
expdata$date_end<-as.Date(expdata$date_end,format="%d/%m/%Y")
expdata

Sorry about the clunky way I've handled the dates, this is the only way I know 
how to do this. 

All assistance gratefully received

GavinR 

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to