Hello R-ers, I just wanted to update this post. I've made some progress on this but am still not quite where I need to be. I feel like I am close so I just wanted to share my work so far.
Thanks in advance! Sam On Mon, Mar 19, 2012 at 1:10 PM, Sam Albers <tonightstheni...@gmail.com> wrote: > Hello all, > > I need to figure out a way to lag a variable in by a number of days > without using the zoo package. I need to use a remote R connection > that doesn't have the zoo package installed and is unwilling to do so. > So that is, I want a function where I can specify the number of days > to lag a variable against a Date formatted column. That is relatively > easy to do. The problem arises when I don't have consecutive dates. I > can't seem to figure out a way to insert an NA when there is > non-consecutive date. So for example: > > > ## A dataframe with non-consecutive dates > set.seed(32) > df1<-data.frame( > Date=seq(as.Date("1967-06-05","%Y-%m-%d"),by="day", length=5), > Dis1=rnorm(5, 1,10) > ) > df2<-data.frame( > Date=seq(as.Date("1967-07-05","%Y-%m-%d"),by="day", length=10), > Dis1=rnorm(5, 1,10) > ) > > df <- rbind(df1,df2); df > > ## A function to lag the variable by a specified number of days > lag.day <- function (lag.by, data) { > c(rep(NA,lag.by), head(data$Dis1, -lag.by)) > } > > ## Using the function > df$lag1 <- lag.day(lag.by=1, data=df); df > ## returns this data frame > > Date Dis1 lag1 > 1 1967-06-05 1.146405 NA > 2 1967-06-06 9.732887 1.146405 > 3 1967-06-07 -9.279462 9.732887 > 4 1967-06-08 7.856646 -9.279462 > 5 1967-06-09 5.494370 7.856646 > 6 1967-06-15 5.070176 5.494370 > 7 1967-06-16 3.847314 5.070176 > 8 1967-06-17 -5.243094 3.847314 > 9 1967-06-18 9.396560 -5.243094 > 10 1967-06-19 4.112792 9.396560 > > > ## When really what I would like is something like this: > > Date Dis1 lag1 > 1 1967-06-05 1.146405 NA > 2 1967-06-06 9.732887 1.146405 > 3 1967-06-07 -9.279462 9.732887 > 4 1967-06-08 7.856646 -9.279462 > 5 1967-06-09 5.494370 7.856646 > 6 1967-06-15 5.070176 NA > 7 1967-06-16 3.847314 5.070176 > 8 1967-06-17 -5.243094 3.847314 > 9 1967-06-18 9.396560 -5.243094 > 10 1967-06-19 4.112792 9.396560 I've now gotten this far but have realized that my approach is flawed because if I increase the lag.by value to anything great than 1, an NA is no longer entered into the correct position. So here is my updated effort: lag.by <- function (data, lag.by) { tmp<-data.frame( ## Difference in days between dates diff=c(diff(data$Date), NA), lag.tmp=c(rep(NA,lag.by), head(data$Dis1, -lag.by)) ) ## Diff calculates difference to next row so all the difference ## values need to be lagged ifelse(c(rep(NA,lag.by), head(tmp$diff, -lag.by))<=1,tmp$lag.tmp,NA) } df$lag <- lag.by(df, lag.by=1) df$lag2 <- lag.by(df, lag.by=2); df Date Dis1 lag lag2 1 1967-06-05 1.146405 NA NA 2 1967-06-06 9.732887 1.146405 NA 3 1967-06-07 -9.279462 9.732887 1.146405 4 1967-06-08 7.856646 -9.279462 9.732887 5 1967-06-09 5.494370 7.856646 -9.279462 6 1967-06-15 5.070176 NA 7.856646 <- Need this to be a NA 7 1967-06-16 3.847314 5.070176 NA 8 1967-06-17 -5.243094 3.847314 5.070176 9 1967-06-18 9.396560 -5.243094 3.847314 10 1967-06-19 4.112792 9.396560 -5.243094 So, I should have NA's in the lag2 column at rows 6 and 7. Any help or thoughts would be much appreciated here. > > So can anyone recommend a way (either using my function or any other > approaches) that I might be able to consistently lag values based on a > lag.by value and consecutive dates? > > Thanks so much in advance! > > Sam ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.