I don't know if this is any faster, but it has no loop. There are improvements that can be made if speed is too slow. Try it on your data:
> x <- data.frame(id=c("001","001","001","001","002","002","002","002","002"), + year=c(2000,2001,2002,2003,1996,1997,1998,1999,2000), + variable=c(0,0,1,0,0,0,1,0,0)) > # will assume that the year is contiguous; exercise to reader if not > # partition by 'id', find where 'variable' is 1 and set next five year. > x.new <- lapply(split(x, x$id), function(person){ + change <- which(person$variable == 1) + mark <- unique(unlist(lapply(change, seq, length=5))) + # make sure less than length + mark <- mark[mark <= nrow(person)] + person$v2 <- 0 # initialize to zero + person$v2[mark] <- 1 # set to 1 on changes + 5 years + person # return new data + }) > do.call('rbind', x.new) id year variable v2 001.1 001 2000 0 0 001.2 001 2001 0 0 001.3 001 2002 1 1 001.4 001 2003 0 1 002.5 002 1996 0 0 002.6 002 1997 0 0 002.7 002 1998 1 1 002.8 002 1999 0 1 002.9 002 2000 0 1 On 10/16/07, Julien Barnier <[EMAIL PROTECTED]> wrote: > Hi all, > > I currently work on a survey which contains biographical data stored > in a chronological way, ie something like : > > id year variable > 001 2000 0 > 001 2001 0 > 001 2002 1 > 001 2003 0 > 002 1996 0 > 002 1997 0 > 002 1998 1 > 002 1999 0 > 002 2000 0 > > where id is a person identifier, year the year of observation and > variable the variable value at given year. In this case, the variable > says if a particular event happened during the given year or not. > > What I want to do is generate a new variable which would say if the > event happened at least one time during the five years preceding the > current one. So if I call this new variable v2, I'd like to obtain : > > id year variable v2 > 001 2000 0 0 > 001 2001 0 0 > 001 2002 1 1 > 001 2003 0 1 > 002 1996 0 0 > 002 1997 0 0 > 002 1998 1 1 > 002 1999 0 1 > 002 2000 0 1 > > Currently I manage to achieve this with two nested for loops, but it > is *very* slow and inefficient. So I wondered if there is a better way > to do this. > > Thanks in advance for any help. > > PS : here is the code to reproduce the first sample data : > > data.frame(id=c("001","001","001","001","002","002","002","002","002"), > year=c(2000,2001,2002,2003,1996,1997,1998,1999,2000), > variable=c(0,0,1,0,0,0,1,0,0)) > > -- > Julien Barnier > Groupe de recherche sur la socialisation > ENS-LSH - Lyon, France > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.