Marc Schwartz wrote: > OK, here is one possible solution, though perhaps with a bit more time, > there may be more optimal approaches. > > Using your example data above, but first noting that you do not want to > use: > > df <- data.frame(cbind(subject,year,event.of.interest)) > > Using cbind() first, creates a matrix and causes all columns to be > coerced to a common data type, obviating the benefit of data frames to > be able to handle multiple data types.
Yes, quite right, the cbind() was unnecessary. I'm not making my real data frame that way, however. > So, now on to the solution: > > # First, order the data frame by increasing order of > # subject number and decreasing order for event.of.interest > # This ensures that these columns are properly sorted > # to facilitate the subsequent code. > > df <- df[order(df$subject, -df$event.of.interest), ] > > > So, 'df' will look like: > >> df > subject year event.of.interest > 2 1 1982 TRUE > 3 1 1996 TRUE > 1 1 1980 FALSE > 4 2 1985 FALSE > 5 2 1987 FALSE > 7 3 1991 TRUE > 9 3 1999 TRUE > 6 3 1990 FALSE > 8 3 1992 FALSE > 10 4 1972 TRUE > 11 4 1983 FALSE > > > # Now use the combinations of sapply(), rle(), seq() and unlist() to > # generate per subject sequences. Note that rle() returns: > # > # > rle(df$subject) > # Run Length Encoding > # lengths: int [1:4] 3 2 4 2 > # values : num [1:4] 1 2 3 4 > # > # See ?rle, ?seq, ?sapply and ?unlist > > df$subject.seq <- unlist(sapply(rle(df$subject)$lengths, > function(x) seq(x))) > > > So, 'df' now looks like: > >> df > subject year event.of.interest subject.seq > 2 1 1982 TRUE 1 > 3 1 1996 TRUE 2 > 1 1 1980 FALSE 3 > 4 2 1985 FALSE 1 > 5 2 1987 FALSE 2 > 7 3 1991 TRUE 1 > 9 3 1999 TRUE 2 > 6 3 1990 FALSE 3 > 8 3 1992 FALSE 4 > 10 4 1972 TRUE 1 > 11 4 1983 FALSE 2 > > > # Now set event.seq to all 0's > > df$event.seq <- 0 > > > So, 'df' now looks like: > >> df > subject year event.of.interest subject.seq event.seq > 2 1 1982 TRUE 1 0 > 3 1 1996 TRUE 2 0 > 1 1 1980 FALSE 3 0 > 4 2 1985 FALSE 1 0 > 5 2 1987 FALSE 2 0 > 7 3 1991 TRUE 1 0 > 9 3 1999 TRUE 2 0 > 6 3 1990 FALSE 3 0 > 8 3 1992 FALSE 4 0 > 10 4 1972 TRUE 1 0 > 11 4 1983 FALSE 2 0 > > > # Get the unique subject id's > # See ?unique > > subj.id <- unique(df$subject) > > > # Now get the indices for each subject where event.of.interest > # is TRUE. See ?which > > events <- sapply(subj.id, > function(x) which(df$subject == x & df$event.of.interest)) > > > So, 'events' looks like: > >> events > [[1]] > [1] 1 2 > > [[2]] > integer(0) > > [[3]] > [1] 6 7 > > [[4]] > [1] 10 > > > # Now use sapply() on the above list to create > # individual sequences per list element: > > seq <- sapply(events, function(x) seq(along = x)) > > > So 'seq' looks like: > >> seq > [[1]] > [1] 1 2 > > [[2]] > integer(0) > > [[3]] > [1] 1 2 > > [[4]] > [1] 1 > > > # So, for the final step, assign the event sequence values in 'seq' to > # the row indices in 'events': > > df$event.seq[unlist(events)] <- unlist(seq) > > > So, 'df' now looks like this: > >> df > subject year event.of.interest subject.seq event.seq > 2 1 1982 TRUE 1 1 > 3 1 1996 TRUE 2 2 > 1 1 1980 FALSE 3 0 > 4 2 1985 FALSE 1 0 > 5 2 1987 FALSE 2 0 > 7 3 1991 TRUE 1 1 > 9 3 1999 TRUE 2 2 > 6 3 1990 FALSE 3 0 > 8 3 1992 FALSE 4 0 > 10 4 1972 TRUE 1 1 > 11 4 1983 FALSE 2 0 > > > HTH, > > Marc SChwartz Wow, that's very trick, or tricky. It works but it is a bit slower and more complex than the Holtzman/Nielsen approach. But some interesting ides there which I shall bear in mind. Tim C ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.