gallon li wrote: > Suppose I have a long format for a longitudinal data > > id time x > 1 1 10 > 1 2 11 > 1 3 23 > 1 4 23 > 2 2 12 > 2 3 13 > 2 4 14 > 3 1 11 > 3 3 15 > 3 4 18 > 3 5 21 > 4 2 22 > 4 3 27 > 4 6 29 > > I want to select the x values for each ID when time is equal to 3. When that > observation is not observed, then I want to replace it with the obervation > at time equal to 4. otherwise just use NA. >
with this dummy data: data = read.table(header=TRUE, textConnection(open='r', ' id time x 2 2 2 2 3 3 2 4 4 2 5 5 3 3 3 3 4 4 3 5 5 4 4 4 4 5 5 5 5 5')) you seem to expect the result to be like # id time x # 2 3 3 # 3 3 3 # 4 4 4 # 5 NA NA one way to hack this is: # the time points you'd like to use, in order of preference times = 3:4 # split the data by id, # for each subset, find values of x for the first time found, or use NA # combine the subsets back into a single data frame do.call(rbind, by(data, data$id, function(data) with(data, { rows = (time == times[which(times %in% time)[1]]) if (is.na(rows[1])) data.frame(id=id, time=NA, x=NA) else data[rows,] }))) # id time x # 2 2 3 3 # 3 3 3 3 # 4 4 4 4 # 5 5 NA NA with your original data: data = read.table(header=TRUE, textConnection(open='r', ' id time x 1 1 10 1 2 11 1 3 23 1 4 23 2 2 12 2 3 13 2 4 14 3 1 11 3 3 15 3 4 18 3 5 21 4 2 22 4 3 27 4 6 29')) times = 3:4 do.call(rbind, by(data, data$id, function(data) with(data, { rows = (time == times[which(times %in% time)[1]]) if (is.na(rows[1])) data.frame(id=id, time=NA, x=NA) else data[rows,] }))) # id time x # 1 1 3 23 # 2 2 3 13 # 3 3 3 15 # 4 4 3 27 is this what you wanted? vQ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.