Dear R-listers, I am a relatively inexperienced R-user currently migrating from Stata. I am deeply frustrated by this data manipulation question: I know how I could do it in Stata, but I cannot make it work in R.
I have a data frame of hospitalization data where each row represents an admission. I need to know when patients were first discharged, but the problem is that patients were sometimes transferred between hospital departments. In my data a transfer looks like a new admission, except that it has a 'start' date equal to the previous admission's 'stop' date. Here is an example: id <- c(rep("a",4),rep("b",2), rep("c",5), rep("d",1)) start <- c(c(0,6,17,20),c(0,1),c(0,5,10,11,50),c(0)) stop <- c(c(6,12,20,30),c(1,10),c(3,10,11,30,55),c(6)) data <- as.data.frame(cbind(id,start,stop)) data # id start stop # 1 a 0 6 # 2 a 6 12 # 3 a 17 20 # 4 a 20 30 # 5 b 0 1 # 6 b 1 10 # 7 c 0 3 # 8 c 5 10 # 9 c 10 11 # 10 c 11 30 # 11 c 50 55 # 12 d 0 6 So, what I want to end up with is this: id start stop a 0 12 # This patient was transferred at time 6 and discharged at time 12. The admission starting at time 17 is therefore irrelevant. b 0 10 c 0 3 d 0 6 I have tried tons of variations over lapply, sapply, split, for etc., all to no avail. Thank you in advance for any assistance. Best regards, Peter Jepsen, MD. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.