On Wed, Mar 4, 2009 at 12:09 AM, Vedula, Satyanarayana <sved...@jhsph.edu> wrote: > Hi, > > Could someone help with coding this in R? > > I need to select one row per patient i in clinic j. The data is organized > similar to that shown below. > > Two columns - patient i in column j identify each unique patient. There are > two columns on outcome. Some patients have multiple rows with each row > representing one visit, coded for in the column, visit. Some patients have > just one row indicating data from a single visit. > > I need to select one row per patient i in clinic j using the following > algorithm: > > If patient has outcome recorded at visit 2, then outcome = outcome columns at > visit 2 > If patient does not have visit 2, then outcome = outcome at visit 5 > If patient does not have visit 2 and visit 5, then outcome = outcome at visit > 4 > If patient does not have visits 2, 5, and 4, then outcome = outcome at visit 3 > If patient does not have visits 2, 5, 4, and 3, then outcome = outcome at > visit 1 > If patient does not have any of the visits, outcome = missing > > > Patient Clinic Visit Outcome_left Outcome_right > patient 1 clinic 1 visit 2 22 21 > patient 1 clinic 3 visit 1 21 21 > patient 1 clinic 3 visit 2 21 22 > patient 1 clinic 3 visit 3 20 22 > patient 3 clinic 5 visit 1 24 21 > patient 3 clinic 5 visit 3 21 22 > patient 3 clinic 5 visit 4 22 23 > patient 3 clinic 5 visit 5 22 22 > > I need to select just the first row for patient 1/clinic 1; the second row > (visit 2) for patient 1/clinic 3; and the fourth row (visit 5) for patient > 3/clinic 5.
I'd approach this problem in the following way: df <- read.csv(textConnection(" Patient,Clinic,Visit,Outcome_left,Outcome_right patient 1,clinic 1,visit 2,22,21 patient 1,clinic 3,visit 1,21,21 patient 1,clinic 3,visit 2,21,22 patient 1,clinic 3,visit 3,20,22 patient 3,clinic 5,visit 1,24,21 patient 3,clinic 5,visit 3,21,22 patient 3,clinic 5,visit 4,22,23 patient 3,clinic 5,visit 5,22,22 "), header = T) closeAllConnections() # With a single patient it's pretty easy to find the preferred visit preferred_visit <- paste("visit", c(2, 5, 4, 3, 1)) one <- subset(df, Patient == "patient 3" & Clinic == "clinic 5") best_visit <- na.omit(match(preferred_visit, one$Visit))[1] one[best_visit, ] # We then turn this into a function find_best_visit <- function(one) { best_visit <- na.omit(match(preferred_visit, one$Visit))[1] one[best_visit, ] } # Then apply it to every combination of patient and clinic with plyr ddply(df, .(Patient, Clinic), find_best_visit) # You can learn more about plyr at http://had.co.nz/plyr Hadley -- http://had.co.nz/ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.