Re: [R] Selecting one row or multiple rows per ID

hadley wickham Wed, 04 Mar 2009 06:56:41 -0800

On Wed, Mar 4, 2009 at 12:09 AM, Vedula, Satyanarayana
<sved...@jhsph.edu> wrote:
> Hi,
>
> Could someone help with coding this in R?
>
> I need to select one row per patient i in clinic j. The data is organized 
> similar to that shown below.
>
> Two columns - patient i in column j identify each unique patient. There are 
> two columns on outcome. Some patients have multiple rows with each row 
> representing one visit, coded for in the column, visit. Some patients have 
> just one row indicating data from a single visit.
>
> I need to select one row per patient i in clinic j using the following 
> algorithm:
>
> If patient has outcome recorded at visit 2, then outcome = outcome columns at 
> visit 2
> If patient does not have visit 2, then outcome = outcome at visit 5
> If patient does not have visit 2 and visit 5, then outcome = outcome at visit 
> 4
> If patient does not have visits 2, 5, and 4, then outcome = outcome at visit 3
> If patient does not have visits 2, 5, 4, and 3, then outcome = outcome at 
> visit 1
> If patient does not have any of the visits, outcome = missing
>
>
> Patient     Clinic     Visit     Outcome_left   Outcome_right
> patient 1  clinic 1   visit 2        22                        21
> patient 1  clinic 3   visit 1        21                        21
> patient 1  clinic 3   visit 2        21                        22
> patient 1  clinic 3   visit 3        20                        22
> patient 3  clinic 5   visit 1        24                        21
> patient 3  clinic 5   visit 3        21                        22
> patient 3  clinic 5   visit 4        22                        23
> patient 3  clinic 5   visit 5        22                        22
>
> I need to select just the first row for patient 1/clinic 1; the second row 
> (visit 2) for patient 1/clinic 3; and the fourth row (visit 5) for patient 
> 3/clinic 5.


I'd approach this problem in the following way:

df <- read.csv(textConnection("
Patient,Clinic,Visit,Outcome_left,Outcome_right
patient 1,clinic 1,visit 2,22,21
patient 1,clinic 3,visit 1,21,21
patient 1,clinic 3,visit 2,21,22
patient 1,clinic 3,visit 3,20,22
patient 3,clinic 5,visit 1,24,21
patient 3,clinic 5,visit 3,21,22
patient 3,clinic 5,visit 4,22,23
patient 3,clinic 5,visit 5,22,22
"), header = T)
closeAllConnections()


# With a single patient it's pretty easy to find the preferred visit
preferred_visit <- paste("visit", c(2, 5, 4, 3, 1))

one <- subset(df, Patient == "patient 3" & Clinic == "clinic 5")
best_visit <- na.omit(match(preferred_visit, one$Visit))[1]
one[best_visit, ]

# We then turn this into a function
find_best_visit <- function(one) {
  best_visit <- na.omit(match(preferred_visit, one$Visit))[1]
  one[best_visit, ]
}

# Then apply it to every combination of patient and clinic with plyr
ddply(df, .(Patient, Clinic), find_best_visit)

# You can learn more about plyr at http://had.co.nz/plyr


Hadley

-- 
http://had.co.nz/

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Selecting one row or multiple rows per ID

Reply via email to