Hi all,

I am trying to convert a data set of physician death codings (each individual's 
cause of death is coded by multiple physicians) from long to wide format, but 
the "reshape" function doesn't seem to work because it requires a "time" 
variable to identify the sequence among the repeated observations within 
individuals.  My data set has no order, and different numbers of physicians 
code each death, up to 23.  It is also quite large, so for-loops are very slow, 
and I'll need to repeat the procedure multiple times.  So I'm looking for a 
processor-efficient way to replicate "reshape" without a time variable.

Thanks in advance for any help you can provide.  A worked example and some code 
I've tried are below.  I'm working with R v2.8.1 on Windows XP Professional.

Cheers,
Alan Cohen

Here's what my data look like now:

> id <- rep(1:5,2)
> COD <- c("A01","A02","A03","A04","A05","B01","A02","B03","B04","A05")
> MDid <- c(1:6,3,5,7,2)
> data <- as.data.frame(cbind(id,COD,MDid))
> data
   id COD MDid
1   1 A01    1
2   2 A02    2
3   3 A03    3
4   4 A04    4
5   5 A05    5
6   1 B01    6
7   2 A02    3
8   3 B03    5
9   4 B04    7
10  5 A05    2

And here's what I'd like them to look like:

> id2 <- 1:5
> COD.1 <- c("A01","A02","A03","A04","A05")
> COD.2 <- c("B01","A02","B03","B04","A05")
> MDid.1 <- 1:5
> MDid.2 <-c(6,3,5,7,2)
> data.wide <- as.data.frame(cbind(id2,COD.1,COD.2,MDid.1,MDid.2))
> data.wide
  id2 COD.1 COD.2 MDid.1 MDid.2
1   1   A01   B01      1      6
2   2   A02   A02      2      3
3   3   A03   B03      3      5
4   4   A04   B04      4      7
5   5   A05   A05      5      2

Here's the for-loop that's very slow (with or without the if-clauses activated):

ids<-unique(data$id)
ct<-length(ids)
codes<-matrix(0,ct,11)
colnames(codes)<-c("ID","ICD1","Coder1","ICD2","Coder2","ICD3","Coder3","ICD4","Coder4","ICD5","Coder5")
j<-0
for (i in 1:ct){
  kkk <- ids[i] 
  rpt<-data[data$id==kkk,]
  j<-max(j,nrow(rpt))
  codes[i,1]<-kkk
  codes[i,2]<-rpt$ICDCode[1]
  codes[i,3]<-rpt$T_Physician_ID[1]
  #if (nrow(rpt)>=2){
   codes[i,4]<-rpt$ICDCode[2]
   codes[i,5]<-rpt$T_Physician_ID[2] 
    #if (nrow(rpt)>=3) {
     codes[i,6]<-rpt$ICDCode[3]
     codes[i,7]<-rpt$T_Physician_ID[3]
      #if (nrow(rpt)>=4) {
       codes[i,8]<-rpt$ICDCode[4]
       codes[i,9]<-rpt$T_Physician_ID[4]
          #if (nrow(rpt)>=5) {
           codes[i,10]<-rpt$ICDCode[5]
           codes[i,11]<-rpt$T_Physician_ID[5]
#}}}}
}

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to