Just a quick update on this thread.

The version of expand.dft() that I posted earlier has a bug in it.

This is the result of the use of lapply() and the evaluation of the
additional arguments passed to type.convert().

I noted this when testing the function on the UCBAdmissions data set,
which is a multi-way table used in some help file examples such
as ?as.data.frame.table.

Here is a corrected version:

expand.dft <- function(x, na.strings = "NA", as.is = FALSE, dec = ".")
{
  DF <- sapply(1:nrow(x), function(i) x[rep(i, each = x$Freq[i]), ],
               simplify = FALSE)

  DF <- subset(do.call("rbind", DF), select = -Freq)

  for (i in 1:ncol(DF))
  {
    DF[[i]] <- type.convert(as.character(DF[[i]]),
                            na.strings = na.strings,
                            as.is = as.is, dec = dec)
                                           
  }
    
  DF
}             


Thus if we now take the UCBAdmissions multi-way table data and convert
it to a flat contingency table:

FCT <- as.data.frame(UCBAdmissions)

> FCT
      Admit Gender Dept Freq
1  Admitted   Male    A  512
2  Rejected   Male    A  313
3  Admitted Female    A   89
4  Rejected Female    A   19
5  Admitted   Male    B  353
6  Rejected   Male    B  207
7  Admitted Female    B   17
8  Rejected Female    B    8
9  Admitted   Male    C  120
10 Rejected   Male    C  205
11 Admitted Female    C  202
12 Rejected Female    C  391
13 Admitted   Male    D  138
14 Rejected   Male    D  279
15 Admitted Female    D  131
16 Rejected Female    D  244
17 Admitted   Male    E   53
18 Rejected   Male    E  138
19 Admitted Female    E   94
20 Rejected Female    E  299
21 Admitted   Male    F   22
22 Rejected   Male    F  351
23 Admitted Female    F   24
24 Rejected Female    F  317


Thus, there should be:

> sum(FCT$Freq)
[1] 4526

rows in the final 'raw' data frame.


> DF <- expand.dft(FCT)

> str(DF)
'data.frame':   4526 obs. of  3 variables:
 $ Admit : Factor w/ 2 levels "Admitted","Rejected": 1 1 1 1 1 1 1 1 1
1 ...
 $ Gender: Factor w/ 2 levels "Female","Male": 2 2 2 2 2 2 2 2 2 2 ...
 $ Dept  : Factor w/ 6 levels "A","B","C","D",..: 1 1 1 1 1 1 1 1 1
1 ...


Note that the three columns are coerced back to factors, which is of
course the default behavior for data frames.

If we now use:

> DF <- expand.dft(FCT, as.is = TRUE)

> str(DF)
'data.frame':   4526 obs. of  3 variables:
 $ Admit : chr  "Admitted" "Admitted" "Admitted" "Admitted" ...
 $ Gender: chr  "Male" "Male" "Male" "Male" ...
 $ Dept  : chr  "A" "A" "A" "A" ...


The three columns stay as character vectors. It was this behavior that
did not work properly in the first version.

HTH,

Marc Schwartz

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to