On Fri, 2006-08-18 at 10:41 -0400, Tom Boonen wrote: > Dear List, > > why does as.data.frame(cbind()) transform numeric variables to > factors, once one of the other variablesused is a character vector? > > # > x.1 <- rnorm(10) > x.2 <- c(rep("Test",10)) > Foo <- as.data.frame(cbind(x.1)) > is.factor(Foo$x.1) > > Foo <- as.data.frame(cbind(x.1,x.2)) > is.factor(Foo$x.1) > # > > I assume there is a good reason for this, can somebody explain? Thanks. > > Best, > Tom
See the Note section of ?cbind, which states: The method dispatching is not done via UseMethod(), but by C-internal dispatching. Therefore, there is no need for, e.g., rbind.default. The dispatch algorithm is described in the source file (ā.../src/main/bind.cā) as 1. For each argument we get the list of possible class memberships from the class attribute. 2. We inspect each class in turn to see if there is an an applicable method. 3. If we find an applicable method we make sure that it is identical to any method determined for prior arguments. If it is identical, we proceed, otherwise we immediately drop through to the default code. If you want to combine other objects with data frames, it may be necessary to coerce them to data frames first. (Note that this algorithm can result in calling the data frame method if the arguments are all either data frames or vectors, and this will result in the coercion of character vectors to factors.) Thus, note the result of: > str(cbind(x.1, x.2)) chr [1:10, 1:2] "-0.265756038510064" "2.13220714034528" ... - attr(*, "dimnames")=List of 2 ..$ : NULL ..$ : chr [1:2] "x.1" "x.2" Since a matrix can only contain a single data type, the numeric vector is coerced to character. Then using as.data.frame() coerces the character matrix to factors, which is the default behavior. If you want to create a data frame, do it this way: > str(data.frame(x.1, x.2)) `data.frame': 10 obs. of 2 variables: $ x.1: num -0.266 2.132 2.096 -0.128 -0.466 ... $ x.2: Factor w/ 1 level "Test": 1 1 1 1 1 1 1 1 1 1 or if you want to retain the character vector, use I(): > str(data.frame(x.1, I(x.2))) `data.frame': 10 obs. of 2 variables: $ x.1: num -0.266 2.132 2.096 -0.128 -0.466 ... $ x.2:Class 'AsIs' chr [1:10] "Test" "Test" "Test" "Test" ... See ?data.frame for more information. HTH, Marc Schwartz ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.