Dear List members,
I have tried now for several times to find out about a side effect of treating
invalid factor levels, but did not find an answer. Various answers on
stackexchange etc. produce the stuff that irritates me without even mentioning
it.
So I am asking the list (apologies if this has been treated in the past).
If you add an invalid factor level to a column in a data frame, this has the
side effect of turning a numerical column into a column with character strings.
Here is a simple example:
> df <- data.frame(
P = factor(c("mittels", "mit", "mittels", "ueber", "mit", "mit")),
ANSWER = factor(c(rep("PP>OBJ", 4), rep("OBJ>PP", 2))),
RT = round(runif(6, 7000, 16000), 0))
> str(df)
'data.frame': 6 obs. of 3 variables:
$ P : Factor w/ 3 levels "mit","mittels",..: 2 1 2 3 1 1
$ ANSWER: Factor w/ 2 levels "OBJ>PP","PP>OBJ": 2 2 2 2 1 1
$ RT : num 11157 13719 14388 14527 14686 ..
> df <- rbind(df, c("in", "V>N", round(runif(1, 7000, 16000), 0)))
> str(df)
'data.frame': 7 obs. of 3 variables:
$ P : Factor w/ 3 levels "mit","mittels",..: 2 1 2 3 1 1 NA
$ ANSWER: Factor w/ 2 levels "OBJ>PP","PP>OBJ": 2 2 2 2 1 1 NA
$ RT : chr "11478" "15819" "8305" "8852" …
You see that RT has changed from _num_ to _chr_ as a side effect of adding the
invalid factor level as NA. I would appreciate understanding what the purpose
of the type coercion is.
Thanks in advance
Tibor
______________________________________________
[email protected] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.