Hi, 

this is a misunderstanding of my question. I wasn’t worried about invalid 
factor levels that produce NA. My question was why a column changes its class, 
which I thought was a side effect. If you add a vector containing one character 
string, the class of the whole vector becomes _chr_. And after this element has 
been added to a column, we have two NAs for the column which are factors, and a 
character string, which is responsible for the change of a numerical vector 
into a character string vector (see ?c, where you find: "The output type is 
determined from the highest type of the components in the hierarchy NULL < raw 
< logical < integer < double < complex < character < list < expression.“).

 
Best


Tibor



> Am 19.09.2022 um 13:59 schrieb Ebert,Timothy Aaron <teb...@ufl.edu>:
> 
> In your example code, the variable remains a class factor, and all entries 
> are valid. The variables will behave as expected given the factor levels in 
> the original dataframe.
> 
> (At least on my system R 4.2, in RStudio, in Windows) R returns a couple of 
> error messages warning me that I was bad.
> What you get is NA for "not available", or "not appropriate" or a missing 
> value. You gave the system an invalid factor level so it was entered as 
> missing. If you get data that has a new factor level, you need to tell R to 
> expect a new factor level first.
> 
> levels(f1) <- c(levels(f1),"New Level")
> levels(f1) <- c(levels(f1),c("NL1","NL2"))
> 
> 
> Tim
> -----Original Message-----
> From: R-help <r-help-boun...@r-project.org> On Behalf Of Tibor Kiss via R-help
> Sent: Monday, September 19, 2022 6:11 AM
> To: r-help@r-project.org
> Subject: [R] Question concerning side effects of treating invalid factor 
> levels
> 
> [External Email]
> 
> Dear List members,
> 
> I have tried now for several times to find out about a side effect of 
> treating invalid factor levels, but did not find an answer. Various answers 
> on stackexchange etc. produce the stuff that irritates me without even 
> mentioning it.
> So I am asking the list (apologies if this has been treated in the past).
> 
> If you add an invalid factor level to a column in a data frame, this has the 
> side effect of turning a numerical column into a column with character 
> strings. Here is a simple example:
> 
>> df <- data.frame(
>        P = factor(c("mittels", "mit", "mittels", "ueber", "mit", "mit")),
>        ANSWER = factor(c(rep("PP>OBJ", 4), rep("OBJ>PP", 2))),
>        RT = round(runif(6, 7000, 16000), 0))
> 
>> str(df)
> 'data.frame':   6 obs. of  3 variables:
> $ P     : Factor w/ 3 levels "mit","mittels",..: 2 1 2 3 1 1
> $ ANSWER: Factor w/ 2 levels "OBJ>PP","PP>OBJ": 2 2 2 2 1 1
> $ RT    : num  11157 13719 14388 14527 14686 ..
> 
>> df <- rbind(df, c("in", "V>N", round(runif(1, 7000, 16000), 0)))
> 
>> str(df)
> 'data.frame':   7 obs. of  3 variables:
> $ P     : Factor w/ 3 levels "mit","mittels",..: 2 1 2 3 1 1 NA
> $ ANSWER: Factor w/ 2 levels "OBJ>PP","PP>OBJ": 2 2 2 2 1 1 NA
> $ RT    : chr  "11478" "15819" "8305" "8852" ...
> 
> You see that RT has changed from _num_ to _chr_ as a side effect of adding 
> the invalid factor level as NA. I would appreciate understanding what the 
> purpose of the type coercion is.
> 
> Thanks in advance
> 
> 
> Tibor
> ______________________________________________
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&amp;data=05%7C01%7Ctebert%40ufl.edu%7C6ee1a1f50c14442beef508da9a301bde%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C637991828670135028%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=sNDYEJKhjSu%2FtrTIwZx5yVemKgDheQYXLrcQqJ2mOgo%3D&amp;reserved=0
> PLEASE do read the posting guide 
> https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.html&amp;data=05%7C01%7Ctebert%40ufl.edu%7C6ee1a1f50c14442beef508da9a301bde%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C637991828670135028%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=AP%2B4fa5pvbGr3IfwdiQvjXwkOdY90CIWIWWWmpIHH7w%3D&amp;reserved=0
> and provide commented, minimal, self-contained, reproducible code.


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to