Hi > -----Original Message----- > From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- > project.org] On Behalf Of Rui Barradas > Sent: Monday, August 20, 2012 2:03 PM > To: S Ellison > Cc: r-help > Subject: Re: [R] Opinion: Why I find factors convenient to use > > Hello, > > Em 20-08-2012 12:30, S Ellison escreveu: > > > > > >> -----Original Message----- > >> Over the years, many people -- including some who I would consider > >> real expeRts -- have criticized factors and advocated the use > >> (sometimes exclusively) of character vectors instead. > > Exclusive use of character vectors is not going to do the job. > > > > The concept of a factor is fundamental to a lot of statistics; a > programming environment that does not implement factors and their > associated special behaviour is probably not a statistical programming > language. > > > > Special behaviours I have in mind include: > > - Level order can be arbitrarily specified for display purposes > > - A control level can be intentionally chosen for contrasts > > - the option of "ordered" factors (for example, for polr and the > like) > > > > So I think the language does and will require a 'factor' type in one > form or another. > > > > _When_ you decide to convert a character input to a factor is, of > course, up to the user,and for cleanup it's very often better to stick > with character early and convert to factor a bit later. But personally, > I think that there is sufficient control over the coding of data to > allow user discretion. and on the whole, it seems to me that character > input gets used as factor data so much of the time when it is used at > all that the default stringsAsFactors=TRUE setting seems the more > sensible default. > > I disagree with this last point. Just think of the number of questions > to this list about, say, dates. When read from file using one of the > forms of read.table, they usually cause problems. Unless the user is an
Hm. I may be wrong but most confusion comes from: My numbers are not read as numbers and when I try to convert them by as.numeric they are changed and scrambled to integers. What can I do? Personally I do not find factors too much confusing, they behave almost the same as character vectors. ch<-sample(letters[1:4], 20, replace=T) ff<-factor(ch) ch[ch=="b"] [1] "b" "b" "b" "b" "b" "b" "b" ff[ff=="b"] [1] b b b b b b b Levels: a b c d paste(ch,1:5) [1] "b 1" "d 2" "d 3" "c 4" "d 5" "c 1" "b 2" "b 3" "c 4" "d 5" "b 1" "c 2" [13] "b 3" "c 4" "b 5" "c 1" "c 2" "c 3" "b 4" "a 5" paste(ff,1:5) [1] "b 1" "d 2" "d 3" "c 4" "d 5" "c 1" "b 2" "b 3" "c 4" "d 5" "b 1" "c 2" [13] "b 3" "c 4" "b 5" "c 1" "c 2" "c 3" "b 4" "a 5" ddch<-c("2000-05-05", "2001-05-05") ddf<-as.factor(ddch) str(as.Date(ddch)) Date[1:2], format: "2000-05-05" "2001-05-05" str(as.Date(ddf)) Date[1:2], format: "2000-05-05" "2001-05-05" The only problem is when you want to add some values to factors or to concatenate by c(some factor, some values), you need to do character conversion like that. my.c <- function(x, ...) { x.f <- as.character(x) if (is.factor(x)) res <- as.factor(c(x.f, ...)) else res <- c(x,...) res } But e.g. merge works fine ffx <- factor("x") str(merge(data.frame(ff), data.frame(ffx), by.x="ff", by.y="ffx", all=T)) 'data.frame': 21 obs. of 1 variable: $ ff: Factor w/ 5 levels "a","b","c","d",..: 1 1 2 2 2 2 2 2 3 3 ... So for me personally default read.table stringsAsFactors=TRUE is better as I have some code working with factors and without checking. > experienced one, in which case he/she might not have a question to ask. > Besides, the default TRUE is contradictory with "stick with character > early and convert to factor a bit later". With both "early" and > "later". > A different thing is to have a very used function's default behavior > change from one version of R to the next one. What about all the code > in use? Maybe it's better to leave it be. > > Rui Barradas > > > > S Ellison > > > > ******************************************************************* > > This email and any attachments are confidential. Any > > use...{{dropped:8}} > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.