On Apr 26, 2011, at 18:52 , Petr Savicky wrote: > On Tue, Apr 26, 2011 at 10:51:33AM +0200, Petr PIKAL wrote: >> Hi >> >> >> d<-data.frame(matrix(c("ww","ww","xx","yy","ww","yy","xx","yy","NA"), >> ncol=3, byrow=TRUE)) >> >> Change character value "NA" to missing value <NA> >> d[d[,3]=="NA",3]<-NA >> >> If you want drop any unused levels of a factor just use >> >> factor(d[,3]) >> [1] xx yy <NA> >> Levels: xx yy > > An explicit NA is a good idea. If the NA is introduced before > creating the data frame, then also the data frame will not > contain the unwanted level. > > a<-matrix(c("ww","ww","xx","yy","ww","yy","xx","yy","NA"), > ncol=3, byrow=TRUE) > a[a[,3]=="NA",3]<-NA > d<-data.frame(a) > d[,3] > > [1] xx yy <NA> > Levels: xx yy > > If the replacement should be done in the whole matrix, then > > a[a=="NA"]<-NA > > may be used. > > Petr Savicky.
I think there's a buglet in here. According to the docs, "If exclude is used it should also be a factor with the same level set as x or a set of codes for the levels to be excluded". However, that plainly doesn't work: > cc <- c("x","y","NA") > ff <- factor(cc) > factor(ff,exclude=1) [1] x y NA Levels: NA x y > factor(ff,exclude=ff[3]) [1] x y NA Levels: NA x y > factor(ff,exclude=ff[2]) [1] x y NA Levels: NA x y In these cases, the internal logic converts exclude to integer, and then uses match(levels, exclude) where levels is unique(x), i.e., a factor. This won't work because match() matches on the _character_ representation of x. The cleanest version that I can think of for the original problem is > factor(ff, levels=setdiff(levels(ff), "NA")) [1] x y <NA> Levels: x y -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd....@cbs.dk Priv: pda...@gmail.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.