Dear all, I have some csv-files (originating from Excel-files) containing empty cells. In my example file I have four variables of different classes, each with some empty cells in the original csv-file:
> test <- read.csv2("test.csv", dec=".") > test id id2 x y 1 a 1 NA 2 b e NA 2.2 3 f 3 3.3 4 c g 4 4.4 > class(test$id) [1] "factor" > class(test$id2) [1] "factor" > class(test$x) [1] "integer" > class(test$y) [1] "numeric" In the help text of read.csv2 you can read 'Blank fields are also considered to be missing values in logical, integer, numeric and complex fields.'. Thus, empty cells in a factor (or a character I assume) is not considered as missing values but an own level: > is.na(test$id) [1] FALSE FALSE FALSE FALSE > levels(test$id) [1] "" "a" "b" "c" When I work with my real (larger) dataset I would like to use functions like 'is.na' and '!is.na' on factors. Now I wonder if there is an R alternativ to do 'search (for empty cells) and replace (with NA)' in Excel? I have tried a modification of Uwe Ligges suggestion on missing value posted 2 Aug: > is.na(test[test==""]) <- TRUE ...but it did not work on the data set: Error in "[<-.data.frame"(`*tmp*`, test == "", value = c(NA, NA, NA, NA : rhs is the wrong length for indexing by a logical matrix However it worked fine when applied to a single vector: > is.na(test$id[test$id==""]) <- TRUE > test$id [1] a b <NA> c Levels: a b c > is.na(test$id) [1] FALSE FALSE TRUE FALSE Is there a more efficient way to fill empty cells in all my factors in R or should I just do it in advance in Excel by 'search and replace'? Thanks in advance! -- ************************ Henrik Pärn Department of Biology NTNU 7491 Trondheim Norway +47 735 96282 (office) +47 909 89 255 (mobile) +47 735 96100 (fax) ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.