Re: [R] Namibia becoming NA
Hi Suresh, I think you will need to use read.table() rather than the read.csv() wrapper for it. Try: input - read.table(file = padded.csv, sep = ,, header = TRUE, na.strings = NULL) HTH, Josh On Sat, Jul 17, 2010 at 10:47 PM, Suresh Singh singh@osu.edu wrote: I have a data file in which one of the columns is country code and NA is the code for Namibia. When I read the data file using read.csv, NA for Namibia is being treated as null or NA How can I prevent this from happening? I tried the following but it didn't work input - read.csv(padded.csv,header = TRUE,as.is = c(code2)) thanks, Suresh [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Namibia becoming NA
On 18-Jul-10 05:47:03, Suresh Singh wrote: I have a data file in which one of the columns is country code and NA is the code for Namibia. When I read the data file using read.csv, NA for Namibia is being treated as null or NA How can I prevent this from happening? I tried the following but it didn't work input - read.csv(padded.csv,header = TRUE,as.is = c(code2)) thanks, Suresh I suppose this was bound to happen, and in my view it represent a bit of a mess! With a test file temp.csv: Code,Country DE,Germany IT,Italy NA,Namibia FR,France X - read.csv(temp.csv) X Code Country # 1 DE Germany # 2 IT Italy # 3 NA Namibia # 4 FR France which(is.na(X)) # [1] 3 exactly as Suresh describes. It does not help to surround the NA in temp.csv with quotes: Code,Country DE,Germany IT,Italy NA,Namibia FR,France leads to exactly the same result. And I have tried every variation I can think of of as.is and colClasses, still with exactly the same result! Conclusion: If an entry in a data file is intended to become the character value NA, there seems to be no way of reading it in directly. This should not be so: it should be preventable! As a cure, assuming that no other value in the Country Code is actually missing (and so should be NA), then (with Suresh's naming) I would suggest, subsequent to reading in the file, something like the following. The complication is that the variable code2 is now a factor, and you cannot simply assign a character value NA to its NA value -- you will get an error message. Hence: ix - which(is.na(input$code2)) Y - as.character(input$code2) Y[ix] - NA input$code2) - factor(Y) The corresponding code for my test example is: ix - which(is.na(X$Code)) Y - as.character(X$Code) Y[ix] - NA X$Code - factor(Y) X # Code Country # 1 DE Germany # 2 IT Italy # 3 NA Namibia # 4 FR France which(is.na(X)) # integer(0) So that works. There ought to be an option in read.csv() and friends which suppresses the conversion of a string NA found in input into an NA value. Maybe there is -- but, if so, it is not visible in the documentation! Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 18-Jul-10 Time: 09:25:05 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Namibia becoming NA
G'day Ted, On Sun, 18 Jul 2010 09:25:09 +0100 (BST) (Ted Harding) ted.hard...@manchester.ac.uk wrote: On 18-Jul-10 05:47:03, Suresh Singh wrote: I have a data file in which one of the columns is country code and NA is the code for Namibia. When I read the data file using read.csv, NA for Namibia is being treated as null or NA How can I prevent this from happening? I tried the following but it didn't work input - read.csv(padded.csv,header = TRUE,as.is = c(code2)) thanks, Suresh I suppose this was bound to happen, and in my view it represent a bit of a mess! With a test file temp.csv: Code,Country DE,Germany IT,Italy NA,Namibia FR,France Thanks for providing an example. leads to exactly the same result. And I have tried every variation I can think of of as.is and colClasses, still with exactly the same result! Did you think of trying some variations of na.strings? ;-) IMO, the simplest way of coding missing values in CSV files is to have two consecutive commas; not some code (whether NA, 99, 999, -1, ...) between them. Conclusion: If an entry in a data file is intended to become the character value NA, there seems to be no way of reading it in directly. This should not be so: it should be preventable! It is, through simple use of the na.strings argument: R X - read.csv(temp.csv, na.strings=) R X Code Country 1 DE Germany 2 IT Italy 3 NA Namibia 4 FR France R which(is.na(X)) integer(0) HTH. Cheers, Berwin == Full address Berwin A Turlach Tel.: +61 (8) 6488 3338 (secr) School of Maths and Stats (M019)+61 (8) 6488 3383 (self) The University of Western Australia FAX : +61 (8) 6488 1028 35 Stirling Highway Crawley WA 6009e-mail: ber...@maths.uwa.edu.au Australiahttp://www.maths.uwa.edu.au/~berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Namibia becoming NA
Berwin A Turlach wrote: Did you think of trying some variations of na.strings? ;-) IMO, the simplest way of coding missing values in CSV files is to have two consecutive commas; not some code (whether NA, 99, 999, -1, ...) between them. Yes. Arguably, na.strings=NULL should be the default (and na= for write.csv) since delimited formats are (mainly) for communicating with external programs, which are not likely to use the NA code (unless it is for Namibia, North America, Noradrenalin, Niels Andersen, etc.) However, back compatibility (including that with files written with R's own write.csv) probably precludes changing anything at this point. Notice that read.csv and friends do pass ... to read.table, so it is easy to override the default setting. -- Peter Dalgaard Center for Statistics, Copenhagen Business School Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Namibia becoming NA
I have a data file in which one of the columns is country code and NA is the code for Namibia. When I read the data file using read.csv, NA for Namibia is being treated as null or NA How can I prevent this from happening? I tried the following but it didn't work input - read.csv(padded.csv,header = TRUE,as.is = c(code2)) thanks, Suresh [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.