Re: [R] Namibia becoming NA

2010-07-18 Thread Joshua Wiley
Hi Suresh,

I think you will need to use read.table() rather than the read.csv()
wrapper for it.  Try:

input - read.table(file = padded.csv, sep = ,, header = TRUE,
na.strings = NULL)

HTH,

Josh

On Sat, Jul 17, 2010 at 10:47 PM, Suresh Singh singh@osu.edu wrote:
 I have a data file in which one of the columns is country code and NA is the
 code for Namibia.
 When I read the data file using read.csv, NA for Namibia is being treated as
 null or NA

 How can I prevent this from happening?

 I tried the following but it didn't work
 input - read.csv(padded.csv,header = TRUE,as.is = c(code2))

 thanks,
 Suresh

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Namibia becoming NA

2010-07-18 Thread Ted Harding
On 18-Jul-10 05:47:03, Suresh Singh wrote:
 I have a data file in which one of the columns is country code and NA
 is the
 code for Namibia.
 When I read the data file using read.csv, NA for Namibia is being
 treated as
 null or NA
 
 How can I prevent this from happening?
 
 I tried the following but it didn't work
 input - read.csv(padded.csv,header = TRUE,as.is = c(code2))
 
 thanks,
 Suresh

I suppose this was bound to happen, and in my view it represent
a bit of a mess! With a test file temp.csv:

  Code,Country
  DE,Germany
  IT,Italy
  NA,Namibia
  FR,France

  X - read.csv(temp.csv)
  X
  Code Country
  # 1   DE Germany
  # 2   IT   Italy
  # 3 NA Namibia
  # 4   FR  France
  which(is.na(X))
  # [1] 3

exactly as Suresh describes. It does not help to surround the NA
in temp.csv with quotes:

  Code,Country
  DE,Germany
  IT,Italy
  NA,Namibia
  FR,France

leads to exactly the same result. And I have tried every variation
I can think of of as.is and colClasses, still with exactly the
same result!

Conclusion: If an entry in a data file is intended to become the
character value NA, there seems to be no way of reading it in
directly. This should not be so: it should be preventable!

As a cure, assuming that no other value in the Country Code is
actually missing (and so should be NA), then (with Suresh's
naming) I would suggest, subsequent to reading in the file,
something like the following. The complication is that the variable
code2 is now a factor, and you cannot simply assign a character
value NA to its NA value -- you will get an error message.
Hence:

  ix - which(is.na(input$code2))
  Y  - as.character(input$code2)
  Y[ix] - NA
  input$code2) - factor(Y)

The corresponding code for my test example is:

  ix - which(is.na(X$Code))
  Y  - as.character(X$Code)
  Y[ix] - NA
  X$Code - factor(Y)

  X
  #   Code Country
  # 1   DE Germany
  # 2   IT   Italy
  # 3   NA Namibia
  # 4   FR  France
  which(is.na(X))
  # integer(0)

So that works.

There ought to be an option in read.csv() and friends which suppresses
the conversion of a string NA found in input into an NA value.
Maybe there is -- but, if so, it is not visible in the documentation!

Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 18-Jul-10   Time: 09:25:05
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Namibia becoming NA

2010-07-18 Thread Berwin A Turlach
G'day Ted,

On Sun, 18 Jul 2010 09:25:09 +0100 (BST)
(Ted Harding) ted.hard...@manchester.ac.uk wrote:

 On 18-Jul-10 05:47:03, Suresh Singh wrote:
  I have a data file in which one of the columns is country code and
  NA is the
  code for Namibia.
  When I read the data file using read.csv, NA for Namibia is being
  treated as
  null or NA
  
  How can I prevent this from happening?
  
  I tried the following but it didn't work
  input - read.csv(padded.csv,header = TRUE,as.is = c(code2))
  
  thanks,
  Suresh
 
 I suppose this was bound to happen, and in my view it represent
 a bit of a mess! With a test file temp.csv:
 
   Code,Country
   DE,Germany
   IT,Italy
   NA,Namibia
   FR,France

Thanks for providing an example.

 leads to exactly the same result. And I have tried every variation
 I can think of of as.is and colClasses, still with exactly the
 same result!

Did you think of trying some variations of na.strings? ;-) 

IMO, the simplest way of coding missing values in CSV files is to have
two consecutive commas; not some code (whether NA, 99, 999, -1, ...)
between them.

 Conclusion: If an entry in a data file is intended to become the
 character value NA, there seems to be no way of reading it in
 directly. This should not be so: it should be preventable!

It is, through simple use of the na.strings argument:

R X - read.csv(temp.csv, na.strings=)
R X
  Code Country
1   DE Germany
2   IT   Italy
3   NA Namibia
4   FR  France
R which(is.na(X))
integer(0)

HTH.

Cheers,

Berwin

== Full address 
Berwin A Turlach  Tel.: +61 (8) 6488 3338 (secr)
School of Maths and Stats (M019)+61 (8) 6488 3383 (self)
The University of Western Australia   FAX : +61 (8) 6488 1028
35 Stirling Highway   
Crawley WA 6009e-mail: ber...@maths.uwa.edu.au
Australiahttp://www.maths.uwa.edu.au/~berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Namibia becoming NA

2010-07-18 Thread Peter Dalgaard
Berwin A Turlach wrote:

 Did you think of trying some variations of na.strings? ;-) 
 
 IMO, the simplest way of coding missing values in CSV files is to have
 two consecutive commas; not some code (whether NA, 99, 999, -1, ...)
 between them.

Yes. Arguably, na.strings=NULL should be the default (and na= for
write.csv) since delimited formats are (mainly) for communicating with
external programs, which are not likely to use the NA code (unless it is
for Namibia, North America, Noradrenalin, Niels Andersen, etc.)

However, back compatibility (including that with files written with R's
own write.csv) probably precludes changing anything at this point.
Notice that read.csv and friends do pass ... to read.table, so it is
easy to override the default setting.

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Namibia becoming NA

2010-07-17 Thread Suresh Singh
I have a data file in which one of the columns is country code and NA is the
code for Namibia.
When I read the data file using read.csv, NA for Namibia is being treated as
null or NA

How can I prevent this from happening?

I tried the following but it didn't work
input - read.csv(padded.csv,header = TRUE,as.is = c(code2))

thanks,
Suresh

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.