On Sep 22, 2015, at 3:00 PM, Therneau, Terry M., Ph.D. wrote:

> I have a csv file from an automatic process (so this will happen thousands of 
> times), for which the first row is a vector of variable names and the second 
> row often starts something like this:
> 
> 5724550,"000202075214",2005.02.17,2005.02.17,"F", .....
> 
> Notice the second variable which is
>      a character string (note the quotation marks)
>      a sequence of numeric digits
>      leading zeros are significant
> 
> The read.csv function insists on turning this into a numeric.  Is there any 
> simple set of options that
> will turn this behavior off?  I'm looking for a way to tell it to "obey the 
> bloody quotes" -- I still want the first, third, etc columns to become 
> numeric.  There can be more than one variable like this, and not always in 
> the second position.

The last part about not knowing which col might be an issue might require 
inputting everything with character class, but if there is a way to pass in a 
colClasses argument this might help:

> read.csv(text='5724550,"000202075214",2005.02.17,2005.02.17,"F"', 
> stringsAsFactors=FALSE, header=FALSE, colClasses=c("numeric", 
> rep("character", 4)))
       V1           V2         V3         V4 V5
1 5724550 000202075214 2005.02.17 2005.02.17  F

Or you can create a class with an As method:

> setClass('myChar')
> setAs('character', 'myChar', def=function(from, to ) to <- I(from))
> read.csv(text='5724550,"000202075214",2005.02.17,2005.02.17,"F"', 
> stringsAsFactors=FALSE, header=FALSE, colClasses=c("numeric", 
> rep('myChar',4)) )
       V1           V2         V3         V4 V5
1 5724550 000202075214 2005.02.17 2005.02.17  F

(Neither of the third or fourth columns makes sense as a numeric, so now 
illustrating coercion to Date.)

> setClass('dotDate')
> setAs('character', 'dotDate', def=function(from, to ) to <- as.Date(from, 
> "%Y.%m.%d")  )

> read.csv(text='5724550,"000202075214",2005.02.17,2005.02.17,"F"', 
> stringsAsFactors=FALSE, header=FALSE, colClasses=c("numeric", "character", 
> rep('dotDate',2), "character") )
       V1           V2         V3         V4 V5
1 5724550 000202075214 2005-02-17 2005-02-17  F


> 
> This happens deep inside the httr library; there is an easy way for me to add 
> more options to the read.csv call but it is not so easy to replace it with 
> something else.
> 
> Terry T
> 
> ______________________________________________
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to