Re: [R] retaining characters in a csv file

Daniel Nordlund Wed, 23 Sep 2015 12:08:16 -0700

On 9/23/2015 5:57 AM, Therneau, Terry M., Ph.D. wrote:

Thanks for all for the comments, I hadn't intended to start a war.
My summary:
1. Most important: I wasn't missing something obvious. This isalways my first suspicion when I submit something to R-help, and it'strue more often than not.
2. Obviously (at least it is now), the CSV standard does not specifythat quotes should force a character result. R is not "wrong". Wrtto using what Excel does as litmus test, I consider that to be totallyuninformative about standards: neither pro (like Duncan) or anti (likeRolf), but simply irrelevant. (Like many MS choices.)
3. I'll have to code in my own solution, either pre-scan the firstfew lines to create a colClasses, or use read_csv from the readrlibrary (if there are leading zeros it keeps the string as character,which may suffice for my needs), or something else.
4. The source of the data is a "text/csv" field coming from an httpPOST request. This is an internal service on an internal Mayo serverand coded by our own IT department; this will not be the first casewhere I have found that their definition of "csv" is not quite standard.
Terry T.
On 23/09/15 10:00, Therneau, Terry M., Ph.D. wrote:
I have a csv file from an automatic process (so this will happen
thousands of times), for which the first row is a vector of variable
names and the second row often starts something like this:

5724550,"000202075214",2005.02.17,2005.02.17,"F", .....

Notice the second variable which is
       a character string (note the quotation marks)
       a sequence of numeric digits
       leading zeros are significant

The read.csv function insists on turning this into a numeric. Is there
any simple set of options that
will turn this behavior off?  I'm looking for a way to tell it to "obey
the bloody quotes" -- I still want the first, third, etc columns to
become numeric.  There can be more than one variable like this, and not
always in the second position.

This happens deep inside the httr library; there is an easy way for me
to add more options to the read.csv call but it is not so easy to
replace it with something else.
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

A fairly simple workaround is to add two lines of code to the process,and then add the colClasses parameter as you suggested in item 2 above.


want <- read.csv('yourfile', quote='', stringsAsFactors= FALSE, nrows=1)
classes <- sapply(want, class)
want <- read.csv('yourfile', stringsAsFactors= FALSE, colClasses=classes)

I don't know if you want your final file to convert strings to factors,so you can modify as needed. In addition, if your files aren't asregular as I inferred, you can increase the number of rows to read inthe first line to ensure getting the classes right.



Hope this is helpful,

Dan

--
Daniel Nordlund
Bothell, WA  USA

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] retaining characters in a csv file

Reply via email to