Re: [R] read.table as integer

Gabor Grothendieck Fri, 13 Jan 2012 06:22:39 -0800

On Fri, Jan 13, 2012 at 7:02 AM, Francisco <franciscororol...@google.com> wrote:
> Hello,
> I have a csv file with many variables, both characters and integers.
> I would like to load it on R and do some operations on integer variables,
> the problem is that R loads the entire dataset considering all variables as
> characters, instead I would like that R makes the distinction between the
> two types, because there are too many variables to do:
> x1<-as.integer(x1)
> x2<-as.integer(x2)
> x3<-as.integer(x3)
> ...
>
> I tried to specify read.table(... stringsAsFactors=FALSE) but it doesn't
> work.


There must be non-integers in some of the columns that are supposed to
be integer.  Lets assume that the first row has no such garbage.  Then
we can get the desired classes from that row and apply it to the
entire data frame.  In this example the second column has such
garbage:

# test data
Lines <- "a,b,c
D,2,3
a,b,9
C,5,6"

# read in just row 1 and read in all rows
DF1 <- read.csv(text = Lines, nrow = 1, as.is = TRUE)
DF <- DF0 <- read.csv(text = Lines, as.is = TRUE)

# there will warning as its converting garbage to NAs
to.int <- function(v, v1) if (inherits(v1, "integer")) as.integer(v) else v
DF <- mapply(to.int, DF0, DF1, SIMPLIFY = FALSE)
DF <- as.data.frame(DF)

As we see here the second column becomes integer despite garbage in it:

> str(DF0) # as read in
'data.frame':   3 obs. of  3 variables:
 $ a: chr  "D" "a" "C"
 $ b: chr  "2" "b" "5"
 $ c: int  3 9 6
> str(DF) # as converted
'data.frame':   3 obs. of  3 variables:
 $ a: Factor w/ 3 levels "a","C","D": 3 1 2
 $ b: int  2 NA 5
 $ c: int  3 9 6
-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] read.table as integer

Reply via email to