On Fri, Oct 4, 2013 at 4:55 AM, Duncan Murdoch <murdoch.dun...@gmail.com> wrote: > On 13-10-04 7:31 AM, Joshua Ulrich wrote: >> >> On Tue, Oct 1, 2013 at 11:29 AM, David Winsemius <dwinsem...@comcast.net> >> wrote: >>> >>> >>> On Sep 30, 2013, at 6:38 AM, Joshua Ulrich wrote: >>> >>>> On Mon, Sep 30, 2013 at 7:33 AM, Milan Bouchet-Valat <nalimi...@club.fr> >>>> wrote: >>>>> >>>>> Hi! >>>>> >>>>> >>>>> It seems that read.table() in R 3.0.1 (Linux 64-bit) does not consider >>>>> quoted integers as an acceptable value for columns for which >>>>> colClasses="integer". But when colClasses is omitted, these columns are >>>>> read as integer anyway. >>>>> >>>>> For example, let's consider a file named file.dat, containing: >>>>> "1" >>>>> "2" >>>>> >>>>>> read.table("file.dat", colClasses="integer") >>>>> >>>>> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, >>>>> na.strings, : >>>>> scan() expected 'an integer' and got '"1"' >>>>> >>>>> But: >>>>>> >>>>>> str(read.table("file.dat")) >>>>> >>>>> 'data.frame': 2 obs. of 1 variable: >>>>> $ V1: int 1 2 >>>>> >>>>> The latter result is indeed documented in ?read.table: >>>>> Unless ‘colClasses’ is specified, all columns are read as >>>>> character columns and then converted using ‘type.convert’ to >>>>> logical, integer, numeric, complex or (depending on ‘as.is’) >>>>> factor as appropriate. Quotes are (by default) interpreted in all >>>>> fields, so a column of values like ‘"42"’ will result in an >>>>> integer column. >>>>> >>>>> >>>>> Should the former behavior be considered a bug? >>>>> >>>> No. If you tell read.table the column is integer and it's actually >>>> character on disk, it should be an error. >>> >>> >>> My reading of the `read.table` help page is that one should expect that >>> when >>> there is an 'integer'-class and an `as.integer` function and "integer" >>> is the >>> argument to colClasses, that `as.integer` will be applied to the values >>> in the >>> column. Should I be reading elsewhere? >>> >> I assume you're referring to the paragraph below. >> >> Possible values are ‘NA’ (the default, when ‘type.convert’ is >> used), ‘"NULL"’ (when the column is skipped), one of the >> atomic vector classes (logical, integer, numeric, complex, >> character, raw), or ‘"factor"’, ‘"Date"’ or ‘"POSIXct"’. >> Otherwise there needs to be an ‘as’ method (from package >> ‘methods’) for conversion from ‘"character"’ to the specified >> formal class. >> >> I read that as meaning that an "as" method is required for classes not >> already listed in the prior sentence. It doesn't say an "as" method >> will be applied if colClasses is one of the atomic, factor, Date, or >> POSIXct classes; but I can see how you might assume that, since all >> the atomic, factor, Date, and POSIXct classes already have "as" >> methods... > > > And this does suggest a workaround for ffdf: instead of declaring the class > to be "integer", declare a class "ffdf_integer", and write a conversion > method. Or simply read everything as character and call as.integer() > explicitly.
Just a note of concert since several proposed it: colClasses="character") followed by as.integer() or strtoi() misses the validation, e.g. "foo" will be turned into NA_integer_. Using read.table() or scan() gives an error. /Henrik > > Duncan Murdoch > > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel