On Fri, Oct 4, 2013 at 9:15 AM, peter dalgaard <pda...@gmail.com> wrote: > > On Oct 4, 2013, at 17:10 , Henrik Bengtsson wrote: > >> On Fri, Oct 4, 2013 at 4:55 AM, Duncan Murdoch <murdoch.dun...@gmail.com> >> wrote: >>> On 13-10-04 7:31 AM, Joshua Ulrich wrote: >>>> >>>> On Tue, Oct 1, 2013 at 11:29 AM, David Winsemius <dwinsem...@comcast.net> >>>> wrote: >>>>> >>>>> >>>>> On Sep 30, 2013, at 6:38 AM, Joshua Ulrich wrote: >>>>> >>>>>> On Mon, Sep 30, 2013 at 7:33 AM, Milan Bouchet-Valat <nalimi...@club.fr> >>>>>> wrote: >>>>>>> >>>>>>> Hi! >>>>>>> >>>>>>> >>>>>>> It seems that read.table() in R 3.0.1 (Linux 64-bit) does not consider >>>>>>> quoted integers as an acceptable value for columns for which >>>>>>> colClasses="integer". But when colClasses is omitted, these columns are >>>>>>> read as integer anyway. >>>>>>> >>>>>>> For example, let's consider a file named file.dat, containing: >>>>>>> "1" >>>>>>> "2" >>>>>>> >>>>>>>> read.table("file.dat", colClasses="integer") >>>>>>> >>>>>>> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, >>>>>>> na.strings, : >>>>>>> scan() expected 'an integer' and got '"1"' >>>>>>> >>>>>>> But: >>>>>>>> >>>>>>>> str(read.table("file.dat")) >>>>>>> >>>>>>> 'data.frame': 2 obs. of 1 variable: >>>>>>> $ V1: int 1 2 >>>>>>> >>>>>>> The latter result is indeed documented in ?read.table: >>>>>>> Unless ‘colClasses’ is specified, all columns are read as >>>>>>> character columns and then converted using ‘type.convert’ to >>>>>>> logical, integer, numeric, complex or (depending on ‘as.is’) >>>>>>> factor as appropriate. Quotes are (by default) interpreted in all >>>>>>> fields, so a column of values like ‘"42"’ will result in an >>>>>>> integer column. >>>>>>> >>>>>>> >>>>>>> Should the former behavior be considered a bug? >>>>>>> >>>>>> No. If you tell read.table the column is integer and it's actually >>>>>> character on disk, it should be an error. >>>>> >>>>> >>>>> My reading of the `read.table` help page is that one should expect that >>>>> when >>>>> there is an 'integer'-class and an `as.integer` function and "integer" >>>>> is the >>>>> argument to colClasses, that `as.integer` will be applied to the values >>>>> in the >>>>> column. Should I be reading elsewhere? >>>>> >>>> I assume you're referring to the paragraph below. >>>> >>>> Possible values are ‘NA’ (the default, when ‘type.convert’ is >>>> used), ‘"NULL"’ (when the column is skipped), one of the >>>> atomic vector classes (logical, integer, numeric, complex, >>>> character, raw), or ‘"factor"’, ‘"Date"’ or ‘"POSIXct"’. >>>> Otherwise there needs to be an ‘as’ method (from package >>>> ‘methods’) for conversion from ‘"character"’ to the specified >>>> formal class. >>>> >>>> I read that as meaning that an "as" method is required for classes not >>>> already listed in the prior sentence. It doesn't say an "as" method >>>> will be applied if colClasses is one of the atomic, factor, Date, or >>>> POSIXct classes; but I can see how you might assume that, since all >>>> the atomic, factor, Date, and POSIXct classes already have "as" >>>> methods... >>> >>> >>> And this does suggest a workaround for ffdf: instead of declaring the class >>> to be "integer", declare a class "ffdf_integer", and write a conversion >>> method. Or simply read everything as character and call as.integer() >>> explicitly. >> >> Just a note of concert since several proposed it: > > concerN?
Ah, yet again, that beautiful music I always hear in my head when I read R-devel. > >> colClasses="character") followed by as.integer() or strtoi() misses >> the validation, e.g. "foo" will be turned into NA_integer_. Using >> read.table() or scan() gives an error. > > The obvious fix for that would seem to be to use scan() on the character > vector: > >> y <- c("1","2",3,4,5) >> y > [1] "1" "2" "3" "4" "5" >> scan(text=y) > Read 5 items > [1] 1 2 3 4 5 >> y <- c("1","2",3,4,"NA") >> scan(text=y) > Read 5 items > [1] 1 2 3 4 NA >> y <- c("1","2",3,4,"foo") >> scan(text=y) > Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : > scan() expected 'a real', got 'foo' Yep, that's also what I proposed above, though it could have been more explicit. See also an earlier reply of mine where I refer to code of readDataFrame for TabularTextFile [[https://r-forge.r-project.org/scm/viewvc.php/pkg/R.filesets/R/TabularTextFile.R?view=markup&root=r-dots] doing this (as an illustration for OP). /H > > >> >> /Henrik >> >>> >>> Duncan Murdoch >>> >>> >>> ______________________________________________ >>> R-devel@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >> >> ______________________________________________ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > > -- > Peter Dalgaard, Professor, > Center for Statistics, Copenhagen Business School > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > Phone: (+45)38153501 > Email: pd....@cbs.dk Priv: pda...@gmail.com > > > > > > > > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel