On Sep 30, 2013, at 6:38 AM, Joshua Ulrich wrote: > On Mon, Sep 30, 2013 at 7:33 AM, Milan Bouchet-Valat <nalimi...@club.fr> > wrote: >> Hi! >> >> >> It seems that read.table() in R 3.0.1 (Linux 64-bit) does not consider >> quoted integers as an acceptable value for columns for which >> colClasses="integer". But when colClasses is omitted, these columns are >> read as integer anyway. >> >> For example, let's consider a file named file.dat, containing: >> "1" >> "2" >> >>> read.table("file.dat", colClasses="integer") >> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : >> scan() expected 'an integer' and got '"1"' >> >> But: >>> str(read.table("file.dat")) >> 'data.frame': 2 obs. of 1 variable: >> $ V1: int 1 2 >> >> The latter result is indeed documented in ?read.table: >> Unless ‘colClasses’ is specified, all columns are read as >> character columns and then converted using ‘type.convert’ to >> logical, integer, numeric, complex or (depending on ‘as.is’) >> factor as appropriate. Quotes are (by default) interpreted in all >> fields, so a column of values like ‘"42"’ will result in an >> integer column. >> >> >> Should the former behavior be considered a bug? >> > No. If you tell read.table the column is integer and it's actually > character on disk, it should be an error.
My reading of the `read.table` help page is that one should expect that when there is an 'integer'-class and an `as.integer` function and "integer" is the argument to colClasses, that `as.integer` will be applied to the values in the column. Should I be reading elsewhere? -- David. > >> This creates problems when combined with read.table.ffdf from package >> ff, since this function tries to guess the column classes by reading the >> first rows of the file, and then passes colClasses to read.table to read >> the remaining rows by chunks. A column of quoted integers is correctly >> detected as integer in the first read, but read.table() fails in >> subsequent reads. >> > This sounds like a issue with read.table.ffdf. The column of quoted > integers is *incorrectly* detected as integer because they're actually > character on disk. read.table.ffdf should rely on how the data are > actually stored on disk (via as.is=TRUE), not how read.table might > convert them once they're read into R. > >> >> Regards >> >> ______________________________________________ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > > -- > Joshua Ulrich | about.me/joshuaulrich > FOSS Trading | www.fosstrading.com > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel David Winsemius Alameda, CA, USA ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel