Le lundi 30 septembre 2013 à 08:38 -0500, Joshua Ulrich a écrit : > On Mon, Sep 30, 2013 at 7:33 AM, Milan Bouchet-Valat <nalimi...@club.fr> > wrote: > > Hi! > > > > > > It seems that read.table() in R 3.0.1 (Linux 64-bit) does not consider > > quoted integers as an acceptable value for columns for which > > colClasses="integer". But when colClasses is omitted, these columns are > > read as integer anyway. > > > > For example, let's consider a file named file.dat, containing: > > "1" > > "2" > > > >> read.table("file.dat", colClasses="integer") > > Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : > > scan() expected 'an integer' and got '"1"' > > > > But: > >> str(read.table("file.dat")) > > 'data.frame': 2 obs. of 1 variable: > > $ V1: int 1 2 > > > > The latter result is indeed documented in ?read.table: > > Unless ‘colClasses’ is specified, all columns are read as > > character columns and then converted using ‘type.convert’ to > > logical, integer, numeric, complex or (depending on ‘as.is’) > > factor as appropriate. Quotes are (by default) interpreted in all > > fields, so a column of values like ‘"42"’ will result in an > > integer column. > > > > > > Should the former behavior be considered a bug? > > > No. If you tell read.table the column is integer and it's actually > character on disk, it should be an error. All values in a CSV file are stored as characters on disk, disregarding the fact that they are surrounded by quotes or not. 1 is saved as 00110001 (ASCII character #49), not 00000001, nor 00000000 00000000 00000000 00000001 (as would for example imply a 32 bit storage of integers).
So, with all due respect, please refrain from formulating such blatantly erroneous statements. Regards > > This creates problems when combined with read.table.ffdf from package > > ff, since this function tries to guess the column classes by reading the > > first rows of the file, and then passes colClasses to read.table to read > > the remaining rows by chunks. A column of quoted integers is correctly > > detected as integer in the first read, but read.table() fails in > > subsequent reads. > > > This sounds like a issue with read.table.ffdf. The column of quoted > integers is *incorrectly* detected as integer because they're actually > character on disk. read.table.ffdf should rely on how the data are > actually stored on disk (via as.is=TRUE), not how read.table might > convert them once they're read into R. > > > > > Regards > > > > ______________________________________________ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > -- > Joshua Ulrich | about.me/joshuaulrich > FOSS Trading | www.fosstrading.com ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel