Regardless of whether "stored as character" is interpreted the R way or the ASCII way, the point Joshua makes is rather valid. Mainly because read.table has an argument quote with default value \"'. This means that at least according to R, everything between either " or ' should be seen as of type character and not integer.
The only way these quotes can end up in a .csv file, is when in the rendering program (often Excel), these integers are called "character" inside the program as well. So they're not treated as integers by the person that created the file, so R won't treat them as integers either. Note that read.table does read the quoted integers as characters, and only afterwards convert those. So yes, this is an issue with read.table.ffdf more than with R itself. And the problem is indeed how integers are treated *the moment they are stored*. This refering to the presence/absence of the quote character. Regards Joris On Mon, Sep 30, 2013 at 4:45 PM, Milan Bouchet-Valat <nalimi...@club.fr>wrote: > Le lundi 30 septembre 2013 à 08:38 -0500, Joshua Ulrich a écrit : > > On Mon, Sep 30, 2013 at 7:33 AM, Milan Bouchet-Valat <nalimi...@club.fr> > wrote: > > > Hi! > > > > > > > > > It seems that read.table() in R 3.0.1 (Linux 64-bit) does not consider > > > quoted integers as an acceptable value for columns for which > > > colClasses="integer". But when colClasses is omitted, these columns are > > > read as integer anyway. > > > > > > For example, let's consider a file named file.dat, containing: > > > "1" > > > "2" > > > > > >> read.table("file.dat", colClasses="integer") > > > Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, > na.strings, : > > > scan() expected 'an integer' and got '"1"' > > > > > > But: > > >> str(read.table("file.dat")) > > > 'data.frame': 2 obs. of 1 variable: > > > $ V1: int 1 2 > > > > > > The latter result is indeed documented in ?read.table: > > > Unless colClasses is specified, all columns are read as > > > character columns and then converted using type.convert to > > > logical, integer, numeric, complex or (depending on as.is) > > > factor as appropriate. Quotes are (by default) interpreted in all > > > fields, so a column of values like "42" will result in an > > > integer column. > > > > > > > > > Should the former behavior be considered a bug? > > > > > No. If you tell read.table the column is integer and it's actually > > character on disk, it should be an error. > All values in a CSV file are stored as characters on disk, disregarding > the fact that they are surrounded by quotes or not. 1 is saved as > 00110001 (ASCII character #49), not 00000001, nor 00000000 00000000 > 00000000 00000001 (as would for example imply a 32 bit storage of > integers). > > So, with all due respect, please refrain from formulating such blatantly > erroneous statements. > > > Regards > > > > > This creates problems when combined with read.table.ffdf from package > > > ff, since this function tries to guess the column classes by reading > the > > > first rows of the file, and then passes colClasses to read.table to > read > > > the remaining rows by chunks. A column of quoted integers is correctly > > > detected as integer in the first read, but read.table() fails in > > > subsequent reads. > > > > > This sounds like a issue with read.table.ffdf. The column of quoted > > integers is *incorrectly* detected as integer because they're actually > > character on disk. read.table.ffdf should rely on how the data are > > actually stored on disk (via as.is=TRUE), not how read.table might > > convert them once they're read into R. > > > > > > > > Regards > > > > > > ______________________________________________ > > > R-devel@r-project.org mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > -- > > Joshua Ulrich | about.me/joshuaulrich > > FOSS Trading | www.fosstrading.com > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Mathematical Modelling, Statistics and Bio-Informatics tel : +32 9 264 59 87 joris.m...@ugent.be ------------------------------- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]]
______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel