On Mon, Sep 30, 2013 at 5:33 AM, Milan Bouchet-Valat <nalimi...@club.fr> wrote: > Hi! > > > It seems that read.table() in R 3.0.1 (Linux 64-bit) does not consider > quoted integers as an acceptable value for columns for which > colClasses="integer". But when colClasses is omitted, these columns are > read as integer anyway. > > For example, let's consider a file named file.dat, containing: > "1" > "2" > >> read.table("file.dat", colClasses="integer") > Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : > scan() expected 'an integer' and got '"1"' > > But: >> str(read.table("file.dat")) > 'data.frame': 2 obs. of 1 variable: > $ V1: int 1 2 > > The latter result is indeed documented in ?read.table: > Unless ‘colClasses’ is specified, all columns are read as > character columns and then converted using ‘type.convert’ to > logical, integer, numeric, complex or (depending on ‘as.is’) > factor as appropriate. Quotes are (by default) interpreted in all > fields, so a column of values like ‘"42"’ will result in an > integer column. > > > Should the former behavior be considered a bug? > > This creates problems when combined with read.table.ffdf from package > ff, since this function tries to guess the column classes by reading the > first rows of the file, and then passes colClasses to read.table to read > the remaining rows by chunks. A column of quoted integers is correctly > detected as integer in the first read, but read.table() fails in > subsequent reads.
The readDataFrame() of the R.filesets package provides argument 'trimQuotes' for this exact reason, i.e. for the purpose of trimming quotes of columns for which 'colClasses' specifies a numeric type before passing on to read.table(). Feel free to borrow from its source code for a patch to ff:read.table.ffdf(). The workaround is in readDataFrame() for TabularTextFile [https://r-forge.r-project.org/scm/viewvc.php/pkg/R.filesets/R/TabularTextFile.R?view=markup&root=r-dots]; look for the part that starts with: # SPECIAL CASE/WORKAROUND: read.table()/scan() will give an error # if a numeric value is quoted and 'colClasses' specifies it as # a numeric value. In order to read such values, we need to remove # the quotes first. /HB 2011-07-13 /Henrik (author of R.filesets) > > > Regards > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel