On 13-10-04 7:31 AM, Joshua Ulrich wrote:
On Tue, Oct 1, 2013 at 11:29 AM, David Winsemius <dwinsem...@comcast.net> wrote:

On Sep 30, 2013, at 6:38 AM, Joshua Ulrich wrote:

On Mon, Sep 30, 2013 at 7:33 AM, Milan Bouchet-Valat <nalimi...@club.fr> wrote:
Hi!


It seems that read.table() in R 3.0.1 (Linux 64-bit) does not consider
quoted integers as an acceptable value for columns for which
colClasses="integer". But when colClasses is omitted, these columns are
read as integer anyway.

For example, let's consider a file named file.dat, containing:
"1"
"2"

read.table("file.dat", colClasses="integer")
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
  scan() expected 'an integer' and got '"1"'

But:
str(read.table("file.dat"))
'data.frame':   2 obs. of  1 variable:
$ V1: int  1 2

The latter result is indeed documented in ?read.table:
     Unless ‘colClasses’ is specified, all columns are read as
     character columns and then converted using ‘type.convert’ to
     logical, integer, numeric, complex or (depending on ‘as.is’)
     factor as appropriate.  Quotes are (by default) interpreted in all
     fields, so a column of values like ‘"42"’ will result in an
     integer column.


Should the former behavior be considered a bug?

No. If you tell read.table the column is integer and it's actually
character on disk, it should be an error.

My reading of the `read.table` help page is that one should expect that when
there is an 'integer'-class and an  `as.integer` function and  "integer" is the
argument to colClasses, that `as.integer` will be applied to the values in the
column. Should I be reading elsewhere?

I assume you're referring to the paragraph below.

   Possible values are ‘NA’ (the default, when ‘type.convert’ is
   used), ‘"NULL"’ (when the column is skipped), one of the
   atomic vector classes (logical, integer, numeric, complex,
   character, raw), or ‘"factor"’, ‘"Date"’ or ‘"POSIXct"’.
   Otherwise there needs to be an ‘as’ method (from package
   ‘methods’) for conversion from ‘"character"’ to the specified
   formal class.

I read that as meaning that an "as" method is required for classes not
already listed in the prior sentence.  It doesn't say an "as" method
will be applied if colClasses is one of the atomic, factor, Date, or
POSIXct classes; but I can see how you might assume that, since all
the atomic, factor, Date, and POSIXct classes already have "as"
methods...

And this does suggest a workaround for ffdf: instead of declaring the class to be "integer", declare a class "ffdf_integer", and write a conversion method. Or simply read everything as character and call as.integer() explicitly.

Duncan Murdoch

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to