Regardless of whether "stored as character" is interpreted the R way or the
ASCII way, the point Joshua makes is rather valid. Mainly because
read.table has an argument quote with default value \"'. This means that at
least according to R, everything between either " or ' should be seen as of
type character and not integer.

The only way these quotes can end up in a .csv file, is when in the
rendering program (often Excel), these integers are called "character"
inside the program as well. So they're not treated as integers by the
person that created the file, so R won't treat them
as integers either. Note that read.table does read the quoted integers as
characters, and only afterwards convert those.

So yes, this is an issue with read.table.ffdf more than with R itself. And
the problem is indeed how integers are treated *the moment they are stored*.
This refering to the presence/absence of the quote character.

Regards
Joris


On Mon, Sep 30, 2013 at 4:45 PM, Milan Bouchet-Valat <nalimi...@club.fr>wrote:

> Le lundi 30 septembre 2013 à 08:38 -0500, Joshua Ulrich a écrit :
> > On Mon, Sep 30, 2013 at 7:33 AM, Milan Bouchet-Valat <nalimi...@club.fr>
> wrote:
> > > Hi!
> > >
> > >
> > > It seems that read.table() in R 3.0.1 (Linux 64-bit) does not consider
> > > quoted integers as an acceptable value for columns for which
> > > colClasses="integer". But when colClasses is omitted, these columns are
> > > read as integer anyway.
> > >
> > > For example, let's consider a file named file.dat, containing:
> > > "1"
> > > "2"
> > >
> > >> read.table("file.dat", colClasses="integer")
> > > Error in scan(file, what, nmax, sep, dec, quote, skip, nlines,
> na.strings, :
> > >   scan() expected 'an integer' and got '"1"'
> > >
> > > But:
> > >> str(read.table("file.dat"))
> > > 'data.frame':   2 obs. of  1 variable:
> > >  $ V1: int  1 2
> > >
> > > The latter result is indeed documented in ?read.table:
> > >      Unless ‘colClasses’ is specified, all columns are read as
> > >      character columns and then converted using ‘type.convert’ to
> > >      logical, integer, numeric, complex or (depending on ‘as.is’)
> > >      factor as appropriate.  Quotes are (by default) interpreted in all
> > >      fields, so a column of values like ‘"42"’ will result in an
> > >      integer column.
> > >
> > >
> > > Should the former behavior be considered a bug?
> > >
> > No. If you tell read.table the column is integer and it's actually
> > character on disk, it should be an error.
> All values in a CSV file are stored as characters on disk, disregarding
> the fact that they are surrounded by quotes or not. 1 is saved as
> 00110001 (ASCII character #49), not 00000001, nor 00000000 00000000
> 00000000 00000001 (as would for example imply a 32 bit storage of
> integers).
>
> So, with all due respect, please refrain from formulating such blatantly
> erroneous statements.
>
>
> Regards
>
>
> > > This creates problems when combined with read.table.ffdf from package
> > > ff, since this function tries to guess the column classes by reading
> the
> > > first rows of the file, and then passes colClasses to read.table to
> read
> > > the remaining rows by chunks. A column of quoted integers is correctly
> > > detected as integer in the first read, but read.table() fails in
> > > subsequent reads.
> > >
> > This sounds like a issue with read.table.ffdf.  The column of quoted
> > integers is *incorrectly* detected as integer because they're actually
> > character on disk.  read.table.ffdf should rely on how the data are
> > actually stored on disk (via as.is=TRUE), not how read.table might
> > convert them once they're read into R.
> >
> > >
> > > Regards
> > >
> > > ______________________________________________
> > > R-devel@r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> > --
> > Joshua Ulrich  |  about.me/joshuaulrich
> > FOSS Trading  |  www.fosstrading.com
>
> ______________________________________________
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Mathematical Modelling, Statistics and Bio-Informatics

tel : +32 9 264 59 87
joris.m...@ugent.be
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

        [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to