Re: [R] Reading in csv data with ff package

2013-11-19 Thread Jan van der Laan

The following seems to work:

data = read.csv.ffdf(x=NULL,file=data.csv,nrows=1001,first.rows = 500,
  next.rows = 1005,sep=,,colClasses = c(integer,factor,logical))


'character' doesn't work because ff does not support character  
vectors. Character vector need to be stored as factors. The  
disadvantage of that is that the levels are stored in memory, so if  
the number of levels is very large (e.g. with unique strings) you  
might still run into memory problems.


'integer' doesn't work because read.csv.ffdf passes the colClasses on  
to read.table, which then tries to converts your second column to  
integer which it can't.


Jan



Nick McClure nfmccl...@gmail.com schreef:


I've spent some time trying to wrap my head around reading in large csv
files with the ff-package.  I think I know how to do it, but am bumping
into some problems.  I've tried to recreate the issues as best as I can
with a smaller example and maybe someone can help explain the problems.

The following code just creates a csv file with an integer column,
character column and logical column.
-
library(ff)
#Create data
size = 2000
fake.data =
data.frame(Integer=round(10*runif(size)),Character=sample(LETTERS,size,replace=T),Logical=sample(c(T,F),size,replace=T))

#Write to csv
write.csv(fake.data,data.csv,row.names=F)
-

Now to read it in as a 'ffdf' class, I can do the following:

-
data = read.csv.ffdf(x=NULL,file=data.csv,nrows=1001,first.rows = 500,
next.rows = 1005,sep=,)
-

That works.  But with my current large data set, read.csv.ffdf is debating
with me about the classes it's importing. I was also messing around with
the first.rows/next.rows, but that's a question for another time. So I'll
try to load the data in, specifying the column types (same exact command,
except with specifying colClasses):

-

data = read.csv.ffdf(x=NULL,file=data.csv,nrows=1001,first.rows =  
500, next.rows = 1005,sep=,,colClasses =  
c(integer,integer,logical))Error in scan(file, what, nmax,  
sep, dec, quote, skip, nlines, na.strings,  :

  scan() expected 'an integer', got 'J' data =
read.csv.ffdf(x=NULL,file=data.csv,nrows=1001,first.rows = 500,
next.rows = 1005,sep=,,colClasses =
c(integer,character,logical))Error in ff(initdata = initdata,
length = length, levels = levels, ordered = ordered,  :
  vmode 'character' not implemented data =
read.csv.ffdf(x=NULL,file=data.csv,nrows=1001,first.rows = 500,
next.rows = 1005,sep=,,colClasses = rep(character,3))Error in
ff(initdata = initdata, length = length, levels = levels, ordered =
ordered,  :
  vmode 'character' not implemented data =
read.csv.ffdf(x=NULL,file=data.csv,nrows=1001,first.rows = 500,
next.rows = 1005,sep=,,colClasses = rep(raw,3))Error in scan(file,
what, nmax, sep, dec, quote, skip, nlines, na.strings,  :
  scan() expected 'a raw', got '8601'

-
I just can't find a combination of classes that will result in this reading
in.  I really don't understand why the classes 'character' won't work for
all of them.  Any thoughts as to why?  I appreciate the help and time.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Reading in csv data with ff package

2013-11-18 Thread Nick McClure
I've spent some time trying to wrap my head around reading in large csv
files with the ff-package.  I think I know how to do it, but am bumping
into some problems.  I've tried to recreate the issues as best as I can
with a smaller example and maybe someone can help explain the problems.

The following code just creates a csv file with an integer column,
character column and logical column.
-
library(ff)
#Create data
size = 2000
fake.data =
data.frame(Integer=round(10*runif(size)),Character=sample(LETTERS,size,replace=T),Logical=sample(c(T,F),size,replace=T))

#Write to csv
write.csv(fake.data,data.csv,row.names=F)
-

Now to read it in as a 'ffdf' class, I can do the following:

-
data = read.csv.ffdf(x=NULL,file=data.csv,nrows=1001,first.rows = 500,
next.rows = 1005,sep=,)
-

That works.  But with my current large data set, read.csv.ffdf is debating
with me about the classes it's importing. I was also messing around with
the first.rows/next.rows, but that's a question for another time. So I'll
try to load the data in, specifying the column types (same exact command,
except with specifying colClasses):

-

 data = read.csv.ffdf(x=NULL,file=data.csv,nrows=1001,first.rows = 500, 
 next.rows = 1005,sep=,,colClasses = c(integer,integer,logical))Error 
 in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :
  scan() expected 'an integer', got 'J' data =
read.csv.ffdf(x=NULL,file=data.csv,nrows=1001,first.rows = 500,
next.rows = 1005,sep=,,colClasses =
c(integer,character,logical))Error in ff(initdata = initdata,
length = length, levels = levels, ordered = ordered,  :
  vmode 'character' not implemented data =
read.csv.ffdf(x=NULL,file=data.csv,nrows=1001,first.rows = 500,
next.rows = 1005,sep=,,colClasses = rep(character,3))Error in
ff(initdata = initdata, length = length, levels = levels, ordered =
ordered,  :
  vmode 'character' not implemented data =
read.csv.ffdf(x=NULL,file=data.csv,nrows=1001,first.rows = 500,
next.rows = 1005,sep=,,colClasses = rep(raw,3))Error in scan(file,
what, nmax, sep, dec, quote, skip, nlines, na.strings,  :
  scan() expected 'a raw', got '8601'

-
I just can't find a combination of classes that will result in this reading
in.  I really don't understand why the classes 'character' won't work for
all of them.  Any thoughts as to why?  I appreciate the help and time.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.