Re: [R] big data?

Spencer Graves Thu, 07 Aug 2014 10:51:12 -0700

correcting a typo (400 MB, not GB. Thanks to David Winsemius forreporting it). Spencer


###############

Thanks to all who replied. For the record, I will summarize herewhat I tried and what I learned:

Mike Harwood suggested the ff package. David Winsemius suggesteddata.table and colbycol. Peter Langfelder suggested sqldf.

sqldf::read.csv.sql allowed me to create an SQL command to read acolumn or a subset of the rows of a 400 MB tab-delimited file in roughlya minute on a 2.3 GHz dual core machine running Windows 7 with 8 GB RAM.It also read a column of a 1.3 GB file in 4 minutes. Thedocumentation was sufficient to allow me to easily get what I wantedwith a minimum of effort.

If I needed to work with these data regularly, I might experimentwith colbycol and ff: The documentation suggested to me that thesepackages might allow me to get quicker answers from routine tasks aftersome preprocessing. Of course, I could also do the preprocessingmanually with sqldf.



      Thanks, again.
      Spencer


On 8/6/2014 9:39 AM, Mike Harwood wrote:

The read.table.ffdf function in the ff package can read in delimited files
and store them to disk as individual columns.  The ffbase package provides
additional data management and analytic functionality.  I have used these
packages on 15 Gb files of 18 million rows and 250 columns.


On Tuesday, August 5, 2014 1:39:03 PM UTC-5, David Winsemius wrote:


On Aug 5, 2014, at 10:20 AM, Spencer Graves wrote:

      What tools do you like for working with tab delimited text files up

to 1.5 GB (under Windows 7 with 8 GB RAM)?

?data.table::fread

      Standard tools for smaller data sometimes grab all the available

RAM, after which CPU usage drops to 3% ;-)


      The "bigmemory" project won the 2010 John Chambers Award but "is

not available (for R version 3.1.0)".


      findFn("big data", 999) downloaded 961 links in 437 packages. That

contains tools for data PostgreSQL and other formats, but I couldn't find
anything for large tab delimited text files.


      Absent a better idea, I plan to write a function getField to

extract a specific field from the data, then use that to split the data
into 4 smaller files, which I think should be small enough that I can do
what I want.

There is the colbycol package with which I have no experience, but I
understand it is designed to partition data into column sized objects.
#--- from its help file-----
cbc.get.col {colbycol}        R Documentation
Reads a single column from the original file into memory

Description

Function cbc.read.table reads a file, stores it column by column in disk
file and creates a colbycol object. Functioncbc.get.col queries this object
and returns a single column.

      Thanks,
      Spencer

______________________________________________
r-h...@r-project.org <javascript:> mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

______________________________________________
r-h...@r-project.org <javascript:> mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] big data?

Reply via email to