subject:"\[R\] Reading large files"

Re: [R] Reading large files with R

2019-09-01 Thread Martin Møller Skarbiniks Pedersen

On Sun, 1 Sep 2019 at 21:53, Duncan Murdoch 
wrote:

> On 01/09/2019 3:06 p.m., Martin Møller Skarbiniks Pedersen wrote:
> > Hi,
> >
> >I am trying to read yaml-file which is not so large (7 GB) and I have
> > plenty of memory.
>
>

> Individual elements in character vectors have a size limit of 2^31-1.
> The read_yaml() function is putting the whole file into one element, and
> that's failing.
>
>
Oh. I didn't know that. But ok, why would anyone create a
a single character vector so big ...

You probably have a couple of choices:
>
>   - Rewrite read_yaml() so it doesn't try to do that.  This is likely
> hard, because most of the work is being done by a C routine, but it's
> conceivable you could use the stringi::stri_read_raw function to do the
> reading, and convince the C routine to handle the raw value instead of a
> character value.
>

I actually might do that in the future.

  - Find a way to split up your file into smaller pieces.
>

Yes, that will be my first solution. Most YAML is easier to parse without
pasting all lines together (crazy!)

> Duncan Murdoch
>

Thanks for pointing me in the right direction.

/Martin

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reading large files with R

2019-09-01 Thread Duncan Murdoch


On 01/09/2019 3:06 p.m., Martin Møller Skarbiniks Pedersen wrote:

Hi,

   I am trying to read yaml-file which is not so large (7 GB) and I have
plenty of memory.
However I get this error:

$  R --version
R version 3.6.1 (2019-07-05) -- "Action of the Toes"
Copyright (C) 2019 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

library(yaml)
keys <- read_yaml("/data/gpg/gpg-keys.yaml")

Error in paste(readLines(file), collapse = "\n") :
   result would exceed 2^31-1 bytes

2^31-1 is only 2GB.

Please advise,

Regards
Martin


Individual elements in character vectors have a size limit of 2^31-1. 
The read_yaml() function is putting the whole file into one element, and 
that's failing.


You probably have a couple of choices:

 - Rewrite read_yaml() so it doesn't try to do that.  This is likely 
hard, because most of the work is being done by a C routine, but it's 
conceivable you could use the stringi::stri_read_raw function to do the 
reading, and convince the C routine to handle the raw value instead of a 
character value.


 - Find a way to split up your file into smaller pieces.

Duncan Murdoch

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Reading large files with R

2019-09-01 Thread Martin Møller Skarbiniks Pedersen

Hi,

  I am trying to read yaml-file which is not so large (7 GB) and I have
plenty of memory.
However I get this error:

$  R --version
R version 3.6.1 (2019-07-05) -- "Action of the Toes"
Copyright (C) 2019 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

library(yaml)
keys <- read_yaml("/data/gpg/gpg-keys.yaml")

Error in paste(readLines(file), collapse = "\n") :
  result would exceed 2^31-1 bytes

2^31-1 is only 2GB.

Please advise,

Regards
Martin

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reading large files

2010-02-06 Thread Saptarshi Guha

Hello,
Do you need /all/ the data in memory at one time? Is your goal to
divide the data (e.g according to some factor /or/ some function of
the columns of data set ) and then analyze the divisions? And then,
possibly, combine the results ?
If so, you might consider using Rhipe. We have analyzed (e.g get
regression parameters, apply algorithms) across subsets of data where
the subsets are created according to some condition.
Using this approach(and a cluster of 8 machines, 72 cores) we
successfully analyzed data sets ranging from 14GB to ~140GB .
This all assumes  that your divisions are suitably small - i notice
you mention that each region is 10-20 GB and you want to compute on
/all/ i.e you need all of it in memory. If so, Rhipe cannot help you.


Regards
Saptarshi



On Thu, Feb 4, 2010 at 8:27 PM, Vadlamani, Satish {FLNA}
 wrote:
> Folks:
> I am trying to read in a large file. Definition of large is:
> Number of lines: 333, 250
> Size: 850 MB
>
> The maching is a dual core intel, with 4 GB RAM and nothing else running on 
> it. I read the previous threads on read.fwf and did not see any conclusive 
> statements on how to read fast. Example record and R code given below. I was 
> hoping to purchase a better machine and do analysis with larger datasets - 
> but these preliminary results do not look good.
>
> Does anyone have any experience with large files (> 1GB) and using them with 
> Revolution-R?
>
>
> Thanks.
>
> Satish
>
> Example Code
> key_vec <- c(1,3,3,4,2,8,8,2,2,3,2,2,1,3,3,3,3,9)
> key_names <- 
> c("allgeo","area1","zone","dist","ccust1","whse","bindc","ccust2","account","area2","ccust3","customer","allprod","cat","bu","class","size","bdc")
> key_info <- data.frame(key_vec,key_names)
> col_names <- c(key_names,sas_time$week)
> num_buckets <- rep(12,209)
> width_vec = c(key_vec,num_buckets)
> col_classes<-c(rep("factor",18),rep("numeric",209))
> #threewkoutstat <- 
> read.fwf(file="3wkoutstatfcst_file02.dat",widths=width_vec,header=FALSE,colClasses=col_classes,n=100)
> threewkoutstat <- 
> read.fwf(file="3wkoutstatfcst_file02.dat",widths=width_vec,header=FALSE,colClasses=col_classes)
> names(threewkoutstat) <- col_names
>
> Example record (only one record pasted below)
> A00400100379949254925004A001002002015002015009        0.00    
>     0.00        0.00        0.00        0.00        0.00        0.00        
> 0.00        0.00        0.00        0.00        0.00        0.00        0.00  
>       0.00        0.00        0.00        0.00        0.00        0.00        
> 0.00        0.00        0.00        0.00        0.00        0.00        0.00  
>       0.00        0.00        0.00        0.00        0.00        0.00        
> 0.00        0.00        0.00        0.00        0.00        0.00        0.00  
>       0.00        0.00        0.00        0.00        0.00        0.00        
> 0.00        0.00        0.00        0.00        0.00        0.00        0.00  
>       0.00        0.00        0.00        0.00        0.00        0.00        
> 0.00        0.00        0.00        0.00        0.00        0.00        0.00  
>       0.00        0.00        0.00        0.00        0.00        0.00        
> 0.00        0.00        0.00        0.00        0.00   !
>      0.00        0.00        0.00        0.00        0.00        0.00        
> 0.00        0.00        0.00        0.00        0.00        0.00        0.00  
>       0.00        0.00        0.00        0.00        0.00        0.00        
> 0.00        0.00        0.00        0.00        0.00        0.00        0.00  
>       0.00        0.00        0.00        0.00        0.00        0.00        
> 0.00        0.00        0.00        0.00        0.00        0.00        0.00  
>       0.00        0.00        0.00        0.00        0.00        0.60        
> 0.60        0.60        0.70        0.00        0.00        0.00        0.00  
>       0.00        0.00        0.00        0.00        0.00        0.00        
> 0.00        0.00        0.00        0.00        0.00        0.00        0.00  
>       0.00        0.00        0.00        0.00        0.00        0.00        
> 0.00        0.00        0.00        0.00        0.00        0.00        0.00  
>       0.00        0.00        0.00        0.00       !
>  0.00        0.00        0.00        0.00        0.00        0.00
>   0.00        0.00        0.00        0.00        0.00        0.00        
> 0.00        0.00        0.00        0.00        0.00        0.00        0.00  
>       0.00        0.00        0.00        0.00        0.00        0.00        
> 0.00        0.00        0.00        0.00        0.00        0.00        0.00  
>       0.00        0.00        0.00        0.00        0.00        0.00        
> 0.00        0.00        0.00        0.00        0.00        0.00        0.00  
>       0.00        0.00        0.00        0.00        0.00
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.

Re: [R] Reading large files

2010-02-06 Thread Gabor Grothendieck

By the way, if you use the H2 database with sqldf then there is a
second way to read files in using sqldf.

# 1. run your perl program outside of R to create myfile.csv, say.

# 2. install java from http://java.sun.com
# and then install the RH2 package from CRAN
install.packages("RH2")

# 3. load sqldf and RH2
# sqldf automatically uses H2 database if RH2 is loaded
library(RH2)
library(sqldf)

# 4. read file using sqldf making use of the CSVREAD function in H2
DF <- sqldf("select * from CSVREAD('myfile.csv')")


On Sat, Feb 6, 2010 at 8:37 PM, Gabor Grothendieck
 wrote:
> file= is the input data file. filter= is just a command string that
> specifies a program to run (not a data file).
>
> 1. If Filename.tmp is the name of a temporary file (that it creates)
> it runs a batch command similar to this:
>      paste("cmd /c", filter, "<", file, ">", Filename.tmp)
>
> 2. Then it reads Filename.tmp into the database (which it creates for
> you) and does this without involving R and
>
> 3. finally it reads the table in the database that was created into R,
> as an R dataframe, and destroys the database.
>
>
> On Sat, Feb 6, 2010 at 7:53 PM, Vadlamani, Satish {FLNA}
>  wrote:
>> Gabor:
>> It did suppress the message now and I was able to load the data. Question.
>>
>> 1. test_df <- read.csv.sql(file="3wkoutstatfcst_small.dat", filter="perl 
>> parse_3wkout.pl")
>>
>> In the statement above, should the filename in file= and the file name that 
>> the perl script uses through the filter= command be the same? I would think 
>> not.  I would say that if filter= is passed to the statement, then the 
>> filename should be ignored. Is this how it works?
>>
>> Thanks.
>> Satish
>>
>>
>> -Original Message-
>> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
>> Sent: Saturday, February 06, 2010 4:58 PM
>> To: Vadlamani, Satish {FLNA}
>> Cc: r-help@r-project.org
>> Subject: Re: [R] Reading large files
>>
>> I have uploaded another version which suppresses display of the error
>> message but otherwise works the same.  Omitting the redundant
>> arguments we have:
>>
>> ibrary(sqldf)
>> # next line is only needed once per session to read in devel version
>> source("http://sqldf.googlecode.com/svn/trunk/R/sqldf.R";)
>>
>> test_df <- read.csv.sql(file="3wkoutstatfcst_small.dat", filter="perl
>> parse_3wkout.pl")
>>
>>
>> On Sat, Feb 6, 2010 at 5:48 PM, Vadlamani, Satish {FLNA}
>>  wrote:
>>> Gabor:
>>> Please see the results below. Sourcing your new R script worked (although 
>>> with the same error message). If I put eol="\n" option, it is adding a "\r" 
>>> to the last column. I took out the eol option below. This is just some more 
>>> feedback to you.
>>>
>>> I am thinking that I will just do an inline edit in Perl (that is create 
>>> the csv file through Perl by overwriting the current file) and then use 
>>> read.csv.sql without the filter= option. This seems to be more tried and 
>>> tested. If you have any suggestions, please let me know. Thanks.
>>> Satish
>>>
>>>
>>> BEFORE SOURCING YOUR NEW R SCRIPT
>>>> test_df <- read.csv.sql(file="3wkoutstatfcst_small.dat", sql = "select * 
>>>> from file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl")
>>> Error in readRegistry(key, maxdepth = 3) :
>>>  Registry key 'SOFTWARE\R-core' not found
>>>> test_df
>>> Error: object 'test_df' not found
>>>
>>> AFTER SOURCING YOUR NEW R SCRIPT
>>>> source("f:/dp_modeling_team/downloads/R/sqldf.R")
>>>> test_df <- read.csv.sql(file="3wkoutstatfcst_small.dat", sql = "select * 
>>>> from file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl")
>>> Error in readRegistry(key, maxdepth = 3) :
>>>  Registry key 'SOFTWARE\R-core' not found
>>> In addition: Warning messages:
>>> 1: closing unused connection 5 (3wkoutstatfcst_small.dat)
>>> 2: closing unused connection 4 (3wkoutstatfcst_small.dat)
>>> 3: closing unused connection 3 (3wkoutstatfcst_small.dat)
>>>> test_df
>>>   allgeo area1 zone dist ccust1 whse bindc ccust2 account area2 ccust3
>>> 1       A     4    1   37     99 4925  4925     99      99     4     99
>>> 2       A     4    1   37

Re: [R] Reading large files

2010-02-06 Thread Gabor Grothendieck

file= is the input data file. filter= is just a command string that
specifies a program to run (not a data file).

1. If Filename.tmp is the name of a temporary file (that it creates)
it runs a batch command similar to this:
  paste("cmd /c", filter, "<", file, ">", Filename.tmp)

2. Then it reads Filename.tmp into the database (which it creates for
you) and does this without involving R and

3. finally it reads the table in the database that was created into R,
as an R dataframe, and destroys the database.


On Sat, Feb 6, 2010 at 7:53 PM, Vadlamani, Satish {FLNA}
 wrote:
> Gabor:
> It did suppress the message now and I was able to load the data. Question.
>
> 1. test_df <- read.csv.sql(file="3wkoutstatfcst_small.dat", filter="perl 
> parse_3wkout.pl")
>
> In the statement above, should the filename in file= and the file name that 
> the perl script uses through the filter= command be the same? I would think 
> not.  I would say that if filter= is passed to the statement, then the 
> filename should be ignored. Is this how it works?
>
> Thanks.
> Satish
>
>
> -Original Message-
> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
> Sent: Saturday, February 06, 2010 4:58 PM
> To: Vadlamani, Satish {FLNA}
> Cc: r-help@r-project.org
> Subject: Re: [R] Reading large files
>
> I have uploaded another version which suppresses display of the error
> message but otherwise works the same.  Omitting the redundant
> arguments we have:
>
> ibrary(sqldf)
> # next line is only needed once per session to read in devel version
> source("http://sqldf.googlecode.com/svn/trunk/R/sqldf.R";)
>
> test_df <- read.csv.sql(file="3wkoutstatfcst_small.dat", filter="perl
> parse_3wkout.pl")
>
>
> On Sat, Feb 6, 2010 at 5:48 PM, Vadlamani, Satish {FLNA}
>  wrote:
>> Gabor:
>> Please see the results below. Sourcing your new R script worked (although 
>> with the same error message). If I put eol="\n" option, it is adding a "\r" 
>> to the last column. I took out the eol option below. This is just some more 
>> feedback to you.
>>
>> I am thinking that I will just do an inline edit in Perl (that is create the 
>> csv file through Perl by overwriting the current file) and then use 
>> read.csv.sql without the filter= option. This seems to be more tried and 
>> tested. If you have any suggestions, please let me know. Thanks.
>> Satish
>>
>>
>> BEFORE SOURCING YOUR NEW R SCRIPT
>>> test_df <- read.csv.sql(file="3wkoutstatfcst_small.dat", sql = "select * 
>>> from file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl")
>> Error in readRegistry(key, maxdepth = 3) :
>>  Registry key 'SOFTWARE\R-core' not found
>>> test_df
>> Error: object 'test_df' not found
>>
>> AFTER SOURCING YOUR NEW R SCRIPT
>>> source("f:/dp_modeling_team/downloads/R/sqldf.R")
>>> test_df <- read.csv.sql(file="3wkoutstatfcst_small.dat", sql = "select * 
>>> from file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl")
>> Error in readRegistry(key, maxdepth = 3) :
>>  Registry key 'SOFTWARE\R-core' not found
>> In addition: Warning messages:
>> 1: closing unused connection 5 (3wkoutstatfcst_small.dat)
>> 2: closing unused connection 4 (3wkoutstatfcst_small.dat)
>> 3: closing unused connection 3 (3wkoutstatfcst_small.dat)
>>> test_df
>>   allgeo area1 zone dist ccust1 whse bindc ccust2 account area2 ccust3
>> 1       A     4    1   37     99 4925  4925     99      99     4     99
>> 2       A     4    1   37     99 4925  4925     99      99     4     99
>>
>> -Original Message-
>> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
>> Sent: Saturday, February 06, 2010 4:28 PM
>> To: Vadlamani, Satish {FLNA}
>> Cc: r-help@r-project.org
>> Subject: Re: [R] Reading large files
>>
>> The software attempts to read the registry and temporarily augment the
>> path in case you have Rtools installed so that the filter can access
>> all the tools that Rtools provides.  I am not sure why its failing on
>> your system but there is evidently some differences between systems
>> here and I have added some code to trap and bypass that portion in
>> case it fails.  I have added the new version to the svn repository so
>> try this:
>>
>> library(sqldf)
>> # overwrite with development version
>> source("http://sqldf.googlecode.com/svn/trunk/R/sqldf.R";)

Re: [R] Reading large files

2010-02-06 Thread Vadlamani, Satish {FLNA}

Gabor:
It did suppress the message now and I was able to load the data. Question.

1. test_df <- read.csv.sql(file="3wkoutstatfcst_small.dat", filter="perl 
parse_3wkout.pl") 

In the statement above, should the filename in file= and the file name that the 
perl script uses through the filter= command be the same? I would think not.  I 
would say that if filter= is passed to the statement, then the filename should 
be ignored. Is this how it works?

Thanks.
Satish


-Original Message-
From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] 
Sent: Saturday, February 06, 2010 4:58 PM
To: Vadlamani, Satish {FLNA}
Cc: r-help@r-project.org
Subject: Re: [R] Reading large files

I have uploaded another version which suppresses display of the error
message but otherwise works the same.  Omitting the redundant
arguments we have:

ibrary(sqldf)
# next line is only needed once per session to read in devel version
source("http://sqldf.googlecode.com/svn/trunk/R/sqldf.R";)

test_df <- read.csv.sql(file="3wkoutstatfcst_small.dat", filter="perl
parse_3wkout.pl")


On Sat, Feb 6, 2010 at 5:48 PM, Vadlamani, Satish {FLNA}
 wrote:
> Gabor:
> Please see the results below. Sourcing your new R script worked (although 
> with the same error message). If I put eol="\n" option, it is adding a "\r" 
> to the last column. I took out the eol option below. This is just some more 
> feedback to you.
>
> I am thinking that I will just do an inline edit in Perl (that is create the 
> csv file through Perl by overwriting the current file) and then use 
> read.csv.sql without the filter= option. This seems to be more tried and 
> tested. If you have any suggestions, please let me know. Thanks.
> Satish
>
>
> BEFORE SOURCING YOUR NEW R SCRIPT
>> test_df <- read.csv.sql(file="3wkoutstatfcst_small.dat", sql = "select * 
>> from file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl")
> Error in readRegistry(key, maxdepth = 3) :
>  Registry key 'SOFTWARE\R-core' not found
>> test_df
> Error: object 'test_df' not found
>
> AFTER SOURCING YOUR NEW R SCRIPT
>> source("f:/dp_modeling_team/downloads/R/sqldf.R")
>> test_df <- read.csv.sql(file="3wkoutstatfcst_small.dat", sql = "select * 
>> from file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl")
> Error in readRegistry(key, maxdepth = 3) :
>  Registry key 'SOFTWARE\R-core' not found
> In addition: Warning messages:
> 1: closing unused connection 5 (3wkoutstatfcst_small.dat)
> 2: closing unused connection 4 (3wkoutstatfcst_small.dat)
> 3: closing unused connection 3 (3wkoutstatfcst_small.dat)
>> test_df
>   allgeo area1 zone dist ccust1 whse bindc ccust2 account area2 ccust3
> 1       A     4    1   37     99 4925  4925     99      99     4     99
> 2       A     4    1   37     99 4925  4925     99      99     4     99
>
> -Original Message-
> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
> Sent: Saturday, February 06, 2010 4:28 PM
> To: Vadlamani, Satish {FLNA}
> Cc: r-help@r-project.org
> Subject: Re: [R] Reading large files
>
> The software attempts to read the registry and temporarily augment the
> path in case you have Rtools installed so that the filter can access
> all the tools that Rtools provides.  I am not sure why its failing on
> your system but there is evidently some differences between systems
> here and I have added some code to trap and bypass that portion in
> case it fails.  I have added the new version to the svn repository so
> try this:
>
> library(sqldf)
> # overwrite with development version
> source("http://sqldf.googlecode.com/svn/trunk/R/sqldf.R";)
> # your code to call read.csv.sql
>
>
> On Sat, Feb 6, 2010 at 5:18 PM, Vadlamani, Satish {FLNA}
>  wrote:
>>
>> Gabor:
>> Here is the update. As you can see, I got the same error as below in 1.
>>
>> 1. Error
>>  test_df <- read.csv.sql(file="out_small.txt", sql = "select * from file", 
>> header = TRUE, sep = ",", filter="perl parse_3wkout.pl", eol="\n")
>> Error in readRegistry(key, maxdepth = 3) :
>>  Registry key 'SOFTWARE\R-core' not found
>>
>> 2. But the loading of the bigger file was successful as you can see below. 
>> 857 MB, 333,250 rows, 227 columns. This is good.
>>
>> I will have to just do an inline edit in Perl and change the file to csv 
>> from within R and then call the read.csv.sql.
>>
>> If you have any suggestions to fix 1, I would like to try them.
>>
>>  system.time(test_df &

Re: [R] Reading large files

2010-02-06 Thread Gabor Grothendieck

I have uploaded another version which suppresses display of the error
message but otherwise works the same.  Omitting the redundant
arguments we have:

ibrary(sqldf)
# next line is only needed once per session to read in devel version
source("http://sqldf.googlecode.com/svn/trunk/R/sqldf.R";)

test_df <- read.csv.sql(file="3wkoutstatfcst_small.dat", filter="perl
parse_3wkout.pl")


On Sat, Feb 6, 2010 at 5:48 PM, Vadlamani, Satish {FLNA}
 wrote:
> Gabor:
> Please see the results below. Sourcing your new R script worked (although 
> with the same error message). If I put eol="\n" option, it is adding a "\r" 
> to the last column. I took out the eol option below. This is just some more 
> feedback to you.
>
> I am thinking that I will just do an inline edit in Perl (that is create the 
> csv file through Perl by overwriting the current file) and then use 
> read.csv.sql without the filter= option. This seems to be more tried and 
> tested. If you have any suggestions, please let me know. Thanks.
> Satish
>
>
> BEFORE SOURCING YOUR NEW R SCRIPT
>> test_df <- read.csv.sql(file="3wkoutstatfcst_small.dat", sql = "select * 
>> from file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl")
> Error in readRegistry(key, maxdepth = 3) :
>  Registry key 'SOFTWARE\R-core' not found
>> test_df
> Error: object 'test_df' not found
>
> AFTER SOURCING YOUR NEW R SCRIPT
>> source("f:/dp_modeling_team/downloads/R/sqldf.R")
>> test_df <- read.csv.sql(file="3wkoutstatfcst_small.dat", sql = "select * 
>> from file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl")
> Error in readRegistry(key, maxdepth = 3) :
>  Registry key 'SOFTWARE\R-core' not found
> In addition: Warning messages:
> 1: closing unused connection 5 (3wkoutstatfcst_small.dat)
> 2: closing unused connection 4 (3wkoutstatfcst_small.dat)
> 3: closing unused connection 3 (3wkoutstatfcst_small.dat)
>> test_df
>   allgeo area1 zone dist ccust1 whse bindc ccust2 account area2 ccust3
> 1       A     4    1   37     99 4925  4925     99      99     4     99
> 2       A     4    1   37     99 4925  4925     99      99     4     99
>
> -Original Message-
> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
> Sent: Saturday, February 06, 2010 4:28 PM
> To: Vadlamani, Satish {FLNA}
> Cc: r-help@r-project.org
> Subject: Re: [R] Reading large files
>
> The software attempts to read the registry and temporarily augment the
> path in case you have Rtools installed so that the filter can access
> all the tools that Rtools provides.  I am not sure why its failing on
> your system but there is evidently some differences between systems
> here and I have added some code to trap and bypass that portion in
> case it fails.  I have added the new version to the svn repository so
> try this:
>
> library(sqldf)
> # overwrite with development version
> source("http://sqldf.googlecode.com/svn/trunk/R/sqldf.R";)
> # your code to call read.csv.sql
>
>
> On Sat, Feb 6, 2010 at 5:18 PM, Vadlamani, Satish {FLNA}
>  wrote:
>>
>> Gabor:
>> Here is the update. As you can see, I got the same error as below in 1.
>>
>> 1. Error
>>  test_df <- read.csv.sql(file="out_small.txt", sql = "select * from file", 
>> header = TRUE, sep = ",", filter="perl parse_3wkout.pl", eol="\n")
>> Error in readRegistry(key, maxdepth = 3) :
>>  Registry key 'SOFTWARE\R-core' not found
>>
>> 2. But the loading of the bigger file was successful as you can see below. 
>> 857 MB, 333,250 rows, 227 columns. This is good.
>>
>> I will have to just do an inline edit in Perl and change the file to csv 
>> from within R and then call the read.csv.sql.
>>
>> If you have any suggestions to fix 1, I would like to try them.
>>
>>  system.time(test_df <- read.csv.sql(file="out.txt"))
>>   user  system elapsed
>>  192.53   15.50  213.68
>> Warning message:
>> closing unused connection 3 (out.txt)
>>
>> Thanks again.
>>
>> Satish
>>
>> -Original Message-
>> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
>> Sent: Saturday, February 06, 2010 3:02 PM
>> To: Vadlamani, Satish {FLNA}
>> Cc: r-help@r-project.org
>> Subject: Re: [R] Reading large files
>>
>> Note that you can shorten #1 to read.csv.sql("out.txt") since your
>> other arguments are the default values.
>>
>> For the second one, use read.c

Re: [R] Reading large files

2010-02-06 Thread Vadlamani, Satish {FLNA}

Gabor:
Please see the results below. Sourcing your new R script worked (although with 
the same error message). If I put eol="\n" option, it is adding a "\r" to the 
last column. I took out the eol option below. This is just some more feedback 
to you.

I am thinking that I will just do an inline edit in Perl (that is create the 
csv file through Perl by overwriting the current file) and then use 
read.csv.sql without the filter= option. This seems to be more tried and 
tested. If you have any suggestions, please let me know. Thanks.
Satish


BEFORE SOURCING YOUR NEW R SCRIPT
> test_df <- read.csv.sql(file="3wkoutstatfcst_small.dat", sql = "select * from 
> file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl")
Error in readRegistry(key, maxdepth = 3) : 
  Registry key 'SOFTWARE\R-core' not found
> test_df
Error: object 'test_df' not found

AFTER SOURCING YOUR NEW R SCRIPT
> source("f:/dp_modeling_team/downloads/R/sqldf.R")
> test_df <- read.csv.sql(file="3wkoutstatfcst_small.dat", sql = "select * from 
> file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl")
Error in readRegistry(key, maxdepth = 3) : 
  Registry key 'SOFTWARE\R-core' not found
In addition: Warning messages:
1: closing unused connection 5 (3wkoutstatfcst_small.dat) 
2: closing unused connection 4 (3wkoutstatfcst_small.dat) 
3: closing unused connection 3 (3wkoutstatfcst_small.dat) 
> test_df
   allgeo area1 zone dist ccust1 whse bindc ccust2 account area2 ccust3
1   A 41   37 99 4925  4925 99  99 4 99
2   A 41   37 99 4925  4925 99  99 4 99 

-Original Message-
From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] 
Sent: Saturday, February 06, 2010 4:28 PM
To: Vadlamani, Satish {FLNA}
Cc: r-help@r-project.org
Subject: Re: [R] Reading large files

The software attempts to read the registry and temporarily augment the
path in case you have Rtools installed so that the filter can access
all the tools that Rtools provides.  I am not sure why its failing on
your system but there is evidently some differences between systems
here and I have added some code to trap and bypass that portion in
case it fails.  I have added the new version to the svn repository so
try this:

library(sqldf)
# overwrite with development version
source("http://sqldf.googlecode.com/svn/trunk/R/sqldf.R";)
# your code to call read.csv.sql


On Sat, Feb 6, 2010 at 5:18 PM, Vadlamani, Satish {FLNA}
 wrote:
>
> Gabor:
> Here is the update. As you can see, I got the same error as below in 1.
>
> 1. Error
>  test_df <- read.csv.sql(file="out_small.txt", sql = "select * from file", 
> header = TRUE, sep = ",", filter="perl parse_3wkout.pl", eol="\n")
> Error in readRegistry(key, maxdepth = 3) :
>  Registry key 'SOFTWARE\R-core' not found
>
> 2. But the loading of the bigger file was successful as you can see below. 
> 857 MB, 333,250 rows, 227 columns. This is good.
>
> I will have to just do an inline edit in Perl and change the file to csv from 
> within R and then call the read.csv.sql.
>
> If you have any suggestions to fix 1, I would like to try them.
>
>  system.time(test_df <- read.csv.sql(file="out.txt"))
>   user  system elapsed
>  192.53   15.50  213.68
> Warning message:
> closing unused connection 3 (out.txt)
>
> Thanks again.
>
> Satish
>
> -Original Message-
> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
> Sent: Saturday, February 06, 2010 3:02 PM
> To: Vadlamani, Satish {FLNA}
> Cc: r-help@r-project.org
> Subject: Re: [R] Reading large files
>
> Note that you can shorten #1 to read.csv.sql("out.txt") since your
> other arguments are the default values.
>
> For the second one, use read.csv.sql, eliminate the arguments that are
> defaults anyways (should not cause a problem but its error prone) and
> add an explicit eol= argument since SQLite can have problems with end
> of line in some cases.  Also test out your perl script separately from
> R first to ensure that it works:
>
> test_df <- read.csv.sql(file="3wkoutstatfcst_small.dat", filter="perl
> parse_3wkout.pl", eol = "\n")
>
> SQLite has some known problems with end of line so try it with and
> without the eol= argument just in case.  When I just made up the
> following gawk example I noticed that I did need to specify the eol=
> argument.
>
> Also I have added a complete example using gawk as Example 13c on the
> home page just now:
> http://code.google.com/p/sqldf/#Example_13._read.csv.sql_and_read.csv2.sql
>
>
> On Sat, Feb 6, 2

Re: [R] Reading large files

2010-02-06 Thread Gabor Grothendieck

The software attempts to read the registry and temporarily augment the
path in case you have Rtools installed so that the filter can access
all the tools that Rtools provides.  I am not sure why its failing on
your system but there is evidently some differences between systems
here and I have added some code to trap and bypass that portion in
case it fails.  I have added the new version to the svn repository so
try this:

library(sqldf)
# overwrite with development version
source("http://sqldf.googlecode.com/svn/trunk/R/sqldf.R";)
# your code to call read.csv.sql


On Sat, Feb 6, 2010 at 5:18 PM, Vadlamani, Satish {FLNA}
 wrote:
>
> Gabor:
> Here is the update. As you can see, I got the same error as below in 1.
>
> 1. Error
>  test_df <- read.csv.sql(file="out_small.txt", sql = "select * from file", 
> header = TRUE, sep = ",", filter="perl parse_3wkout.pl", eol="\n")
> Error in readRegistry(key, maxdepth = 3) :
>  Registry key 'SOFTWARE\R-core' not found
>
> 2. But the loading of the bigger file was successful as you can see below. 
> 857 MB, 333,250 rows, 227 columns. This is good.
>
> I will have to just do an inline edit in Perl and change the file to csv from 
> within R and then call the read.csv.sql.
>
> If you have any suggestions to fix 1, I would like to try them.
>
>  system.time(test_df <- read.csv.sql(file="out.txt"))
>   user  system elapsed
>  192.53   15.50  213.68
> Warning message:
> closing unused connection 3 (out.txt)
>
> Thanks again.
>
> Satish
>
> -Original Message-
> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
> Sent: Saturday, February 06, 2010 3:02 PM
> To: Vadlamani, Satish {FLNA}
> Cc: r-help@r-project.org
> Subject: Re: [R] Reading large files
>
> Note that you can shorten #1 to read.csv.sql("out.txt") since your
> other arguments are the default values.
>
> For the second one, use read.csv.sql, eliminate the arguments that are
> defaults anyways (should not cause a problem but its error prone) and
> add an explicit eol= argument since SQLite can have problems with end
> of line in some cases.  Also test out your perl script separately from
> R first to ensure that it works:
>
> test_df <- read.csv.sql(file="3wkoutstatfcst_small.dat", filter="perl
> parse_3wkout.pl", eol = "\n")
>
> SQLite has some known problems with end of line so try it with and
> without the eol= argument just in case.  When I just made up the
> following gawk example I noticed that I did need to specify the eol=
> argument.
>
> Also I have added a complete example using gawk as Example 13c on the
> home page just now:
> http://code.google.com/p/sqldf/#Example_13._read.csv.sql_and_read.csv2.sql
>
>
> On Sat, Feb 6, 2010 at 3:52 PM, Vadlamani, Satish {FLNA}
>  wrote:
>> Gabor:
>>
>> I had success with the following.
>> 1. I created a csv file with a perl script called "out.txt". Then ran the 
>> following successfully
>> library("sqldf")
>> test_df <- read.csv.sql(file="out.txt", sql = "select * from file", header = 
>> TRUE, sep = ",", dbname = tempfile())
>>
>> 2. I did not have success with the following. Could you tell me what I may 
>> be doing wrong? I could paste the perl script if necessary. From the perl 
>> script, I am reading the file, creating the csv record and printing each 
>> record one by one and then exiting.
>>
>> Thanks.
>>
>> Not had success with below..
>> #test_df <- read.csv2.sql(file="3wkoutstatfcst_small.dat", sql = "select * 
>> from file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl", dbname 
>> = tempfile())
>> test_df
>>
>> Error message below:
>> test_df <- read.csv2.sql(file="3wkoutstatfcst_small.dat", sql = "select * 
>> from file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl", dbname 
>> = tempfile())
>> Error in readRegistry(key, maxdepth = 3) :
>>  Registry key 'SOFTWARE\R-core' not found
>> In addition: Warning messages:
>> 1: closing unused connection 14 (3wkoutstatfcst_small.dat)
>> 2: closing unused connection 13 (3wkoutstatfcst_small.dat)
>> 3: closing unused connection 11 (3wkoutstatfcst_small.dat)
>> 4: closing unused connection 9 (3wkoutstatfcst_small.dat)
>> 5: closing unused connection 3 (3wkoutstatfcst_small.dat)
>>> test_df <- read.csv2.sql(file="3wkoutstatfcst_small.dat", sql = "select * 
>>> from file", header = TRUE, sep =

Re: [R] Reading large files

2010-02-06 Thread Vadlamani, Satish {FLNA}


Gabor:
Here is the update. As you can see, I got the same error as below in 1.

1. Error
 test_df <- read.csv.sql(file="out_small.txt", sql = "select * from file", 
header = TRUE, sep = ",", filter="perl parse_3wkout.pl", eol="\n")
Error in readRegistry(key, maxdepth = 3) : 
  Registry key 'SOFTWARE\R-core' not found 

2. But the loading of the bigger file was successful as you can see below. 857 
MB, 333,250 rows, 227 columns. This is good.

I will have to just do an inline edit in Perl and change the file to csv from 
within R and then call the read.csv.sql. 

If you have any suggestions to fix 1, I would like to try them.

 system.time(test_df <- read.csv.sql(file="out.txt"))
   user  system elapsed 
 192.53   15.50  213.68 
Warning message:
closing unused connection 3 (out.txt) 

Thanks again.

Satish

-Original Message-
From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] 
Sent: Saturday, February 06, 2010 3:02 PM
To: Vadlamani, Satish {FLNA}
Cc: r-help@r-project.org
Subject: Re: [R] Reading large files

Note that you can shorten #1 to read.csv.sql("out.txt") since your
other arguments are the default values.

For the second one, use read.csv.sql, eliminate the arguments that are
defaults anyways (should not cause a problem but its error prone) and
add an explicit eol= argument since SQLite can have problems with end
of line in some cases.  Also test out your perl script separately from
R first to ensure that it works:

test_df <- read.csv.sql(file="3wkoutstatfcst_small.dat", filter="perl
parse_3wkout.pl", eol = "\n")

SQLite has some known problems with end of line so try it with and
without the eol= argument just in case.  When I just made up the
following gawk example I noticed that I did need to specify the eol=
argument.

Also I have added a complete example using gawk as Example 13c on the
home page just now:
http://code.google.com/p/sqldf/#Example_13._read.csv.sql_and_read.csv2.sql


On Sat, Feb 6, 2010 at 3:52 PM, Vadlamani, Satish {FLNA}
 wrote:
> Gabor:
>
> I had success with the following.
> 1. I created a csv file with a perl script called "out.txt". Then ran the 
> following successfully
> library("sqldf")
> test_df <- read.csv.sql(file="out.txt", sql = "select * from file", header = 
> TRUE, sep = ",", dbname = tempfile())
>
> 2. I did not have success with the following. Could you tell me what I may be 
> doing wrong? I could paste the perl script if necessary. From the perl 
> script, I am reading the file, creating the csv record and printing each 
> record one by one and then exiting.
>
> Thanks.
>
> Not had success with below..
> #test_df <- read.csv2.sql(file="3wkoutstatfcst_small.dat", sql = "select * 
> from file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl", dbname = 
> tempfile())
> test_df
>
> Error message below:
> test_df <- read.csv2.sql(file="3wkoutstatfcst_small.dat", sql = "select * 
> from file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl", dbname = 
> tempfile())
> Error in readRegistry(key, maxdepth = 3) :
>  Registry key 'SOFTWARE\R-core' not found
> In addition: Warning messages:
> 1: closing unused connection 14 (3wkoutstatfcst_small.dat)
> 2: closing unused connection 13 (3wkoutstatfcst_small.dat)
> 3: closing unused connection 11 (3wkoutstatfcst_small.dat)
> 4: closing unused connection 9 (3wkoutstatfcst_small.dat)
> 5: closing unused connection 3 (3wkoutstatfcst_small.dat)
>> test_df <- read.csv2.sql(file="3wkoutstatfcst_small.dat", sql = "select * 
>> from file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl", dbname 
>> = tempfile())
> Error in readRegistry(key, maxdepth = 3) :
>  Registry key 'SOFTWARE\R-core' not found
>
> -Original Message-
> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
> Sent: Saturday, February 06, 2010 12:14 PM
> To: Vadlamani, Satish {FLNA}
> Cc: r-help@r-project.org
> Subject: Re: [R] Reading large files
>
> No.
>
> On Sat, Feb 6, 2010 at 1:01 PM, Vadlamani, Satish {FLNA}
>  wrote:
>> Gabor:
>> Can I pass colClasses as a vector to read.csv.sql? Thanks.
>> Satish
>>
>>
>> -Original Message-
>> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
>> Sent: Saturday, February 06, 2010 9:41 AM
>> To: Vadlamani, Satish {FLNA}
>> Cc: r-help@r-project.org
>> Subject: Re: [R] Reading large files
>>
>> Its just any Windows batch command string that filters stdin to
>> stdout.  What the command consist

Re: [R] Reading large files

2010-02-06 Thread Gabor Grothendieck

Note that you can shorten #1 to read.csv.sql("out.txt") since your
other arguments are the default values.

For the second one, use read.csv.sql, eliminate the arguments that are
defaults anyways (should not cause a problem but its error prone) and
add an explicit eol= argument since SQLite can have problems with end
of line in some cases.  Also test out your perl script separately from
R first to ensure that it works:

test_df <- read.csv.sql(file="3wkoutstatfcst_small.dat", filter="perl
parse_3wkout.pl", eol = "\n")

SQLite has some known problems with end of line so try it with and
without the eol= argument just in case.  When I just made up the
following gawk example I noticed that I did need to specify the eol=
argument.

Also I have added a complete example using gawk as Example 13c on the
home page just now:
http://code.google.com/p/sqldf/#Example_13._read.csv.sql_and_read.csv2.sql


On Sat, Feb 6, 2010 at 3:52 PM, Vadlamani, Satish {FLNA}
 wrote:
> Gabor:
>
> I had success with the following.
> 1. I created a csv file with a perl script called "out.txt". Then ran the 
> following successfully
> library("sqldf")
> test_df <- read.csv.sql(file="out.txt", sql = "select * from file", header = 
> TRUE, sep = ",", dbname = tempfile())
>
> 2. I did not have success with the following. Could you tell me what I may be 
> doing wrong? I could paste the perl script if necessary. From the perl 
> script, I am reading the file, creating the csv record and printing each 
> record one by one and then exiting.
>
> Thanks.
>
> Not had success with below..
> #test_df <- read.csv2.sql(file="3wkoutstatfcst_small.dat", sql = "select * 
> from file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl", dbname = 
> tempfile())
> test_df
>
> Error message below:
> test_df <- read.csv2.sql(file="3wkoutstatfcst_small.dat", sql = "select * 
> from file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl", dbname = 
> tempfile())
> Error in readRegistry(key, maxdepth = 3) :
>  Registry key 'SOFTWARE\R-core' not found
> In addition: Warning messages:
> 1: closing unused connection 14 (3wkoutstatfcst_small.dat)
> 2: closing unused connection 13 (3wkoutstatfcst_small.dat)
> 3: closing unused connection 11 (3wkoutstatfcst_small.dat)
> 4: closing unused connection 9 (3wkoutstatfcst_small.dat)
> 5: closing unused connection 3 (3wkoutstatfcst_small.dat)
>> test_df <- read.csv2.sql(file="3wkoutstatfcst_small.dat", sql = "select * 
>> from file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl", dbname 
>> = tempfile())
> Error in readRegistry(key, maxdepth = 3) :
>  Registry key 'SOFTWARE\R-core' not found
>
> -Original Message-
> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
> Sent: Saturday, February 06, 2010 12:14 PM
> To: Vadlamani, Satish {FLNA}
> Cc: r-help@r-project.org
> Subject: Re: [R] Reading large files
>
> No.
>
> On Sat, Feb 6, 2010 at 1:01 PM, Vadlamani, Satish {FLNA}
>  wrote:
>> Gabor:
>> Can I pass colClasses as a vector to read.csv.sql? Thanks.
>> Satish
>>
>>
>> -Original Message-
>> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
>> Sent: Saturday, February 06, 2010 9:41 AM
>> To: Vadlamani, Satish {FLNA}
>> Cc: r-help@r-project.org
>> Subject: Re: [R] Reading large files
>>
>> Its just any Windows batch command string that filters stdin to
>> stdout.  What the command consists of should not be important.   An
>> invocation of perl that runs a perl script that filters stdin to
>> stdout might look like this:
>>  read.csv.sql("myfile.dat", filter = "perl myprog.pl")
>>
>> For an actual example see the source of read.csv2.sql which defaults
>> to using a Windows vbscript program as a filter.
>>
>> On Sat, Feb 6, 2010 at 10:16 AM, Vadlamani, Satish {FLNA}
>>  wrote:
>>> Jim, Gabor:
>>> Thanks so much for the suggestions where I can use read.csv.sql and embed 
>>> Perl (or gawk). I just want to mention that I am running on Windows. I am 
>>> going to read the documentation the filter argument and see if it can take 
>>> a decent sized Perl script and then use its output as input.
>>>
>>> Suppose that I write a Perl script that parses this fwf file and creates a 
>>> CSV file. Can I embed this within the read.csv.sql call? Or, can it only be 
>>> a statement or something? If you know the answer, please let me know. 
>>> O

Re: [R] Reading large files

2010-02-06 Thread Vadlamani, Satish {FLNA}

Gabor:

I had success with the following.
1. I created a csv file with a perl script called "out.txt". Then ran the 
following successfully
library("sqldf")
test_df <- read.csv.sql(file="out.txt", sql = "select * from file", header = 
TRUE, sep = ",", dbname = tempfile())

2. I did not have success with the following. Could you tell me what I may be 
doing wrong? I could paste the perl script if necessary. From the perl script, 
I am reading the file, creating the csv record and printing each record one by 
one and then exiting.

Thanks.

Not had success with below..
#test_df <- read.csv2.sql(file="3wkoutstatfcst_small.dat", sql = "select * from 
file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl", dbname = 
tempfile())
test_df 

Error message below:
test_df <- read.csv2.sql(file="3wkoutstatfcst_small.dat", sql = "select * from 
file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl", dbname = 
tempfile())
Error in readRegistry(key, maxdepth = 3) : 
  Registry key 'SOFTWARE\R-core' not found
In addition: Warning messages:
1: closing unused connection 14 (3wkoutstatfcst_small.dat) 
2: closing unused connection 13 (3wkoutstatfcst_small.dat) 
3: closing unused connection 11 (3wkoutstatfcst_small.dat) 
4: closing unused connection 9 (3wkoutstatfcst_small.dat) 
5: closing unused connection 3 (3wkoutstatfcst_small.dat) 
> test_df <- read.csv2.sql(file="3wkoutstatfcst_small.dat", sql = "select * 
> from file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl", dbname = 
> tempfile())
Error in readRegistry(key, maxdepth = 3) : 
  Registry key 'SOFTWARE\R-core' not found

-----Original Message-
From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] 
Sent: Saturday, February 06, 2010 12:14 PM
To: Vadlamani, Satish {FLNA}
Cc: r-help@r-project.org
Subject: Re: [R] Reading large files

No.

On Sat, Feb 6, 2010 at 1:01 PM, Vadlamani, Satish {FLNA}
 wrote:
> Gabor:
> Can I pass colClasses as a vector to read.csv.sql? Thanks.
> Satish
>
>
> -Original Message-
> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
> Sent: Saturday, February 06, 2010 9:41 AM
> To: Vadlamani, Satish {FLNA}
> Cc: r-help@r-project.org
> Subject: Re: [R] Reading large files
>
> Its just any Windows batch command string that filters stdin to
> stdout.  What the command consists of should not be important.   An
> invocation of perl that runs a perl script that filters stdin to
> stdout might look like this:
>  read.csv.sql("myfile.dat", filter = "perl myprog.pl")
>
> For an actual example see the source of read.csv2.sql which defaults
> to using a Windows vbscript program as a filter.
>
> On Sat, Feb 6, 2010 at 10:16 AM, Vadlamani, Satish {FLNA}
>  wrote:
>> Jim, Gabor:
>> Thanks so much for the suggestions where I can use read.csv.sql and embed 
>> Perl (or gawk). I just want to mention that I am running on Windows. I am 
>> going to read the documentation the filter argument and see if it can take a 
>> decent sized Perl script and then use its output as input.
>>
>> Suppose that I write a Perl script that parses this fwf file and creates a 
>> CSV file. Can I embed this within the read.csv.sql call? Or, can it only be 
>> a statement or something? If you know the answer, please let me know. 
>> Otherwise, I will try a few things and report back the results.
>>
>> Thanks again.
>> Saitsh
>>
>>
>> -Original Message-
>> From: jim holtman [mailto:jholt...@gmail.com]
>> Sent: Saturday, February 06, 2010 6:16 AM
>> To: Gabor Grothendieck
>> Cc: Vadlamani, Satish {FLNA}; r-help@r-project.org
>> Subject: Re: [R] Reading large files
>>
>> In perl the 'unpack' command makes it very easy to parse fixed fielded data.
>>
>> On Fri, Feb 5, 2010 at 9:09 PM, Gabor Grothendieck
>>  wrote:
>>> Note that the filter= argument on read.csv.sql can be used to pass the
>>> input through a filter written in perl, [g]awk or other language.
>>> For example: read.csv.sql(..., filter = "gawk -f myfilter.awk")
>>>
>>> gawk has the FIELDWIDTHS variable for automatically parsing fixed
>>> width fields, e.g.
>>> http://www.delorie.com/gnu/docs/gawk/gawk_44.html
>>> making this very easy but perl or whatever you are most used to would
>>> be fine too.
>>>
>>> On Fri, Feb 5, 2010 at 8:50 PM, Vadlamani, Satish {FLNA}
>>>  wrote:
>>>> Hi Gabor:
>>>> Thanks. My files are all in fixed width format. They are a lot of them. It 
>

Re: [R] Reading large files

2010-02-06 Thread Gabor Grothendieck

No.

On Sat, Feb 6, 2010 at 1:01 PM, Vadlamani, Satish {FLNA}
 wrote:
> Gabor:
> Can I pass colClasses as a vector to read.csv.sql? Thanks.
> Satish
>
>
> -Original Message-
> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
> Sent: Saturday, February 06, 2010 9:41 AM
> To: Vadlamani, Satish {FLNA}
> Cc: r-help@r-project.org
> Subject: Re: [R] Reading large files
>
> Its just any Windows batch command string that filters stdin to
> stdout.  What the command consists of should not be important.   An
> invocation of perl that runs a perl script that filters stdin to
> stdout might look like this:
>  read.csv.sql("myfile.dat", filter = "perl myprog.pl")
>
> For an actual example see the source of read.csv2.sql which defaults
> to using a Windows vbscript program as a filter.
>
> On Sat, Feb 6, 2010 at 10:16 AM, Vadlamani, Satish {FLNA}
>  wrote:
>> Jim, Gabor:
>> Thanks so much for the suggestions where I can use read.csv.sql and embed 
>> Perl (or gawk). I just want to mention that I am running on Windows. I am 
>> going to read the documentation the filter argument and see if it can take a 
>> decent sized Perl script and then use its output as input.
>>
>> Suppose that I write a Perl script that parses this fwf file and creates a 
>> CSV file. Can I embed this within the read.csv.sql call? Or, can it only be 
>> a statement or something? If you know the answer, please let me know. 
>> Otherwise, I will try a few things and report back the results.
>>
>> Thanks again.
>> Saitsh
>>
>>
>> -Original Message-
>> From: jim holtman [mailto:jholt...@gmail.com]
>> Sent: Saturday, February 06, 2010 6:16 AM
>> To: Gabor Grothendieck
>> Cc: Vadlamani, Satish {FLNA}; r-help@r-project.org
>> Subject: Re: [R] Reading large files
>>
>> In perl the 'unpack' command makes it very easy to parse fixed fielded data.
>>
>> On Fri, Feb 5, 2010 at 9:09 PM, Gabor Grothendieck
>>  wrote:
>>> Note that the filter= argument on read.csv.sql can be used to pass the
>>> input through a filter written in perl, [g]awk or other language.
>>> For example: read.csv.sql(..., filter = "gawk -f myfilter.awk")
>>>
>>> gawk has the FIELDWIDTHS variable for automatically parsing fixed
>>> width fields, e.g.
>>> http://www.delorie.com/gnu/docs/gawk/gawk_44.html
>>> making this very easy but perl or whatever you are most used to would
>>> be fine too.
>>>
>>> On Fri, Feb 5, 2010 at 8:50 PM, Vadlamani, Satish {FLNA}
>>>  wrote:
>>>> Hi Gabor:
>>>> Thanks. My files are all in fixed width format. They are a lot of them. It 
>>>> would take me some effort to convert them to CSV. I guess this cannot be 
>>>> avoided? I can write some Perl scripts to convert fixed width format to 
>>>> CSV format and then start with your suggestion. Could you let me know your 
>>>> thoughts on the approach?
>>>> Satish
>>>>
>>>>
>>>> -Original Message-
>>>> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
>>>> Sent: Friday, February 05, 2010 5:16 PM
>>>> To: Vadlamani, Satish {FLNA}
>>>> Cc: r-help@r-project.org
>>>> Subject: Re: [R] Reading large files
>>>>
>>>> If your problem is just how long it takes to load the file into R try
>>>> read.csv.sql in the sqldf package.  A single read.csv.sql call can
>>>> create an SQLite database and table layout for you, read the file into
>>>> the database (without going through R so R can't slow this down),
>>>> extract all or a portion into R based on the sql argument you give it
>>>> and then remove the database.  See the examples on the home page:
>>>> http://code.google.com/p/sqldf/#Example_13._read.csv.sql_and_read.csv2.sql
>>>>
>>>> On Fri, Feb 5, 2010 at 2:11 PM, Satish Vadlamani
>>>>  wrote:
>>>>>
>>>>> Matthew:
>>>>> If it is going to help, here is the explanation. I have an end state in
>>>>> mind. It is given below under "End State" header. In order to get there, I
>>>>> need to start somewhere right? I started with a 850 MB file and could not
>>>>> load in what I think is reasonable time (I waited for an hour).
>>>>>
>>>>> There are references to 64 bit. How will that help? It is a 4GB RAM 
>>>>> machine
>>>>> and there is no paging activity

Re: [R] Reading large files

2010-02-06 Thread Vadlamani, Satish {FLNA}

Gabor:
Can I pass colClasses as a vector to read.csv.sql? Thanks.
Satish
 

-Original Message-
From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] 
Sent: Saturday, February 06, 2010 9:41 AM
To: Vadlamani, Satish {FLNA}
Cc: r-help@r-project.org
Subject: Re: [R] Reading large files

Its just any Windows batch command string that filters stdin to
stdout.  What the command consists of should not be important.   An
invocation of perl that runs a perl script that filters stdin to
stdout might look like this:
  read.csv.sql("myfile.dat", filter = "perl myprog.pl")

For an actual example see the source of read.csv2.sql which defaults
to using a Windows vbscript program as a filter.

On Sat, Feb 6, 2010 at 10:16 AM, Vadlamani, Satish {FLNA}
 wrote:
> Jim, Gabor:
> Thanks so much for the suggestions where I can use read.csv.sql and embed 
> Perl (or gawk). I just want to mention that I am running on Windows. I am 
> going to read the documentation the filter argument and see if it can take a 
> decent sized Perl script and then use its output as input.
>
> Suppose that I write a Perl script that parses this fwf file and creates a 
> CSV file. Can I embed this within the read.csv.sql call? Or, can it only be a 
> statement or something? If you know the answer, please let me know. 
> Otherwise, I will try a few things and report back the results.
>
> Thanks again.
> Saitsh
>
>
> -Original Message-
> From: jim holtman [mailto:jholt...@gmail.com]
> Sent: Saturday, February 06, 2010 6:16 AM
> To: Gabor Grothendieck
> Cc: Vadlamani, Satish {FLNA}; r-help@r-project.org
> Subject: Re: [R] Reading large files
>
> In perl the 'unpack' command makes it very easy to parse fixed fielded data.
>
> On Fri, Feb 5, 2010 at 9:09 PM, Gabor Grothendieck
>  wrote:
>> Note that the filter= argument on read.csv.sql can be used to pass the
>> input through a filter written in perl, [g]awk or other language.
>> For example: read.csv.sql(..., filter = "gawk -f myfilter.awk")
>>
>> gawk has the FIELDWIDTHS variable for automatically parsing fixed
>> width fields, e.g.
>> http://www.delorie.com/gnu/docs/gawk/gawk_44.html
>> making this very easy but perl or whatever you are most used to would
>> be fine too.
>>
>> On Fri, Feb 5, 2010 at 8:50 PM, Vadlamani, Satish {FLNA}
>>  wrote:
>>> Hi Gabor:
>>> Thanks. My files are all in fixed width format. They are a lot of them. It 
>>> would take me some effort to convert them to CSV. I guess this cannot be 
>>> avoided? I can write some Perl scripts to convert fixed width format to CSV 
>>> format and then start with your suggestion. Could you let me know your 
>>> thoughts on the approach?
>>> Satish
>>>
>>>
>>> -Original Message-
>>> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
>>> Sent: Friday, February 05, 2010 5:16 PM
>>> To: Vadlamani, Satish {FLNA}
>>> Cc: r-help@r-project.org
>>> Subject: Re: [R] Reading large files
>>>
>>> If your problem is just how long it takes to load the file into R try
>>> read.csv.sql in the sqldf package.  A single read.csv.sql call can
>>> create an SQLite database and table layout for you, read the file into
>>> the database (without going through R so R can't slow this down),
>>> extract all or a portion into R based on the sql argument you give it
>>> and then remove the database.  See the examples on the home page:
>>> http://code.google.com/p/sqldf/#Example_13._read.csv.sql_and_read.csv2.sql
>>>
>>> On Fri, Feb 5, 2010 at 2:11 PM, Satish Vadlamani
>>>  wrote:
>>>>
>>>> Matthew:
>>>> If it is going to help, here is the explanation. I have an end state in
>>>> mind. It is given below under "End State" header. In order to get there, I
>>>> need to start somewhere right? I started with a 850 MB file and could not
>>>> load in what I think is reasonable time (I waited for an hour).
>>>>
>>>> There are references to 64 bit. How will that help? It is a 4GB RAM machine
>>>> and there is no paging activity when loading the 850 MB file.
>>>>
>>>> I have seen other threads on the same types of questions. I did not see any
>>>> clear cut answers or errors that I could have been making in the process. 
>>>> If
>>>> I am missing something, please let me know. Thanks.
>>>> Satish
>>>>
>>>>
>>>> End State
>>>>> Satish wrote: "at one time I will need to load say 15GB int

Re: [R] Reading large files

2010-02-06 Thread Gabor Grothendieck

Its just any Windows batch command string that filters stdin to
stdout.  What the command consists of should not be important.   An
invocation of perl that runs a perl script that filters stdin to
stdout might look like this:
  read.csv.sql("myfile.dat", filter = "perl myprog.pl")

For an actual example see the source of read.csv2.sql which defaults
to using a Windows vbscript program as a filter.

On Sat, Feb 6, 2010 at 10:16 AM, Vadlamani, Satish {FLNA}
 wrote:
> Jim, Gabor:
> Thanks so much for the suggestions where I can use read.csv.sql and embed 
> Perl (or gawk). I just want to mention that I am running on Windows. I am 
> going to read the documentation the filter argument and see if it can take a 
> decent sized Perl script and then use its output as input.
>
> Suppose that I write a Perl script that parses this fwf file and creates a 
> CSV file. Can I embed this within the read.csv.sql call? Or, can it only be a 
> statement or something? If you know the answer, please let me know. 
> Otherwise, I will try a few things and report back the results.
>
> Thanks again.
> Saitsh
>
>
> -Original Message-
> From: jim holtman [mailto:jholt...@gmail.com]
> Sent: Saturday, February 06, 2010 6:16 AM
> To: Gabor Grothendieck
> Cc: Vadlamani, Satish {FLNA}; r-help@r-project.org
> Subject: Re: [R] Reading large files
>
> In perl the 'unpack' command makes it very easy to parse fixed fielded data.
>
> On Fri, Feb 5, 2010 at 9:09 PM, Gabor Grothendieck
>  wrote:
>> Note that the filter= argument on read.csv.sql can be used to pass the
>> input through a filter written in perl, [g]awk or other language.
>> For example: read.csv.sql(..., filter = "gawk -f myfilter.awk")
>>
>> gawk has the FIELDWIDTHS variable for automatically parsing fixed
>> width fields, e.g.
>> http://www.delorie.com/gnu/docs/gawk/gawk_44.html
>> making this very easy but perl or whatever you are most used to would
>> be fine too.
>>
>> On Fri, Feb 5, 2010 at 8:50 PM, Vadlamani, Satish {FLNA}
>>  wrote:
>>> Hi Gabor:
>>> Thanks. My files are all in fixed width format. They are a lot of them. It 
>>> would take me some effort to convert them to CSV. I guess this cannot be 
>>> avoided? I can write some Perl scripts to convert fixed width format to CSV 
>>> format and then start with your suggestion. Could you let me know your 
>>> thoughts on the approach?
>>> Satish
>>>
>>>
>>> -Original Message-
>>> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
>>> Sent: Friday, February 05, 2010 5:16 PM
>>> To: Vadlamani, Satish {FLNA}
>>> Cc: r-help@r-project.org
>>> Subject: Re: [R] Reading large files
>>>
>>> If your problem is just how long it takes to load the file into R try
>>> read.csv.sql in the sqldf package.  A single read.csv.sql call can
>>> create an SQLite database and table layout for you, read the file into
>>> the database (without going through R so R can't slow this down),
>>> extract all or a portion into R based on the sql argument you give it
>>> and then remove the database.  See the examples on the home page:
>>> http://code.google.com/p/sqldf/#Example_13._read.csv.sql_and_read.csv2.sql
>>>
>>> On Fri, Feb 5, 2010 at 2:11 PM, Satish Vadlamani
>>>  wrote:
>>>>
>>>> Matthew:
>>>> If it is going to help, here is the explanation. I have an end state in
>>>> mind. It is given below under "End State" header. In order to get there, I
>>>> need to start somewhere right? I started with a 850 MB file and could not
>>>> load in what I think is reasonable time (I waited for an hour).
>>>>
>>>> There are references to 64 bit. How will that help? It is a 4GB RAM machine
>>>> and there is no paging activity when loading the 850 MB file.
>>>>
>>>> I have seen other threads on the same types of questions. I did not see any
>>>> clear cut answers or errors that I could have been making in the process. 
>>>> If
>>>> I am missing something, please let me know. Thanks.
>>>> Satish
>>>>
>>>>
>>>> End State
>>>>> Satish wrote: "at one time I will need to load say 15GB into R"
>>>>
>>>>
>>>> -
>>>> Satish Vadlamani
>>>> --
>>>> View this message in context: 
>>>> http://n4.nabble.com/Reading-large-files-tp1469691p1470667.html
>>>>

Re: [R] Reading large files

2010-02-06 Thread Vadlamani, Satish {FLNA}

Jim, Gabor:
Thanks so much for the suggestions where I can use read.csv.sql and embed Perl 
(or gawk). I just want to mention that I am running on Windows. I am going to 
read the documentation the filter argument and see if it can take a decent 
sized Perl script and then use its output as input.

Suppose that I write a Perl script that parses this fwf file and creates a CSV 
file. Can I embed this within the read.csv.sql call? Or, can it only be a 
statement or something? If you know the answer, please let me know. Otherwise, 
I will try a few things and report back the results.

Thanks again.
Saitsh
 

-Original Message-
From: jim holtman [mailto:jholt...@gmail.com] 
Sent: Saturday, February 06, 2010 6:16 AM
To: Gabor Grothendieck
Cc: Vadlamani, Satish {FLNA}; r-help@r-project.org
Subject: Re: [R] Reading large files

In perl the 'unpack' command makes it very easy to parse fixed fielded data.

On Fri, Feb 5, 2010 at 9:09 PM, Gabor Grothendieck
 wrote:
> Note that the filter= argument on read.csv.sql can be used to pass the
> input through a filter written in perl, [g]awk or other language.
> For example: read.csv.sql(..., filter = "gawk -f myfilter.awk")
>
> gawk has the FIELDWIDTHS variable for automatically parsing fixed
> width fields, e.g.
> http://www.delorie.com/gnu/docs/gawk/gawk_44.html
> making this very easy but perl or whatever you are most used to would
> be fine too.
>
> On Fri, Feb 5, 2010 at 8:50 PM, Vadlamani, Satish {FLNA}
>  wrote:
>> Hi Gabor:
>> Thanks. My files are all in fixed width format. They are a lot of them. It 
>> would take me some effort to convert them to CSV. I guess this cannot be 
>> avoided? I can write some Perl scripts to convert fixed width format to CSV 
>> format and then start with your suggestion. Could you let me know your 
>> thoughts on the approach?
>> Satish
>>
>>
>> -Original Message-
>> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
>> Sent: Friday, February 05, 2010 5:16 PM
>> To: Vadlamani, Satish {FLNA}
>> Cc: r-help@r-project.org
>> Subject: Re: [R] Reading large files
>>
>> If your problem is just how long it takes to load the file into R try
>> read.csv.sql in the sqldf package.  A single read.csv.sql call can
>> create an SQLite database and table layout for you, read the file into
>> the database (without going through R so R can't slow this down),
>> extract all or a portion into R based on the sql argument you give it
>> and then remove the database.  See the examples on the home page:
>> http://code.google.com/p/sqldf/#Example_13._read.csv.sql_and_read.csv2.sql
>>
>> On Fri, Feb 5, 2010 at 2:11 PM, Satish Vadlamani
>>  wrote:
>>>
>>> Matthew:
>>> If it is going to help, here is the explanation. I have an end state in
>>> mind. It is given below under "End State" header. In order to get there, I
>>> need to start somewhere right? I started with a 850 MB file and could not
>>> load in what I think is reasonable time (I waited for an hour).
>>>
>>> There are references to 64 bit. How will that help? It is a 4GB RAM machine
>>> and there is no paging activity when loading the 850 MB file.
>>>
>>> I have seen other threads on the same types of questions. I did not see any
>>> clear cut answers or errors that I could have been making in the process. If
>>> I am missing something, please let me know. Thanks.
>>> Satish
>>>
>>>
>>> End State
>>>> Satish wrote: "at one time I will need to load say 15GB into R"
>>>
>>>
>>> -
>>> Satish Vadlamani
>>> --
>>> View this message in context: 
>>> http://n4.nabble.com/Reading-large-files-tp1469691p1470667.html
>>> Sent from the R help mailing list archive at Nabble.com.
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reading large files

2010-02-06 Thread jim holtman

In perl the 'unpack' command makes it very easy to parse fixed fielded data.

On Fri, Feb 5, 2010 at 9:09 PM, Gabor Grothendieck
 wrote:
> Note that the filter= argument on read.csv.sql can be used to pass the
> input through a filter written in perl, [g]awk or other language.
> For example: read.csv.sql(..., filter = "gawk -f myfilter.awk")
>
> gawk has the FIELDWIDTHS variable for automatically parsing fixed
> width fields, e.g.
> http://www.delorie.com/gnu/docs/gawk/gawk_44.html
> making this very easy but perl or whatever you are most used to would
> be fine too.
>
> On Fri, Feb 5, 2010 at 8:50 PM, Vadlamani, Satish {FLNA}
>  wrote:
>> Hi Gabor:
>> Thanks. My files are all in fixed width format. They are a lot of them. It 
>> would take me some effort to convert them to CSV. I guess this cannot be 
>> avoided? I can write some Perl scripts to convert fixed width format to CSV 
>> format and then start with your suggestion. Could you let me know your 
>> thoughts on the approach?
>> Satish
>>
>>
>> -Original Message-
>> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
>> Sent: Friday, February 05, 2010 5:16 PM
>> To: Vadlamani, Satish {FLNA}
>> Cc: r-help@r-project.org
>> Subject: Re: [R] Reading large files
>>
>> If your problem is just how long it takes to load the file into R try
>> read.csv.sql in the sqldf package.  A single read.csv.sql call can
>> create an SQLite database and table layout for you, read the file into
>> the database (without going through R so R can't slow this down),
>> extract all or a portion into R based on the sql argument you give it
>> and then remove the database.  See the examples on the home page:
>> http://code.google.com/p/sqldf/#Example_13._read.csv.sql_and_read.csv2.sql
>>
>> On Fri, Feb 5, 2010 at 2:11 PM, Satish Vadlamani
>>  wrote:
>>>
>>> Matthew:
>>> If it is going to help, here is the explanation. I have an end state in
>>> mind. It is given below under "End State" header. In order to get there, I
>>> need to start somewhere right? I started with a 850 MB file and could not
>>> load in what I think is reasonable time (I waited for an hour).
>>>
>>> There are references to 64 bit. How will that help? It is a 4GB RAM machine
>>> and there is no paging activity when loading the 850 MB file.
>>>
>>> I have seen other threads on the same types of questions. I did not see any
>>> clear cut answers or errors that I could have been making in the process. If
>>> I am missing something, please let me know. Thanks.
>>> Satish
>>>
>>>
>>> End State
>>>> Satish wrote: "at one time I will need to load say 15GB into R"
>>>
>>>
>>> -
>>> Satish Vadlamani
>>> --
>>> View this message in context: 
>>> http://n4.nabble.com/Reading-large-files-tp1469691p1470667.html
>>> Sent from the R help mailing list archive at Nabble.com.
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reading large files

2010-02-05 Thread Gabor Grothendieck

Note that the filter= argument on read.csv.sql can be used to pass the
input through a filter written in perl, [g]awk or other language.
For example: read.csv.sql(..., filter = "gawk -f myfilter.awk")

gawk has the FIELDWIDTHS variable for automatically parsing fixed
width fields, e.g.
http://www.delorie.com/gnu/docs/gawk/gawk_44.html
making this very easy but perl or whatever you are most used to would
be fine too.

On Fri, Feb 5, 2010 at 8:50 PM, Vadlamani, Satish {FLNA}
 wrote:
> Hi Gabor:
> Thanks. My files are all in fixed width format. They are a lot of them. It 
> would take me some effort to convert them to CSV. I guess this cannot be 
> avoided? I can write some Perl scripts to convert fixed width format to CSV 
> format and then start with your suggestion. Could you let me know your 
> thoughts on the approach?
> Satish
>
>
> -Original Message-
> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
> Sent: Friday, February 05, 2010 5:16 PM
> To: Vadlamani, Satish {FLNA}
> Cc: r-help@r-project.org
> Subject: Re: [R] Reading large files
>
> If your problem is just how long it takes to load the file into R try
> read.csv.sql in the sqldf package.  A single read.csv.sql call can
> create an SQLite database and table layout for you, read the file into
> the database (without going through R so R can't slow this down),
> extract all or a portion into R based on the sql argument you give it
> and then remove the database.  See the examples on the home page:
> http://code.google.com/p/sqldf/#Example_13._read.csv.sql_and_read.csv2.sql
>
> On Fri, Feb 5, 2010 at 2:11 PM, Satish Vadlamani
>  wrote:
>>
>> Matthew:
>> If it is going to help, here is the explanation. I have an end state in
>> mind. It is given below under "End State" header. In order to get there, I
>> need to start somewhere right? I started with a 850 MB file and could not
>> load in what I think is reasonable time (I waited for an hour).
>>
>> There are references to 64 bit. How will that help? It is a 4GB RAM machine
>> and there is no paging activity when loading the 850 MB file.
>>
>> I have seen other threads on the same types of questions. I did not see any
>> clear cut answers or errors that I could have been making in the process. If
>> I am missing something, please let me know. Thanks.
>> Satish
>>
>>
>> End State
>>> Satish wrote: "at one time I will need to load say 15GB into R"
>>
>>
>> -
>> Satish Vadlamani
>> --
>> View this message in context: 
>> http://n4.nabble.com/Reading-large-files-tp1469691p1470667.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reading large files

2010-02-05 Thread Vadlamani, Satish {FLNA}

Hi Gabor:
Thanks. My files are all in fixed width format. They are a lot of them. It 
would take me some effort to convert them to CSV. I guess this cannot be 
avoided? I can write some Perl scripts to convert fixed width format to CSV 
format and then start with your suggestion. Could you let me know your thoughts 
on the approach?
Satish

-Original Message-
From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] 
Sent: Friday, February 05, 2010 5:16 PM
To: Vadlamani, Satish {FLNA}
Cc: r-help@r-project.org
Subject: Re: [R] Reading large files

If your problem is just how long it takes to load the file into R try
read.csv.sql in the sqldf package.  A single read.csv.sql call can
create an SQLite database and table layout for you, read the file into
the database (without going through R so R can't slow this down),
extract all or a portion into R based on the sql argument you give it
and then remove the database.  See the examples on the home page:
http://code.google.com/p/sqldf/#Example_13._read.csv.sql_and_read.csv2.sql

On Fri, Feb 5, 2010 at 2:11 PM, Satish Vadlamani
 wrote:
>
> Matthew:
> If it is going to help, here is the explanation. I have an end state in
> mind. It is given below under "End State" header. In order to get there, I
> need to start somewhere right? I started with a 850 MB file and could not
> load in what I think is reasonable time (I waited for an hour).
>
> There are references to 64 bit. How will that help? It is a 4GB RAM machine
> and there is no paging activity when loading the 850 MB file.
>
> I have seen other threads on the same types of questions. I did not see any
> clear cut answers or errors that I could have been making in the process. If
> I am missing something, please let me know. Thanks.
> Satish
>
>
> End State
>> Satish wrote: "at one time I will need to load say 15GB into R"
>
>
> -
> Satish Vadlamani
> --
> View this message in context: 
> http://n4.nabble.com/Reading-large-files-tp1469691p1470667.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reading large files

2010-02-05 Thread Gabor Grothendieck

If your problem is just how long it takes to load the file into R try
read.csv.sql in the sqldf package.  A single read.csv.sql call can
create an SQLite database and table layout for you, read the file into
the database (without going through R so R can't slow this down),
extract all or a portion into R based on the sql argument you give it
and then remove the database.  See the examples on the home page:
http://code.google.com/p/sqldf/#Example_13._read.csv.sql_and_read.csv2.sql

On Fri, Feb 5, 2010 at 2:11 PM, Satish Vadlamani
 wrote:
>
> Matthew:
> If it is going to help, here is the explanation. I have an end state in
> mind. It is given below under "End State" header. In order to get there, I
> need to start somewhere right? I started with a 850 MB file and could not
> load in what I think is reasonable time (I waited for an hour).
>
> There are references to 64 bit. How will that help? It is a 4GB RAM machine
> and there is no paging activity when loading the 850 MB file.
>
> I have seen other threads on the same types of questions. I did not see any
> clear cut answers or errors that I could have been making in the process. If
> I am missing something, please let me know. Thanks.
> Satish
>
>
> End State
>> Satish wrote: "at one time I will need to load say 15GB into R"
>
>
> -
> Satish Vadlamani
> --
> View this message in context: 
> http://n4.nabble.com/Reading-large-files-tp1469691p1470667.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reading large files

2010-02-05 Thread jim holtman

What you need to do is to take a smaller sample of you data (e.g.
50-100MB) and load that data and determine how big the resulting
object is. Depends a lot on how you are loading it.  Are you using
'scan' or 'read.table'; if 'read.table' have you define the class of
the columns?  I typically read in files of 40MB in about 15 seconds
(300K rows with 16 columns).  The resulting object is about 24MB.  I
would expect you to be able to read in 100MB in under a minute.  The
other part of the question is how much of the data do you really need
to read in and process at once.  I assume that it is not all of it.
You might structure your data to only require reading in the data that
you need to analyze.  Just because you have a file that large, may not
mean you need all the data.

I have 2GB on my Windows box and try to keep the maximum object I
process to under 400MB since I know copies will be made at different
stages.  There are packages that let you do some of the analysis on
data that is larger than can fit in memory.  I would also suggest you
use a database so that you do not have to continually read in the
data.

If you  pockets are deep, go for a 64-bit version with 64GB if you
want to process files that are 10-15GB.  Otherwise rethink the problem
you are trying to solve with respect to some of the
boundaries/constraints that are imposed by most system.

On Fri, Feb 5, 2010 at 2:11 PM, Satish Vadlamani
 wrote:
>
> Matthew:
> If it is going to help, here is the explanation. I have an end state in
> mind. It is given below under "End State" header. In order to get there, I
> need to start somewhere right? I started with a 850 MB file and could not
> load in what I think is reasonable time (I waited for an hour).
>
> There are references to 64 bit. How will that help? It is a 4GB RAM machine
> and there is no paging activity when loading the 850 MB file.
>
> I have seen other threads on the same types of questions. I did not see any
> clear cut answers or errors that I could have been making in the process. If
> I am missing something, please let me know. Thanks.
> Satish
>
>
> End State
>> Satish wrote: "at one time I will need to load say 15GB into R"
>
>
> -
> Satish Vadlamani
> --
> View this message in context: 
> http://n4.nabble.com/Reading-large-files-tp1469691p1470667.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reading large files

2010-02-05 Thread Charlie Sharpsteen

On Thu, Feb 4, 2010 at 5:27 PM, Vadlamani, Satish {FLNA}
 wrote:
> Folks:
> I am trying to read in a large file. Definition of large is:
> Number of lines: 333, 250
> Size: 850 MB

Perhaps this post by JD Long will provide an example that is suitable
to your situation:

 http://www.cerebralmastication.com/2009/11/loading-big-data-into-r/

Hope it helps!

-Charlie

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reading large files

2010-02-05 Thread Matthew Dowle

I can't help you further than whats already been posted to you. Maybe 
someone else can.
Best of luck.

"Satish Vadlamani"  wrote in message 
news:1265397089104-1470667.p...@n4.nabble.com...
>
> Matthew:
> If it is going to help, here is the explanation. I have an end state in
> mind. It is given below under "End State" header. In order to get there, I
> need to start somewhere right? I started with a 850 MB file and could not
> load in what I think is reasonable time (I waited for an hour).
>
> There are references to 64 bit. How will that help? It is a 4GB RAM 
> machine
> and there is no paging activity when loading the 850 MB file.
>
> I have seen other threads on the same types of questions. I did not see 
> any
> clear cut answers or errors that I could have been making in the process. 
> If
> I am missing something, please let me know. Thanks.
> Satish
>
>
> End State
>> Satish wrote: "at one time I will need to load say 15GB into R"
>
>
> -
> Satish Vadlamani
> -- 
> View this message in context: 
> http://n4.nabble.com/Reading-large-files-tp1469691p1470667.html
> Sent from the R help mailing list archive at Nabble.com.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reading large files

2010-02-05 Thread Satish Vadlamani


Matthew:
If it is going to help, here is the explanation. I have an end state in
mind. It is given below under "End State" header. In order to get there, I
need to start somewhere right? I started with a 850 MB file and could not
load in what I think is reasonable time (I waited for an hour).

There are references to 64 bit. How will that help? It is a 4GB RAM machine
and there is no paging activity when loading the 850 MB file.

I have seen other threads on the same types of questions. I did not see any
clear cut answers or errors that I could have been making in the process. If
I am missing something, please let me know. Thanks.
Satish


End State
> Satish wrote: "at one time I will need to load say 15GB into R" 


-
Satish Vadlamani
-- 
View this message in context: 
http://n4.nabble.com/Reading-large-files-tp1469691p1470667.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reading large files

2010-02-05 Thread Matthew Dowle

I agree with Jim.  The term "do analysis" is almost meaningless, the posting 
guide makes reference to statements such as that. At least he tried to 
define large, but inconsistenly (first of all 850MB, then changed to 
10-20-15GB).

> Satish wrote: "at one time I will need to load say 15GB into R"

Assuming the user is always right then, here is some information :

R has been 64bit on unix for a very long time (over a decade).  64bit R is 
also available for Win64.
It uses as much RAM you install on the box, e.g. 64GB.
Yes R users do that,  and they've been doing that for years and years.
The data.table package was mainly designed for 64bit,  although its a point 
of consternation when people think thats all its useful for.
If you don't have the hardware, then you can rent the time on EC2. There are 
tools and packages to make that easy e.g. pre-built images you can just use. 
Look at the HPC task view. Search the archives. Don't miss Biocep at 
http://biocep-distrib.r-forge.r-project.org/doc.html.

Albert Einstein said "A clever person solves a problem. A wise person avoids 
it.".   So an option for you is to be wise and move to 64bit.

"jim holtman"  wrote in message 
news:644e1f321002050513y242304der84b5674930b54...@mail.gmail.com...
> Where should be shine it?  No information provided on operating
> system, version, memory, size of files, what you want to do with them,
> etc.  Lot of options: put it in a database, read partial file (lines
> and/or columns), preprocess, etc.  Your option.
>
> On Fri, Feb 5, 2010 at 8:03 AM, Satish Vadlamani
>  wrote:
>>
>> Folks:
>> Can anyone throw some light on this? Thanks.
>> Satish
>>
>>
>> -
>> Satish Vadlamani
>> --
>> View this message in context: 
>> http://n4.nabble.com/Reading-large-files-tp1469691p1470169.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> -- 
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem that you are trying to solve?
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reading large files

2010-02-05 Thread jim holtman

Where should be shine it?  No information provided on operating
system, version, memory, size of files, what you want to do with them,
etc.  Lot of options: put it in a database, read partial file (lines
and/or columns), preprocess, etc.  Your option.

On Fri, Feb 5, 2010 at 8:03 AM, Satish Vadlamani
 wrote:
>
> Folks:
> Can anyone throw some light on this? Thanks.
> Satish
>
>
> -
> Satish Vadlamani
> --
> View this message in context: 
> http://n4.nabble.com/Reading-large-files-tp1469691p1470169.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reading large files

2010-02-05 Thread Satish Vadlamani


Folks:
Can anyone throw some light on this? Thanks.
Satish


-
Satish Vadlamani
-- 
View this message in context: 
http://n4.nabble.com/Reading-large-files-tp1469691p1470169.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reading large files

2010-02-04 Thread satishv


Folks:
Suppose I divide USA into 16 regions. My end goal is to run data mining /
analysis on each of these 16 regions. The data for each of these regions
(sales, forecast, etc.) will be in the range of 10-20 GB. At one time, I
will need to load say 15 GB into R and then do analysis.

Is this something other R users are doing? Or, is it better to switch to
SAS? Could you help me with any information on this? Thanks.
Satish


-- 
View this message in context: 
http://n4.nabble.com/Reading-large-files-tp1469691p1469700.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Reading large files

2010-02-04 Thread Vadlamani, Satish {FLNA}

Folks:
I am trying to read in a large file. Definition of large is:
Number of lines: 333, 250
Size: 850 MB

The maching is a dual core intel, with 4 GB RAM and nothing else running on it. 
I read the previous threads on read.fwf and did not see any conclusive 
statements on how to read fast. Example record and R code given below. I was 
hoping to purchase a better machine and do analysis with larger datasets - but 
these preliminary results do not look good.

Does anyone have any experience with large files (> 1GB) and using them with 
Revolution-R?


Thanks.

Satish

Example Code
key_vec <- c(1,3,3,4,2,8,8,2,2,3,2,2,1,3,3,3,3,9)
key_names <- 
c("allgeo","area1","zone","dist","ccust1","whse","bindc","ccust2","account","area2","ccust3","customer","allprod","cat","bu","class","size","bdc")
key_info <- data.frame(key_vec,key_names)
col_names <- c(key_names,sas_time$week)
num_buckets <- rep(12,209)
width_vec = c(key_vec,num_buckets)
col_classes<-c(rep("factor",18),rep("numeric",209))
#threewkoutstat <- 
read.fwf(file="3wkoutstatfcst_file02.dat",widths=width_vec,header=FALSE,colClasses=col_classes,n=100)
threewkoutstat <- 
read.fwf(file="3wkoutstatfcst_file02.dat",widths=width_vec,header=FALSE,colClasses=col_classes)
names(threewkoutstat) <- col_names

Example record (only one record pasted below)
A00400100379949254925004A0010020020150020150090.00  
  0.000.000.000.000.000.000.00  
  0.000.000.000.000.000.00
0.000.000.000.000.000.000.00
0.000.000.000.000.000.00
0.000.000.000.000.000.000.00
0.000.000.000.000.000.00
0.000.000.000.000.000.000.00
0.000.000.000.000.000.00
0.000.000.000.000.000.000.00
0.000.000.000.000.000.00
0.000.000.000.000.000.000.00
0.000.000.000.00   !
  0.000.000.000.000.000.00
0.000.000.000.000.000.000.00
0.000.000.000.000.000.00
0.000.000.000.000.000.000.00
0.000.000.000.000.000.00
0.000.000.000.000.000.000.00
0.000.000.000.000.000.60
0.600.600.700.000.000.000.00
0.000.000.000.000.000.00
0.000.000.000.000.000.000.00
0.000.000.000.000.000.00
0.000.000.000.000.000.000.00
0.000.000.000.00   !
  0.000.000.000.000.000.00 
   0.000.000.000.000.000.000.00 
   0.000.000.000.000.000.00
0.000.000.000.000.000.000.00
0.000.000.000.000.000.00
0.000.000.000.000.000.000.00
0.000.000.000.000.000.00
0.000.000.000.000.00

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reading large files quickly; resolved

2009-05-11 Thread Rob Steele

Rob Steele wrote:
> I'm finding that readLines() and read.fwf() take nearly two hours to
> work through a 3.5 GB file, even when reading in large (100 MB) chunks.
>  The unix command wc by contrast processes the same file in three
> minutes.  Is there a faster way to read files in R?
> 
> Thanks!
> 

readChar() is fast.  I use strsplit(..., fixed = TRUE) to separate the
input data into lines and then use substr() to separate the lines into
fields.  I do a little light processing and write the result back out
with writeChar().  The whole thing takes thirty minutes where read.fwf()
took nearly two hours just to read the data.

Thanks for the help!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reading large files quickly

2009-05-10 Thread Rob Steele

At the moment I'm just reading the large file to see how fast it goes.
Eventually, if I can get the read time down, I'll write out a processed
version.  Thanks for suggesting scan(); I'll try it.

Rob

jim holtman wrote:
> Since you are reading it in chunks, I assume that you are writing out each
> segment as you read it in.  How are you writing it out to save it?  Is the
> time you are quoting both the reading and the writing?  If so, can you break
> down the differences in what these operations are taking?
> 
> How do you plan to use the data?  Is it all numeric?  Are you keeping it in
> a dataframe?  Have you considered using 'scan' to read in the data and to
> specify what the columns are?  If you would like some more help, the answer
> to these questions will help.
> 
> On Sat, May 9, 2009 at 10:09 PM, Rob Steele 
> wrote:
> 
>> Thanks guys, good suggestions.  To clarify, I'm running on a fast
>> multi-core server with 16 GB RAM under 64 bit CentOS 5 and R 2.8.1.
>> Paging shouldn't be an issue since I'm reading in chunks and not trying
>> to store the whole file in memory at once.  Thanks again.
>>
>> Rob Steele wrote:
>>> I'm finding that readLines() and read.fwf() take nearly two hours to
>>> work through a 3.5 GB file, even when reading in large (100 MB) chunks.
>>>  The unix command wc by contrast processes the same file in three
>>> minutes.  Is there a faster way to read files in R?
>>>
>>> Thanks!
>>  >
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> 
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reading large files quickly

2009-05-09 Thread jim holtman

Since you are reading it in chunks, I assume that you are writing out each
segment as you read it in.  How are you writing it out to save it?  Is the
time you are quoting both the reading and the writing?  If so, can you break
down the differences in what these operations are taking?

How do you plan to use the data?  Is it all numeric?  Are you keeping it in
a dataframe?  Have you considered using 'scan' to read in the data and to
specify what the columns are?  If you would like some more help, the answer
to these questions will help.

On Sat, May 9, 2009 at 10:09 PM, Rob Steele wrote:

> Thanks guys, good suggestions.  To clarify, I'm running on a fast
> multi-core server with 16 GB RAM under 64 bit CentOS 5 and R 2.8.1.
> Paging shouldn't be an issue since I'm reading in chunks and not trying
> to store the whole file in memory at once.  Thanks again.
>
> Rob Steele wrote:
> > I'm finding that readLines() and read.fwf() take nearly two hours to
> > work through a 3.5 GB file, even when reading in large (100 MB) chunks.
> >  The unix command wc by contrast processes the same file in three
> > minutes.  Is there a faster way to read files in R?
> >
> > Thanks!
>  >
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reading large files quickly

2009-05-09 Thread Rob Steele

Thanks guys, good suggestions.  To clarify, I'm running on a fast
multi-core server with 16 GB RAM under 64 bit CentOS 5 and R 2.8.1.
Paging shouldn't be an issue since I'm reading in chunks and not trying
to store the whole file in memory at once.  Thanks again.

Rob Steele wrote:
> I'm finding that readLines() and read.fwf() take nearly two hours to
> work through a 3.5 GB file, even when reading in large (100 MB) chunks.
>  The unix command wc by contrast processes the same file in three
> minutes.  Is there a faster way to read files in R?
> 
> Thanks!
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reading large files quickly

2009-05-09 Thread Jakson Alves de Aquino

Rob Steele wrote:
> I'm finding that readLines() and read.fwf() take nearly two hours to
> work through a 3.5 GB file, even when reading in large (100 MB) chunks.
>  The unix command wc by contrast processes the same file in three
> minutes.  Is there a faster way to read files in R?

I use statist to convert the fixed width data file into a csv file
because read.table() is considerably faster than read.fwf(). For example:

system("statist --na-string NA --xcols collist big.txt big.csv")
bigdf <- read.table(file = "big.csv", header=T, as.is=T)

The file collist is a text file whose lines contain the following
information:

variable begin end

where "variable" is the column name, and "begin" and "end" are integer
numbers indicating where in big.txt the columns begin and end.

Statist can be downloaded from: http://statist.wald.intevation.org/

-- 
Jakson Aquino
Social Sciences Department
Federal University of Ceará, Brazil

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reading large files quickly

2009-05-09 Thread jim holtman

First 'wc' and readLines are doing vastly different functions.  'wc' is just
reading through the file without having to allocate memory to it;
'readLines' is actually storing the data in memory.

I have a 150MB file I was trying it on, and here is what 'wc' did on my
Windows system:

/cygdrive/c: time wc tempxx.txt
  1055808  13718468 151012320 tempxx.txt
real0m2.343s
user0m1.702s
sys 0m0.436s
/cygdrive/c:

If I multiply that by 25 to extrapolate to a 3.5GB file, it should take
about a little less than one minute to process on my relatively slow laptop.

'readLines' on the same file takes:

> system.time(x <- readLines('/tempxx.txt'))
   user  system elapsed
  37.820.47   39.23
If I extrapolate that to 3.5GB, it would take about 16 minutes.  Now
considering that I only have 2GB on my system, I would not be able to read
the whole file in at once.

You never did specify what type of system you were running on and how much
memory you had.  Were you 'paging' due to lack of memory?

> system.time(x <- readLines('/tempxx.txt'))
   user  system elapsed
  37.820.47   39.23
> object.size(x)
84814016 bytes

On Sat, May 9, 2009 at 12:25 PM, Rob Steele wrote:

> I'm finding that readLines() and read.fwf() take nearly two hours to
> work through a 3.5 GB file, even when reading in large (100 MB) chunks.
>  The unix command wc by contrast processes the same file in three
> minutes.  Is there a faster way to read files in R?
>
> Thanks!
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reading large files quickly

2009-05-09 Thread Gabor Grothendieck

You could try it with sqldf and see if that is any faster.
It use RSQLite/sqlite to read the data into a database without
going through R and from there it reads all or a portion as
specified into R.  It requires two lines of code of the form:

f < file("myfile.dat")
DF <- sqldf("select * from f", dbname = tempfile())

with appropriate modification to specify the format of your file and
possibly to indicate a portion only.  See example 6 on the sqldf
home page: http://sqldf.googlecode.com
and ?sqldf

On Sat, May 9, 2009 at 12:25 PM, Rob Steele
 wrote:
> I'm finding that readLines() and read.fwf() take nearly two hours to
> work through a 3.5 GB file, even when reading in large (100 MB) chunks.
>  The unix command wc by contrast processes the same file in three
> minutes.  Is there a faster way to read files in R?
>
> Thanks!
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Reading large files quickly

2009-05-09 Thread Rob Steele

I'm finding that readLines() and read.fwf() take nearly two hours to
work through a 3.5 GB file, even when reading in large (100 MB) chunks.
 The unix command wc by contrast processes the same file in three
minutes.  Is there a faster way to read files in R?

Thanks!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

38 matches

Mail list logo