How about using python/perl/ruby, designed precisely for this type of
routine data munging, to pipe the processed output into an R dataframe?

 

msci <- read.table(pipe("python steve/python/msci.py"), header=T, as.is=T)

 

Iteratively, you could deliver the python output in chunks, something like:

 

msci <- read.table(pipe("python steve/python/msci.py 1 500000"), header=T,
as.is=T)

 

msci <- rbind(msci, read.table(pipe("python steve/python/msci.py 500001
1000000"), header=T, as.is=T))

 

etc.

 

Steve Miller

 

 

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Jason Barnhart
Sent: Wednesday, September 13, 2006 11:52 AM
To: Gabor Grothendieck; Anupam Tyagi
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] Reading fixed column format

 

Another possibility:

 

    1) Split the original file into smaller chunks of xx,xxx of rows.

    2) Process each file using read.fwf saving the requisite variables.

       (If necessary, save each intermediate matrix/data.frame to disk

       to conserve space)

    3) 'rbind' the results.

 

Not exactly elegant but it works.

 

----- Original Message ----- 

From: "Gabor Grothendieck" <[EMAIL PROTECTED]>

To: "Anupam Tyagi" <[EMAIL PROTECTED]>

Cc: <r-help@stat.math.ethz.ch>

Sent: Wednesday, September 13, 2006 7:21 AM

Subject: Re: [R] Reading fixed column format

 

 

> On 9/13/06, Anupam Tyagi <[EMAIL PROTECTED]> wrote:

>> Gabor Grothendieck <ggrothendieck <at> gmail.com> writes:

>> 

>> > C:\bin>cut -c2-3,6-8 a.dat

>> > 23678

>> > 23678

>> > 23678

>> 

>> Thanks. I think this will work. How do I redirect the output to a file on

>> windows?

> 

> Same as on UNIX

> 

> cut -c2-3,6-8 a.dat > a2.dat

> 

>> Is there simple way to convert the cut command to a script on windows,

> 

> Using notepad or other text editor put it in file a.bat and then

> issue this command from the console

> 

> a.bat

> 

> Note that you could process it multiple time if you like:

> 

> cut -c6-8 a.dat > a2.dat

> cut -c2-3 a2.dat > a3.dat

> 

> produces the same thing but uses 2 passes and so keeps each line shorter.

> Be sure you do it from the tail end forward as shown above to avoid having

> to recalculate the positions.

> 

>> because the entire command may not fit on one line? Anupam.

>> 

> 

> ______________________________________________

> R-help@stat.math.ethz.ch mailing list

> https://stat.ethz.ch/mailman/listinfo/r-help

> PLEASE do read the posting guide 

> http://www.R-project.org/posting-guide.html

> and provide commented, minimal, self-contained, reproducible code.

> 

 

______________________________________________

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help

PLEASE do read the posting guide http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


        [[alternative HTML version deleted]]

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to