Murray Jorgensen <[EMAIL PROTECTED]> wrote:
        I'm wondering if anyone has written some functions or code for handling 
        very large files in R. I am working with a data file that is 41 
        variables times who knows how many observations making up 27MB altogether.
        
Does that really count as "very large"?
I tried making a file where each line was
"1 2 3 .... 39 40 41"
With 240,000 lines it came to 27.36 million bytes.
You can *hold* that amount of data in R quite easily.
The problem is the time it takes to read it using scan() or read.table().

        The sort of thing that I am thinking of having R do is
        
        - count the number of lines in a file
        
        - form a data frame by selecting all cases whose line numbers are in a 
        supplied vector (which could be used to extract random subfiles of 
        particular sizes)
        
        Does anyone know of a package that might be useful for this?
        
There's a Unix program I posted to comp.sources years ago called "sample":
    sample -(how many) <(where from)
selects the given number of lines without replacement its standard input
and writes them in random order to its standard output.  Hook it up to a
decent random number generator and you're pretty much done: read.table()
and scan() can read from a pipe.

______________________________________________
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Reply via email to