RE: [R] how to get how many lines there are in a file.

Marc Schwartz Mon, 06 Dec 2004 13:11:19 -0800

On Mon, 2004-12-06 at 14:00 -0500, Liaw, Andy wrote:

> Marc,
> 
> I wrote the following function to read the file in chunks:
> 
> countLines <- function(file, chunk=1e3) {
>     f <- file(file, "r")
>     on.exit(close(f))
>     nLines <- 0
>     while((n <- length(readLines(f, chunk))) > 0) nLines <- nLines + n
>     nLines
> }
> 
> To my surprise:
> 
> > system.time(n4 <- countLines3("hcv.ap"), gcFirst=TRUE)
> [1] 35.24  0.26 35.53  0.00  0.00
> > system.time(n4 <- countLines3("hcv.ap", 1), gcFirst=TRUE)
> [1] 36.10  0.32 36.43  0.00  0.00
> 
> There's almost no penalty (in time) in reading one line at a time.
> One do
> save quite a bit of memory, though.


Andy, 

I suspect that the conservation of time for reading one line at a time,
versus the larger chunks, is correlated to the use of disc caching and
"read ahead" functionality in the disk sub-system and the OS.

Thus, even though you are requesting one line to be read at a time in
your function, each physical read of the file by the disk sub-system is
in reality reading larger chunks of the file and storing that in cache
memory until needed or flushed by new data. 

So your function is taking advantage of higher speed memory to memory
transfers, versus disk to memory transfers, given the serial read nature
of the process.

As you point out however, system memory is conserved.

Best,

Marc

______________________________________________
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

RE: [R] how to get how many lines there are in a file.

Reply via email to