On Mon, 2004-12-06 at 14:00 -0500, Liaw, Andy wrote: > Marc, > > I wrote the following function to read the file in chunks: > > countLines <- function(file, chunk=1e3) { > f <- file(file, "r") > on.exit(close(f)) > nLines <- 0 > while((n <- length(readLines(f, chunk))) > 0) nLines <- nLines + n > nLines > } > > To my surprise: > > > system.time(n4 <- countLines3("hcv.ap"), gcFirst=TRUE) > [1] 35.24 0.26 35.53 0.00 0.00 > > system.time(n4 <- countLines3("hcv.ap", 1), gcFirst=TRUE) > [1] 36.10 0.32 36.43 0.00 0.00 > > There's almost no penalty (in time) in reading one line at a time. > One do > save quite a bit of memory, though.
Andy, I suspect that the conservation of time for reading one line at a time, versus the larger chunks, is correlated to the use of disc caching and "read ahead" functionality in the disk sub-system and the OS. Thus, even though you are requesting one line to be read at a time in your function, each physical read of the file by the disk sub-system is in reality reading larger chunks of the file and storing that in cache memory until needed or flushed by new data. So your function is taking advantage of higher speed memory to memory transfers, versus disk to memory transfers, given the serial read nature of the process. As you point out however, system memory is conserved. Best, Marc ______________________________________________ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html