Thanks for a great suggestions. I guess the code you suggested would look something like this:
fregexpr = function(pattern, filename) { # same as gregexpr but operating on files not strings # Only single string 'pattern's allowed buf.size=1024 n = file.info(filename)$size pos = NULL fp = file(filename, "rb") for (d in seq(1,n,by=buf.size)) { m = if (n-d>buf.size) buf.size else n-d p = gregexpr(pattern, readChar(fp, m))[[1]] if(p[1]>0) pos=c(pos, p+d-1) } close(fp) if (is.null(pos)) pos=-1 return (pos) } > fname = file.path(R.home(),"COPYING") > fregexpr("right", fname) [1] 73 1347 1422 1460 1727 1879 1908 1939 3106 3350 4240 5530 [13] 6637 6661 6740 9460 9534 10503 11756 12528 12566 13805 15907 16056 [25] 17053 17681 17813 > gregexpr("right", readChar(fname,file.info(fname)$size))[[1]] [1] 73 1347 1422 1460 1727 1879 1908 1939 3106 3350 4240 5530 [13] 6637 6661 6740 9460 9534 10503 11756 12528 12566 13805 15907 16056 [25] 17053 17681 17813 attr(,"match.length") [1] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 The function above does what I need, if someone needs a function that parallels gregexpr but operates on files not strings, than most of the work would be in modifying line "if(p[1]>0) pos=c(pos, p+d-1)" to do concatination and addition on lists. Thanks Jarek Tuszynski -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Seth Falcon Sent: Friday, November 04, 2005 1:44 AM To: r-help@stat.math.ethz.ch Subject: Re: [R] Search within a file On 3 Nov 2005, [EMAIL PROTECTED] wrote: > I am looking for a way to search a file for position of some > expression, from within R. My current code: > > sha1Pos = gregexpr("<sha1>", readChar(filename, > file.info(filename)$size))[[1]] > > Works fine for small files, but text files I will be working with > might get up to Gb range, so I was trying to accomplish the same > without loading the whole file into R. I would think you could use readLines to read in a batch of lines, run (g)regexpr, and keep track of matches and position. Create a connection to the file using file() first, and then subsequent calls to readLines will start where you left off. But you will need to adjust the position indices returned by gregexpr by how far into the file you are. Seems very doable. + seth ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html