I'm assuming your problem is with memory, and not multithreaded reading. Given 
that:

I also work with files much too big to fit into memory.

You could just use java.util.Scanner. That has a useDelimiter method, so you 
can set the pattern to break on:

(defn lazy-read-records [file regex]
  (let [scanner (java.util.Scanner. file)
        get-next (fn get-next []
                   (try
                     (cons (.next scanner)
                           (lazy-seq (get-next)))
                     (catch java.util.NoSuchElementException e ())))]
    (.useDelimiter scanner regex)
    (lazy-seq (get-next))))

The trick here is that the sequence is lazy. It won't read the file until it 
needs to in order to return the next element.

If you don't hold onto the head of the sequence, the front part can be garbage 
collected while you are working further down.

PS If, for some reason, you want the character indices rather than the actual 
records, replace (.next scanner) with:

(do (.next scanner)
        (.start (.match scanner)))

On Aug 16, 2010, at 5:22 PM, cej38 wrote:

> Hello,
>  I work with text files that are, at times, too large to read in all
> at one time.  In searching for a way to read in only part of the file
> I came across
> http://meshy.org/2009/12/13/widefinder-2-with-clojure.html
> 
> I am only interested in the chunk-file and read-lines-range functions.
> 
> My problem is that I would like to change chunk-file, so that instead
> of looking for the next line break, it would look for some regular
> expression (to be given as part of the function call), and would then
> report the position of the first character of every instance of that
> regular expression.
> 
> After working on this for a couple of days I am raising the white
> flag.  Is there someone that can help me with this?
> 
> Thanks.
> 
> 
> 
> -- 
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with your 
> first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to