Hi everybody!

I'am experimenting with clojure and as an exercice I use the facebook
puzzles (http://www.facebook.com/careers/puzzles.php?puzzle_id=20)
Most puzzles require to read from a text file "efficiently". So I try
to not read the full file at a time, but process it lazily.

For that I made a very small helper library that try to benefit of
lazy sequences:

;Pattern instances are immutables and thread safe
(def split-pattern (java.util.regex.Pattern/compile "\\s"))

(defn split-words [string]
  "Split the provided string into words. Separators are space and
tabs"
  (if (nil? string)
    nil
    (vec (remove #(.equals % "") (.split split-pattern string)))))

(defn read-text-file [file-name]
   "Read a text file, line per line, lazily returning nil when end of
file has been reached. Each line is a vector of words"
   (let [reader (java.io.BufferedReader. (java.io.FileReader. file-
name))]
     (map split-words (repeatedly #(.readLine reader)))))

(defn next-line [lines]
  (first (take 1 lines)))

So basically, a file is a lazy sequence of lines, and each line is a
vector of words.

Lazy behavior seems to be working at first, it I write:

 (let [data (read-text-file "liars.txt")]
  (take 5 data))

-> (["5"] ["Stephen" "1"] ["Tommaso"] ["Tommaso" "1"] ["Galileo"])

It correctly return the 5 first lines of my file. Perfect that's
exactly what I want.

But when really using it, it doesn't work. If I call several time the
take function, it always return the first lines instead of providing
the next ones:

(let [data (read-text-file "liars.txt")]
  [(take 5 data) (take 5 data)])
=>[(["5"] ["Stephen" "1"] ["Tommaso"] ["Tommaso" "1"] ["Galileo"])
(["5"] ["Stephen" "1"] ["Tommaso"] ["Tommaso" "1"] ["Galileo"])]

If I call take 10 directly, it works as expected:

(let [data (read-text-file "liars.txt")]
  (take 10 data))
=>(["5"] ["Stephen" "1"] ["Tommaso"] ["Tommaso" "1"] ["Galileo"]
["Isaac" "1"] ["Tommaso"] ["Galileo" "1"] ["Tommaso"] ["George" "2"])

You would say, why not just take all data from the stream and then
process it?

Well the file has a specific format, first line contain some data,
then few next line contain another data and so on. So I want to have a
function that will read only a subpart of the file for example,
another function another part and call them sequentially. But as shown
in the simple previous example it simply doesn't work.

My understanding is that some immutable thing is in the middle and it
act like the data reference isn't changed between calls. That not what
I want obviously as I'am getting data from a java stream, that is not
supposed to be immutable.

And how can I manage correctly this kind of cases? Efficiantly and
idiomatically.

Thanks in advance,

Nicolas.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to