Re: Lazily extract lines from large file
On Aug 17, 2012 4:53 PM, "David Jacobs" wrote: > Okay that's great. Thanks, you guys. Was read-lines only holding onto > the head of the line seq because I bound it in the let statement? No; (partial nth values) holds on to values, and map holds on to the function you give it. Omitting needless lets is a matter of style in this case. -- Stephen Compall If anyone in the MSA is online, you should watch this flythrough. -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: Lazily extract lines from large file
On Fri, Aug 17, 2012 at 10:53 PM, David Jacobs wrote: > Okay that's great. Thanks, you guys. Was read-lines only holding onto > the head of the line seq because I bound it in the let statement? Yea... I think so... I don't know if that's a case that the compiler's "locals clearing" handles. In any event, that's why I chose to pass the lazy sequence directly to the called function without binding it in a let first. // Ben > On Fri, Aug 17, 2012 at 11:09 AM, Ben Smith-Mannschott > wrote: >> On Thu, Aug 16, 2012 at 11:47 PM, David Jacobs wrote: >>> I'm trying to grab 5 lines by their line numbers from a large (> 1GB) file >>> with Clojure. >>> >>> So far I've got: >>> >>> (defn multi-nth [values indices] >>> (map (partial nth values) indices)) >>> >>> (defn read-lines [file indices] >>> (with-open [rdr (clojure.java.io/reader file)] >>> (let [lines (line-seq rdr)] >>> (multi-nth lines indices >>> >>> Now, (read-lines "my-file" [0]) works without a problem. However, passing in >>> [0 1] gives me the following error: "java.lang.RuntimeException: >>> java.io.IOException: Stream closed" >>> >>> It seems that the stream is being closed before I can read the second line >>> from the file. Interestingly, if I manually pull out a line from the file >>> with something like `(nth lines 200)`, the `multi-nth` call works for all >>> values <= 200. >>> >>> Any idea what's going on? >>> >>> PS This question is on SO if someone wants points: >>> http://stackoverflow.com/questions/11995807/lazily-extract-lines-from-large-file >> >> The lazyness of map is biting you. The result of read-lines will not >> have been fully realized before the file is closed. Also, calling nth >> repeatedly is not going to do wonders for efficiency. Try this on for >> size: >> >> >> (ns nthlines.core >> (:require [clojure.java.io :as io])) >> >> (defn multi-nth [values indices] >> (let [matches-index? (set indices)] >> (keep-indexed #(when (matches-index? %1) %2) values))) >> >> (defn read-lines [file indices] >> (with-open [r (io/reader file)] >> (doall (multi-nth (line-seq r) indices >> >> (comment >> >> (def words "/Users/bsmith/w/nthlines/words.txt") >> (def nlines 84918960) ;; 856MB with one word per line >> >> (time (read-lines words [0 1 2 (- nlines 2) (- nlines 1)])) >> >> ;;=> "Elapsed time: 18778.904 msecs" >> ;; ("A" "a" "aa" "Zyzomys" "Zyzzogeton") >> >> ) >> >> // Ben >> >> -- >> You received this message because you are subscribed to the Google >> Groups "Clojure" group. >> To post to this group, send email to clojure@googlegroups.com >> Note that posts from new members are moderated - please be patient with your >> first post. >> To unsubscribe from this group, send email to >> clojure+unsubscr...@googlegroups.com >> For more options, visit this group at >> http://groups.google.com/group/clojure?hl=en > > -- > You received this message because you are subscribed to the Google > Groups "Clojure" group. > To post to this group, send email to clojure@googlegroups.com > Note that posts from new members are moderated - please be patient with your > first post. > To unsubscribe from this group, send email to > clojure+unsubscr...@googlegroups.com > For more options, visit this group at > http://groups.google.com/group/clojure?hl=en -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: Lazily extract lines from large file
Okay that's great. Thanks, you guys. Was read-lines only holding onto the head of the line seq because I bound it in the let statement? On Fri, Aug 17, 2012 at 11:09 AM, Ben Smith-Mannschott wrote: > On Thu, Aug 16, 2012 at 11:47 PM, David Jacobs wrote: >> I'm trying to grab 5 lines by their line numbers from a large (> 1GB) file >> with Clojure. >> >> So far I've got: >> >> (defn multi-nth [values indices] >> (map (partial nth values) indices)) >> >> (defn read-lines [file indices] >> (with-open [rdr (clojure.java.io/reader file)] >> (let [lines (line-seq rdr)] >> (multi-nth lines indices >> >> Now, (read-lines "my-file" [0]) works without a problem. However, passing in >> [0 1] gives me the following error: "java.lang.RuntimeException: >> java.io.IOException: Stream closed" >> >> It seems that the stream is being closed before I can read the second line >> from the file. Interestingly, if I manually pull out a line from the file >> with something like `(nth lines 200)`, the `multi-nth` call works for all >> values <= 200. >> >> Any idea what's going on? >> >> PS This question is on SO if someone wants points: >> http://stackoverflow.com/questions/11995807/lazily-extract-lines-from-large-file > > The lazyness of map is biting you. The result of read-lines will not > have been fully realized before the file is closed. Also, calling nth > repeatedly is not going to do wonders for efficiency. Try this on for > size: > > > (ns nthlines.core > (:require [clojure.java.io :as io])) > > (defn multi-nth [values indices] > (let [matches-index? (set indices)] > (keep-indexed #(when (matches-index? %1) %2) values))) > > (defn read-lines [file indices] > (with-open [r (io/reader file)] > (doall (multi-nth (line-seq r) indices > > (comment > > (def words "/Users/bsmith/w/nthlines/words.txt") > (def nlines 84918960) ;; 856MB with one word per line > > (time (read-lines words [0 1 2 (- nlines 2) (- nlines 1)])) > > ;;=> "Elapsed time: 18778.904 msecs" > ;; ("A" "a" "aa" "Zyzomys" "Zyzzogeton") > > ) > > // Ben > > -- > You received this message because you are subscribed to the Google > Groups "Clojure" group. > To post to this group, send email to clojure@googlegroups.com > Note that posts from new members are moderated - please be patient with your > first post. > To unsubscribe from this group, send email to > clojure+unsubscr...@googlegroups.com > For more options, visit this group at > http://groups.google.com/group/clojure?hl=en -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: Lazily extract lines from large file
On Thu, Aug 16, 2012 at 11:47 PM, David Jacobs wrote: > I'm trying to grab 5 lines by their line numbers from a large (> 1GB) file > with Clojure. > > So far I've got: > > (defn multi-nth [values indices] > (map (partial nth values) indices)) > > (defn read-lines [file indices] > (with-open [rdr (clojure.java.io/reader file)] > (let [lines (line-seq rdr)] > (multi-nth lines indices > > Now, (read-lines "my-file" [0]) works without a problem. However, passing in > [0 1] gives me the following error: "java.lang.RuntimeException: > java.io.IOException: Stream closed" > > It seems that the stream is being closed before I can read the second line > from the file. Interestingly, if I manually pull out a line from the file > with something like `(nth lines 200)`, the `multi-nth` call works for all > values <= 200. > > Any idea what's going on? > > PS This question is on SO if someone wants points: > http://stackoverflow.com/questions/11995807/lazily-extract-lines-from-large-file The lazyness of map is biting you. The result of read-lines will not have been fully realized before the file is closed. Also, calling nth repeatedly is not going to do wonders for efficiency. Try this on for size: (ns nthlines.core (:require [clojure.java.io :as io])) (defn multi-nth [values indices] (let [matches-index? (set indices)] (keep-indexed #(when (matches-index? %1) %2) values))) (defn read-lines [file indices] (with-open [r (io/reader file)] (doall (multi-nth (line-seq r) indices (comment (def words "/Users/bsmith/w/nthlines/words.txt") (def nlines 84918960) ;; 856MB with one word per line (time (read-lines words [0 1 2 (- nlines 2) (- nlines 1)])) ;;=> "Elapsed time: 18778.904 msecs" ;; ("A" "a" "aa" "Zyzomys" "Zyzzogeton") ) // Ben -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: Lazily extract lines from large file
On Thu, Aug 16, 2012 at 5:47 PM, David Jacobs wrote: > I'm trying to grab 5 lines by their line numbers from a large (> 1GB) file > with Clojure. > > So far I've got: > > (defn multi-nth [values indices] > (map (partial nth values) indices)) > > (defn read-lines [file indices] > (with-open [rdr (clojure.java.io/reader file)] > (let [lines (line-seq rdr)] > (multi-nth lines indices > > Now, (read-lines "my-file" [0]) works without a problem. However, passing in > [0 1] gives me the following error: "java.lang.RuntimeException: > java.io.IOException: Stream closed" > > It seems that the stream is being closed before I can read the second line > from the file. Interestingly, if I manually pull out a line from the file > with something like `(nth lines 200)`, the `multi-nth` call works for all > values <= 200. > > Any idea what's going on? Laziness is biting you in this case, I imagine. You're not realizing the result you seek until after with-open closes the file. You could try throwing in a doall around the let perhaps. However, you probably would want to change your algorithm a bit, as your code will hold on the to the head of "lines" until all indices has been processed, consuming memory for all lines read regardless of whether you are interested in them. Also, nth with sequences is probably not what you want most of the time. Lars Nilsson -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Lazily extract lines from large file
I'm trying to grab 5 lines by their line numbers from a large (> 1GB) file with Clojure. So far I've got: (defn multi-nth [values indices] (map (partial nth values) indices)) (defn read-lines [file indices] (with-open [rdr (clojure.java.io/reader file)] (let [lines (line-seq rdr)] (multi-nth lines indices Now, (read-lines "my-file" [0]) works without a problem. However, passing in [0 1] gives me the following error: "java.lang.RuntimeException: java.io.IOException: Stream closed" It seems that the stream is being closed before I can read the second line from the file. Interestingly, if I manually pull out a line from the file with something like `(nth lines 200)`, the `multi-nth` call works for all values <= 200. Any idea what's going on? PS This question is on SO if someone wants points: http://stackoverflow.com/questions/11995807/lazily-extract-lines-from-large-file -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en