Re: Lazily extract lines from large file

2012-08-22 Thread Stephen Compall
On Aug 17, 2012 4:53 PM, "David Jacobs"  wrote:
> Okay that's great. Thanks, you guys. Was read-lines only holding onto
> the head of the line seq because I bound it in the let statement?

No; (partial nth values) holds on to values, and map holds on to the
function you give it.

Omitting needless lets is a matter of style in this case.

--
Stephen Compall
If anyone in the MSA is online, you should watch this flythrough.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: Lazily extract lines from large file

2012-08-17 Thread Ben Smith-Mannschott
On Fri, Aug 17, 2012 at 10:53 PM, David Jacobs  wrote:
> Okay that's great. Thanks, you guys. Was read-lines only holding onto
> the head of the line seq because I bound it in the let statement?

Yea... I think so... I don't know if that's a case that the compiler's
"locals clearing" handles. In any event, that's why I chose to pass
the lazy sequence directly to the called function without binding it
in a let first.

// Ben

> On Fri, Aug 17, 2012 at 11:09 AM, Ben Smith-Mannschott
>  wrote:
>> On Thu, Aug 16, 2012 at 11:47 PM, David Jacobs  wrote:
>>> I'm trying to grab 5 lines by their line numbers from a large (> 1GB) file
>>> with Clojure.
>>>
>>> So far I've got:
>>>
>>> (defn multi-nth [values indices]
>>>   (map (partial nth values) indices))
>>>
>>> (defn read-lines [file indices]
>>>   (with-open [rdr (clojure.java.io/reader file)]
>>> (let [lines (line-seq rdr)]
>>>   (multi-nth lines indices
>>>
>>> Now, (read-lines "my-file" [0]) works without a problem. However, passing in
>>> [0 1] gives me the following error: "java.lang.RuntimeException:
>>> java.io.IOException: Stream closed"
>>>
>>> It seems that the stream is being closed before I can read the second line
>>> from the file. Interestingly, if I manually pull out a line from the file
>>> with something like `(nth lines 200)`, the `multi-nth` call works for all
>>> values <= 200.
>>>
>>> Any idea what's going on?
>>>
>>> PS This question is on SO if someone wants points:
>>> http://stackoverflow.com/questions/11995807/lazily-extract-lines-from-large-file
>>
>> The lazyness of map is biting you. The result of read-lines will not
>> have been fully realized before the file is closed.  Also, calling nth
>> repeatedly is not going to do wonders for efficiency. Try this on for
>> size:
>>
>>
>> (ns nthlines.core
>>   (:require [clojure.java.io :as io]))
>>
>> (defn multi-nth [values indices]
>>   (let [matches-index? (set indices)]
>> (keep-indexed #(when (matches-index? %1) %2) values)))
>>
>> (defn read-lines [file indices]
>>   (with-open [r (io/reader file)]
>> (doall (multi-nth (line-seq r) indices
>>
>> (comment
>>
>>   (def words "/Users/bsmith/w/nthlines/words.txt")
>>   (def nlines 84918960) ;; 856MB with one word per line
>>
>>   (time (read-lines words [0 1 2 (- nlines 2) (- nlines 1)]))
>>
>>   ;;=> "Elapsed time: 18778.904 msecs"
>>   ;;   ("A" "a" "aa" "Zyzomys" "Zyzzogeton")
>>
>> )
>>
>> // Ben
>>
>> --
>> You received this message because you are subscribed to the Google
>> Groups "Clojure" group.
>> To post to this group, send email to clojure@googlegroups.com
>> Note that posts from new members are moderated - please be patient with your 
>> first post.
>> To unsubscribe from this group, send email to
>> clojure+unsubscr...@googlegroups.com
>> For more options, visit this group at
>> http://groups.google.com/group/clojure?hl=en
>
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with your 
> first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: Lazily extract lines from large file

2012-08-17 Thread David Jacobs
Okay that's great. Thanks, you guys. Was read-lines only holding onto
the head of the line seq because I bound it in the let statement?

On Fri, Aug 17, 2012 at 11:09 AM, Ben Smith-Mannschott
 wrote:
> On Thu, Aug 16, 2012 at 11:47 PM, David Jacobs  wrote:
>> I'm trying to grab 5 lines by their line numbers from a large (> 1GB) file
>> with Clojure.
>>
>> So far I've got:
>>
>> (defn multi-nth [values indices]
>>   (map (partial nth values) indices))
>>
>> (defn read-lines [file indices]
>>   (with-open [rdr (clojure.java.io/reader file)]
>> (let [lines (line-seq rdr)]
>>   (multi-nth lines indices
>>
>> Now, (read-lines "my-file" [0]) works without a problem. However, passing in
>> [0 1] gives me the following error: "java.lang.RuntimeException:
>> java.io.IOException: Stream closed"
>>
>> It seems that the stream is being closed before I can read the second line
>> from the file. Interestingly, if I manually pull out a line from the file
>> with something like `(nth lines 200)`, the `multi-nth` call works for all
>> values <= 200.
>>
>> Any idea what's going on?
>>
>> PS This question is on SO if someone wants points:
>> http://stackoverflow.com/questions/11995807/lazily-extract-lines-from-large-file
>
> The lazyness of map is biting you. The result of read-lines will not
> have been fully realized before the file is closed.  Also, calling nth
> repeatedly is not going to do wonders for efficiency. Try this on for
> size:
>
>
> (ns nthlines.core
>   (:require [clojure.java.io :as io]))
>
> (defn multi-nth [values indices]
>   (let [matches-index? (set indices)]
> (keep-indexed #(when (matches-index? %1) %2) values)))
>
> (defn read-lines [file indices]
>   (with-open [r (io/reader file)]
> (doall (multi-nth (line-seq r) indices
>
> (comment
>
>   (def words "/Users/bsmith/w/nthlines/words.txt")
>   (def nlines 84918960) ;; 856MB with one word per line
>
>   (time (read-lines words [0 1 2 (- nlines 2) (- nlines 1)]))
>
>   ;;=> "Elapsed time: 18778.904 msecs"
>   ;;   ("A" "a" "aa" "Zyzomys" "Zyzzogeton")
>
> )
>
> // Ben
>
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with your 
> first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: Lazily extract lines from large file

2012-08-17 Thread Ben Smith-Mannschott
On Thu, Aug 16, 2012 at 11:47 PM, David Jacobs  wrote:
> I'm trying to grab 5 lines by their line numbers from a large (> 1GB) file
> with Clojure.
>
> So far I've got:
>
> (defn multi-nth [values indices]
>   (map (partial nth values) indices))
>
> (defn read-lines [file indices]
>   (with-open [rdr (clojure.java.io/reader file)]
> (let [lines (line-seq rdr)]
>   (multi-nth lines indices
>
> Now, (read-lines "my-file" [0]) works without a problem. However, passing in
> [0 1] gives me the following error: "java.lang.RuntimeException:
> java.io.IOException: Stream closed"
>
> It seems that the stream is being closed before I can read the second line
> from the file. Interestingly, if I manually pull out a line from the file
> with something like `(nth lines 200)`, the `multi-nth` call works for all
> values <= 200.
>
> Any idea what's going on?
>
> PS This question is on SO if someone wants points:
> http://stackoverflow.com/questions/11995807/lazily-extract-lines-from-large-file

The lazyness of map is biting you. The result of read-lines will not
have been fully realized before the file is closed.  Also, calling nth
repeatedly is not going to do wonders for efficiency. Try this on for
size:


(ns nthlines.core
  (:require [clojure.java.io :as io]))

(defn multi-nth [values indices]
  (let [matches-index? (set indices)]
(keep-indexed #(when (matches-index? %1) %2) values)))

(defn read-lines [file indices]
  (with-open [r (io/reader file)]
(doall (multi-nth (line-seq r) indices

(comment

  (def words "/Users/bsmith/w/nthlines/words.txt")
  (def nlines 84918960) ;; 856MB with one word per line

  (time (read-lines words [0 1 2 (- nlines 2) (- nlines 1)]))

  ;;=> "Elapsed time: 18778.904 msecs"
  ;;   ("A" "a" "aa" "Zyzomys" "Zyzzogeton")

)

// Ben

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: Lazily extract lines from large file

2012-08-16 Thread Lars Nilsson
On Thu, Aug 16, 2012 at 5:47 PM, David Jacobs  wrote:
> I'm trying to grab 5 lines by their line numbers from a large (> 1GB) file
> with Clojure.
>
> So far I've got:
>
> (defn multi-nth [values indices]
>   (map (partial nth values) indices))
>
> (defn read-lines [file indices]
>   (with-open [rdr (clojure.java.io/reader file)]
> (let [lines (line-seq rdr)]
>   (multi-nth lines indices
>
> Now, (read-lines "my-file" [0]) works without a problem. However, passing in
> [0 1] gives me the following error: "java.lang.RuntimeException:
> java.io.IOException: Stream closed"
>
> It seems that the stream is being closed before I can read the second line
> from the file. Interestingly, if I manually pull out a line from the file
> with something like `(nth lines 200)`, the `multi-nth` call works for all
> values <= 200.
>
> Any idea what's going on?

Laziness is biting you in this case, I imagine. You're not realizing
the result you seek until after with-open closes the file. You could
try throwing in a doall around the let perhaps. However, you probably
would want to change your algorithm a bit, as your code will hold on
the to the head of "lines" until all indices has been processed,
consuming memory for all lines read regardless of whether you are
interested in them. Also, nth with sequences is probably not what you
want most of the time.

Lars Nilsson

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Lazily extract lines from large file

2012-08-16 Thread David Jacobs
I'm trying to grab 5 lines by their line numbers from a large (> 1GB) file 
with Clojure.

So far I've got:

(defn multi-nth [values indices]
  (map (partial nth values) indices))

(defn read-lines [file indices]
  (with-open [rdr (clojure.java.io/reader file)]
(let [lines (line-seq rdr)]
  (multi-nth lines indices

Now, (read-lines "my-file" [0]) works without a problem. However, passing 
in [0 1] gives me the following error: "java.lang.RuntimeException: 
java.io.IOException: Stream closed" 

It seems that the stream is being closed before I can read the second line 
from the file. Interestingly, if I manually pull out a line from the file 
with something like `(nth lines 200)`, the `multi-nth` call works for all 
values <= 200.

Any idea what's going on?

PS This question is on SO if someone wants 
points: 
http://stackoverflow.com/questions/11995807/lazily-extract-lines-from-large-file

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en