Re: puzzlement over lazy sequences
On Sep 12, 2011, at 11:28 PM, Ken Wesson wrote: But if, as you say, take, drop, etc. work for larger n, it should be easy to make nth work with larger n and non-random-access seqs, just by changing the non-random-access case to (first (drop n the-seq)). I'd be rather surprised if nth suddenly started giving linear performance on arrays for large values of n. If nth can't be made to work in constant time on arrays for n 2**31, then I'd favor the IllegalArgumentException approach. One can always do (first (drop …)) manually if linear performance is acceptable. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: puzzlement over lazy sequences
On Tue, Sep 13, 2011 at 2:39 AM, Michael Gardner gardne...@gmail.com wrote: On Sep 12, 2011, at 11:28 PM, Ken Wesson wrote: But if, as you say, take, drop, etc. work for larger n, it should be easy to make nth work with larger n and non-random-access seqs, just by changing the non-random-access case to (first (drop n the-seq)). I'd be rather surprised if nth suddenly started giving linear performance on arrays for large values of n. If nth can't be made to work in constant time on arrays for n 2**31, then I'd favor the IllegalArgumentException approach. One can always do (first (drop …)) manually if linear performance is acceptable. It already gives log32 rather than constant time performance on vectors, I think. That would stay the same. It can be extended to sorted-set and sorted-map with log2 time performance, again including with large-n support. On actual java.util collections and Java arrays it can just check if n exceeds 2^31 - 1, give not-found behavior if so, and otherwise punt to Java. And on non-random-access stuff it can use (first (drop n thingy)). -- Protege: What is this seething mass of parentheses?! Master: Your father's Lisp REPL. This is the language of a true hacker. Not as clumsy or random as C++; a language for a more civilized age. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: puzzlement over lazy sequences
Hi, On Tuesday, September 13, 2011 6:28:01 AM UTC+2, Ken Wesson wrote: They're trees of arrays of 32 items, and the trees can in principle have arbitrary depth. So the 2^31 limit on Java arrays doesn't impact the Clojure collections, it seems. are you sure? As far as I understood things (reading PersistentVector.java and [1]), there can only be 6 levels of nodes, because of the 32bit which are used to store an integer and because of the bit-shifting which is used to find the correct level. That would lead to a limit of 32^6 which is half of the 2^31: user= (= (Math/pow 2 31) (* 2 (Math/pow 32 6))) true :-) Regards, Stefan [1] http://blog.higher-order.net/2009/02/01/understanding-clojures-persistentvector-implementation/ -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: puzzlement over lazy sequences
On Tue, Sep 13, 2011 at 4:18 AM, Stefan Kamphausen ska2...@googlemail.com wrote: Hi, On Tuesday, September 13, 2011 6:28:01 AM UTC+2, Ken Wesson wrote: They're trees of arrays of 32 items, and the trees can in principle have arbitrary depth. So the 2^31 limit on Java arrays doesn't impact the Clojure collections, it seems. are you sure? As far as I understood things (reading PersistentVector.java and [1]), there can only be 6 levels of nodes, because of the 32bit which are used to store an integer and because of the bit-shifting which is used to find the correct level. I said the tree can *in principle* have arbitrary depth. Bit-shifting can be done on BigIntegers, too. -- Protege: What is this seething mass of parentheses?! Master: Your father's Lisp REPL. This is the language of a true hacker. Not as clumsy or random as C++; a language for a more civilized age. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: puzzlement over lazy sequences
On Mon, Sep 12, 2011 at 12:54 AM, Alan Malloy a...@malloys.org wrote: Integer overflow. user (mod 9876543210 (bigint (Math/pow 2 32))) 1286608618 Oops. But nth can probably be fixed while keeping good performance: (defn- small-drop [s n] (loop [n (int n) s (seq s)] (if (zero? n) s (recur (dec n) (next s) (def pow231 (bigint (Math/pow 2 31))) (defn my-nth [s n] (let [s (small-drop s (rem n pow231))] (loop [n (quot n pow231) s s] (if s (if (zero? n) (first s) (recur (dec n) (small-drop s pow231))) Given a bigint n, this does only two arithmetic operations on bigints at the start and one possibly-bigint dec every 2^31 items, instead of doing a bigint dec every single item. So it should be nearly as fast for large n as for small n, per item. I'm guessing there are similar bugs in drop, take, and so forth with large n and large (or infinite) seqs. They should all be fixed. Note: the above code is untested, and (nth n too-short-a-seq) returns nil. For further speed you'd want small-drop to use an unchecked dec. You'd also want my-nth to punt to the special implementations for vectors and other random-access seqables, and maybe change the behavior on long seqs. -- Protege: What is this seething mass of parentheses?! Master: Your father's Lisp REPL. This is the language of a true hacker. Not as clumsy or random as C++; a language for a more civilized age. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: puzzlement over lazy sequences
On Mon, Sep 12, 2011 at 11:55 AM, Stuart Halloway stuart.hallo...@gmail.com wrote: I'm guessing there are similar bugs in drop, take, and so forth with large n and large (or infinite) seqs. They should all be fixed. The other fns are ok, thanks to their separate heritage. drop, take, et al are sequence functions, and proceed iteratively. nth is of a different lineage. It was designed to target collections that support constant time-lookup. Collection-y things in the Java world provide APIs that take int, not long, because that is what arrays do at the bottom. FWIW, Sun (now Oracle) considers this a low-priority problem: http://bugs.sun.com/bugdatabase/view_bug.do;jsessionid=5d84e2e6a65a5c03eacaafb73e91?bug_id=4963452. nth has a doc bug: it should be documented as a function of a collection and a 32-bit int. A patch updating the docstring, and even enforcing it with an IllegalArgumentException, would be welcome. (George: want to write your first patch? :-) ) Stu Stuart Halloway Clojure/core http://clojure.com The Clojure collections aren't just flat arrays, though, are they? They're trees of arrays of 32 items, and the trees can in principle have arbitrary depth. So the 2^31 limit on Java arrays doesn't impact the Clojure collections, it seems. And nth should therefore work for larger n, on all of them. But if, as you say, take, drop, etc. work for larger n, it should be easy to make nth work with larger n and non-random-access seqs, just by changing the non-random-access case to (first (drop n the-seq)). -- Protege: What is this seething mass of parentheses?! Master: Your father's Lisp REPL. This is the language of a true hacker. Not as clumsy or random as C++; a language for a more civilized age. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: puzzlement over lazy sequences
Hi George, Once again, I make a globally referenced, infinitely long stream. But now I use lazy-seq instead of cycle: user= (def ev-stream (lazy-seq (cons true (cons false ev- stream #'user/ev-stream user= (defn ev? [n] (nth ev-stream n)) #'user/ev? user= (time (ev? 9876543210)) Elapsed time: 47244.061 msecs true OMG! Not only did it NOT hose the heap and crash, it actually ran much faster than the version with the unreferenced (cycle [true false]). The consing version of ev-stream is self-referential, because you explicitly made it so by consing it back onto itself. So it only has two items in it, though it bounces back and forth between them forever. The cycling version is not self-referential. The only reason I can think of, for this to NOT exhaust memory, is that the lazy-seq macro knows when to construct a circular list. Is that what happens? If so, why DOESN'T it happen with cycle, where it's obviously the behavior one would want? cycle actually calls lazy-seq. A quick way to check such things at the REPL is with source: user= (source cycle) (defn cycle Returns a lazy (infinite!) sequence of repetitions of the items in coll. {:added 1.0 :static true} [coll] (lazy-seq (when-let [s (seq coll)] (concat s (cycle s) snip Now I'll test it with 9876543210, a number which ev? was able to handle: user= (time (mod3 9876543210)) Elapsed time: 37759.615 msecs 1 user= (mod 987654321 3) 0 Whoa! The computation finished in reasonable time, but with the WRONG answer! How did that happen? Did I find a bug? No, there is simply a typo in your input arg. Stu Stuart Halloway Clojure/core http://clojure.com -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: puzzlement over lazy sequences
On Sun, Sep 11, 2011 at 8:28 AM, Stuart Halloway stuart.hallo...@gmail.com wrote: cycle actually calls lazy-seq. A quick way to check such things at the REPL is with source: user= (source cycle) (defn cycle Returns a lazy (infinite!) sequence of repetitions of the items in coll. {:added 1.0 :static true} [coll] (lazy-seq (when-let [s (seq coll)] (concat s (cycle s) snip Is there any reason not to define it as (defn cycle Returns a lazy (infinite!) sequence of repetitions of the items in coll. {:added 1.0 :static true} [coll] (let [l (lazy-seq (when-let [s (seq coll)] (concat s l This will have the effect that holding any element of the cycle holds the head; however, even the original holds the head of s as long as the cycle is referenced, so that at worst doubles memory use for finite s, and for finite walks along the cycle with infinite s, as s (or the first N elements of s) get held onto in both cases, but an equal number of cycle elements get held onto with the self-referencing cycle instead of potentially only one at a time. Now I'll test it with 9876543210, a number which ev? was able to handle: user= (time (mod3 9876543210)) Elapsed time: 37759.615 msecs 1 user= (mod 987654321 3) 0 Whoa! The computation finished in reasonable time, but with the WRONG answer! How did that happen? Did I find a bug? No, there is simply a typo in your input arg. True, but both numbers are congruent to 0 mod 3, so where did the 1 come from in (time (mod3 9876543210))? -- Protege: What is this seething mass of parentheses?! Master: Your father's Lisp REPL. This is the language of a true hacker. Not as clumsy or random as C++; a language for a more civilized age. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: puzzlement over lazy sequences
Hi, Stu, Loving your book! I posted a reply earlier, through a different interface, which went to moderators. Sorry for the clumsiness, but I'm not familiar with the mechanics of newsgroups. On Sep 11, 7:28 am, Stuart Halloway stuart.hallo...@gmail.com wrote: The consing version of ev-stream is self-referential, because you explicitly made it so by consing it back onto itself. So it only has two items in it, though it bounces back and forth between them forever. The cycling version is not self-referential. Since it's the self reference that gives the nice results (finite memory consumption, and apparently better speed), I came up with a little macro to provide it: (defmacro defcycle [name coll] `(def ~name (lazy-seq (concat ~coll ~name))) ) This is probably not the most useful way to do it, since the user has to provide name. Now I'll test it with 9876543210, a number which ev? was able to handle: user= (time (mod3 9876543210)) Elapsed time: 37759.615 msecs 1 user= (mod 987654321 3) 0 Whoa! The computation finished in reasonable time, but with the WRONG answer! How did that happen? Did I find a bug? No, there is simply a typo in your input arg. with the typo fixed, i.e. (mod 9876543210 3), the result is still 0. regards, George -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: puzzlement over lazy sequences
Integer overflow. user (mod 9876543210 (bigint (Math/pow 2 32))) 1286608618 On Sep 11, 9:44 pm, George Kangas gwkan...@gmail.com wrote: I believe the bug can be blamed on nth. Using nth, I make a function which should be identity on natural numbers: Clojure 1.2.1 user= (defn ident [n] (nth (iterate inc 0) n)) #'user/ident And it works, for reasonable size numbers: user= (ident 12345) 12345 user= (ident 7654321) 7654321 But, with a ridiculously huge number... user= (ident 9876543210) 1286608618 ...something goes wrong. Also, when I replace nth in my code (in the top post) with mth, then it works. Here's mth: user= (defn mth [seq m] (if (zero? m) (first seq) (recur (rest seq) (dec m #'user/mth user= (mth (iterate inc 0) 123456) 123456 user= (mth (iterate inc 0) 9876543210) 9876543210 Of course, mth is godawful slow. I guess nobody wanted to find the 9876543210th element of a sequence, until this wanking dilettante came along! cheers, George -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
puzzlement over lazy sequences
Greetings, Clojurers! I've been playing with clojure, particularly with lazy sequences. Some of the results have left me puzzled, so I saved a REPL session wherein I illustrate the points of puzzlement. REPL lines are indented below; added comments are unindented. Clojure 1.2.1 I define a silly version of even? which uses a lazy sequence: user= (defn ev? [n] (nth (cycle [true false]) n)) #'user/ev? It works, albeit slowly, for large arguments: user= (time (ev? 9876543210)) Elapsed time: 337225.287 msecs true This won't work, if any reference is retained to that long sequence: user= (def ev-stream (cycle [true false])) #'user/ev-stream user= (defn ev? [n] (nth ev-stream n)) #'user/ev? user= (time (ev? 9876543210)) Exception in thread main java.lang.OutOfMemoryError: Java heap space at java.lang.reflect.Method.copy(Method.java:143) at java.lang.reflect.ReflectAccess.copyMethod(ReflectAccess.java:118) at sun.reflect.ReflectionFactory.copyMethod(ReflectionFactory.java: 282) at java.lang.Class.copyMethods(Class.java:2748) at java.lang.Class.getMethods(Class.java:1410) at clojure.lang.Reflector.getMethods(Reflector.java:310) at clojure.lang.Reflector.invokeInstanceMethod(Reflector.java:27) at clojure.main$repl_caught.invoke(main.clj:116) at clojure.main$repl$fn__5637.invoke(main.clj:206) at clojure.main$repl.doInvoke(main.clj:204) at clojure.lang.RestFn.invoke(RestFn.java:421) at clojure.main$repl_opt.invoke(main.clj:262) at clojure.main$main.doInvoke(main.clj:355) at clojure.lang.RestFn.invoke(RestFn.java:397) at clojure.lang.Var.invoke(Var.java:361) at clojure.lang.AFn.applyToHelper(AFn.java:159) at clojure.lang.Var.applyTo(Var.java:482) at clojure.main.main(main.java:37) Process scheme exited abnormally with code 1 Okay, that was the expected java.lang.barf. No puzzlement so far. Now I restart the REPL. Clojure 1.2.1 Once again, I make a globally referenced, infinitely long stream. But now I use lazy-seq instead of cycle: user= (def ev-stream (lazy-seq (cons true (cons false ev- stream #'user/ev-stream user= (defn ev? [n] (nth ev-stream n)) #'user/ev? user= (time (ev? 9876543210)) Elapsed time: 47244.061 msecs true OMG! Not only did it NOT hose the heap and crash, it actually ran much faster than the version with the unreferenced (cycle [true false]). The only reason I can think of, for this to NOT exhaust memory, is that the lazy-seq macro knows when to construct a circular list. Is that what happens? If so, why DOESN'T it happen with cycle, where it's obviously the behavior one would want? Okay, now I'll play the same game again, but with mod 3: user= (def mod3-stream (lazy-seq (cons 0 (cons 1 (cons 2 mod3- stream) #'user/mod3-stream user= (defn mod3 [n] (nth mod3-stream n)) #'user/mod3 It passes a simple test: user= (map mod3 (range 20)) (0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1) It passes a test with a pretty big number: user= (time (mod3 987654321)) Elapsed time: 28998.966 msecs 0 user= (mod 987654321 3) 0 Now I'll test it with 9876543210, a number which ev? was able to handle: user= (time (mod3 9876543210)) Elapsed time: 37759.615 msecs 1 user= (mod 987654321 3) 0 Whoa! The computation finished in reasonable time, but with the WRONG answer! How did that happen? Did I find a bug? Thanks for reading this far, and best regards, George Kangas -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en