Re: puzzlement over lazy sequences

2011-09-13 Thread Michael Gardner
On Sep 12, 2011, at 11:28 PM, Ken Wesson wrote:

 But if, as you say, take, drop, etc. work for larger n, it should be
 easy to make nth work with larger n and non-random-access seqs, just
 by changing the non-random-access case to (first (drop n the-seq)).

I'd be rather surprised if nth suddenly started giving linear performance on 
arrays for large values of n. If nth can't be made to work in constant time on 
arrays for n  2**31, then I'd favor the IllegalArgumentException approach. One 
can always do (first (drop …)) manually if linear performance is acceptable.

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: puzzlement over lazy sequences

2011-09-13 Thread Ken Wesson
On Tue, Sep 13, 2011 at 2:39 AM, Michael Gardner gardne...@gmail.com wrote:
 On Sep 12, 2011, at 11:28 PM, Ken Wesson wrote:

 But if, as you say, take, drop, etc. work for larger n, it should be
 easy to make nth work with larger n and non-random-access seqs, just
 by changing the non-random-access case to (first (drop n the-seq)).

 I'd be rather surprised if nth suddenly started giving linear performance on 
 arrays for large values of n. If nth can't be made to work in constant time 
 on arrays for n  2**31, then I'd favor the IllegalArgumentException 
 approach. One can always do (first (drop …)) manually if linear performance 
 is acceptable.

It already gives log32 rather than constant time performance on
vectors, I think. That would stay the same. It can be extended to
sorted-set and sorted-map with log2 time performance, again including
with large-n support. On actual java.util collections and Java arrays
it can just check if n exceeds 2^31 - 1, give not-found behavior if
so, and otherwise punt to Java. And on non-random-access stuff it can
use (first (drop n thingy)).

-- 
Protege: What is this seething mass of parentheses?!
Master: Your father's Lisp REPL. This is the language of a true
hacker. Not as clumsy or random as C++; a language for a more
civilized age.

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: puzzlement over lazy sequences

2011-09-13 Thread Stefan Kamphausen
Hi,

On Tuesday, September 13, 2011 6:28:01 AM UTC+2, Ken Wesson wrote:


 They're trees of arrays of 32 items, and the trees can in principle
 have arbitrary depth. So the 2^31 limit on Java arrays doesn't impact
 the Clojure collections, it seems.


are you sure?  As far as I understood things (reading PersistentVector.java 
and [1]), there can only be 6 levels of nodes, because of the 32bit which 
are used to store an integer and because of the bit-shifting which is used 
to find the correct level. 
That would lead to a limit of 32^6 which is half of the 2^31:

user= (= (Math/pow 2 31) (* 2 (Math/pow 32 6)))
true

:-)

Regards,
Stefan

[1] 
http://blog.higher-order.net/2009/02/01/understanding-clojures-persistentvector-implementation/

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: puzzlement over lazy sequences

2011-09-13 Thread Ken Wesson
On Tue, Sep 13, 2011 at 4:18 AM, Stefan Kamphausen
ska2...@googlemail.com wrote:
 Hi,

 On Tuesday, September 13, 2011 6:28:01 AM UTC+2, Ken Wesson wrote:

 They're trees of arrays of 32 items, and the trees can in principle
 have arbitrary depth. So the 2^31 limit on Java arrays doesn't impact
 the Clojure collections, it seems.

 are you sure?  As far as I understood things (reading PersistentVector.java
 and [1]), there can only be 6 levels of nodes, because of the 32bit which
 are used to store an integer and because of the bit-shifting which is used
 to find the correct level.

I said the tree can *in principle* have arbitrary depth. Bit-shifting
can be done on BigIntegers, too.

-- 
Protege: What is this seething mass of parentheses?!
Master: Your father's Lisp REPL. This is the language of a true
hacker. Not as clumsy or random as C++; a language for a more
civilized age.

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: puzzlement over lazy sequences

2011-09-12 Thread Ken Wesson
On Mon, Sep 12, 2011 at 12:54 AM, Alan Malloy a...@malloys.org wrote:
 Integer overflow.

 user (mod 9876543210 (bigint (Math/pow 2 32)))
 1286608618

Oops.

But nth can probably be fixed while keeping good performance:

(defn- small-drop [s n]
  (loop [n (int n) s (seq s)]
(if (zero? n) s (recur (dec n) (next s)

(def pow231 (bigint (Math/pow 2 31)))

(defn my-nth [s n]
  (let [s (small-drop s (rem n pow231))]
(loop [n (quot n pow231) s s]
  (if s
(if (zero? n)
  (first s)
  (recur (dec n) (small-drop s pow231)))

Given a bigint n, this does only two arithmetic operations on bigints
at the start and one possibly-bigint dec every 2^31 items, instead of
doing a bigint dec every single item. So it should be nearly as fast
for large n as for small n, per item.

I'm guessing there are similar bugs in drop, take, and so forth with
large n and large (or infinite) seqs. They should all be fixed.

Note: the above code is untested, and (nth n too-short-a-seq) returns
nil. For further speed you'd want small-drop to use an unchecked dec.
You'd also want my-nth to punt to the special implementations for
vectors and other random-access seqables, and maybe change the
behavior on long seqs.

-- 
Protege: What is this seething mass of parentheses?!
Master: Your father's Lisp REPL. This is the language of a true
hacker. Not as clumsy or random as C++; a language for a more
civilized age.

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: puzzlement over lazy sequences

2011-09-12 Thread Ken Wesson
On Mon, Sep 12, 2011 at 11:55 AM, Stuart Halloway
stuart.hallo...@gmail.com wrote:
 I'm guessing there are similar bugs in drop, take, and so forth with
 large n and large (or infinite) seqs. They should all be fixed.

 The other fns are ok, thanks to their separate heritage. drop, take, et al
 are sequence functions, and proceed iteratively.
 nth is of a different lineage. It was designed to target collections that
 support constant time-lookup. Collection-y things in the Java world provide
 APIs that take int, not long, because that is what arrays do at the bottom.
 FWIW, Sun (now Oracle) considers this a low-priority
 problem: http://bugs.sun.com/bugdatabase/view_bug.do;jsessionid=5d84e2e6a65a5c03eacaafb73e91?bug_id=4963452.
 nth has a doc bug: it should be documented as a function of a collection and
 a 32-bit int.  A patch updating the docstring, and even enforcing it with an
 IllegalArgumentException, would be welcome. (George: want to write your
 first patch? :-) )
 Stu
 Stuart Halloway
 Clojure/core
 http://clojure.com

The Clojure collections aren't just flat arrays, though, are they?
They're trees of arrays of 32 items, and the trees can in principle
have arbitrary depth. So the 2^31 limit on Java arrays doesn't impact
the Clojure collections, it seems.

And nth should therefore work for larger n, on all of them.

But if, as you say, take, drop, etc. work for larger n, it should be
easy to make nth work with larger n and non-random-access seqs, just
by changing the non-random-access case to (first (drop n the-seq)).


-- 
Protege: What is this seething mass of parentheses?!
Master: Your father's Lisp REPL. This is the language of a true
hacker. Not as clumsy or random as C++; a language for a more
civilized age.

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: puzzlement over lazy sequences

2011-09-11 Thread Stuart Halloway

Hi George,

 Once again, I make a globally referenced, infinitely long stream.  But
 now I use lazy-seq
 instead of cycle:
 
user= (def ev-stream (lazy-seq (cons true (cons false ev-
 stream
#'user/ev-stream
user= (defn ev? [n] (nth ev-stream n))
#'user/ev?
user= (time (ev? 9876543210))
Elapsed time: 47244.061 msecs
true
 
 OMG! Not only did it NOT hose the heap and crash, it actually ran much
 faster than the version
 with the unreferenced (cycle [true false]).

The consing version of ev-stream is self-referential, because you explicitly 
made it so by consing it back onto itself. So it only has two items in it, 
though it bounces back and forth between them forever. The cycling version is 
not self-referential.

 The only reason I can think of, for this to NOT exhaust memory, is
 that the lazy-seq macro knows
 when to construct a circular list. Is that what happens?  If so, why
 DOESN'T it happen with cycle,
 where it's obviously the behavior one would want?

cycle actually calls lazy-seq.  A quick way to check such things at the REPL is 
with source:

user= (source cycle)
(defn cycle
  Returns a lazy (infinite!) sequence of repetitions of the items in coll.
  {:added 1.0
   :static true}
  [coll] (lazy-seq 
  (when-let [s (seq coll)] 
  (concat s (cycle s)

snip

 Now I'll test it with 9876543210, a number which ev? was able to
 handle:
 
user= (time (mod3 9876543210))
Elapsed time: 37759.615 msecs
1
user= (mod 987654321 3)
0
 
 Whoa! The computation finished in reasonable time, but with the WRONG
 answer! How did that happen?
 Did I find a bug?

No, there is simply a typo in your input arg.

Stu


Stuart Halloway
Clojure/core
http://clojure.com

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: puzzlement over lazy sequences

2011-09-11 Thread Ken Wesson
On Sun, Sep 11, 2011 at 8:28 AM, Stuart Halloway
stuart.hallo...@gmail.com wrote:
 cycle actually calls lazy-seq.  A quick way to check such things at the REPL
 is with source:
 user= (source cycle)
 (defn cycle
   Returns a lazy (infinite!) sequence of repetitions of the items in coll.
   {:added 1.0
    :static true}
   [coll] (lazy-seq
           (when-let [s (seq coll)]
               (concat s (cycle s)
 snip

Is there any reason not to define it as

(defn cycle
   Returns a lazy (infinite!) sequence of repetitions of the items in coll.
   {:added 1.0
    :static true}
   [coll]
   (let [l (lazy-seq
 (when-let [s (seq coll)]
   (concat s l

This will have the effect that holding any element of the cycle holds
the head; however, even the original holds the head of s as long as
the cycle is referenced, so that at worst doubles memory use for
finite s, and for finite walks along the cycle with infinite s, as s
(or the first N elements of s) get held onto in both cases, but an
equal number of cycle elements get held onto with the self-referencing
cycle instead of potentially only one at a time.

 Now I'll test it with 9876543210, a number which ev? was able to
 handle:

    user= (time (mod3 9876543210))
    Elapsed time: 37759.615 msecs
    1
    user= (mod 987654321 3)
    0

 Whoa! The computation finished in reasonable time, but with the WRONG
 answer! How did that happen?
 Did I find a bug?

 No, there is simply a typo in your input arg.

True, but both numbers are congruent to 0 mod 3, so where did the 1
come from in (time (mod3 9876543210))?

-- 
Protege: What is this seething mass of parentheses?!
Master: Your father's Lisp REPL. This is the language of a true
hacker. Not as clumsy or random as C++; a language for a more
civilized age.

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: puzzlement over lazy sequences

2011-09-11 Thread George Kangas
Hi, Stu,

  Loving your book!

  I posted a reply earlier, through a different interface, which went
to moderators.  Sorry for the clumsiness, but I'm not familiar with
the mechanics of newsgroups.

On Sep 11, 7:28 am, Stuart Halloway stuart.hallo...@gmail.com wrote:

 The consing version of ev-stream is self-referential, because you explicitly 
 made it so by consing it back onto itself. So it only has two items in it, 
 though it bounces back and forth between them forever. The cycling version is 
 not self-referential.

Since it's the self reference that gives the nice results (finite
memory consumption, and apparently better speed), I came up with a
little macro to provide it:

(defmacro defcycle [name coll]
`(def ~name (lazy-seq (concat ~coll ~name))) )

This is probably not the most useful way to do it, since the user has
to provide name.

  Now I'll test it with 9876543210, a number which ev? was able to
  handle:

     user= (time (mod3 9876543210))
     Elapsed time: 37759.615 msecs
     1
     user= (mod 987654321 3)
     0

  Whoa! The computation finished in reasonable time, but with the WRONG
  answer! How did that happen?
  Did I find a bug?

 No, there is simply a typo in your input arg.

with the typo fixed, i.e. (mod 9876543210 3), the result is still 0.

regards,

George

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: puzzlement over lazy sequences

2011-09-11 Thread Alan Malloy
Integer overflow.

user (mod 9876543210 (bigint (Math/pow 2 32)))
1286608618

On Sep 11, 9:44 pm, George Kangas gwkan...@gmail.com wrote:
 I believe the bug can be blamed on nth.

 Using nth, I make a function which should be identity on natural
 numbers:

 Clojure 1.2.1
 user= (defn ident [n] (nth (iterate inc 0) n))
 #'user/ident

 And it works, for reasonable size numbers:

 user= (ident 12345)
 12345
 user= (ident 7654321)
 7654321

 But, with a ridiculously huge number...

 user= (ident 9876543210)
 1286608618

 ...something goes wrong.

 Also, when I replace nth in my code (in the top post) with mth,
 then it works.  Here's mth:

 user= (defn mth [seq m] (if (zero? m) (first seq) (recur (rest seq)
 (dec m
 #'user/mth
 user= (mth (iterate inc 0) 123456)
 123456
 user= (mth (iterate inc 0) 9876543210)
 9876543210

 Of course, mth is godawful slow.

 I guess nobody wanted to find the 9876543210th element of a sequence,
 until this wanking dilettante came along!

 cheers,
 George

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


puzzlement over lazy sequences

2011-09-10 Thread George Kangas
Greetings, Clojurers!

I've been playing with clojure, particularly with lazy sequences.
Some of the
results have left me puzzled, so I saved a REPL session wherein I
illustrate the points
of puzzlement.  REPL lines are indented below; added comments are
unindented.


Clojure 1.2.1

I define a silly version of even? which uses a lazy sequence:

user= (defn ev? [n] (nth (cycle [true false]) n))
#'user/ev?

It works, albeit slowly, for large arguments:

user= (time (ev? 9876543210))
Elapsed time: 337225.287 msecs
true

This won't work, if any reference is retained to that long sequence:

user= (def ev-stream (cycle [true false]))
#'user/ev-stream
user= (defn ev? [n] (nth ev-stream n))
#'user/ev?
user= (time (ev? 9876543210))
Exception in thread main java.lang.OutOfMemoryError: Java heap
space
at java.lang.reflect.Method.copy(Method.java:143)
at java.lang.reflect.ReflectAccess.copyMethod(ReflectAccess.java:118)
at sun.reflect.ReflectionFactory.copyMethod(ReflectionFactory.java:
282)
at java.lang.Class.copyMethods(Class.java:2748)
at java.lang.Class.getMethods(Class.java:1410)
at clojure.lang.Reflector.getMethods(Reflector.java:310)
at clojure.lang.Reflector.invokeInstanceMethod(Reflector.java:27)
at clojure.main$repl_caught.invoke(main.clj:116)
at clojure.main$repl$fn__5637.invoke(main.clj:206)
at clojure.main$repl.doInvoke(main.clj:204)
at clojure.lang.RestFn.invoke(RestFn.java:421)
at clojure.main$repl_opt.invoke(main.clj:262)
at clojure.main$main.doInvoke(main.clj:355)
at clojure.lang.RestFn.invoke(RestFn.java:397)
at clojure.lang.Var.invoke(Var.java:361)
at clojure.lang.AFn.applyToHelper(AFn.java:159)
at clojure.lang.Var.applyTo(Var.java:482)
at clojure.main.main(main.java:37)

Process scheme exited abnormally with code 1

Okay, that was the expected java.lang.barf.  No puzzlement so far.
Now I restart the REPL.

Clojure 1.2.1

Once again, I make a globally referenced, infinitely long stream.  But
now I use lazy-seq
instead of cycle:

user= (def ev-stream (lazy-seq (cons true (cons false ev-
stream
#'user/ev-stream
user= (defn ev? [n] (nth ev-stream n))
#'user/ev?
user= (time (ev? 9876543210))
Elapsed time: 47244.061 msecs
true

OMG! Not only did it NOT hose the heap and crash, it actually ran much
faster than the version
with the unreferenced (cycle [true false]).

The only reason I can think of, for this to NOT exhaust memory, is
that the lazy-seq macro knows
when to construct a circular list. Is that what happens?  If so, why
DOESN'T it happen with cycle,
where it's obviously the behavior one would want?

Okay, now I'll play the same game again, but with mod 3:

user= (def mod3-stream (lazy-seq (cons 0 (cons 1 (cons 2 mod3-
stream)
#'user/mod3-stream
user= (defn mod3 [n] (nth mod3-stream n))
#'user/mod3

It passes a simple test:

user= (map mod3 (range 20))
(0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1)

It passes a test with a pretty big number:

user= (time (mod3 987654321))
Elapsed time: 28998.966 msecs
0
user= (mod 987654321 3)
0

Now I'll test it with 9876543210, a number which ev? was able to
handle:

user= (time (mod3 9876543210))
Elapsed time: 37759.615 msecs
1
user= (mod 987654321 3)
0

Whoa! The computation finished in reasonable time, but with the WRONG
answer! How did that happen?
Did I find a bug?

Thanks for reading this far, and best regards,

George Kangas

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en