Re: Cons.count overflows stack (with patch)

2009-01-07 Thread Chouser

On Wed, Jan 7, 2009 at 2:41 AM, Christian Vest Hansen
karmazi...@gmail.com wrote:

 On Wed, Jan 7, 2009 at 5:26 AM, Chouser chou...@gmail.com wrote:
 Since I couldn't find any other class that uses this kind of
 recursion for count(), it may be impossible to build a seq that
 would still cause Cons.count() to overflow the stack, but I'm not
 completely certain.

 We're all allowed to implement our own IPersistentCollections, but if
 they break 'count, then that would be a different bug, no?

Good question.

If your new collection (FooColl) implements seq() by returning a new
kind of Seq (FooColl$Seq) which in turn implements FooColl$Seq.count()
recursively as Cons did, it will break for long chains of FooColl$Seq.
This already strikes me as very unlikely, because it would only be
tempting to do in collections that accept an unmodified ISeq as part
of their contents.  But regardless, this would be entirely your own
new code and there's nothing I can do about it.

However, if FooColl$Seq avoids that error by using code like in this
patch, then we would have a more subtle issue.  A FooColl$Seq that
includes a Cons which in turn was cons'ed onto a FooColl$Seq that
includes a Cons ... and continues alternating like that for sufficient
depth would defeat this patch's fix.  Cons.count() would see that it's
'rest' is not a Cons, and would delegate to RT.count().  FooColl$Seq,
doing the same, would create a mutual recursion situation that could
overflow the stack.

I think we're now firmly off in the weeds far enough to declare we
can cross that bridge when we get to it, happy in the belief that we
never will.  But even so, FooColl$Seq could push things a little
further by checking for Cons *or* FooColl$Seq and avoid recursing into
either one.  This would then postpone the problem until someone makes
an alternating chain of FooColl$Seq and BarColl$Seq.

I suppose the right way to handle this would be a signal Interface (or
perhaps just a method?) declaring if a seq's count() is fast or not.
Then Cons.count() could check this: if 'rest' is fast, defer to
RT.count(), otherwise inc in it's own loop.  This sounds to me like a
lot of work and a big patch for a problem nobody's likely to have.

--Chouser

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Clojure group.
To post to this group, send email to clojure@googlegroups.com
To unsubscribe from this group, send email to 
clojure+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/clojure?hl=en
-~--~~~~--~~--~--~---



Re: Cons.count overflows stack (with patch)

2009-01-07 Thread Christian Vest Hansen

On Wed, Jan 7, 2009 at 2:10 PM, Chouser chou...@gmail.com wrote:

 On Wed, Jan 7, 2009 at 2:41 AM, Christian Vest Hansen
 karmazi...@gmail.com wrote:

 On Wed, Jan 7, 2009 at 5:26 AM, Chouser chou...@gmail.com wrote:
 Since I couldn't find any other class that uses this kind of
 recursion for count(), it may be impossible to build a seq that
 would still cause Cons.count() to overflow the stack, but I'm not
 completely certain.

 We're all allowed to implement our own IPersistentCollections, but if
 they break 'count, then that would be a different bug, no?

 Good question.

 If your new collection (FooColl) implements seq() by returning a new
 kind of Seq (FooColl$Seq) which in turn implements FooColl$Seq.count()
 recursively as Cons did, it will break for long chains of FooColl$Seq.
 This already strikes me as very unlikely, because it would only be
 tempting to do in collections that accept an unmodified ISeq as part
 of their contents.  But regardless, this would be entirely your own
 new code and there's nothing I can do about it.

 However, if FooColl$Seq avoids that error by using code like in this
 patch, then we would have a more subtle issue.  A FooColl$Seq that
 includes a Cons which in turn was cons'ed onto a FooColl$Seq that
 includes a Cons ... and continues alternating like that for sufficient
 depth would defeat this patch's fix.  Cons.count() would see that it's
 'rest' is not a Cons, and would delegate to RT.count().  FooColl$Seq,
 doing the same, would create a mutual recursion situation that could
 overflow the stack.

 I think we're now firmly off in the weeds far enough to declare we
 can cross that bridge when we get to it, happy in the belief that we
 never will.  But even so, FooColl$Seq could push things a little
 further by checking for Cons *or* FooColl$Seq and avoid recursing into
 either one.  This would then postpone the problem until someone makes
 an alternating chain of FooColl$Seq and BarColl$Seq.

Or those people doing tricksy stuff like this could make a careful and
considered decision to design their Seq types such that they are kinds
of Cons, which, logically, I'd say they are.

But considering your patch, my spidey-senses have also failed to
unearth a way to use this approach to blow the stack.


 I suppose the right way to handle this would be a signal Interface (or
 perhaps just a method?) declaring if a seq's count() is fast or not.
 Then Cons.count() could check this: if 'rest' is fast, defer to
 RT.count(), otherwise inc in it's own loop.  This sounds to me like a
 lot of work and a big patch for a problem nobody's likely to have.

And when it does happen in a couple of years time, we can point to
this discussion and ask the guy to kindly provide a patch :-P

it reminds of a JRE bug that remain undescovered for something like
five years, until someone had the audacity to try and Arrays.sort an
array larger than half of Integer.MAX_VALUE :)


 --Chouser

 




-- 
Venlig hilsen / Kind regards,
Christian Vest Hansen.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Clojure group.
To post to this group, send email to clojure@googlegroups.com
To unsubscribe from this group, send email to 
clojure+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/clojure?hl=en
-~--~~~~--~~--~--~---



Cons.count overflows stack (with patch)

2009-01-06 Thread Chouser
As discovered by cmvkk in IRC:

user= (count (reduce #(cons %2 %) [0] (range 4601)))
java.lang.StackOverflowError (NO_SOURCE_FILE:0)

This is because Cons.count() is recursive via RT.count().  Presumably
it's to allow efficient counting of heterogeneous seqs, like:

 user= (def big-vec (into [] (range 1000))) ; this is slow
#'user/big-vec
user= (count (cons :a big-vec)) ; but this is fast
1001

I've attached a patch that keeps this latter efficiency, but won't
blow the stack for pure chains of Cons's.  However, it has
Cons.count() still defer to RT.count() for non-Cons rests.  Since I
couldn't find any other class that uses this kind of recursion for
count(), it may be impossible to build a seq that would still cause
Cons.count() to overflow the stack, but I'm not completely certain.

An alternative, of course, would be to do a simple loop as
ASeq.count() does, at the cost of the efficient vector counting
demonstrated above.

If you have comments or questions, don't hesitate or I'll move this to
the issues page!

--Chouser

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Clojure group.
To post to this group, send email to clojure@googlegroups.com
To unsubscribe from this group, send email to 
clojure+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/clojure?hl=en
-~--~~~~--~~--~--~---

commit 5573feb9cdfc16533e2b27fa27013cc0b68822ba
Author: Chouser chou...@n01se.net
Date:   Tue Jan 6 23:08:26 2009 -0500

Avoid stack overflow on Cons.count()

diff --git a/src/jvm/clojure/lang/Cons.java b/src/jvm/clojure/lang/Cons.java
index c9e65eb..c7ef54f 100644
--- a/src/jvm/clojure/lang/Cons.java
+++ b/src/jvm/clojure/lang/Cons.java
@@ -38,7 +38,11 @@ public ISeq rest(){
 }
 
 public int count(){
-	return 1 + RT.count(_rest);
+	int i = 1;
+	for(ISeq s = rest(); s != null; s = s.rest(), i++)
+		if( ! (s instanceof Cons) )
+			return i + RT.count(s);
+	return i;
 }
 
 public ISeq seq(){


Re: Cons.count overflows stack (with patch)

2009-01-06 Thread Christian Vest Hansen

On Wed, Jan 7, 2009 at 5:26 AM, Chouser chou...@gmail.com wrote:
 As discovered by cmvkk in IRC:

 user= (count (reduce #(cons %2 %) [0] (range 4601)))
 java.lang.StackOverflowError (NO_SOURCE_FILE:0)

 This is because Cons.count() is recursive via RT.count().  Presumably
 it's to allow efficient counting of heterogeneous seqs, like:

  user= (def big-vec (into [] (range 1000))) ; this is slow
 #'user/big-vec
 user= (count (cons :a big-vec)) ; but this is fast
 1001

 I've attached a patch that keeps this latter efficiency, but won't
 blow the stack for pure chains of Cons's.  However, it has
 Cons.count() still defer to RT.count() for non-Cons rests.  Since I
 couldn't find any other class that uses this kind of recursion for
 count(), it may be impossible to build a seq that would still cause
 Cons.count() to overflow the stack, but I'm not completely certain.

We're all allowed to implement our own IPersistentCollections, but if
they break 'count, then that would be a different bug, no?


 An alternative, of course, would be to do a simple loop as
 ASeq.count() does, at the cost of the efficient vector counting
 demonstrated above.

 If you have comments or questions, don't hesitate or I'll move this to
 the issues page!

 --Chouser

 




-- 
Venlig hilsen / Kind regards,
Christian Vest Hansen.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Clojure group.
To post to this group, send email to clojure@googlegroups.com
To unsubscribe from this group, send email to 
clojure+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/clojure?hl=en
-~--~~~~--~~--~--~---