Re: Cons.count overflows stack (with patch)
On Wed, Jan 7, 2009 at 2:41 AM, Christian Vest Hansen karmazi...@gmail.com wrote: On Wed, Jan 7, 2009 at 5:26 AM, Chouser chou...@gmail.com wrote: Since I couldn't find any other class that uses this kind of recursion for count(), it may be impossible to build a seq that would still cause Cons.count() to overflow the stack, but I'm not completely certain. We're all allowed to implement our own IPersistentCollections, but if they break 'count, then that would be a different bug, no? Good question. If your new collection (FooColl) implements seq() by returning a new kind of Seq (FooColl$Seq) which in turn implements FooColl$Seq.count() recursively as Cons did, it will break for long chains of FooColl$Seq. This already strikes me as very unlikely, because it would only be tempting to do in collections that accept an unmodified ISeq as part of their contents. But regardless, this would be entirely your own new code and there's nothing I can do about it. However, if FooColl$Seq avoids that error by using code like in this patch, then we would have a more subtle issue. A FooColl$Seq that includes a Cons which in turn was cons'ed onto a FooColl$Seq that includes a Cons ... and continues alternating like that for sufficient depth would defeat this patch's fix. Cons.count() would see that it's 'rest' is not a Cons, and would delegate to RT.count(). FooColl$Seq, doing the same, would create a mutual recursion situation that could overflow the stack. I think we're now firmly off in the weeds far enough to declare we can cross that bridge when we get to it, happy in the belief that we never will. But even so, FooColl$Seq could push things a little further by checking for Cons *or* FooColl$Seq and avoid recursing into either one. This would then postpone the problem until someone makes an alternating chain of FooColl$Seq and BarColl$Seq. I suppose the right way to handle this would be a signal Interface (or perhaps just a method?) declaring if a seq's count() is fast or not. Then Cons.count() could check this: if 'rest' is fast, defer to RT.count(), otherwise inc in it's own loop. This sounds to me like a lot of work and a big patch for a problem nobody's likely to have. --Chouser --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~--~~~~--~~--~--~---
Re: Cons.count overflows stack (with patch)
On Wed, Jan 7, 2009 at 2:10 PM, Chouser chou...@gmail.com wrote: On Wed, Jan 7, 2009 at 2:41 AM, Christian Vest Hansen karmazi...@gmail.com wrote: On Wed, Jan 7, 2009 at 5:26 AM, Chouser chou...@gmail.com wrote: Since I couldn't find any other class that uses this kind of recursion for count(), it may be impossible to build a seq that would still cause Cons.count() to overflow the stack, but I'm not completely certain. We're all allowed to implement our own IPersistentCollections, but if they break 'count, then that would be a different bug, no? Good question. If your new collection (FooColl) implements seq() by returning a new kind of Seq (FooColl$Seq) which in turn implements FooColl$Seq.count() recursively as Cons did, it will break for long chains of FooColl$Seq. This already strikes me as very unlikely, because it would only be tempting to do in collections that accept an unmodified ISeq as part of their contents. But regardless, this would be entirely your own new code and there's nothing I can do about it. However, if FooColl$Seq avoids that error by using code like in this patch, then we would have a more subtle issue. A FooColl$Seq that includes a Cons which in turn was cons'ed onto a FooColl$Seq that includes a Cons ... and continues alternating like that for sufficient depth would defeat this patch's fix. Cons.count() would see that it's 'rest' is not a Cons, and would delegate to RT.count(). FooColl$Seq, doing the same, would create a mutual recursion situation that could overflow the stack. I think we're now firmly off in the weeds far enough to declare we can cross that bridge when we get to it, happy in the belief that we never will. But even so, FooColl$Seq could push things a little further by checking for Cons *or* FooColl$Seq and avoid recursing into either one. This would then postpone the problem until someone makes an alternating chain of FooColl$Seq and BarColl$Seq. Or those people doing tricksy stuff like this could make a careful and considered decision to design their Seq types such that they are kinds of Cons, which, logically, I'd say they are. But considering your patch, my spidey-senses have also failed to unearth a way to use this approach to blow the stack. I suppose the right way to handle this would be a signal Interface (or perhaps just a method?) declaring if a seq's count() is fast or not. Then Cons.count() could check this: if 'rest' is fast, defer to RT.count(), otherwise inc in it's own loop. This sounds to me like a lot of work and a big patch for a problem nobody's likely to have. And when it does happen in a couple of years time, we can point to this discussion and ask the guy to kindly provide a patch :-P it reminds of a JRE bug that remain undescovered for something like five years, until someone had the audacity to try and Arrays.sort an array larger than half of Integer.MAX_VALUE :) --Chouser -- Venlig hilsen / Kind regards, Christian Vest Hansen. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~--~~~~--~~--~--~---
Cons.count overflows stack (with patch)
As discovered by cmvkk in IRC: user= (count (reduce #(cons %2 %) [0] (range 4601))) java.lang.StackOverflowError (NO_SOURCE_FILE:0) This is because Cons.count() is recursive via RT.count(). Presumably it's to allow efficient counting of heterogeneous seqs, like: user= (def big-vec (into [] (range 1000))) ; this is slow #'user/big-vec user= (count (cons :a big-vec)) ; but this is fast 1001 I've attached a patch that keeps this latter efficiency, but won't blow the stack for pure chains of Cons's. However, it has Cons.count() still defer to RT.count() for non-Cons rests. Since I couldn't find any other class that uses this kind of recursion for count(), it may be impossible to build a seq that would still cause Cons.count() to overflow the stack, but I'm not completely certain. An alternative, of course, would be to do a simple loop as ASeq.count() does, at the cost of the efficient vector counting demonstrated above. If you have comments or questions, don't hesitate or I'll move this to the issues page! --Chouser --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~--~~~~--~~--~--~--- commit 5573feb9cdfc16533e2b27fa27013cc0b68822ba Author: Chouser chou...@n01se.net Date: Tue Jan 6 23:08:26 2009 -0500 Avoid stack overflow on Cons.count() diff --git a/src/jvm/clojure/lang/Cons.java b/src/jvm/clojure/lang/Cons.java index c9e65eb..c7ef54f 100644 --- a/src/jvm/clojure/lang/Cons.java +++ b/src/jvm/clojure/lang/Cons.java @@ -38,7 +38,11 @@ public ISeq rest(){ } public int count(){ - return 1 + RT.count(_rest); + int i = 1; + for(ISeq s = rest(); s != null; s = s.rest(), i++) + if( ! (s instanceof Cons) ) + return i + RT.count(s); + return i; } public ISeq seq(){
Re: Cons.count overflows stack (with patch)
On Wed, Jan 7, 2009 at 5:26 AM, Chouser chou...@gmail.com wrote: As discovered by cmvkk in IRC: user= (count (reduce #(cons %2 %) [0] (range 4601))) java.lang.StackOverflowError (NO_SOURCE_FILE:0) This is because Cons.count() is recursive via RT.count(). Presumably it's to allow efficient counting of heterogeneous seqs, like: user= (def big-vec (into [] (range 1000))) ; this is slow #'user/big-vec user= (count (cons :a big-vec)) ; but this is fast 1001 I've attached a patch that keeps this latter efficiency, but won't blow the stack for pure chains of Cons's. However, it has Cons.count() still defer to RT.count() for non-Cons rests. Since I couldn't find any other class that uses this kind of recursion for count(), it may be impossible to build a seq that would still cause Cons.count() to overflow the stack, but I'm not completely certain. We're all allowed to implement our own IPersistentCollections, but if they break 'count, then that would be a different bug, no? An alternative, of course, would be to do a simple loop as ASeq.count() does, at the cost of the efficient vector counting demonstrated above. If you have comments or questions, don't hesitate or I'll move this to the issues page! --Chouser -- Venlig hilsen / Kind regards, Christian Vest Hansen. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~--~~~~--~~--~--~---