Re: [Chicken-users] Unbounded stack growth
At the risk of charging in like an entire stampede of cluelessness, Quoting Matthew Flatt mfl...@cs.utah.edu: At Thu, 12 Jul 2012 11:25:44 +0900, Alex Shinn wrote: I disagree - I think a stack grown too large is likely indicative of a programming error, or at the very least an inefficient algorithm. In the general case I want my programs to be able to allocate as much heap as possible, but have a separate limitation on the stack. Amen. Just because a computation is naturally expressed as a recursion does not mean that you should write it that way. Just because a computation can be expressed inefficiently, does not mean it should not be allowed to be expressed inefficiently. If things were otherwise, how many of your programs would've never compiled? I know a large number of mine never would've seen the light of day. To take a toy example, suppose you're summing numbers in from a binary tree. For this toy, suppose that a tree is either a number or a `cons' of two trees. Then, only a novice would write ; A tree-of-number is either ; - num ; - (cons tree-of-number tree-of-number) ; sum : tree-of-number - number (define (sum t) (cond [(number? t) t] [else (+ (sum (car t)) (sum (cdr t)))])) That `sum' will work fine on relatively balanced trees, but it will crash (as it should!) if the tree is too large and too list-like. Every real programmer knows that you should crate your own stack to sum the numbers. Why should it? Shouldn't it just run slowly, the sign of inefficient code? Since when is the sign of /inefficient/ code that it crashes? That's the sign of incorrect code, although incorrect code may certainly be inefficient. (I assume that we can all extrapolate from the toy to real programs. A compiler processes a tree of expressions, right? It may be given a too-deeply nested pile of function calls, and only an naive compiler writer would simply recur on sub-expressions to compile an expression. On second thought, maybe the compiler writer should just recur, and the right response for too-deeply nested code is a seg fault; that would serve the compiler user right for producing such a strange input.) I believe a naive compiler writer should be allowed to write a compiler, without being 'served right' by those who would take such a stance in this. The performance of his compiler will be his quite-sufficient punishment. And, finally: if this bug is fixed, but the option to disable the fix is introduced, badcheney is a fantastic (although not necessarily wise) option name ;) Cheers, -Cam ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Unbounded stack growth
Hallo, On Thu, Jul 12, 2012 at 4:25 AM, Alex Shinn alexsh...@gmail.com wrote: I disagree - I think a stack grown too large is likely indicative of a programming error, or at the very least an inefficient algorithm. In the general case I want my programs to be able to allocate as much heap as possible, but have a separate limitation on the stack. Programming errors or inefficient algorithms should crash C programs, not Scheme programs. -- -alex http://www.artisancoder.com/ ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Unbounded stack growth
On Thu, Jul 12, 2012 at 4:39 PM, Alex Queiroz asand...@gmail.com wrote: Hallo, On Thu, Jul 12, 2012 at 4:25 AM, Alex Shinn alexsh...@gmail.com wrote: I disagree - I think a stack grown too large is likely indicative of a programming error, or at the very least an inefficient algorithm. In the general case I want my programs to be able to allocate as much heap as possible, but have a separate limitation on the stack. Programming errors or inefficient algorithms should crash C programs, not Scheme programs. Wow, if you've got a magical Scheme compiler that can read my mind and fix all my bugs for me I'll switch right now! :) -- Alex ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Unbounded stack growth
On Thu, Jul 12, 2012 at 9:47 AM, Alex Shinn alexsh...@gmail.com wrote: Wow, if you've got a magical Scheme compiler that can read my mind and fix all my bugs for me I'll switch right now! :) Are you really saying that it is ok for a Scheme program to crash with a segmentation fault because of programming errors, and not just because of compiler bugs? -- -alex http://www.artisancoder.com/ ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Unbounded stack growth
On Thu, Jul 12, 2012 at 4:51 PM, Alex Queiroz asand...@gmail.com wrote: On Thu, Jul 12, 2012 at 9:47 AM, Alex Shinn alexsh...@gmail.com wrote: Wow, if you've got a magical Scheme compiler that can read my mind and fix all my bugs for me I'll switch right now! :) Are you really saying that it is ok for a Scheme program to crash with a segmentation fault because of programming errors, and not just because of compiler bugs? No, and I never said nor implied that. I think the continuum here is, all else being equal: raise continuable exception abort with meaningful message segfault though often all else is not equal. For the specific case of handling programs which use unbounded stack, most implementations just blow up, and the question is how heap do they allocate in the process. Are they optimistic and think it can't be much longer now as they allocate that last 100MB, or do they bail out a little earlier? Whether you set a fixed limit or just let it use up all available memory, there is still a limit. Setting a separate limit does leave you with some heap space to try to recover with, though, and is friendlier to other processes. But now we're just negotiating the price. -- Alex ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Unbounded stack growth
Marc Feeley scripsit: In this example, there will be an arbitrarily long sequence of C calls (in the unwinding of the recursion to even) with no Scheme call, so stack size checks will not be performed during the unwinding, yet the C stack grows during the unwinding. There is no stack overflow during the winding phase of the recursion because the stack size checks are performed at regular intervals (at each entry to even). While you're right, it's not clear that this matters enough to fix. It's not a *correctness* error, since every implementation will blow up on excessive recursion sooner or later when memory is exhausted. If the overflow check were done, the maximum recursion depth would be bounded by the C heap, not the C stack. However, inserting all those checks has a cost. So it would be a question of measuring the added cost of the checks over a large variety of programs. If it's consistently small, they should be added; if not, there should be an option to provide them or to turn them off. -- John Cowan co...@ccil.org http://www.ccil.org/~cowan Any day you get all five woodpeckers is a good day. --Elliotte Rusty Harold ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Unbounded stack growth
It seems that compiling with clang (llvm 3.0) prevents the crash, at least for values up to 20 million, on OS X and Linux. Any higher and I start to hit swap. I don't know why this works. Plain gcc on linux, and llvm-gcc (llvm 2.7) on OS X 10.7, do crash at about 600k here w/ stack ulimit 8M. Based on your results I am guessing your stack ulimit is 4M. Tried removing __attribute__((noreturn)) from chicken.h, as is done for clang and it made no difference, as you'd expect. Only other difference from gcc is the stack pointer retrieval code but that is identical between clang and llvm-gcc. So I have to assume LLVM 3.0 has something to do with the crash avoidance, which could very well be luck. Tested with Chicken 4.7.0.6. Jim On Jul 11, 2012, at 12:23 PM, Marc Feeley wrote: I have been experimenting with the Spock Scheme to JavaScript compiler. I encountered a bug in its implementation of the Cheney on the MTA approach to tail-calls and continuations. The bug also exists for Chicken. In the Cheney on the MTA approach there needs to be a check, at regular intervals, of the stack size, so that the stack can be GC'd using a throw/catch or longjmp/setjmp. Chicken (and Spock) perform this check at the entry of Scheme functions. For correctness, it must also be performed at the *return point* of every non-tail call. This is because the code has been CPS converted, so a Scheme return is actually a C (or JS) *call* to the continuation function, which grows the stack. Here's a simple example where this matters : ;; File: even.scm (define (even n) (if (= n 0) #t (if (even (- n 1)) #f #t))) (print (even (string-number (car (command-line-arguments) and a shell transcript of the behavior on a Mac OS X machine : % csc even.scm % ./even 11 #f % ./even 20 #t % ./even 30 Segmentation fault: 11 The segmentation fault is due to a C stack overflow. In this example, there will be an arbitrarily long sequence of C calls (in the unwinding of the recursion to even) with no Scheme call, so stack size checks will not be performed during the unwinding, yet the C stack grows during the unwinding. There is no stack overflow during the winding phase of the recursion because the stack size checks are performed at regular intervals (at each entry to even). I believe the fix should be simple (i.e. a stack size check should be added to return points). Marc ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Unbounded stack growth
On 2012-07-11, at 2:31 PM, John Cowan wrote: Marc Feeley scripsit: In this example, there will be an arbitrarily long sequence of C calls (in the unwinding of the recursion to even) with no Scheme call, so stack size checks will not be performed during the unwinding, yet the C stack grows during the unwinding. There is no stack overflow during the winding phase of the recursion because the stack size checks are performed at regular intervals (at each entry to even). While you're right, it's not clear that this matters enough to fix. It's not a *correctness* error, Saying that this is not a correctness error is lawyer-speak. since every implementation will blow up on excessive recursion sooner or later when memory is exhausted. Part of the problem is that it truly blows up. A high-level language should never give a segment violation. If there isn't enough memory, a software exception should be raised (possibly catchable), or at a bare minimum a readable error message (such as stack overflow). If the overflow check were done, the maximum recursion depth would be bounded by the C heap, not the C stack. On my 64 bit machine, the C heap is several orders of magnitude larger than the C stack. However, inserting all those checks has a cost. So it would be a question of measuring the added cost of the checks over a large variety of programs. If it's consistently small, they should be added; if not, there should be an option to provide them or to turn them off. Performance should not trump safety and correctness. Marc ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Unbounded stack growth
On 2012-07-11, at 3:44 PM, Jim Ursetto wrote: It seems that compiling with clang (llvm 3.0) prevents the crash, at least for values up to 20 million, on OS X and Linux. Any higher and I start to hit swap. I don't know why this works. Plain gcc on linux, and llvm-gcc (llvm 2.7) on OS X 10.7, do crash at about 600k here w/ stack ulimit 8M. Based on your results I am guessing your stack ulimit is 4M. Tried removing __attribute__((noreturn)) from chicken.h, as is done for clang and it made no difference, as you'd expect. Only other difference from gcc is the stack pointer retrieval code but that is identical between clang and llvm-gcc. So I have to assume LLVM 3.0 has something to do with the crash avoidance, which could very well be luck. Tested with Chicken 4.7.0.6. Jim I'm pretty sure LLVM is simply implementing the tail-call itself. So there is no stack growth in the unwinding phase. Unfortunately, one can't rely on the C compiler doing this in general. Marc ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Unbounded stack growth
On 2012-07-11, at 2:31 PM, John Cowan wrote: Marc Feeley scripsit: In this example, there will be an arbitrarily long sequence of C calls (in the unwinding of the recursion to even) with no Scheme call, so stack size checks will not be performed during the unwinding, yet the C stack grows during the unwinding. There is no stack overflow during the winding phase of the recursion because the stack size checks are performed at regular intervals (at each entry to even). While you're right, it's not clear that this matters enough to fix. It's not a *correctness* error, since every implementation will blow up on excessive recursion sooner or later when memory is exhausted. And I should have mentionned that in the case of Spock, which compiled to JavaScript, it crashes much sooner (with n 1 on V8). Marc ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Unbounded stack growth
On Jul 11, 2012, at 2:47 PM, Marc Feeley wrote: I'm pretty sure LLVM is simply implementing the tail-call itself. So there is no stack growth in the unwinding phase. That is possible. We do currently disable __attribute__((noreturn)) on functions across the board when using clang, and last I checked this caused it to generate a bunch of (unreachable) return code. But perhaps LLVM 3 is optimizing the tall-call anyway. Jim ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Unbounded stack growth
The issue could be splitted to two: 1. Whether the implementation checks stack usage more often 2. Whether the implementation terminates with more descriptive message than SEGV I think John argues on the first ground, however Marc's argument can cover both. These days PCs have lots of heap, and busting it with incorrect program can take long time, with some inconveniences. (When I see the problem the machine is thrashing crazily.) So addressing the option 2 itself makes sense, I guess. I don't know about Chicken internals enough to say handling SEGV in this situation is feasible or not, though. --shiro From: John Cowan co...@mercury.ccil.org Subject: Re: [Chicken-users] Unbounded stack growth Date: Wed, 11 Jul 2012 14:31:58 -0400 While you're right, it's not clear that this matters enough to fix. It's not a *correctness* error, since every implementation will blow up on excessive recursion sooner or later when memory is exhausted. If the overflow check were done, the maximum recursion depth would be bounded by the C heap, not the C stack. However, inserting all those checks has a cost. So it would be a question of measuring the added cost of the checks over a large variety of programs. If it's consistently small, they should be added; if not, there should be an option to provide them or to turn them off. ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Unbounded stack growth
From: Marc Feeley fee...@iro.umontreal.ca Subject: [Chicken-users] Unbounded stack growth Date: Wed, 11 Jul 2012 13:23:30 -0400 I have been experimenting with the Spock Scheme to JavaScript compiler. I encountered a bug in its implementation of the Cheney on the MTA approach to tail-calls and continuations. The bug also exists for Chicken. In the Cheney on the MTA approach there needs to be a check, at regular intervals, of the stack size, so that the stack can be GC'd using a throw/catch or longjmp/setjmp. Chicken (and Spock) perform this check at the entry of Scheme functions. For correctness, it must also be performed at the *return point* of every non-tail call. This is because the code has been CPS converted, so a Scheme return is actually a C (or JS) *call* to the continuation function, which grows the stack. Yes, this is a known bug (see also https://bugs.call-cc.org/ticket/876). For performance reasons, the stack check is not done in continuation procedures. As John has pointed out, this usually only matters for code that would blow the stack anyway on normal stack-based implementations that don't evict stack-frames to the heap, or doing other trickery. Nevertheless it would be nice to have a heap-bounded virtual stack and handle every recursion without crashing. I think the best approach is to add a compiler option for this (not that we haven't got enough already), to (a) give the user the chance to switch to the safe behaviour and (b) measure the performance impact. cheers, felix ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Unbounded stack growth
Performance should not trump safety and correctness. Absolutely right, yet everybody has a different perception of what performance, safety and correctness means. Segfaulting on _stack-overflow_ is not something that I'd call incorrect or unsafe - I'd call it inconvenient and it may be the case that handling the overflow gracefully isn't such a big deal at all. On the other hand, an extremely deep recursion could in such a case (stack checks everywhere) bring the machine to a halt due to excessive thrashing. I don't know whether I'd perhaps prefer the segfault in such a situation... cheers, felix ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Unbounded stack growth
On 2012-07-11, at 5:59 PM, Felix wrote: Performance should not trump safety and correctness. Absolutely right, yet everybody has a different perception of what performance, safety and correctness means. Segfaulting on _stack-overflow_ is not something that I'd call incorrect or unsafe - I'd call it inconvenient and it may be the case that handling the overflow gracefully isn't such a big deal at all. On the other hand, an extremely deep recursion could in such a case (stack checks everywhere) bring the machine to a halt due to excessive thrashing. I don't know whether I'd perhaps prefer the segfault in such a situation... The point is that, in a high-level language, a segfault represents a loss of control of the language implementation. On unix systems, it is usually caused by a protected virtual memory page that has been touched, and a segfault signal is generated. Not very informative, but not a big safety issue. On other operating systems (e.g. in a Nintendo DS, a FPGA, or a toaster), some unrelated zone of memory (say the heap containing Scheme objects) has been corrupted silently and the program has started executing random stuff, possibly burning your toast and setting fire to the house. The programmer already has control (with the csc -heap-limit N option) over how much memory is available to the program to avoid thrashing caused by an infinite recursion, or an allocation loop gone wild. There should be no difference in heap and stack allocation. These are implementation terms. It is all just memory. After all, the Cheney on the MTA approach was designed to migrate the stack frames to the heap transparently, so the user should be isolated from the concern of where the continuation is stored. What is unfortunate with this bug is that it goes against the programmer's intuition. If you slightly modify the code to this : ;; File: even.scm (define (even i n) (if (= 0 (modulo i 10)) (print i)) (if (= i n) #t (not (even (+ i 1) n (print (even 0 (string-number (car (command-line-arguments) So that it is easy to see how deep in the recursion the program has gone, then you get this output : % ./even 90 0 10 20 30 40 50 60 70 80 90 Segmentation fault: 11 % ./even 30 0 10 20 30 Segmentation fault: 11 This shows that the stack overflow is occuring during the unwinding phase of the recursion. This is quite unintuitive for me. Try explaining to a beginner that the unwinding of the recursion is causing memory allocation! Moreover, if you slightly modify the code so that there is a call to the my-not function instead of the builtin not, then there are no problems (because the call to my-not at each step of the unwinding is going to gracefully handle the growing C stack and garbage collect it): ;; File: even.scm (slightly modified) (define (my-not x) (not x)) (define (even i n) (if (= 0 (modulo i 10)) (print i)) (if (= i n) #t (my-not (even (+ i 1) n (print (even 0 (string-number (car (command-line-arguments) % ./even 90 0 10 20 30 40 50 60 70 80 90 #t The decision for not testing the stack limit on returns is based on a performance concern. Adding this test at return points would allow the above code to work correctly, for very large values of n. By the way, I'm surprised that the very similar looking code for rev-iota : (define (rev-iota n) (if (= n 0) '() (cons n (rev-iota (- n 1) does not trigger the bug. Is cons treated differently from not? Marc ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Unbounded stack growth
On Thu, Jul 12, 2012 at 10:52 AM, Marc Feeley fee...@iro.umontreal.ca wrote: On 2012-07-11, at 5:59 PM, Felix wrote: Performance should not trump safety and correctness. Absolutely right, yet everybody has a different perception of what performance, safety and correctness means. Segfaulting on _stack-overflow_ is not something that I'd call incorrect or unsafe - I'd call it inconvenient and it may be the case that handling the overflow gracefully isn't such a big deal at all. On the other hand, an extremely deep recursion could in such a case (stack checks everywhere) bring the machine to a halt due to excessive thrashing. I don't know whether I'd perhaps prefer the segfault in such a situation... The point is that, in a high-level language, a segfault represents a loss of control of the language implementation. On unix systems, it is usually caused by a protected virtual memory page that has been touched, and a segfault signal is generated. Not very informative, but not a big safety issue. On other operating systems (e.g. in a Nintendo DS, a FPGA, or a toaster), some unrelated zone of memory (say the heap containing Scheme objects) has been corrupted silently and the program has started executing random stuff, possibly burning your toast and setting fire to the house. The programmer already has control (with the csc -heap-limit N option) over how much memory is available to the program to avoid thrashing caused by an infinite recursion, or an allocation loop gone wild. There should be no difference in heap and stack allocation. These are implementation terms. It is all just memory. After all, the Cheney on the MTA approach was designed to migrate the stack frames to the heap transparently, so the user should be isolated from the concern of where the continuation is stored. I disagree - I think a stack grown too large is likely indicative of a programming error, or at the very least an inefficient algorithm. In the general case I want my programs to be able to allocate as much heap as possible, but have a separate limitation on the stack. The default Chibi stack is only 1k. It can grow, but again to a limit of only 1M (by default). This coincidentally causes it to also return #t on the 200,000 input to even and raise an out of stack error on 300,000 (with a ridiculously long stack trace making me wonder if 1M is too large). -- Alex What is unfortunate with this bug is that it goes against the programmer's intuition. If you slightly modify the code to this : ;; File: even.scm (define (even i n) (if (= 0 (modulo i 10)) (print i)) (if (= i n) #t (not (even (+ i 1) n (print (even 0 (string-number (car (command-line-arguments) So that it is easy to see how deep in the recursion the program has gone, then you get this output : % ./even 90 0 10 20 30 40 50 60 70 80 90 Segmentation fault: 11 % ./even 30 0 10 20 30 Segmentation fault: 11 This shows that the stack overflow is occuring during the unwinding phase of the recursion. This is quite unintuitive for me. Try explaining to a beginner that the unwinding of the recursion is causing memory allocation! Moreover, if you slightly modify the code so that there is a call to the my-not function instead of the builtin not, then there are no problems (because the call to my-not at each step of the unwinding is going to gracefully handle the growing C stack and garbage collect it): ;; File: even.scm (slightly modified) (define (my-not x) (not x)) (define (even i n) (if (= 0 (modulo i 10)) (print i)) (if (= i n) #t (my-not (even (+ i 1) n (print (even 0 (string-number (car (command-line-arguments) % ./even 90 0 10 20 30 40 50 60 70 80 90 #t The decision for not testing the stack limit on returns is based on a performance concern. Adding this test at return points would allow the above code to work correctly, for very large values of n. By the way, I'm surprised that the very similar looking code for rev-iota : (define (rev-iota n) (if (= n 0) '() (cons n (rev-iota (- n 1) does not trigger the bug. Is cons treated differently from not? Marc ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users