Stefan O'Rear wrote:
On Thu, May 03, 2007 at 05:36:45PM -0700, Brandon Michael Moore wrote:
On Thu, May 03, 2007 at 04:59:58PM -0700, John Meacham wrote:
I believe it is because a stack cannot be garbage collected, and must be
traversed as roots for every garbage collection. I don't think there are
any issues with a huge stack per se, but it does not play nice with
garbage collection so may hurt your performance and memory usage in
unforeseen ways.
Isn't it just the top of the stack that has to be treadted as a root?
(maybe you need to walk the stack to find exception handlers and so on.)
Maybe it shouldn't be so much worse than a heap. The Chicken Scheme
system allocates everything on the C stack, and runs some sort of
compacting collector when it is about to fill.

GHC uses a simple exponential-backoff algorithm for handling stacks.
When the stack pointer passes the stack limit, the thread yields to
the scheduler, where the stack size is doubled, and the old stack is
moved.  Perhaps instead we could modify the algorithm such that up to
16K stack size the behaivor is the same, but use linked lists for
larger?
1. Allocate a new chunk, of size 16KB.

2. Copy the topmost 1KB of stack to the new block.  This decreases
   storage efficiency slightly (but not much in time - memcpy is
   several times faster than anything the current ghc code generator
   can generate), but it avoids a nasty corner case (stack size
   fluctuating across 0 mod 16K) by acting as a form of hysteresis.

3. Create a special frame at the bottom of the new stack chunk that
   returns into a stack underflow handler, thus avoiding the need for
yet another conditional.

GHC in the distant past (pre-4.0) had linked stack chunks, but we discarded the idea when the RTS was rewritten, mainly for simplicity. It was before my time, so I don't know all the war stories, but I think it turned out to be hairy enough to avoid it in the redesign. At least it would require separate TSO and STACK objects, the underflow frame, code to avoid performance degradation at the boundary (e.g. as you say, copy the top 1k of stack to the new chunk), stack walking gets more complicated (several places in the RTS do this).

My impression is that the performance of stack growth isn't usually a limiting factor, so I'm not sure it's worth adding this complexity. Benchmarks can be convincing, though...

Cheers,
        Simon


_______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Reply via email to