Hi all,

As you're probably aware, we've been seeing a lot of seemingly random
breakage on the Salmonella boxes.  Take for example today's report:
http://salmonella-linux-x86-64.call-cc.org/master-debugbuild/gcc/linux/x86-64/2015/09/30/salmonella-report/

Several eggs fail on installation or test with this message:
[panic] out of memory - heap full while resizing - execution terminated

I've been doing some digging and I'm not 100% sure, but it looks like
these problems have been triggered (but not _caused_) by the argvector
re-use patch (d04f591d / a33fc6209).  It looks like in some cases,
argvectors are "allocated" on the temporary stack.  This stack is
supposed to be used for setting aside live objects before performing
a GC, so that they get copied to the new heap, and so the argvector
won't get clobbered by the longjmp.

Currently, the argvector is *retained* on the temporary stack when
calling the trampoline function, but the temporary stack pointer is
reset to the bottom of the stack.  That means, the memory is seen as
"free".  This is wrong: if something _else_ will be allocated on the
temporary stack (like for example the save() calls in
C_allocate_vector()), that means the argvector data gets clobbered.
This may or may not be a problem, depending on whether the argvector
will be re-used.  Note that the argvector re-use patch itself only
_increases_ the number of places where an argvector may be re-used;
in the original argvector patches there are also some situations in
which this may happen (for example in C_apply and C_apply_values).

The attached patch tries to ensure that argvectors are always
stack-allocated.  After a GC, the argvector is moved from the
temporary stack to the C stack.  After that, the temporary stack is
reset as usual.

I also noticed that the C_make_structure() function can be simplified
quite a bit by just using the save_and_reclaim() function and jumping
back to itself if memory is insufficient.  There's no real reason
to have a second make_structure_2() function around.  I think the same
may be possible for C_allocate_vector()/allocate_vector_2(), but the
function is a bit too tricky and large for me to grok and refactor.
Hopefully, removing the _2 function will result in a bit less stack
usage.  But if nothing else, it makes the code more readble IMHO.

I'd appreciate it if someone could take a look at this ASAP, so we can
apply it and check whether it fixes Salmonella.  For the remainder of
this week, I'll have plenty time to dig in further if this doesn't
solve it, but next week I'll get back to work and will have less time.

Cheers,
Peter
From 34dd47358ad1071028c925984a88fb448573a9c2 Mon Sep 17 00:00:00 2001
From: Peter Bex <pe...@more-magic.net>
Date: Thu, 1 Oct 2015 15:57:56 +0200
Subject: [PATCH 1/2] Avoid allocating argvectors on the temporary stack.

The temporary stack should be strictly reserved for setting aside live
data just before performing a GC & longjump (which resets the stack).

There are several problems with (ab)using the temporary stack in non-GC
situations, which surfaced only after the re-use of argvectors was made
more prevalent:

1) Anything allocated on the temp stack will be kept around during a GC,
which may cause objects to unnecessarily stick around for another round
of GC.
2) When the longjmp is performed, if the argvector is allocated in the
temporary stack and then the temporary_stack pointer is reset, we may
scribble over the argvector whenever something is put on the temp stack.
With argvector re-use, this will more commonly trigger errors.

Now the restart trampoline will move the saved data from the temporary
stack to the C stack.  This should probably slightly more cause garbage
collection events, but it will also make it easier to eventually make
a dynamically allocated temporary stack (see also #1098).
---
 runtime.c | 82 +++++++++++++++++++++++++++++++++++----------------------------
 1 file changed, 46 insertions(+), 36 deletions(-)

diff --git a/runtime.c b/runtime.c
index a2da0db..6e4cd1a 100644
--- a/runtime.c
+++ b/runtime.c
@@ -1499,8 +1499,10 @@ C_word CHICKEN_run(void *toplevel)
   serious_signal_occurred = 0;
 
   if(!return_to_host) {
-    C_word *p = C_temporary_stack;
+    int argcount = C_temporary_stack_bottom - C_temporary_stack;
+    C_word *p = C_alloc(argcount);
 
+    C_memcpy(p, C_temporary_stack, argcount * sizeof(C_word));
     C_temporary_stack = C_temporary_stack_bottom;
     ((C_proc)C_restart_trampoline)(C_restart_c, p);
   }
@@ -3105,11 +3107,13 @@ C_regparm C_word C_fcall C_mutate_scratch_slot(C_word *slot, C_word val)
 
 void C_save_and_reclaim(void *trampoline, int n, C_word *av)
 {
-  if(C_temporary_stack != av) { /* used in apply */
-    C_temporary_stack = C_temporary_stack_bottom - n;
-    C_memmove(C_temporary_stack, av, n * sizeof(C_word));
-  }
+  assert(av > C_temporary_stack_bottom || av < C_temporary_stack_limit);
+  assert(C_temporary_stack == C_temporary_stack_bottom);
 
+  C_temporary_stack = C_temporary_stack_bottom - n;
+  assert(C_temporary_stack >= C_temporary_stack_limit);
+
+  C_memmove(C_temporary_stack, av, n * sizeof(C_word));
   C_reclaim(trampoline, n);
 }
 
@@ -3238,6 +3242,7 @@ C_regparm void C_fcall C_reclaim(void *trampoline, C_word c)
   /* Clear the mutated slot stack: */
   mutation_stack_top = mutation_stack_bottom;
 
+  assert(C_temporary_stack >= C_temporary_stack_limit);
   /* Mark live values: */
   for(p = C_temporary_stack; p < C_temporary_stack_bottom; ++p)
     mark(p);
@@ -7136,41 +7141,40 @@ void C_ccall C_apply(C_word c, C_word *av)
     /* closure = av[ 0 ] */
     k = av[ 1 ],
     fn = av[ 2 ];
-  int i, n = c - 3;
-  int m = n - 1;    
-  C_word x, skip, *ptr;
+  int new_av_size, i, n = c - 3;
+  int non_list_args = n - 1;
+  C_word rest, *ptr, *new_av;
 
   if(c < 4) C_bad_min_argc(c, 4);
 
   if(C_immediatep(fn) || C_header_bits(fn) != C_CLOSURE_TYPE)
     barf(C_NOT_A_CLOSURE_ERROR, "apply", fn);
 
-  ptr = C_temporary_stack_limit;
-  *(ptr++) = fn;
-  *(ptr++) = k;
+  rest = av[ c - 1 ];
+  if(rest != C_SCHEME_END_OF_LIST && (C_immediatep(rest) || C_block_header(rest) != C_PAIR_TAG))
+    barf(C_BAD_ARGUMENT_TYPE_ERROR, "apply", rest);
 
-  if(n > 1) {
-    C_memmove(ptr, av + 3, m * sizeof(C_word));
-    ptr += m;
-  }
+  new_av_size = 2 + non_list_args + C_unfix(C_u_i_length(rest));
 
-  x = av[ c - 1 ];
+  if (!C_demand(new_av_size))
+    C_save_and_reclaim((void *)C_apply, c, av);
 
-  if(x != C_SCHEME_END_OF_LIST && (C_immediatep(x) || C_block_header(x) != C_PAIR_TAG))
-    barf(C_BAD_ARGUMENT_TYPE_ERROR, "apply", x);
+  new_av = ptr = C_alloc(new_av_size);
+  *(ptr++) = fn;
+  *(ptr++) = k;
 
-  for(skip = x; !C_immediatep(skip) && C_block_header(skip) == C_PAIR_TAG; skip = C_u_i_cdr(skip)) {
-    x = C_u_i_car(skip);
-    
-    if(ptr >= C_temporary_stack_bottom)
-      barf(C_TOO_MANY_PARAMETERS_ERROR, "apply");
+  if(non_list_args > 0) {
+    C_memcpy(ptr, av + 3, non_list_args * sizeof(C_word));
+    ptr += non_list_args;
+  }
 
-    *(ptr++) = x;
-    ++m;
+  while(!C_immediatep(rest) && C_block_header(rest) == C_PAIR_TAG) {
+    *(ptr++) = C_u_i_car(rest);
+    rest = C_u_i_cdr(rest);
   }
+  assert((ptr - new_av) == new_av_size);
 
-  C_temporary_stack = C_temporary_stack_bottom;
-  ((C_proc)(void *)C_block_item(fn, 0))(m + 2, C_temporary_stack_limit);
+  ((C_proc)(void *)C_block_item(fn, 0))(new_av_size, new_av);
 }
 
 
@@ -7287,22 +7291,27 @@ void C_ccall C_apply_values(C_word c, C_word *av)
 
   lst = av[ 2 ];
 
+  if(lst != C_SCHEME_END_OF_LIST && (C_immediatep(lst) || C_block_header(lst) != C_PAIR_TAG))
+    barf(C_BAD_ARGUMENT_TYPE_ERROR, "apply", lst);
+
   /* Check continuation wether it receives multiple values: */
   if(C_block_item(k, 0) == (C_word)values_continuation) {
-    C_word 
-      *av2,
-      *ptr = C_temporary_stack_limit;
+    C_word *av2, *ptr;
+
+    n = C_unfix(C_u_i_length(lst)) + 1;
+
+    if (!C_demand(n))
+      C_save_and_reclaim((void *)C_apply_values, c, av);
 
-    for(n = 0; !C_immediatep(lst) && C_block_header(lst) == C_PAIR_TAG; ++n) {
+    av2 = C_alloc(n + 1);
+    av2[ 0 ] = k;
+    ptr = av2 + 1;
+    while(!C_immediatep(lst) && C_block_header(lst) == C_PAIR_TAG) {
       *(ptr++) = C_u_i_car(lst);
       lst = C_u_i_cdr(lst);
     }
 
-    /* copy into new array */
-    av2 = C_alloc(n + 1);
-    av2[ 0 ] = k;
-    C_memcpy(av2 + 1, C_temporary_stack_limit, n * sizeof(C_word));
-    C_do_apply(n + 1, av2);
+    C_do_apply(n, av2);
   }
   
   if(C_immediatep(lst)) {
@@ -11352,6 +11361,7 @@ static void bignum_to_str_2(C_word c, C_word *av)
 void C_ccall C_make_structure(C_word c, C_word *av)
 {
   if(!C_demand(c - 1)) {
+    assert(C_temporary_stack == C_temporary_stack_bottom);
     C_temporary_stack = C_temporary_stack_bottom - (c - 1);
     C_memmove(C_temporary_stack, av + 1, (c - 1) * sizeof(C_word));
     C_reclaim((void *)make_structure_2, c - 1);
-- 
2.1.4

From b8028845e93800a288c1fa16c56a2b9361918a00 Mon Sep 17 00:00:00 2001
From: Peter Bex <pe...@more-magic.net>
Date: Thu, 1 Oct 2015 17:03:21 +0200
Subject: [PATCH 2/2] Simplify C_make_structure by using standard
 save_and_reclaim

This gets rid of the extra make_structure_2 function which doesn't
really add anything except make the code less clear & more confusing.

Conflicts:
	runtime.c
---
 runtime.c | 32 +++++++++++---------------------
 1 file changed, 11 insertions(+), 21 deletions(-)

diff --git a/runtime.c b/runtime.c
index 6e4cd1a..1b5451f 100644
--- a/runtime.c
+++ b/runtime.c
@@ -551,7 +551,6 @@ static C_cpsproc(call_cc_wrapper) C_noret;
 static C_cpsproc(call_cc_values_wrapper) C_noret;
 static C_cpsproc(gc_2) C_noret;
 static C_cpsproc(allocate_vector_2) C_noret;
-static C_cpsproc(make_structure_2) C_noret;
 static C_cpsproc(generic_trampoline) C_noret;
 static void handle_interrupt(void *trampoline) C_noret;
 static C_cpsproc(callback_return_continuation) C_noret;
@@ -11360,30 +11359,21 @@ static void bignum_to_str_2(C_word c, C_word *av)
 
 void C_ccall C_make_structure(C_word c, C_word *av)
 {
-  if(!C_demand(c - 1)) {
-    assert(C_temporary_stack == C_temporary_stack_bottom);
-    C_temporary_stack = C_temporary_stack_bottom - (c - 1);
-    C_memmove(C_temporary_stack, av + 1, (c - 1) * sizeof(C_word));
-    C_reclaim((void *)make_structure_2, c - 1);
-  }
-
-  make_structure_2(c - 1, av + 1);
-}
-
-
-void C_ccall make_structure_2(C_word c, C_word *av)
-{
   C_word
-    k = av[ 0 ],
-    type = av[ 1 ],
-    size = c - 2,
-    *a = C_alloc(C_SIZEOF_STRUCTURE(size+1)),
-    *s = a,
-    s0 = (C_word)s;
+    /* closure = av[ 0 ] */
+    k = av[ 1 ],
+    type = av[ 2 ],
+    size = c - 3,
+    *s, s0;
 
+  if(!C_demand(size + 2))
+    C_save_and_reclaim((void *)C_make_structure, c, av);
+
+  s = C_alloc(C_SIZEOF_STRUCTURE(size + 1)),
+  s0 = (C_word)s;
   *(s++) = C_STRUCTURE_TYPE | (size + 1);
   *(s++) = type;
-  av += 2;
+  av += 3;
 
   while(size--)
     *(s++) = *(av++);
-- 
2.1.4

From b916f53a04df5e9c3823487e06ab22ec57d05151 Mon Sep 17 00:00:00 2001
From: Peter Bex <pe...@more-magic.net>
Date: Thu, 1 Oct 2015 15:57:56 +0200
Subject: [PATCH 1/2] Avoid allocating argvectors on the temporary stack.

The temporary stack should be strictly reserved for setting aside live
data just before performing a GC & longjump (which resets the stack).

There are several problems with (ab)using the temporary stack in non-GC
situations, which surfaced only after the re-use of argvectors was made
more prevalent:

1) Anything allocated on the temp stack will be kept around during a GC,
which may cause objects to unnecessarily stick around for another round
of GC.
2) When the longjmp is performed, if the argvector is allocated in the
temporary stack and then the temporary_stack pointer is reset, we may
scribble over the argvector whenever something is put on the temp stack.
With argvector re-use, this will more commonly trigger errors.

Now the restart trampoline will move the saved data from the temporary
stack to the C stack.  This should probably slightly more cause garbage
collection events, but it will also make it easier to eventually make
a dynamically allocated temporary stack (see also #1098).
---
 runtime.c | 82 +++++++++++++++++++++++++++++++++++----------------------------
 1 file changed, 46 insertions(+), 36 deletions(-)

diff --git a/runtime.c b/runtime.c
index ac4a84a..105a7e7 100644
--- a/runtime.c
+++ b/runtime.c
@@ -1410,8 +1410,10 @@ C_word CHICKEN_run(void *toplevel)
   serious_signal_occurred = 0;
 
   if(!return_to_host) {
-    C_word *p = C_temporary_stack;
+    int argcount = C_temporary_stack_bottom - C_temporary_stack;
+    C_word *p = C_alloc(argcount);
 
+    C_memcpy(p, C_temporary_stack, argcount * sizeof(C_word));
     C_temporary_stack = C_temporary_stack_bottom;
     ((C_proc)C_restart_trampoline)(C_restart_c, p);
   }
@@ -2706,11 +2708,13 @@ C_mutate_slot(C_word *slot, C_word val)
 
 void C_save_and_reclaim(void *trampoline, int n, C_word *av)
 {
-  if(C_temporary_stack != av) { /* used in apply */
-    C_temporary_stack = C_temporary_stack_bottom - n;
-    C_memmove(C_temporary_stack, av, n * sizeof(C_word));
-  }
+  assert(av > C_temporary_stack_bottom || av < C_temporary_stack_limit);
+  assert(C_temporary_stack == C_temporary_stack_bottom);
 
+  C_temporary_stack = C_temporary_stack_bottom - n;
+  assert(C_temporary_stack >= C_temporary_stack_limit);
+
+  C_memmove(C_temporary_stack, av, n * sizeof(C_word));
   C_reclaim(trampoline, n);
 }
 
@@ -2839,6 +2843,7 @@ C_regparm void C_fcall C_reclaim(void *trampoline, C_word c)
   /* Clear the mutated slot stack: */
   mutation_stack_top = mutation_stack_bottom;
 
+  assert(C_temporary_stack >= C_temporary_stack_limit);
   /* Mark live values: */
   for(p = C_temporary_stack; p < C_temporary_stack_bottom; ++p)
     mark(p);
@@ -5958,41 +5963,40 @@ void C_ccall C_apply(C_word c, C_word *av)
     /* closure = av[ 0 ] */
     k = av[ 1 ],
     fn = av[ 2 ];
-  int i, n = c - 3;
-  int m = n - 1;    
-  C_word x, skip, *ptr;
+  int new_av_size, i, n = c - 3;
+  int non_list_args = n - 1;
+  C_word rest, *ptr, *new_av;
 
   if(c < 4) C_bad_min_argc(c, 4);
 
   if(C_immediatep(fn) || C_header_bits(fn) != C_CLOSURE_TYPE)
     barf(C_NOT_A_CLOSURE_ERROR, "apply", fn);
 
-  ptr = C_temporary_stack_limit;
-  *(ptr++) = fn;
-  *(ptr++) = k;
+  rest = av[ c - 1 ];
+  if(rest != C_SCHEME_END_OF_LIST && (C_immediatep(rest) || C_block_header(rest) != C_PAIR_TAG))
+    barf(C_BAD_ARGUMENT_TYPE_ERROR, "apply", rest);
 
-  if(n > 1) {
-    C_memmove(ptr, av + 3, m * sizeof(C_word));
-    ptr += m;
-  }
+  new_av_size = 2 + non_list_args + C_unfix(C_u_i_length(rest));
 
-  x = av[ c - 1 ];
+  if (!C_demand(new_av_size))
+    C_save_and_reclaim((void *)C_apply, c, av);
 
-  if(x != C_SCHEME_END_OF_LIST && (C_immediatep(x) || C_block_header(x) != C_PAIR_TAG))
-    barf(C_BAD_ARGUMENT_TYPE_ERROR, "apply", x);
+  new_av = ptr = C_alloc(new_av_size);
+  *(ptr++) = fn;
+  *(ptr++) = k;
 
-  for(skip = x; !C_immediatep(skip) && C_block_header(skip) == C_PAIR_TAG; skip = C_u_i_cdr(skip)) {
-    x = C_u_i_car(skip);
-    
-    if(ptr >= C_temporary_stack_bottom)
-      barf(C_TOO_MANY_PARAMETERS_ERROR, "apply");
+  if(non_list_args > 0) {
+    C_memcpy(ptr, av + 3, non_list_args * sizeof(C_word));
+    ptr += non_list_args;
+  }
 
-    *(ptr++) = x;
-    ++m;
+  while(!C_immediatep(rest) && C_block_header(rest) == C_PAIR_TAG) {
+    *(ptr++) = C_u_i_car(rest);
+    rest = C_u_i_cdr(rest);
   }
+  assert((ptr - new_av) == new_av_size);
 
-  C_temporary_stack = C_temporary_stack_bottom;
-  ((C_proc)(void *)C_block_item(fn, 0))(m + 2, C_temporary_stack_limit);
+  ((C_proc)(void *)C_block_item(fn, 0))(new_av_size, new_av);
 }
 
 
@@ -6109,22 +6113,27 @@ void C_ccall C_apply_values(C_word c, C_word *av)
 
   lst = av[ 2 ];
 
+  if(lst != C_SCHEME_END_OF_LIST && (C_immediatep(lst) || C_block_header(lst) != C_PAIR_TAG))
+    barf(C_BAD_ARGUMENT_TYPE_ERROR, "apply", lst);
+
   /* Check continuation wether it receives multiple values: */
   if(C_block_item(k, 0) == (C_word)values_continuation) {
-    C_word 
-      *av2,
-      *ptr = C_temporary_stack_limit;
+    C_word *av2, *ptr;
+
+    n = C_unfix(C_u_i_length(lst)) + 1;
+
+    if (!C_demand(n))
+      C_save_and_reclaim((void *)C_apply_values, c, av);
 
-    for(n = 0; !C_immediatep(lst) && C_block_header(lst) == C_PAIR_TAG; ++n) {
+    av2 = C_alloc(n + 1);
+    av2[ 0 ] = k;
+    ptr = av2 + 1;
+    while(!C_immediatep(lst) && C_block_header(lst) == C_PAIR_TAG) {
       *(ptr++) = C_u_i_car(lst);
       lst = C_u_i_cdr(lst);
     }
 
-    /* copy into new array */
-    av2 = C_alloc(n + 1);
-    av2[ 0 ] = k;
-    C_memcpy(av2 + 1, C_temporary_stack_limit, n * sizeof(C_word));
-    C_do_apply(n + 1, av2);
+    C_do_apply(n, av2);
   }
   
   if(C_immediatep(lst)) {
@@ -7804,6 +7813,7 @@ void C_ccall C_fixnum_to_string(C_word c, C_word *av)
 void C_ccall C_make_structure(C_word c, C_word *av)
 {
   if(!C_demand(c - 1)) {
+    assert(C_temporary_stack == C_temporary_stack_bottom);
     C_temporary_stack = C_temporary_stack_bottom - (c - 1);
     C_memmove(C_temporary_stack, av + 1, (c - 1) * sizeof(C_word));
     C_reclaim((void *)make_structure_2, c - 1);
-- 
2.1.4

From 57334c7dccd173af59c6616e17d1d0f8d995b9e1 Mon Sep 17 00:00:00 2001
From: Peter Bex <pe...@more-magic.net>
Date: Thu, 1 Oct 2015 16:18:16 +0200
Subject: [PATCH 2/2] Simplify C_make_structure by using standard
 save_and_reclaim

This gets rid of the extra make_structure_2 function which doesn't
really add anything except make the code less clear & more confusing.
---
 runtime.c | 32 +++++++++++---------------------
 1 file changed, 11 insertions(+), 21 deletions(-)

diff --git a/runtime.c b/runtime.c
index 105a7e7..947b74f 100644
--- a/runtime.c
+++ b/runtime.c
@@ -494,7 +494,6 @@ static C_cpsproc(call_cc_wrapper) C_noret;
 static C_cpsproc(call_cc_values_wrapper) C_noret;
 static C_cpsproc(gc_2) C_noret;
 static C_cpsproc(allocate_vector_2) C_noret;
-static C_cpsproc(make_structure_2) C_noret;
 static C_cpsproc(generic_trampoline) C_noret;
 static void handle_interrupt(void *trampoline) C_noret;
 static C_cpsproc(callback_return_continuation) C_noret;
@@ -7812,30 +7811,21 @@ void C_ccall C_fixnum_to_string(C_word c, C_word *av)
 
 void C_ccall C_make_structure(C_word c, C_word *av)
 {
-  if(!C_demand(c - 1)) {
-    assert(C_temporary_stack == C_temporary_stack_bottom);
-    C_temporary_stack = C_temporary_stack_bottom - (c - 1);
-    C_memmove(C_temporary_stack, av + 1, (c - 1) * sizeof(C_word));
-    C_reclaim((void *)make_structure_2, c - 1);
-  }
-
-  make_structure_2(c - 1, av + 1);
-}
-
-
-void C_ccall make_structure_2(C_word c, C_word *av)
-{
   C_word
-    k = av[ 0 ],
-    type = av[ 1 ],
-    size = c - 2,
-    *a = C_alloc(size + 2),
-    *s = a,
-    s0 = (C_word)s;
+    /* closure = av[ 0 ] */
+    k = av[ 1 ],
+    type = av[ 2 ],
+    size = c - 3,
+    *s, s0;
 
+  if(!C_demand(size + 2))
+    C_save_and_reclaim((void *)C_make_structure, c, av);
+
+  s = C_alloc(size + 2),
+  s0 = (C_word)s;
   *(s++) = C_STRUCTURE_TYPE | (size + 1);
   *(s++) = type;
-  av += 2;
+  av += 3;
 
   while(size--)
     *(s++) = *(av++);
-- 
2.1.4

Attachment: signature.asc
Description: Digital signature

_______________________________________________
Chicken-hackers mailing list
Chicken-hackers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-hackers

Reply via email to