On Sat, Oct 17, 2015 at 04:58:06PM +0200, Peter Bex wrote:
> Well, here's a bunch of patches to fix this issue (#1221) as well as a
> handful of memory-related issues.  I still haven't managed to pinpoint
> the crashes we're seeing on Salmonella, but this is a separate issue
> which seems to be fixed by one of the attached patches.

OK, I've been doing some more digging, and I think I managed to figure
out why the crashes are happening.  It's extremely obscure and happens
so rarely because it relies a *lot* on the particular call pattern, and
even on timing because it gets triggered in handle_interrupt, which is
time-driven.

The strange "state vector" which we sometimes(?) use for capturing a
thread's continuation will be abused as an argvector by passing a
pointer straight into the middle of it.  This means that due to the
argvector re-use, this state vector may inadvertently mutated during
the execution of the program, but only if the "right" (wrong?) functions
are called in sequence.

Please, apply the other patches (which this mail is a reply to) first,
as I think they're pretty important as well.  Then apply the attached
patch.  It applies to both master and chicken-5.

Cheers,
Peter
From f2b33d64a2fd98b66e3eecaf894de04130f74ccf Mon Sep 17 00:00:00 2001
From: Peter Bex <pe...@more-magic.net>
Date: Sat, 24 Oct 2015 21:17:07 +0200
Subject: [PATCH 1/2] Copy thread "state vector" to a fresh argvector before
 invoking the trampoline function.

Upon resumption of a thread from a "state vector" (rather than a
closure/continuation), the trampoline function was passed a pointer
right into the state vector, which doubled as an argument vector.

Now that argvectors may be re-used, their slots can be trashed any
time in the call history, depending on the exact situation.  These
argvector slots will eventually contain temporary data which becomes
invalid over time.  When the original state vector is then copied
during garbage collection, the mutated slot is also copied.  When the
Cheney algorithm traverses over this slot, it will break down because
the data it points to is now invalid.

This hopefully fixes the "random crashes" we've been seeing on
Salmonella the last few weeks.
---
 runtime.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/runtime.c b/runtime.c
index 43edfe9..2d477c0 100644
--- a/runtime.c
+++ b/runtime.c
@@ -7979,10 +7979,17 @@ void C_ccall C_context_switch(C_word c, C_word *av)
     /* closure = av[ 0 ] */
     state = av[ 2 ],
     n = C_header_size(state) - 1,
-    adrs = C_block_item(state, 0);
+    adrs = C_block_item(state, 0),
+    *av2;
   C_proc tp = (C_proc)C_block_item(adrs,0);
 
-  tp(n, (C_word *)state + 2);
+  /* Copy argvector because it may be mutated in-place.  The state
+   * vector should not be re-invoked(?), but it can be kept alive
+   * during GC, so the mutated argvector/state slots may turn stale.
+   */
+  av2 = C_alloc(n);
+  C_memcpy(av2, (C_word *)state + 2, n * sizeof(C_word));
+  tp(n, av2);
 }
 
 
-- 
2.1.4

Attachment: signature.asc
Description: Digital signature

_______________________________________________
Chicken-hackers mailing list
Chicken-hackers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-hackers

Reply via email to