[NOTE: was sent to Simon Marlow, Simon Peyton Jones, erroneously to
[EMAIL PROTECTED]
Great job! I had actually worked on this problem for some time but
did not have enough experience with the source code and backtracing
from assembler through Stg to find the exact problem. Would you
have time to answer a few questions I have?
(1) how do I obtain the latest 6.4.3 release? It is no longer on
CVS, the darcs branch, http://darcs.haskell.org/ghc.ghc-6.4, seems to
have been updated last January, and I had been working on the
ghc-6.4.2.tar.bz2 snapshot from http://www.haskell.org/ghc/dist/6.4.2/
ghc-6.4.2-src.tar.bz2. Is 6.4.3 a darcs tag?
(2) For working a debug build of ghc-6.4.2 I had to modify the file
ghc/compiler/nativeGen/RegisterAlloc.hs by adding a deriving
declaration:
ghc/compiler/nativeGen/RegisterAlloc.hs:158
> data FreeRegs = FreeRegs !Word32 !Word32
+ deriving (Show)
This fix was in the 6.6 branch. Is it also now in the 6.4.3 branch?
(3) I cheated and modified the ghc script that invokes the
executable ..lib/ghc-6.4.2/ghc-6.4.2 by inserting a gdb invocation
after the exec statement. (I was working on compiling Crypto with
the original Cabal setup but didn't want to resort to makefiles.):
# Mini-driver for GHC
exec gdb --args $GHCBIN $TOPDIROPT ${1+"$@"}
Is there a better way to go about this?
(4) Would you please elaborate on the problem and the fix? The
problems consistently showed up in ghc/rts/GC.c:threadSqueezeStack,
in the variable frame (note: comments *follow* code):
(gdb) disas threadSqueezeStack
...
0x00c24678 <threadSqueezeStack+16>: mr r24,r3
; tso,
tso
0x00c2467c <threadSqueezeStack+20>: addi r0,r3,56
; r0 =
tso->stack[0]
0x00c24680 <threadSqueezeStack+24>: lwz r2,44(r3)
;
<variable>.stack_size
0x00c24684 <threadSqueezeStack+28>: rlwinm r2,r2,2,0,29
;
stack_size
0x00c24688 <threadSqueezeStack+32>: add r25,r0,r2
; bottom,
tso->stack[0], stack_size
0x00c2468c <threadSqueezeStack+36>: lwz r31,52(r3)
;
<variable>.sp (tso->sp)
0x00c24690 <threadSqueezeStack+40>: cmplw cr7,r25,r31
;
assert(frame < stack)
0x00c24694 <threadSqueezeStack+44>: bgt+ cr7,0xc246a8
;
<threadSqueezeStack+64>
0x00c24698 <threadSqueezeStack+48>: lis r3,211
0x00c2469c <threadSqueezeStack+52>: addi r3,r3,304
0x00c246a0 <threadSqueezeStack+56>: li r4,4356
0x00c246a4 <threadSqueezeStack+60>: bl 0xcb9758
;
<_assertFail>
0x00c246a8 <threadSqueezeStack+64>: addi r29,r31,-8
; gap,
<variable>.sp,
0x00c246ac <threadSqueezeStack+68>: li r23,0
;
updatee,
0x00c246b0 <threadSqueezeStack+72>: li r27,0
;
prev_was_update_frame,
0x00c246b4 <threadSqueezeStack+76>: li r28,0
;
current_gap_size,
0x00c246b8 <threadSqueezeStack+80>: mr r11,r31
0x00c246bc <threadSqueezeStack+84>: lwz r2,0(r31)
;
<variable>.header.info, D.xxxx
0x00c246c0 <threadSqueezeStack+88>: addi r9,r2,-12
; info,
<variable>.header.info
0x00c246c4 <threadSqueezeStack+92>: lhz r2,8(r9)
; crash point: <variable>.i.type, r9=0xfffffffc, sometimes r9=0x70000100
; 8(r9) overflows to 0x00000004
; NOTE: 8 in 8(r9) derived from:
; sizeof((StgInt)srt_offset) + sizeof((StgClosureInfo)layout)
Sorry if my comments seem pedantic--I just started really learning
assembler in August and I partly used gcc -S -fverbose-asm to help
out. After running the build under Crypto many times there were a
few times when the assert (frame < stack) would fail so I was
following registers (r31, r2 and r9) in the functions used in other
threads, as well. (The problem was in r9.) After that it was a
matter of traceback.
(5) One other avenue I was exploring was the use of Zero Length
Arrays (ZLA's) and potential gcc bugs (a few of this sort have been
noticed in gcc-3.3 through 4.0). Why do you use ZLA's in the code?
The reasons not to are:
a. ZLA's are largely supported by GNU extensions.
As noted in the GCC manual, Section 5.12, at http://gcc.gnu.org/
onlinedocs/gcc/Zero-Length.html : "A structure containing a flexible
array member, or a union containing such a structure (possibly
recursively), may not be a member of a structure or an element of an
array. (However, these uses are permitted by GCC as extensions.)"
You are therefore forced to include structures containing ZLA's as
pointers-to-structures, for example:
# 280 "ghc/includes/InfoTables.h"
typedef struct _StgInfoTable {
StgClosureInfo layout;
StgHalfWord type;
StgHalfWord srt_bitmap;
StgCode code[];
} StgInfoTable;
# 44 "ghc/includes/Closures.h"
typedef struct {
const struct _StgInfoTable* info;
} StgHeader;
b. the C sizeof() operator does not correctly report the size of
structures containing ZLA's, so sizeof(StgInfoTable) reports 8, not
12, although the gcc compiler correctly produces the assembler for
manipulating such a structure:
0x00c246c0 <threadSqueezeStack+88>: addi r9,r2,-12
; (((StgRetInfoTable *)(((StgClosure *)frame)->header.info) - 1))
; the -12 is the size of StgRetInfoTable
So macros such as ghc/includes/TSO.h:TSO_STRUCT_SIZE can't simply be
defined as
#define TSO_STRUCT_SIZE sizeof(StgTSO)
and gdb has trouble accessing the members of the structure:
(gdb) p *tso
$30 = {
header = {
info = 0xaf5ff0
},
link = 0xded718,
mut_link = 0xded71c,
global_link = 0x2bce000,
what_next = 1,
why_blocked = 11,
block_info = {
closure = 0x0,
tso = 0x0,
fd = 0,
target = 0
},
blocked_exceptions = 0xded718,
id = 2,
saved_errno = 25,
main = 0x0,
trec = 0xded714,
stack_size = 242,
max_stack_size = 2080754,
sp = 0x2bddc78
}
// note the lack of member tso.stack
Finally, if there are alignment issues, wouldn't that be better
controlled explicitly through pragmas?
Please don't think I am being critical here: I just don't know enough
to understand your reasons.
-Pete
on 2006/10/16 06:50:02 PDT Simon Marlow wrote:
Modified files: (Branch: ghc-6-4-branch)
ghc/rts Capability.c
Log:
Fix crash in the threaded RTS caused by spurious wakeups of
pthread_cond_wait(). This is certainly affecting the threaded
RTS in
6.4.x on Solaris, and possibly other platforms too. I'm currently
testing to see whether there are any further problems on Solaris,
but
with luck this may be the final fix for the threaded RTS problems in
the 6.4.x branch.
Does not affect 6.6; the corresponding code in 6.6 is already
spurious-wakeup-safe.
Revision Changes Path
1.31.6.2 +32 -7 fptools/ghc/rts/Capability.c
_______________________________________________
Cvs-ghc mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/cvs-ghc