Hi,
At the moment, under Win32, virtually all of the Perl 6 test suite fails
unless run with -G (disabling garbage collection). The problem for at
least one of them is that the free list appears to be corrupted.
First, a few notes on the free list. Parrot allocates large chunks of
memory, called pools, and then allocates objects out of these pools
itself. All garbage collectible objects start out with the same two things:
typedef struct pobj_t {
UnionVal u;
Parrot_UInt flags;
} pobj_t;
The UnionVal contains a range of things, but is at least the size of two
integers or two pointers. Normally this is used to store some of the
data for the object itself. However, after the object is freed, the
first pointer-sized chunk of the UnionVal is used for another thing: to
store a linked list of free objects. When we want a new object, unless
the free list is empty (and thus NULL), we take the object on the front
and set the free list to whatever that referred to as the next thing on
the free list, like this:
if (!pool->free_list)
(*pool->more_objects)(interp, pool);
ptr = pool->free_list;
pool->free_list = *(void **)ptr;
The segfault was occuring on the third line of this, namely because ptr
was coming back as 0xFFFFFFFF. After some messing around, I realized
that if I changed the code to read:
if (!pool->free_list)
(*pool->more_objects)(interp, pool);
ptr = pool->free_list;
if (*(void **)ptr == 0xFFFFFFFF) {
PMC *check = (PMC*)ptr;
return NULL;
}
pool->free_list = *(void **)ptr;
And set a breakpoint on the "return NULL;", then I'd get some better
idea of what ptr actually was. Turns out it is a PMC - a Key PMC in
fact. And if you look in the Key PMC, you see a comment like:
PMC_int_val(-1) means end of iteration.
PMC_int_val is the same memory location as the free list pointer would
be, -1 is 0xFFFFFFFF and...well, you can see where this is going. So
somehow this Key PMC is not getting marked live, when it is still being
used, right?
Well, maybe. Next I looked at the flags of this PMC.
00000100 00010000 00000110 00000011
We'd expect that:
b_PObj_on_free_list_FLAG = 1 << 19,
Would be set, but it ain't. So the PMC is on the free list, but hasn't
got the "I'm on the free list" flag. Thus it was, in theory, never
actually placed onto the free list. Changing the test condition from
earlier to:
if (!PObj_on_free_list_TEST((PObj*)ptr)) {
PMC *check = (PMC*)ptr;
return NULL;
}
And setting the breakpoint gave check as the Key PMC, just like before.
From which I infer that perhaps it's not a Key PMC that is not being
marked, but something else that keeps a Key PMC referenced from the
first pointer in the UnionVal Perhaps an iterator; from Iterator.pmc's
mark routine:
/* the KEY */
if (PMC_struct_val(SELF))
pobject_lives(INTERP, (PObj *) PMC_struct_val(SELF));
Or perhaps not, that's all I have time for today. But this post is for
those of you who wonder how one goes about tracing GC problems - I have
been asked before and just wanted to make things a little less
mysterious. Hope this helps. Or that it spurs someone else to continue
the hunt for this bug, which would be rather nice to nail.
Happy hacking,
Jonathan