Re: Odd Memory Corruption
On Thursday 05 July 2007 19:58:50 Patrick R. Michaud wrote: I also get segfaults after applying this patch. However, if I change the patch such that the size_t sentinel; line goes at the end of the struct PMC instead of the beginning, then everything appears to compile and run. In fact, if size_t sentinel; is placed as the second or third line in the struct (i.e., after pobj_t obj; or VTABLE *vtable;) then things still appear to compile and run. It's only when the sentinel is given as the first line that I get errors. So, I'm guessing there's some code somewhere that assumes that obj is always the first component of PMC. Right, and the clue was also in parrot/pobj.h. Because buffers, strings, and PMCs all are PObjs, their structs must be isomorphic. Hoisting the sentinel into the pobj_t struct gets past that segfault. Structural subtyping hates me. -- c
Re: Odd Memory Corruption
chromatic wrote: In theory, this patch should apply and run cleanly. It doesn't. Thus, something somewhere pokes into memory it shouldn't. Any ideas? Alternately, any comments on this analysis? I think adding memory checks is a brilliant idea, especially because memory is sometimes recycled with GC, thus malloc/free checks being less useful. The bug where a PMC was still used while already being put on the free list, and aggravated by the fact that the free list pointer is put at the same address as the data comes into mind. I really like what Microsoft has done with their memory guards. The Structure of a Page Heap Block http://msdn2.microsoft.com/en-us/library/ms220938(vs.80).aspx It would be great if someone could have a look at patches 43432 and 43529. They might be memory relevant too. Ron
Odd Memory Corruption
In theory, this patch should apply and run cleanly. It doesn't. Thus, something somewhere pokes into memory it shouldn't. Any ideas? Alternately, any comments on this analysis? -- c === include/parrot/pobj.h == --- include/parrot/pobj.h (revision 4520) +++ include/parrot/pobj.h (local) @@ -139,6 +139,7 @@ #define PMC_DATA_IN_EXT 1 struct PMC { +size_t sentinel; pobj_t obj; VTABLE *vtable; PMC *real_self; @@ -183,7 +184,7 @@ #ifdef NDEBUG # define PMC_ext_checked(pmc) (pmc)-pmc_ext #else -# define PMC_ext_checked(pmc) (assert((pmc)-pmc_ext), (pmc)-pmc_ext) +# define PMC_ext_checked(pmc) (assert((pmc)-pmc_ext (pmc)-sentinel == 0xbeefbeef), (pmc)-pmc_ext) #endif /* NDEBUG */ #if PMC_DATA_IN_EXT # define PMC_data(pmc) PMC_ext_checked(pmc)-data === src/headers.c == --- src/headers.c (revision 4520) +++ src/headers.c (local) @@ -249,6 +249,7 @@ ? interp-arena_base-constant_pmc_pool : interp-arena_base-pmc_pool; PMC * const pmc = (PMC *)pool-get_free_object(interp, pool); +pmc-sentinel = 0xbeefbeef; /* clear flags, set is_PMC_FLAG */ if (flags PObj_is_PMC_EXT_FLAG) { === src/pmc/hash.pmc == --- src/pmc/hash.pmc (revision 4520) +++ src/pmc/hash.pmc (local) @@ -526,6 +531,9 @@ */ PMC* get_pmc_keyed_str(STRING *key) { +if (!PMC_struct_val(SELF)) +return PMCNULL; + HashBucket * const b = parrot_hash_get_bucket(INTERP, (Hash*) PMC_struct_val(SELF), key); === src/pmc.c == --- src/pmc.c (revision 4520) +++ src/pmc.c (local) @@ -199,6 +199,7 @@ pmc-real_self = pmc; VTABLE_set_pointer(interp, pmc, pmc); } +pmc-sentinel = 0xdeadbeef; return pmc; } if (vtable-flags VTABLE_IS_CONST_PMC_FLAG) { @@ -245,6 +246,7 @@ fprintf(stderr, \t= new %p type %d\n, pmc, (int)base_type); } #endif +pmc-sentinel = 0xbeefbeef; return pmc; }
Re: Odd Memory Corruption
On Thu, Jul 05, 2007 at 06:30:44PM -0700, chromatic wrote: In theory, this patch should apply and run cleanly. It doesn't. Thus, something somewhere pokes into memory it shouldn't. Any ideas? Alternately, any comments on this analysis? I also get segfaults after applying this patch. However, if I change the patch such that the size_t sentinel; line goes at the end of the struct PMC instead of the beginning, then everything appears to compile and run. In fact, if size_t sentinel; is placed as the second or third line in the struct (i.e., after pobj_t obj; or VTABLE *vtable;) then things still appear to compile and run. It's only when the sentinel is given as the first line that I get errors. So, I'm guessing there's some code somewhere that assumes that obj is always the first component of PMC. Pm
Re: [perl #31493] Overlapping memory corruption
Steve Fink wrote: Hey, your reason is much better than my reason. Still, why do the _noinit stuff and duplicate the creation code? Why not just call pmc_new as in my replacement code? Cpmc_new would create a Hash already. But the clone has to create one of the source type, which might not be quite the same. leo
[perl #31493] Overlapping memory corruption
# New Ticket Created by Steve Fink # Please include the string: [perl #31493] # in the subject line of all future correspondence about this issue. # URL: http://rt.perl.org:80/rt3/Ticket/Display.html?id=31493 --- osname= linux osvers= 2.4.21-1.1931.2.382.entsmp arch= i386-linux-thread-multi cc= gcc 3.2.2 20030222 (Red Hat Linux 3.2.2-5) --- Flags: category=core severity=high ack=no --- I have a dynamic PMC named 'Match' that extends the PerlHash PMC. I have occasionally been seeing destroy() not implemented in class 'Match' on program shutdown. Match does not define destroy(), nor does it set the active destroy flag. Looking deeper, I find that it's trying to dispose of a PMC with its -obj.flags member completely trashed, and in particular it has an erroneous bit set for the active destroy flag. I won't go through all the details of what I looked at (though I'll post them in my blog eventually), but what's happening is that this line (from perlhash.pmc's clone() implementation) is corrupting the flags field: ((Hash*)PMC_struct_val(dest))-container = dest; The problem is that the dest PMC contains a Hash structure in its struct_val field, but the address of the Hash is only 0x18 bytes before the address of the PMC itself, and Hashes are 0x40 bytes each. The offset of the container field within a Hash is 0x20, and the offset of the flags field within a PMC is 0x04. So assigning to that container field overwrites the flags field, since 0x18 + 0x04 == 0x20. It seems like an initialization ordering problem, especially since I can make the problem go away by changing the clone() method from PMC* clone () { PMC* dest = pmc_new_noinit(INTERP, SELF-vtable-base_type); PObj_custom_mark_SET(dest); ((Hash*)PMC_struct_val(dest))-container = dest; hash_clone(INTERP, (Hash *)PMC_struct_val(SELF), (Hash**)PMC_struct_val(dest)); return dest; } to PMC* clone () { PMC* dest = pmc_new(INTERP, SELF-vtable-base_type); hash_clone(INTERP, (Hash *)PMC_struct_val(SELF), (Hash**)PMC_struct_val(dest)); return dest; } But I'm not sure why, and I'm not sure if this is a bug in perlhash's clone(), or some weird side effect of using dynamic PMCs. (For now, I'm just reimplementing clone() in my subclass.) --- Summary of my parrot 0.1.0 configuration: configdate='Wed Sep 8 23:20:30 2004' Platform: osname=linux, archname=i386-linux-thread-multi jitcapable=1, jitarchname=i386-linux, jitosname=LINUX, jitcpuarch=i386 execcapable=1 perl=/usr/bin/perl Compiler: cc='ccache gcc', ccflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -O1', Linker and Libraries: ld='gcc', ldflags=' -L/usr/local/lib', cc_ldflags='', libs='-lnsl -ldl -lm -lpthread -lcrypt -lutil -lrt -lgmp' Dynamic Linking: so='.so', ld_shared='-shared -L/usr/local/lib -fPIC', ld_shared_flags='' Types: iv=long, intvalsize=4, intsize=4, opcode_t=long, opcode_t_size=4, ptrsize=4, ptr_alignment=1 byteorder=1234, nv=double, numvalsize=8, doublesize=8 --- Environment: HOMELANGLANGUAGELC_ALLLD_LIBRARY_PATHLOGDIRPATHSHELL
Re: [perl #31493] Overlapping memory corruption
Steve Fink (via RT) wrote: I won't go through all the details of what I looked at (though I'll post them in my blog eventually), but what's happening is that this line (from perlhash.pmc's clone() implementation) is corrupting the flags field: ((Hash*)PMC_struct_val(dest))-container = dest; Ah, yep. PMC_struct_val(dest) doesn't hold the hash yet, it is created in hash_clone() only after this line. The problem is that the dest PMC contains a Hash structure in its struct_val field No. That's the pointer of the free_list, pointing to the previous PMC in that size class. Putting above line after the hash_clone() fixes that bug. Thanks for reporting, leo
Re: [perl #31493] Overlapping memory corruption
On Sep-09, Leopold Toetsch wrote: Steve Fink (via RT) wrote: I won't go through all the details of what I looked at (though I'll post them in my blog eventually), but what's happening is that this line (from perlhash.pmc's clone() implementation) is corrupting the flags field: ((Hash*)PMC_struct_val(dest))-container = dest; Ah, yep. PMC_struct_val(dest) doesn't hold the hash yet, it is created in hash_clone() only after this line. The problem is that the dest PMC contains a Hash structure in its struct_val field No. That's the pointer of the free_list, pointing to the previous PMC in that size class. Putting above line after the hash_clone() fixes that bug. Hey, your reason is much better than my reason. Still, why do the _noinit stuff and duplicate the creation code? Why not just call pmc_new as in my replacement code?
Re: Memory corruption
Leopold Toetsch [EMAIL PROTECTED] wrote: Steve Fink [EMAIL PROTECTED] wrote: It crashes on a memcpy inside compact_pool Some remarks what I could find out: - current COW copying of stacks is AFAIK borken - both copies of one COWed stack share the same Buffer header I have now more signs for that: - First: I have a smaller program, that shows a simimlar corruption bug (a Random PMC turns suddenly into a RetContinuation PMC - while its called Random it shouldn't expose randomness on itself :) - Turning GC off stops that bug - as well as in your program - Using the LEA/system allocator (Configure.pl --gc=libc) makes the bug vanish the GC/compact_pool in resources.c does correctly deal with such Buffers. COWed strings have distinct (String)Buffer headers. I'm quite sure now, that the bug is caused by the new COWed register chunks, which share *one* Buffer header. leo
Re: Memory corruption
Steve Fink [EMAIL PROTECTED] wrote: Here's another obnoxious test case. [ ... ] So, in case anyone is curious (hi leo!), attached is a 56KB (9KB gzipped) imc file. It crashes on a memcpy inside compact_pool Some remarks what I could find out: - the damaged Buffer is in a pool of object_size 60 - Array/IntList (i.e. List) buffer_likes have that size - valgrind reports a lot of unitialised values during DOD [1] - some of these are in mark_context - the context contains all kinds of aggregates - current COW copying of stacks is AFAIK borken - both copies of one COWed stack share the same Buffer header - I don't think the the GC/compact_pool in resources.c does correctly deal with such Buffers. COWed strings have distinct (String)Buffer headers. It could be that e.g. lex Pads (which contain Arrays) aren't marked properly (valgrind has indications for that): ==16728== Conditional jump or move depends on uninitialised value(s) ==16728==at 0x80C4BE0: pobject_lives (src/dod.c:91) ==16728==by 0x81C1B72: mark_stack (src/method_util.c:162) ==16728==by 0x81BD2B8: mark_context (src/sub.c:59) And finally I have to admit, that the list code is too complex. I don't think, that the error is directly caused by the list code (DOD/GC is turned off during each allocation). But it seems to be overkill for the average usage (mainly small Arrays). It can deal with huge sparse arrays, tries to be space efficient, with small arrays and is pretty fast. But all current array usage (except a few tests) just put a few items into arrays and do not much mess around. leo [1] some of these are during stack trace, where valgrind doesn't see what Parrot is doing - we can ignore these
Memory corruption
Here's another obnoxious test case. I started to try to strip it down, but it starts working again if I even delete nonsense lines from a subroutine that is never called. And I'm working on something else and not at all in the mood to re-learn how to debug parrot internals. It turns out that I don't get the crash when running JITted, so I think I'll just do that for now. So, in case anyone is curious (hi leo!), attached is a 56KB (9KB gzipped) imc file. It crashes on a memcpy inside compact_pool (triggered by new_hash). b-buflen is obviously corrupted. Using -G to disable garbage collection (does that work?) doesn't seem to help matters at all. Deleting the __setup sub at the end of the file makes the problem go away. (Note that __setup is never actually called, and the body of the routine is irrelevant other than its length.) dead.imc.gz Description: GNU Zip compressed data
Do my debugging for me? (memory corruption)
I'm staring at a crash, my eyes are glazing over, and I need sleep. So I was wondering if anyone would be interested in taking a look at a .imc file that is giving me a seg fault while marking a hash in a gc run triggered by a hash PMC allocation. Or at least tell me whether it's seg faulting on your machine too. I'll attach the 5KB compressed .imc file (25KB uncompressed; PIR code is redundant!) It's generated from the following Perl6 code, but you'd need my local changes in order to regenerate it: rule thing() { [a-z]+ } rule parens() { { print entering parens\n; } \( [ thing | parens | \s ]* \) { print leaving parens\n; } } sub main() { my $t = ()()(((blah blah () blah))(blah)); my $t2 = $t _ ); print ok 8\n if $t2 !~ /^ parens $/; } (the entering/leaving parens printouts have no effect on the bug; they're just remnants of my flailing.) If you run with --gc-debug, it dies a little earlier, but in what appears to be the same op. Hopefully, Steve hash_bug.imc.gz Description: GNU Zip compressed data
Re: Do my debugging for me? (memory corruption)
Steve Fink [EMAIL PROTECTED] wrote: I'm staring at a crash I'll attach the 5KB compressed .imc file (25KB uncompressed; PIR code Its really good, to have such short code snippets, that clearly show, where a bug is coming from ;) Anyway, it was again me causing this bug - sorry. Fixed and updated the comment which I didn't understand when removing it. Hopefully, Steve Thanks for the failure report, leo
Memory Corruption Bug
I've been using clintp's pasm test script for much of my testing, which is supposed to be an infix expression evaluator. I've been stress-testing parrot and finding bugs by putting increasingly-large expressions into the pasm file for it to evaluate. Attached is a .pasm file which causes some string data to be written into the middle of the string_pool-pool_buffer list of entries, such that when it tries to dereference foo in new_pmc_header, it's pointing to garbage memory. 0x20202020 for me, which is four spaces. Changing the save/restore of spaces in the pasm file to use periods causes the pointer to be 0x2e2e2e2e. I tried for a bit on this, but couldn't really track it down any more than that. Hopefully someone else can figure it out. To change the expression to make it more stressful (on both your debugging sessions, and parrot), add to the 'save ' at the very bottom of the script. Currently, it's along the lines of: 2 . ( +(45*2+6)-3*5/(2+9/7-(5*6+3-5*7/2-2)*31+52/67*10-32-56) x $somelargenumberofpastes ) Mike Lambert PS: Clint, I hope you don't mind me giving out the pasm file. I believe it is in the best interest of your master plan. :) gc_string_stress.pasm Description: gc_string_stress.pasm
Re: Memory Corruption Bug
Michel J Lambert [EMAIL PROTECTED] wrote: Attached is a .pasm file which causes some string data to be written into the middle of the string_pool-pool_buffer list of entries, such that when it tries to dereference foo in new_pmc_header, it's pointing to garbage memory. 0x20202020 for me, which is four spaces. Changing the save/restore of spaces in the pasm file to use periods causes the pointer to be 0x2e2e2e2e. I tried for a bit on this, but couldn't really track it down any more than that. Hopefully someone else can figure it out. Two problems found so far: 1) mem_realloc passes the incorrect size to Parrot_allocate (this causes the specific error mentioned above) 2) add_header_to_free calls mem_realloc calls Parrot_allocate calls go_collect which moves the free header pool The first is a simple fix; the second needs either suppression of collection during the procedure, or not using mem_realloc. The patch below puts the reallocation code into add_header_to_free and add_pmc_to_free, this means that, in the scenario described, the free pool will be moved by the garbage collector, and then immediately moved again. Since the free pools are not compressed by go_collect, perhaps they should be allocated independently and not copied around all the time?? After fixing the above, the test program still abends, this time with subend somehow is less than substart - I have not yet followed up on this. -- Peter Gibbs EmKel Systems Index: memory.c === RCS file: /home/perlcvs/parrot/memory.c,v retrieving revision 1.29 diff -u -r1.29 memory.c --- memory.c 18 Mar 2002 16:42:06 - 1.29 +++ memory.c 29 Mar 2002 17:41:17 - @@ -145,7 +145,7 @@ { size_t copysize = (fromsize tosize ? tosize : fromsize); void *mem; -mem = Parrot_allocate(interpreter, copysize); +mem = Parrot_allocate(interpreter, tosize); if (!mem) { return NULL; } Index: resources.c === RCS file: /home/perlcvs/parrot/resources.c,v retrieving revision 1.35 diff -u -r1.35 resources.c --- resources.c 26 Mar 2002 16:33:01 - 1.35 +++ resources.c 29 Mar 2002 17:42:15 - @@ -18,22 +18,26 @@ static void add_header_to_free(struct Parrot_Interp *interpreter, struct free_pool *pool, void *to_add); -/* Add a string header to the free string header pool */ +/* Add a PMC header to the free pool */ static void add_pmc_to_free(struct Parrot_Interp *interpreter, struct free_pool *pool, void *to_add) { PMC **temp_ptr; /* First, check and see if there's enough space in the free pool. If - we're within the size of a STRING pointer, we make it bigger */ + we're within the size of a pointer, we make it bigger */ if (pool-entries_in_pool * sizeof(PMC *) = pool-pool_buffer.buflen - sizeof(PMC *)) { /* If not, make the free pool bigger. We enlarge it by 20% */ -pool-pool_buffer.bufstart = mem_realloc(interpreter, - pool-pool_buffer.bufstart, - pool-pool_buffer.buflen, - (UINTVAL)(pool-pool_buffer.buflen * 1.2)); -pool-pool_buffer.buflen = (UINTVAL)(pool-pool_buffer.buflen * 1.2); - +/* don't use mem_realloc, because garbage collection may occur */ +size_t new_size = pool-pool_buffer.buflen * 1.2; +void *new_ptr = Parrot_allocate(interpreter, new_size); +if (!new_ptr) { +internal_exception(ALLOCATION_ERROR, + No room to expand free PMC pool); +} +memcpy(new_ptr, pool-pool_buffer.bufstart, pool-pool_buffer.buflen); +pool-pool_buffer.bufstart = new_ptr; +pool-pool_buffer.buflen = new_size; } /* Okay, so there's space. Add the header on */ @@ -269,12 +273,16 @@ if (pool-entries_in_pool * sizeof(STRING *) = pool-pool_buffer.buflen - sizeof(STRING *)) { /* If not, make the free pool bigger. We enlarge it by 20% */ -pool-pool_buffer.bufstart = mem_realloc(interpreter, - pool-pool_buffer.bufstart, - pool-pool_buffer.buflen, - (UINTVAL)(pool-pool_buffer.buflen * 1.2)); -pool-pool_buffer.buflen = (UINTVAL)(pool-pool_buffer.buflen * 1.2); - +/* don't use mem_realloc, because garbage collection may occur */ +size_t new_size = pool-pool_buffer.buflen * 1.2; +void *new_ptr = Parrot_allocate(interpreter, new_size); +if (!new_ptr) { +internal_exception(ALLOCATION_ERROR, + No room to expand free buffer header pool); +} +memcpy(new_ptr, pool-pool_buffer.bufstart, pool-pool_buffer.buflen); +pool-pool_buffer.bufstart = new_ptr; +pool-pool_buffer.buflen = new_size; } /* Okay, so there's space. Add the header on */
[APPLIED (partially)] Re: Memory Corruption Bug
At 07:57 PM 3/29/2002 +0200, you wrote: Michel J Lambert [EMAIL PROTECTED] wrote: Attached is a .pasm file which causes some string data to be written into the middle of the string_pool-pool_buffer list of entries, such that when it tries to dereference foo in new_pmc_header, it's pointing to garbage memory. 0x20202020 for me, which is four spaces. Changing the save/restore of spaces in the pasm file to use periods causes the pointer to be 0x2e2e2e2e. I tried for a bit on this, but couldn't really track it down any more than that. Hopefully someone else can figure it out. Two problems found so far: 1) mem_realloc passes the incorrect size to Parrot_allocate (this causes the specific error mentioned above) Clearly a bug. Applied, thanks. 2) add_header_to_free calls mem_realloc calls Parrot_allocate calls go_collect which moves the free header pool The first is a simple fix; the second needs either suppression of collection during the procedure, or not using mem_realloc. The patch below puts the reallocation code into add_header_to_free and add_pmc_to_free, this means that, in the scenario described, the free pool will be moved by the garbage collector, and then immediately moved again. Since the free pools are not compressed by go_collect, perhaps they should be allocated independently and not copied around all the time?? After fixing the above, the test program still abends, this time with subend somehow is less than substart - I have not yet followed up on this. I left the second part concering realloc() out, pending further discussion. Not saying this is wrong, just that is sounds as if you may suspect a more elegant fix that may warrant comments from the list. -Melvin