On 5/3/2012 5:26 PM, Charles Steinkuehler wrote:
> If you run with electric fence, you'll get a segfault well before the
> malloc crash, where one of the debug writes to stderr is stepping on
> memory that it shouldn't (which is likely what is eventually causing
> malloc to barf).  Looking at what might be corrupting the stderr
> structure is what lead me to point the finger at the rtapi_print_msg
> from rtapi_clock_set_period.

Just wanted to say I'm going to be going off-line for a while (until
next Wednesday).  I'll have e-mail, but no access to a development platform.

I was hoping to track down the cause of this bug first, but didn't quite
make it.  I do think the call chain leading up to the function
default_rtapi_msg_handler is the culprit, but I haven't figured out
exactly why.  What I do know is:

Result of calls to default_rtapi_msg_handler
NOTE: Using electric fence to identify memory problems!
=============================================
1st call (via rtapi_clock_set_period) causes the memory region holding
_IO_FILE * stderr to be overwritten

2nd call (via rtapi_task_new) ???

3rd call (via rtapi_reset_pagefault_count) !!-CRASH-!! ...or hit the
"electric fence"
=============================================

Note that this is *WITHOUT* the patch I made, so the code is trying to
write to stderr (which is apparently getting corrupted).  It is easy to
reproduce this in gdb by running rtapi_app, setting a break-point on
default_rtapi_msg_handler, and hitting 'c' a few times.  The break
happens before the nastiness (which occurs in the vfprintf routine), so
you can poke around memory and the call stack before firing off the
vfprintf if you want (example gdb session attached).

That's as close to a smoking gun as I've been able to uncover...

...maybe someone a bit more versed in C and passing variable length
argument lists can spot what's going wrong.  For all I know, the
electric fence segfault above is unrelated to the 'real' crash later on
in malloc(), and instead is some issue with the electric fence library
and pthreads, and the overwrite of the stderr memory structure could be
expected behavior.

Anyway, thanks for all the help so far!!!

-- 
Charles Steinkuehler
char...@steinkuehler.net
GNU gdb (GDB) 7.4-debian
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/charles/src/linuxcnc-bitmuster/bin/rtapi_app...done.
(gdb) underfence
Enabled Electric Fence for undeflow detection
(gdb) start load threads name1=fast period1=50000
Temporary breakpoint 1 at 0x404450: file rtapi/linux_rtapi_app.cc, line 673.
Starting program: /home/charles/src/linuxcnc-bitmuster/bin/rtapi_app load 
threads name1=fast period1=50000
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Temporary breakpoint 1, main (argc=5, argv=0x7fffffffebc8) at 
rtapi/linux_rtapi_app.cc:673
673     {
(gdb) break default_rtapi_msg_handler
Breakpoint 2 at 0x40658c: file ./rtapi/linux_common.h, line 171.
(gdb) p stderr
$1 = (struct _IO_FILE *) 0x7ffff6e0d880
(gdb) watch -l *0x7ffff6e0d880
Hardware watchpoint 3: -location *0x7ffff6e0d880
(gdb) c
Continuing.

Breakpoint 2, default_rtapi_msg_handler (level=RTAPI_MSG_INFO, fmt=0x40866a 
"rtapi_clock_set_period (res=%ld) -> %d\n", ap=0x7fffffffe2f8) at 
./rtapi/linux_common.h:171
171         if (level == RTAPI_MSG_ALL)
(gdb) 
Continuing.
Hardware watchpoint 3: -location *0x7ffff6e0d880

Old value = -72540026
New value = -72540025
_IO_setb (f=0x7ffff6e0d880, b=0x7ffff6e0d903 "", eb=0x7ffff6e0d904 "", a=0) at 
genops.c:413
(gdb) 
Continuing.
Hardware watchpoint 3: -location *0x7ffff6e0d880

Old value = -72540025
New value = -72537977
_IO_new_file_overflow (f=0x7ffff6e0d880, ch=-1) at fileops.c:873
(gdb) 
Continuing.

Breakpoint 2, default_rtapi_msg_handler (level=RTAPI_MSG_INFO, fmt=0x408608 
"Creating new task with requested priority %d (highest=%d lowest=%d)\n", 
ap=0x7fffffffe2b8)
    at ./rtapi/linux_common.h:171
171         if (level == RTAPI_MSG_ALL)
(gdb) 
Continuing.
[New Thread 0x7ffff7f13700 (LWP 12993)]
[Switching to Thread 0x7ffff7f13700 (LWP 12993)]

Breakpoint 2, default_rtapi_msg_handler (level=RTAPI_MSG_INFO, fmt=0x4083d6 
"rtapi task %d: Reset pagefault counter\n", ap=0x7ffff7f12c98) at 
./rtapi/linux_common.h:171
171         if (level == RTAPI_MSG_ALL)
(gdb) 
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff6ace4b3 in _IO_vfprintf_internal (s=0x7ffff7f124e0, format=0x4083d6 
"rtapi task %d: Reset pagefault counter\n", ap=0x7ffff7f12c98) at vfprintf.c:245
(gdb) q
A debugging session is active.

        Inferior 1 [process 12990] will be killed.

Quit anyway? (y or n) 
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Emc-developers mailing list
Emc-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/emc-developers

Reply via email to