Re: ptrace crash on PREEMPT 2.6.18-128.7.1.el5 kernel

2010-03-01 Thread Régis Odeyé

Hi,

We experiment the same issue. It is related to this fix: 
linux-2.6-misc-ptrace-fix-exec-report.patch and the related bugzilla: 
https://bugzilla.redhat.com/show_bug.cgi?id=455060


We patched the ptrace.c with this:

*** 
/root/rpmbuild_r5/BUILD/kernel-2.6.18/linux-2.6.18.i386/kernel/ptrace.c
2010-02-17 18:02:49.0 +0100

--- ptrace.c2010-02-17 19:23:05.0 +0100
***
*** 1976,1981 
--- 1976,1982 
* The difference is in where the real stop takes place and
* what ptrace can do with tsk-exit_code there.
*/
+   preempt_enable_no_resched();
   send_sig(SIGTRAP, tsk, 0);
   return UTRACE_ACTION_RESUME;
 }

And it seems to be fine for now: gdb, strace are now working properly.

Regards
Régis.

Roland McGrath wrote:

But I'm happy to have tracked it down to the utrace-based ptrace
emulation, and was mostly just interested in knowing if preempt and
utrace are fundamentally incompatible on x86_64, or something like
that. I'll fight through the 2.6.18-164 issues instead, since the
ptrace problem doesn't seem to be happening on that version.



It is probably the case that the RHEL5 utrace code cannot easily be made to
work reliably with CONFIG_PREEMPT.


Thanks,
Roland


  



--
Régis ODEYE

Kontron Modular Computers SA
150, rue M. Berthelot / ZI Toulon Est / BP 244 / Fr 83078 TOULON Cedex 9
Phone: (33) 4 98 16 34 86   Fax: (33) 4 98 16 34 01
E-mail: regis.od...@kontron.com  Web : www.kontron.com




ptrace crash on PREEMPT 2.6.18-128.7.1.el5 kernel

2010-02-26 Thread Steve Fink
I'm not sure if this is the place for this, but:

I have an x86_64 machine that gets an immediate SIGSEGV when ptracing anything:

[r...@dl360g6gs1 kernel-2.6.18]# strace true
execve(/bin/true, [true], [/* 28 vars */]) = 0
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
+++ killed by SIGSEGV +++

I have recompiled the kernel (2.6.18-128.7.1.el5), but the only
significant change I can think of making is that I enabled preemption.

I also have an x86_64 VM under VirtualBox using a slightly different
kernel. It was initially working, but when I installed an updated
kernel RPM, it started crashing as well -- even before rebooting into
the new kernel! However, it is crashing differently. It gives me a
kernel stack trace (pasted below). It looks like some sort of locking
issue.

Is this problem fixed in later patched kernels? I know
kernel-2.6.18-164.11.1.el5.x86_64.rpm is available, but the last time
I tried that particular one it caused me some unrelated problems so
I'm hesitant to go there.

I can post the kernel config if it would be helpful.



BUG: warning at kernel/ptrace.c:1674/ptrace_report() (Not tainted)

Call Trace:
 [800c44d4] ptrace_report+0xeb/0x120
 [800c3f52] utrace_report_syscall+0x74/0x227
 [80070f41] syscall_trace_leave+0x5e/0x87
 [80060312] int_very_careful+0x35/0x3f

BUG: scheduling while atomic: true/0x0001/2526

Call Trace:
 [800655dd] __sched_text_start+0x7d/0xc22
 [800a0484] kernel_text_address+0x1a/0x26
 [8006f17b] dump_trace+0x214/0x23d
 [800c2212] utrace_quiescent+0xde/0x261
 [800c4095] utrace_report_syscall+0x1b7/0x227
 [80070f41] syscall_trace_leave+0x5e/0x87
 [80060312] int_very_careful+0x35/0x3f

BUG: warning at kernel/ptrace.c:1674/ptrace_report() (Not tainted)

Call Trace:
 [800c44d4] ptrace_report+0xeb/0x120
 [80060312] int_very_careful+0x35/0x3f
 [800c45ea] ptrace_report_signal+0x4c/0x5c
 [800c1955] report_signal+0x7f/0x179
 [800c3282] utrace_get_signal+0x3e3/0x62b
 [8006e73c] __switch_to+0x2e/0x22d
 [8002c12a] get_signal_to_deliver+0x177/0x461
 [8005d4ed] do_notify_resume+0x9c/0x7b0
 [800c4095] utrace_report_syscall+0x1b7/0x227
 [8006032e] int_signal+0x12/0x17

BUG: scheduling while atomic: true/0x0001/2526

Call Trace:
 [800655dd] __sched_text_start+0x7d/0xc22
 [800c44ec] ptrace_report+0x103/0x120
 [80060312] int_very_careful+0x35/0x3f
 [800c45ea] ptrace_report_signal+0x4c/0x5c
 [800c1955] report_signal+0x7f/0x179
 [800c2212] utrace_quiescent+0xde/0x261
 [800c3450] utrace_get_signal+0x5b1/0x62b
 [8006e73c] __switch_to+0x2e/0x22d
 [8002c12a] get_signal_to_deliver+0x177/0x461
 [8005d4ed] do_notify_resume+0x9c/0x7b0
 [800c4095] utrace_report_syscall+0x1b7/0x227
 [8006032e] int_signal+0x12/0x17
Call Trace:
 [800655dd] __sched_text_start+0x7d/0xc22
 [800c44ec] ptrace_report+0x103/0x120
 [80060312] int_very_careful+0x35/0x3f
 [800c45ea] ptrace_report_signal+0x4c/0x5c
 [800c1955] report_signal+0x7f/0x179
 [800c2212] utrace_quiescent+0xde/0x261
 [800c3450] utrace_get_signal+0x5b1/0x62b
 [8006e73c] __switch_to+0x2e/0x22d
 [8002c12a] get_signal_to_deliver+0x177/0x461
 [8005d4ed] do_notify_resume+0x9c/0x7b0
 [800c4095] utrace_report_syscall+0x1b7/0x227
 [8006032e] int_signal+0x12/0x17

BUG: warning at kernel/ptrace.c:1674/ptrace_report() (Not tainted)

Call Trace:
 [800c44d4] ptrace_report+0xeb/0x120
 [800c45ea] ptrace_report_signal+0x4c/0x5c
 [800c1955] report_signal+0x7f/0x179
 [800c3282] utrace_get_signal+0x3e3/0x62b
 [8002c12a] get_signal_to_deliver+0x177/0x461
 [8005d4ed] do_notify_resume+0x9c/0x7b0
 [8009af62] specific_send_sig_info+0xa1/0xac
 [800683c9] _spin_unlock_irqrestore+0x16/0x31
 [8009b243] force_sig_info+0xae/0xb9
 [8006a707] do_page_fault+0x81e/0x830
 [800c4095] utrace_report_syscall+0x1b7/0x227
 [800606e0] retint_signal+0x3d/0x79

BUG: scheduling while atomic: true/0x0001/2526

Call Trace:
 [800655dd] __sched_text_start+0x7d/0xc22
 [800c44ec] ptrace_report+0x103/0x120
 [800c45ea] ptrace_report_signal+0x4c/0x5c
 [800c1955] report_signal+0x7f/0x179
 [800c2212] utrace_quiescent+0xde/0x261
 [800c3450] utrace_get_signal+0x5b1/0x62b
 [8002c12a] get_signal_to_deliver+0x177/0x461
 [8005d4ed] do_notify_resume+0x9c/0x7b0
 [8009af62] specific_send_sig_info+0xa1/0xac
 [800683c9] _spin_unlock_irqrestore+0x16/0x31
 [8009b243] force_sig_info+0xae/0xb9
 [8006a707] do_page_fault+0x81e/0x830
 [800c4095] utrace_report_syscall+0x1b7/0x227
 [800606e0] retint_signal+0x3d/0x79

BUG: warning at 

Re: ptrace crash on PREEMPT 2.6.18-128.7.1.el5 kernel

2010-02-26 Thread Steve Fink
On Fri, Feb 26, 2010 at 12:33 PM, Roland McGrath rol...@redhat.com wrote:
 If you are using actual RHEL5, you can go through your normal support
 channels for help on that.  I don't know off hand of anybody who wants to
 help you with support for using RHEL5 kernel source built with a set of
 options different from what RHEL5's own builds use.  For kernel developers,
 that is a really ancient kernel now.  For enterprise support folks,
 changing big important config options for rebuilding from the stable old
 kernel's source is outside the scope of stable and support.

Fair enough. Thanks for the quick response. For what I'm working on, I
really do need the preemptive kernel (I'm generating a few thousand
different live video streams, so latency=glitches, and preempt
measurably helps.) Which, as you say, pretty much pushes me outside of
the supportable envelope unless we track the bleeding edge, which is
not a good idea for our setup.

But I'm happy to have tracked it down to the utrace-based ptrace
emulation, and was mostly just interested in knowing if preempt and
utrace are fundamentally incompatible on x86_64, or something like
that. I'll fight through the 2.6.18-164 issues instead, since the
ptrace problem doesn't seem to be happening on that version.

Thanks,
Steve



Re: ptrace crash on PREEMPT 2.6.18-128.7.1.el5 kernel

2010-02-26 Thread Roland McGrath
 But I'm happy to have tracked it down to the utrace-based ptrace
 emulation, and was mostly just interested in knowing if preempt and
 utrace are fundamentally incompatible on x86_64, or something like
 that. I'll fight through the 2.6.18-164 issues instead, since the
 ptrace problem doesn't seem to be happening on that version.

It is probably the case that the RHEL5 utrace code cannot easily be made to
work reliably with CONFIG_PREEMPT.


Thanks,
Roland