Configuration: sparc-sun-solaris10 gdb-6.4 gcc-3.4.3
R500.ramses.267> ./gdb --nx GNU gdb 6.4 Copyright 2005 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "sparc-sun-solaris2.10". (gdb) Problem: The sigstep.exp tests test the interaction of various forms of single stepping and signal handling. For the tests to run completely successfully the following two conditions must be true: 1. A signal can be delivered to a process during a single step operation. 2. The signal trampoline frame detection code can accurately detect the entry to a trampoline and the exit from the trampoline. Both the above conditions fail on Solaris. Leading to multiple failures in sigstep.exp (and other tests, for example sigbpt.exp). The first issue is: The Solaris single stepping function is implemented using the /proc filesystem and the PCRUN command with a PRSTEP flag. All gdb tests that try to deliver a signal while single stepping hang indefinitely. The reason is that signals pending against the process are never delivered when single stepping. Investigation shows that if a non single step based command such as "continue" is used, the signal is delivered as expected. Use the following gdb command to see the problem: ./gdb --nx --command=gdb.cmd testsuite/gdb.base/sigstep Where gdb.cmd contains: br main r set done = 1 set itimer = itimer_real break 66 continue advance 65 break handler step Further investigation identified the specific scenario. If a PCRUN command is issued with a flag of PRSTEP when the process is in the PR_FAULTED state, any signals pending against the process are not delivered. If the process is first transitioned to the PR_REQUESTED state, and a PCRUN command with PRSTEP flag is now issued, the pending signals are delivered as expected. I have a patch to implement the above fix. The second issue is: The Solaris Signal Trampoline detection code in sparc-sol2-tdep.c detects the signal trampoline by looking for the functions sigacthandler, ucbsigvechandler or __sighndlr in the next frame. This is fine for detecting when you are in a stack frame reached via a signal trampoline, but it does not work to provide accurate detection of the beginning and end of the trampoline. The Solaris10 signal trampoline looks something like this: sigacthandler call_user_handler unsleep_self setup_schedctl __schedctl set_parking_flag lmutex_lock lmutex_unlock sigaddset sigvalid __sigfillset __lwp_sigmask __systemcall6 __sighndlr <user handler code called> setcontext __setcontext_syscall _syscall6 This only represents one path through the trampoline, based on signal number and critical sections, the control flow can change or be deferred. As such it is very difficult to track weather the current PC is inside a signal trampoline using the function names of the implementation. To make matters worse: 1. In the last two patch cluster updates, the signal trampoline mechanism has changed, functions have been added then removed. 2. The call to call_user_handler reuses the frame of sigacthandler, therefore sigacthandler cannot be detected on the stack. Because of issue 2 above the handle_inferior_event incorrectly identifies a call to call_user_handler in a signal trampoline at infrun.c:2364 as a subroutine call, i.e. the sigacthandler frame is trashed and replaced with call_user_handler frame, which is identified as a subroutine call of the current frame. Using the same test above(for issue 1), but turning on "set debug infrun 1" will show that a call to call_user_handler is incorrectly identified as a subroutine call. This actually enables the stepping mechanism to step over signal handlers as if they are subroutines, it works, but not as intended. If the signal trampoline detection code is corrected, so that it can fully detect a signal trampoline from beginning to end, it again fails, but now at infrun.c:2557. It is detected that single stepping has stepped to a different line, therefore stepping is stopped. It is correct that stepping is on a different line, but according to the test the expected outcome involves continuing to step through the user handler and out through the signal trampoline until we return to the faulting instruction (and continue stepping at that point if required). The problems I see are: 1. The mechanism I implemented to identify the complete signal trampoline includes the names of all possibly invoked functions and a backtrace mechanism to ensure they were called from a signal handling function, i.e. sigacthandler or call_user_handler. When the C library implementation changes, this mechanism will break. 2. The logic in handle_inferior_event seems to be wrong for user signal handling functions. If it is detected we are at a different line, then it should be determined if this point was reached due to signal handling, if it was, then continue stepping though the signal handler and any subsequently called functions. I think this would require unwinding the frame stack looking for a SIGTRAMP frame. The test at infrun.c:2348 could be modified to not only look for a SIGTRAMP_FRAME in the current frame, but in any previous frame too. Alternative sigtramp detection: Any fix depends on a reliable way to detect the signal trampoline. I think a better way to detect the trampoline would be to use the proc filesystem. The lwpstatus_t for the current lwp, or the representative lwp for the process contains a member "pr_oldcontext". If the process or lwp is currently handling a signal, this member will be non-null and will be the address of the first ucontext_t on the inferior process stack. (If the process is handling multiple nested signals the member uc_link in the ucontext_t will be the address of the next context structure). A signal trampoline could be reliably detected by just checking for the presence of a pr_oldcontext in the lwpstatus. The correct ucontext could be selected by comparing the frame stack pointer passed to the signal trampoline detection code with the stack pointers saved in the ucontext. I currently have no satisfactory patch for this problem, any additional feedback regarding the way signal trampolines currently work in gdb for Solaris and any change to use the /proc filesystem would be appreciated. Regards, Steve Williams ------------------------------------ UTStarcom Canada Co. Stephen J Williams Director System Development [EMAIL PROTECTED] 4600 Jacombs Road Richmond, British Columbia V6V 3B1 Canada tel: +1 (604) 720-2309 fax: +1 (604) 276-0501 mobile: +1 (604) 720-3325 ------------------------------------ _______________________________________________ Bug-gdb mailing list Bug-gdb@gnu.org http://lists.gnu.org/mailman/listinfo/bug-gdb