On Wed, 26 Oct 2005, Tobias Ulmer wrote: > Hi > > I'm running a 3.7 (all patches applied, everthing else default) on an > old box (dmesg at the end). It fetches mail for me with the following > script: > > ---8<--- > #! /bin/sh > > LOCK="$HOME/.getmail.lock" > > if ! [ -f $LOCK ] > then > touch $LOCK > getmail 2>&1 > /dev/null > rm $LOCK > fi > ---8<--- > > This script is run from crontab every minute. Sometimes ksh segfaults > and dumps core. It only happens once a day or two, so this is not a big > problem for me. I was however curious and compiled ksh with -g to get > more information. > > [EMAIL PROTECTED]:~# gdb /bin/sh /home/tobiasu/core/sh.core > GNU gdb 6.3 > [...] > This GDB was configured as "i386-unknown-openbsd3.7"... > Core was generated by `sh'. > Program terminated with signal 11, Segmentation fault. > #0 0x1c027ed6 in _weak__thread_fd_unlock () > (gdb) backtrace full > #0 0x1c027ed6 in _weak__thread_fd_unlock () > No symbol table info available. > #1 0x1c028025 in _weak__thread_fd_unlock () > No symbol table info available. > #2 0x1c027b48 in _weak__thread_fd_unlock () > No symbol table info available. > #3 0x1c028095 in _weak__thread_fd_unlock () > No symbol table info available. > #4 0x1c028395 in malloc () > No symbol table info available. > #5 0x1c03c90e in atexit () > No symbol table info available. > #6 0x1c0002e9 in __register_frame_info () > No symbol table info available. > #7 0x1c000155 in __init () > No symbol table info available. > #8 0x1c0001ee in ___start () > No symbol table info available. > #9 0x1c00016f in _start () > No symbol table info available. > (gdb) quit > [EMAIL PROTECTED]:~# gdb /bin/sh /home/tobiasu/core/sh2.core > GNU gdb 6.3 > [...] > This GDB was configured as "i386-unknown-openbsd3.7"... > Core was generated by `sh'. > Program terminated with signal 11, Segmentation fault. > #0 0x1c027ed6 in _weak__thread_fd_unlock () > (gdb) backtrace full > #0 0x1c027ed6 in _weak__thread_fd_unlock () > No symbol table info available. > #1 0x1c028025 in _weak__thread_fd_unlock () > No symbol table info available. > #2 0x1c027b48 in _weak__thread_fd_unlock () > No symbol table info available. > #3 0x1c028095 in _weak__thread_fd_unlock () > No symbol table info available. > #4 0x1c028395 in malloc () > No symbol table info available. > #5 0x1c03c90e in atexit () > No symbol table info available. > #6 0x1c0002e9 in __register_frame_info () > No symbol table info available. > #7 0x1c000155 in __init () > No symbol table info available. > #8 0x1c0001ee in ___start () > No symbol table info available. > #9 0x1c00016f in _start () > No symbol table info available. > (gdb) info registers > eax 0x0 0 > ecx 0x5 5 > edx 0x0 0 > ebx 0x0 0 > esp 0xcfbf3fd4 0xcfbf3fd4 > ebp 0xcfbf3fec 0xcfbf3fec > esi 0x0 0 > edi 0xcfbf4034 -809549772 > eip 0x1c027ed6 0x1c027ed6 > eflags 0x10202 66050 > cs 0x1f 31 > ss 0x27 39 > ds 0x27 39 > es 0x27 39 > fs 0x27 39 > gs 0x27 39 > > > > My _guess_ is that it has something to do with the test condition if the > lock-file still exists and then is deleted shortly after (This is called > a race condition, right?). I tried to grep /usr/src but it takes hours > (PIO4, no DMA...) and I didn't find out where this thread_fd_unlock > function is nor what it does.
This is strange. From the trace it looks like you are crashing in code that is executed before sh is running. What is extra strange is that your code is executing thread specific stuff, which isn't supposed to happen in a single threaded program like sh is. > I might also be completly wrong. Can someone bring some light into this > and give me a clue why it happens? Maybe it can even be fixed :) No clues so far... -Otto