Re: handle_threadlist_exception: handle_threadlist_exception called with threadlist_ix -1
On Wed, 12 Oct 2005, Pavel Tsekov wrote: On Tue, 11 Oct 2005, Christopher Faylor wrote: I don't see how ignoring blocked signals would cause a SEGV however. Well... indirectly they do :) I hope you are not too annoyed already because this time I really found the cause of the problem. Assume a signal is sent to a thread with pthread_kill() but the thread is blocking the signal and in doesn't get processed through it's lifetime. The thread dies but the signal still remains in the singal queue. Something triggeres the processing of the signal - sig_dispatch_pending() in my case (which is called as part of pthread_sigmask()). As part of the processing the 'tls' member of sigpacket is dereferenced but at that time it is already invalid. I'll try to post a testcase ASAP which demonstrates the problem. Find the testcase attached. The interesting part starts when SIGUSR2 is send from the main thread.#include limits.h #include signal.h #include stdio.h #include pthread.h static pid_t the_pid; static void empty_handler(int signo) { printf (in empty_handler(): signo = %d\n, signo); } static void *thread_loop (void *unused) { int i; sigset_t block_set, pending_set; sigemptyset (block_set); sigaddset (block_set, SIGUSR2); if (pthread_sigmask (SIG_BLOCK, block_set, NULL) != 0) { printf (failed to set the list of blocked signals\n); } /* All done - let the main thread know that it can send us a signal. */ kill (the_pid, SIGUSR1); for (i = 0; i INT_MAX; i++); printf (exiting thread_loop()\n); return NULL; } int main (int argc, char **argv) { int rv; int i; pthread_t thr_id; sigset_t new_set, old_set; void *thr_result; the_pid = getpid (); /* Dummy synchronization scheme so that we know that the second thread initialized its list of blocked signals. */ signal (SIGUSR1, empty_handler); sigemptyset (new_set); sigaddset (new_set, SIGUSR1); sigprocmask (SIG_BLOCK, new_set, old_set); rv = pthread_create (thr_id, NULL, thread_loop, NULL); if (rv != 0) { printf (failed to create thread.\n); exit (1); } /* Wait until the second thread signals the main thread. */ sigsuspend (old_set); sigprocmask (SIG_UNBLOCK, new_set, NULL); /* Send a SIGUSR2 signal to the second thread while it is blocking SIGUSR2. */ pthread_kill (thr_id, SIGUSR2); /* Wait for the thread to terminate. */ pthread_join (thr_id, thr_result); /* Trigger sig_dispatch_pending() */ signal (SIGUSR1, SIG_IGN); /* Just wait for the program to crash. */ sleep (600); exit (0); } -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: handle_threadlist_exception: handle_threadlist_exception called with threadlist_ix -1
On Thu, 6 Oct 2005, Pavel Tsekov wrote: On Thu, 6 Oct 2005, Christopher Faylor wrote: It might be a different problem but the message is the same. It *is* a different problem. Ok. Some thread is sending a signal 31 (SIGUSR1). Which thread is doing this? An application thread signaling another thread to stop its execution. I am on it - I'll report back if I manage to find something. While tracking this problem I found what I suspect is a small bug in the way sigsuspend() works when it is used to retrieve the list of pending signals for a thread other than the main one. I think this is related to the crash I am seeing in some way though this has to be determined yet. As I read the code, when retrieving the list of pending signals sigpending() inspects only the list of blocked signals for the main thread - it doesn't look in the thread specific list of blocked signals of the calling thread. The code which I refer to is the following block from wait_sig(): case __SIGPENDING: *pack.mask = 0; unsigned bit; sigq.reset (); while ((q = sigq.next ())) if (myself-getsigmask () (bit = SIGTOMASK (q-si.si_signo))) *pack.mask |= bit; break; On the other hand the code in sigpacket::process() does the right thing when it delivers a signal i.e. it looks the list of blocked signals in both the main thread and the target thread. Attached is a simple test case which demonstrates the problem. On Linux: pending_set = pending_set = pending_set = pending_set = pending_set = pending_set = 0800 exiting thread_loop() On Cygwin: pending_set = pending_set = pending_set = pending_set = pending_set = pending_set = pending_set = pending_set = pending_set = pending_set = pending_set = pending_set = pending_set = [ keeps looping forever ]#include signal.h #include stdio.h #include pthread.h static void *thread_loop (void *unused) { sigset_t block_set, pending_set; sigemptyset (block_set); sigaddset (block_set, SIGUSR2); if (pthread_sigmask (SIG_BLOCK, block_set, NULL) != 0) { printf (failed to set the list of blocked signals\n); } while (1) { sigpending (pending_set); printf (pending_set = %08X\n, pending_set); if (sigismember (pending_set, SIGUSR2) != 0) break; sleep (1); } printf (exiting thread_loop()\n); return NULL; } int main (int argc, char **argv) { int rv; pthread_t thr_id; rv = pthread_create (thr_id, NULL, thread_loop, NULL); if (rv != 0) { printf (failed to create thread.\n); exit (1); } /* give the second thread a chance to run */ sleep (5); while (1) { if (pthread_kill (thr_id, SIGUSR2) != 0) break; } exit (0); } -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: handle_threadlist_exception: handle_threadlist_exception called with threadlist_ix -1
On Tue, Oct 11, 2005 at 05:51:00PM +0300, Pavel Tsekov wrote: On Thu, 6 Oct 2005, Pavel Tsekov wrote: On Thu, 6 Oct 2005, Christopher Faylor wrote: It might be a different problem but the message is the same. It *is* a different problem. Ok. Some thread is sending a signal 31 (SIGUSR1). Which thread is doing this? An application thread signaling another thread to stop its execution. I am on it - I'll report back if I manage to find something. While tracking this problem I found what I suspect is a small bug in the way sigsuspend() works when it is used to retrieve the list of pending signals for a thread other than the main one. I think this is related to the crash I am seeing in some way though this has to be determined yet. See the FIXME a few lines down from that. As I read the code, when retrieving the list of pending signals sigpending() inspects only the list of blocked signals for the main thread - it doesn't look in the thread specific list of blocked signals of the calling thread. So, it sounds like you're pretty close to a PTC. I don't see how ignoring blocked signals would cause a SEGV however. cgf -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: handle_threadlist_exception: handle_threadlist_exception called with threadlist_ix -1
On Thu, 1 Sep 2005, Christopher Faylor wrote: On Thu, Sep 01, 2005 at 03:25:17PM +0100, Dave Korn wrote: Anyone else seeing quite a lot of these with current cvs HEAD? Often when pressing Ctrl-C, sometimes when things exit for other (signal-related?) reasons? I think this error indicates that a signal has been received but either find_tls hasn't yet been called, or something has overwritten the threadlist index. There's a lot that goes on at startup/fork time, though, and I'm not deeply familiar with it. Since I'm set up for debugging ATM, does anyone have any suggestions where I could look next? How about looking in the direction of a simple test scenario which demonstrates what you are reporting? I am can reproduce this repeatedly - I'll try to isolate the cause and post a test case. -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: handle_threadlist_exception: handle_threadlist_exception called with threadlist_ix -1
On Thu, Oct 06, 2005 at 04:18:39PM +0300, Pavel Tsekov wrote: On Thu, 1 Sep 2005, Christopher Faylor wrote: On Thu, Sep 01, 2005 at 03:25:17PM +0100, Dave Korn wrote: Anyone else seeing quite a lot of these with current cvs HEAD? Often when pressing Ctrl-C, sometimes when things exit for other (signal-related?) reasons? I think this error indicates that a signal has been received but either find_tls hasn't yet been called, or something has overwritten the threadlist index. There's a lot that goes on at startup/fork time, though, and I'm not deeply familiar with it. Since I'm set up for debugging ATM, does anyone have any suggestions where I could look next? How about looking in the direction of a simple test scenario which demonstrates what you are reporting? I am can reproduce this repeatedly - I'll try to isolate the cause and post a test case. Did you happen to notice when the age of the message to which you're responding? Dave figured out the problem subsequent to sending the above. It was due to some object files not getting rebuilt after a change to cygtls.h. -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: handle_threadlist_exception: handle_threadlist_exception called with threadlist_ix -1
On Thu, 6 Oct 2005, Christopher Faylor wrote: Did you happen to notice when the age of the message to which you're responding? Dave figured out the problem subsequent to sending the above. It was due to some object files not getting rebuilt after a change to cygtls.h. Yes. When I saw the error message I remembered that I've seen it on the mailing list already. So, I used the search engine to find on which date the message was posted and replied ot that post. It might be a different problem but the message is the same. And this is with a clean build of the dll from yesterday and a clean build of the software from today. Here is a backtrace (just for the record): (gdb) bt #0 sigismember (set=0x162f090, sig=31) at ../../../../src/winsup/cygwin/signal.cc:429 #1 0x61017710 in sigpacket::process (this=0x6113b3ec) at ../../../../src/winsup/cygwin/exceptions.cc:1072 #2 0x61092b18 in wait_sig () at ../../../../src/winsup/cygwin/sigproc.cc:1128 #3 0x610033ef in cygthread::stub (arg=0xa2eff0) at ../../../../src/winsup/cygwin/cygthread.cc:73 #4 0x0f94 in ?? () #5 0x in ?? () from In frame 0 'set' is pointing at invalid memory - I am trying to determine why. -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: handle_threadlist_exception: handle_threadlist_exception called with threadlist_ix -1
On Thu, Oct 06, 2005 at 05:39:35PM +0300, Pavel Tsekov wrote: On Thu, 6 Oct 2005, Christopher Faylor wrote: Did you happen to notice when the age of the message to which you're responding? Dave figured out the problem subsequent to sending the above. It was due to some object files not getting rebuilt after a change to cygtls.h. Yes. When I saw the error message I remembered that I've seen it on the mailing list already. So, I used the search engine to find on which date the message was posted and replied ot that post. It might be a different problem but the message is the same. It *is* a different problem. And this is with a clean build of the dll from yesterday and a clean build of the software from today. Here is a backtrace (just for the record): (gdb) bt #0 sigismember (set=0x162f090, sig=31) at ../../../../src/winsup/cygwin/signal.cc:429 #1 0x61017710 in sigpacket::process (this=0x6113b3ec) at ../../../../src/winsup/cygwin/exceptions.cc:1072 #2 0x61092b18 in wait_sig () at ../../../../src/winsup/cygwin/sigproc.cc:1128 #3 0x610033ef in cygthread::stub (arg=0xa2eff0) at ../../../../src/winsup/cygwin/cygthread.cc:73 #4 0x0f94 in ?? () #5 0x in ?? () from In frame 0 'set' is pointing at invalid memory - I am trying to determine why. Some thread is sending a signal 31 (SIGUSR1). Which thread is doing this? cgf -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: handle_threadlist_exception: handle_threadlist_exception called with threadlist_ix -1
On Thu, 6 Oct 2005, Christopher Faylor wrote: It might be a different problem but the message is the same. It *is* a different problem. Ok. Some thread is sending a signal 31 (SIGUSR1). Which thread is doing this? An application thread signaling another thread to stop its execution. I am on it - I'll report back if I manage to find something. -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
handle_threadlist_exception: handle_threadlist_exception called with threadlist_ix -1
Anyone else seeing quite a lot of these with current cvs HEAD? Often when pressing Ctrl-C, sometimes when things exit for other (signal-related?) reasons? I think this error indicates that a signal has been received but either find_tls hasn't yet been called, or something has overwritten the threadlist index. There's a lot that goes on at startup/fork time, though, and I'm not deeply familiar with it. Since I'm set up for debugging ATM, does anyone have any suggestions where I could look next? cheers, DaveK -- Can't think of a witty .sigline today -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: handle_threadlist_exception: handle_threadlist_exception called with threadlist_ix -1
On Thu, Sep 01, 2005 at 03:25:17PM +0100, Dave Korn wrote: Anyone else seeing quite a lot of these with current cvs HEAD? Often when pressing Ctrl-C, sometimes when things exit for other (signal-related?) reasons? I think this error indicates that a signal has been received but either find_tls hasn't yet been called, or something has overwritten the threadlist index. There's a lot that goes on at startup/fork time, though, and I'm not deeply familiar with it. Since I'm set up for debugging ATM, does anyone have any suggestions where I could look next? How about looking in the direction of a simple test scenario which demonstrates what you are reporting? cgf -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
RE: handle_threadlist_exception: handle_threadlist_exception called with threadlist_ix -1
Original Message From: Christopher Faylor Sent: 01 September 2005 15:44 On Thu, Sep 01, 2005 at 03:25:17PM +0100, Dave Korn wrote: Anyone else seeing quite a lot of these with current cvs HEAD? Often when pressing Ctrl-C, sometimes when things exit for other (signal-related?) reasons? I think this error indicates that a signal has been received but either find_tls hasn't yet been called, or something has overwritten the threadlist index. There's a lot that goes on at startup/fork time, though, and I'm not deeply familiar with it. Since I'm set up for debugging ATM, does anyone have any suggestions where I could look next? How about looking in the direction of a simple test scenario which demonstrates what you are reporting? cgf Well, run programs and sometimes it happens when you press Ctrl-C isn't exactly reproducible, so I was trying to find out what the message _means_ so that I could try and make a few guesses at how to trip whatever condition it indicates so that I might have a chance of being able to make a testcase. cheers, DaveK -- Can't think of a witty .sigline today -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/