Re: handle_threadlist_exception: handle_threadlist_exception called with threadlist_ix -1

2005-10-13 Thread Pavel Tsekov
On Wed, 12 Oct 2005, Pavel Tsekov wrote:

 On Tue, 11 Oct 2005, Christopher Faylor wrote:

  I don't see how ignoring blocked signals would cause a SEGV however.

 Well... indirectly they do :) I hope you are not too annoyed already
 because this time I really found the cause of the problem.

 Assume a signal is sent to a thread with pthread_kill() but the thread is
 blocking the signal and in doesn't get processed through it's lifetime.
 The thread dies but the signal still remains in the singal queue.
 Something triggeres the processing of the signal - sig_dispatch_pending()
 in my case (which is called as part of pthread_sigmask()). As part of the
 processing the 'tls' member of sigpacket is dereferenced but at that time
 it is already invalid.

 I'll try to post a testcase ASAP which demonstrates the problem.

Find the testcase attached. The interesting part starts when SIGUSR2 is
send from the main thread.#include limits.h
#include signal.h
#include stdio.h
#include pthread.h

static pid_t the_pid;

static void empty_handler(int signo)
{
  printf (in empty_handler(): signo = %d\n, signo);
}

static void *thread_loop (void *unused)
{
  int i;
  sigset_t block_set, pending_set;

  sigemptyset (block_set);
  sigaddset (block_set, SIGUSR2);
  if (pthread_sigmask (SIG_BLOCK, block_set, NULL) != 0)
{
  printf (failed to set the list of blocked signals\n);
}

  /* All done - let the main thread know that it
 can send us a signal. */
  kill (the_pid, SIGUSR1);

  for (i = 0; i  INT_MAX; i++);

  printf (exiting thread_loop()\n);

  return NULL;
}

int main (int argc, char **argv)
{
  int rv;
  int i;
  pthread_t thr_id;
  sigset_t new_set, old_set;
  void *thr_result;

  the_pid = getpid ();

  /* Dummy synchronization scheme so that we know that
 the second thread initialized its list of blocked
 signals. */
  signal (SIGUSR1, empty_handler);
  sigemptyset (new_set);
  sigaddset (new_set, SIGUSR1);
  sigprocmask (SIG_BLOCK, new_set, old_set);

  rv = pthread_create (thr_id, NULL, thread_loop, NULL);
  if (rv != 0)
{
  printf (failed to create thread.\n);
  exit (1);
}

  /* Wait until the second thread signals the main thread. */
  sigsuspend (old_set);
  sigprocmask (SIG_UNBLOCK, new_set, NULL);

  /* Send a SIGUSR2 signal to the second thread while
 it is blocking SIGUSR2. */
  pthread_kill (thr_id, SIGUSR2);

  /* Wait for the thread to terminate. */
  pthread_join (thr_id, thr_result);

  /* Trigger sig_dispatch_pending() */
  signal (SIGUSR1, SIG_IGN);

  /* Just wait for the program to crash. */
  sleep (600);

  exit (0);
}
--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/

Re: handle_threadlist_exception: handle_threadlist_exception called with threadlist_ix -1

2005-10-11 Thread Pavel Tsekov
On Thu, 6 Oct 2005, Pavel Tsekov wrote:

 On Thu, 6 Oct 2005, Christopher Faylor wrote:

  It might be a different problem but the message is the same.
 
  It *is* a different problem.

 Ok.

  Some thread is sending a signal 31 (SIGUSR1).  Which thread is doing this?

 An application thread signaling another thread to stop its execution. I am
 on it - I'll report back if I manage to find something.

While tracking this problem I found what I suspect is a small bug in
the way sigsuspend() works when it is used to retrieve the list of pending
signals for a thread other than the main one. I think this is related to
the crash I am seeing in some way though this has to be determined yet.

As I read the code, when retrieving the list of pending signals
sigpending() inspects only the list of blocked signals for the main
thread - it doesn't look in the thread specific list of blocked signals
of the calling thread.

The code which I refer to is the following block from wait_sig():

case __SIGPENDING:
  *pack.mask = 0;
  unsigned bit;
  sigq.reset ();
  while ((q = sigq.next ()))
if (myself-getsigmask ()  (bit = SIGTOMASK (q-si.si_signo)))
  *pack.mask |= bit;
  break;

On the other hand the code in sigpacket::process() does the right thing
when it delivers a signal i.e. it looks the list of blocked signals in
both the main thread and the target thread.

Attached is a simple test case which demonstrates the problem.

On Linux:

pending_set = 
pending_set = 
pending_set = 
pending_set = 
pending_set = 
pending_set = 0800
exiting thread_loop()

On Cygwin:

pending_set = 
pending_set = 
pending_set = 
pending_set = 
pending_set = 
pending_set = 
pending_set = 
pending_set = 
pending_set = 
pending_set = 
pending_set = 
pending_set = 
pending_set = 

[ keeps looping forever ]#include signal.h
#include stdio.h
#include pthread.h

static void *thread_loop (void *unused)
{
  sigset_t block_set, pending_set;

  sigemptyset (block_set);
  sigaddset (block_set, SIGUSR2);
  if (pthread_sigmask (SIG_BLOCK, block_set, NULL) != 0)
{
  printf (failed to set the list of blocked signals\n);
}

  while (1)
{
  sigpending (pending_set);

  printf (pending_set = %08X\n, pending_set);

  if (sigismember (pending_set, SIGUSR2) != 0)
break;

  sleep (1);
}

  printf (exiting thread_loop()\n);

  return NULL;
}

int main (int argc, char **argv)
{
  int rv;
  pthread_t thr_id;

  rv = pthread_create (thr_id, NULL, thread_loop, NULL);
  if (rv != 0)
{
  printf (failed to create thread.\n);
  exit (1);
}

  /* give the second thread a chance to run */
  sleep (5);

  while (1)
{
  if (pthread_kill (thr_id, SIGUSR2) != 0)
break;
}

  exit (0);
}
--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/

Re: handle_threadlist_exception: handle_threadlist_exception called with threadlist_ix -1

2005-10-11 Thread Christopher Faylor
On Tue, Oct 11, 2005 at 05:51:00PM +0300, Pavel Tsekov wrote:
On Thu, 6 Oct 2005, Pavel Tsekov wrote:

 On Thu, 6 Oct 2005, Christopher Faylor wrote:

  It might be a different problem but the message is the same.
 
  It *is* a different problem.

 Ok.

  Some thread is sending a signal 31 (SIGUSR1).  Which thread is doing this?

 An application thread signaling another thread to stop its execution. I am
 on it - I'll report back if I manage to find something.

While tracking this problem I found what I suspect is a small bug in
the way sigsuspend() works when it is used to retrieve the list of pending
signals for a thread other than the main one. I think this is related to
the crash I am seeing in some way though this has to be determined yet.

See the FIXME a few lines down from that.

As I read the code, when retrieving the list of pending signals
sigpending() inspects only the list of blocked signals for the main
thread - it doesn't look in the thread specific list of blocked signals
of the calling thread.

So, it sounds like you're pretty close to a PTC.

I don't see how ignoring blocked signals would cause a SEGV however.

cgf

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



Re: handle_threadlist_exception: handle_threadlist_exception called with threadlist_ix -1

2005-10-06 Thread Pavel Tsekov
On Thu, 1 Sep 2005, Christopher Faylor wrote:

 On Thu, Sep 01, 2005 at 03:25:17PM +0100, Dave Korn wrote:
 
   Anyone else seeing quite a lot of these with current cvs HEAD?  Often when
 pressing Ctrl-C, sometimes when things exit for other (signal-related?)
 reasons?
 
   I think this error indicates that a signal has been received but either
 find_tls hasn't yet been called, or something has overwritten the threadlist
 index.  There's a lot that goes on at startup/fork time, though, and I'm not
 deeply familiar with it.  Since I'm set up for debugging ATM, does anyone
 have any suggestions where I could look next?

 How about looking in the direction of a simple test scenario which 
 demonstrates
 what you are reporting?

I am can reproduce this repeatedly - I'll try to isolate the cause and
post a test case.

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



Re: handle_threadlist_exception: handle_threadlist_exception called with threadlist_ix -1

2005-10-06 Thread Christopher Faylor
On Thu, Oct 06, 2005 at 04:18:39PM +0300, Pavel Tsekov wrote:
On Thu, 1 Sep 2005, Christopher Faylor wrote:

 On Thu, Sep 01, 2005 at 03:25:17PM +0100, Dave Korn wrote:
 
   Anyone else seeing quite a lot of these with current cvs HEAD?  Often when
 pressing Ctrl-C, sometimes when things exit for other (signal-related?)
 reasons?
 
   I think this error indicates that a signal has been received but either
 find_tls hasn't yet been called, or something has overwritten the threadlist
 index.  There's a lot that goes on at startup/fork time, though, and I'm not
 deeply familiar with it.  Since I'm set up for debugging ATM, does anyone
 have any suggestions where I could look next?

 How about looking in the direction of a simple test scenario which 
 demonstrates
 what you are reporting?

I am can reproduce this repeatedly - I'll try to isolate the cause and
post a test case.

Did you happen to notice when the age of the message to which you're
responding?  Dave figured out the problem subsequent to sending the
above.  It was due to some object files not getting rebuilt after a
change to cygtls.h.

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



Re: handle_threadlist_exception: handle_threadlist_exception called with threadlist_ix -1

2005-10-06 Thread Pavel Tsekov

On Thu, 6 Oct 2005, Christopher Faylor wrote:

 Did you happen to notice when the age of the message to which you're
 responding?  Dave figured out the problem subsequent to sending the
 above.  It was due to some object files not getting rebuilt after a
 change to cygtls.h.

Yes. When I saw the error message I remembered that I've seen it on
the mailing list already. So, I used the search engine to find on
which date the message was posted and replied ot that post.

It might be a different problem but the message is the same. And this is
with a clean build of the dll from yesterday and a clean build of the
software from today. Here is a backtrace (just for the record):

(gdb) bt
#0  sigismember (set=0x162f090, sig=31) at
../../../../src/winsup/cygwin/signal.cc:429
#1  0x61017710 in sigpacket::process (this=0x6113b3ec) at
../../../../src/winsup/cygwin/exceptions.cc:1072
#2  0x61092b18 in wait_sig () at
../../../../src/winsup/cygwin/sigproc.cc:1128
#3  0x610033ef in cygthread::stub (arg=0xa2eff0) at
../../../../src/winsup/cygwin/cygthread.cc:73
#4  0x0f94 in ?? ()
#5  0x in ?? () from

In frame 0 'set' is pointing at invalid memory - I am trying to determine
why.


--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



Re: handle_threadlist_exception: handle_threadlist_exception called with threadlist_ix -1

2005-10-06 Thread Christopher Faylor
On Thu, Oct 06, 2005 at 05:39:35PM +0300, Pavel Tsekov wrote:
On Thu, 6 Oct 2005, Christopher Faylor wrote:

 Did you happen to notice when the age of the message to which you're
 responding?  Dave figured out the problem subsequent to sending the
 above.  It was due to some object files not getting rebuilt after a
 change to cygtls.h.

Yes. When I saw the error message I remembered that I've seen it on
the mailing list already. So, I used the search engine to find on
which date the message was posted and replied ot that post.

It might be a different problem but the message is the same.

It *is* a different problem.

And this is with a clean build of the dll from yesterday and a clean
build of the software from today.  Here is a backtrace (just for the
record):

(gdb) bt
#0  sigismember (set=0x162f090, sig=31) at
../../../../src/winsup/cygwin/signal.cc:429
#1  0x61017710 in sigpacket::process (this=0x6113b3ec) at
../../../../src/winsup/cygwin/exceptions.cc:1072
#2  0x61092b18 in wait_sig () at
../../../../src/winsup/cygwin/sigproc.cc:1128
#3  0x610033ef in cygthread::stub (arg=0xa2eff0) at
../../../../src/winsup/cygwin/cygthread.cc:73
#4  0x0f94 in ?? ()
#5  0x in ?? () from

In frame 0 'set' is pointing at invalid memory - I am trying to determine
why.

Some thread is sending a signal 31 (SIGUSR1).  Which thread is doing this?

cgf

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



Re: handle_threadlist_exception: handle_threadlist_exception called with threadlist_ix -1

2005-10-06 Thread Pavel Tsekov
On Thu, 6 Oct 2005, Christopher Faylor wrote:

 It might be a different problem but the message is the same.

 It *is* a different problem.

Ok.

 Some thread is sending a signal 31 (SIGUSR1).  Which thread is doing this?

An application thread signaling another thread to stop its execution. I am
on it - I'll report back if I manage to find something.

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



handle_threadlist_exception: handle_threadlist_exception called with threadlist_ix -1

2005-09-01 Thread Dave Korn


  Anyone else seeing quite a lot of these with current cvs HEAD?  Often when
pressing Ctrl-C, sometimes when things exit for other (signal-related?)
reasons?

  I think this error indicates that a signal has been received but either
find_tls hasn't yet been called, or something has overwritten the threadlist
index.  There's a lot that goes on at startup/fork time, though, and I'm not
deeply familiar with it.  Since I'm set up for debugging ATM, does anyone
have any suggestions where I could look next?


cheers, 
  DaveK
-- 
Can't think of a witty .sigline today


--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



Re: handle_threadlist_exception: handle_threadlist_exception called with threadlist_ix -1

2005-09-01 Thread Christopher Faylor
On Thu, Sep 01, 2005 at 03:25:17PM +0100, Dave Korn wrote:

  Anyone else seeing quite a lot of these with current cvs HEAD?  Often when
pressing Ctrl-C, sometimes when things exit for other (signal-related?)
reasons?

  I think this error indicates that a signal has been received but either
find_tls hasn't yet been called, or something has overwritten the threadlist
index.  There's a lot that goes on at startup/fork time, though, and I'm not
deeply familiar with it.  Since I'm set up for debugging ATM, does anyone
have any suggestions where I could look next?

How about looking in the direction of a simple test scenario which demonstrates
what you are reporting?

cgf

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



RE: handle_threadlist_exception: handle_threadlist_exception called with threadlist_ix -1

2005-09-01 Thread Dave Korn
Original Message
From: Christopher Faylor
Sent: 01 September 2005 15:44

 On Thu, Sep 01, 2005 at 03:25:17PM +0100, Dave Korn wrote:
 
  Anyone else seeing quite a lot of these with current cvs HEAD?  Often
 when pressing Ctrl-C, sometimes when things exit for other
 (signal-related?) reasons? 
 
  I think this error indicates that a signal has been received but either
 find_tls hasn't yet been called, or something has overwritten the
 threadlist index.  There's a lot that goes on at startup/fork time,
 though, and I'm not deeply familiar with it.  Since I'm set up for
 debugging ATM, does anyone have any suggestions where I could look next?
 
 How about looking in the direction of a simple test scenario which
 demonstrates what you are reporting?
 
 cgf


  Well, run programs and sometimes it happens when you press Ctrl-C isn't
exactly reproducible, so I was trying to find out what the message _means_
so that I could try and make a few guesses at how to trip whatever condition
it indicates so that I might have a chance of being able to make a testcase.


cheers,
  DaveK
-- 
Can't think of a witty .sigline today


--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/