Hi Alan,

On Wed, Mar 18, 2015 at 01:11:32PM +0000, Alan Fitton wrote:
> Basically the signal_queue isn't being updated with a reference to SIGTTOU,
> because signal_state[SIGTTOU].count is > 0. I guess there's an assumption in
> the code that if any given signal already has events counted up in
> signal_state, then it must have updated signal_queue so they will get
> processed soon.

This is indeed what the code does :

        if (!signal_state[sig].count) {
                /* signal was not queued yet */
                if (signal_queue_len < MAX_SIGNAL)
                        signal_queue[signal_queue_len++] = sig;
                else
                        qfprintf(stderr, "Signal %d : signal queue is 
unexpectedly full.\n", sig);
        }

        signal_state[sig].count++;

So there's theorically no way to have a non-zero count value
with a zero signal_queue_len, unless one of these gets corrupted
at some point.

Also, __signal_process_queue() seems to properly count these :

        for (cur_pos = 0; cur_pos < signal_queue_len; cur_pos++) {
                sig  = signal_queue[cur_pos];
                desc = &signal_state[sig];
                if (desc->count) {
                        struct sig_handler *sh, *shb;
                        list_for_each_entry_safe(sh, shb, &desc->handlers, 
list) {
                                if ((sh->flags & SIG_F_TYPE_FCT) && sh->handler)
                                        ((void (*)(struct sig_handler 
*))sh->handler)(sh);
                                else if ((sh->flags & SIG_F_TYPE_TASK) && 
sh->handler)
                                        task_wakeup(sh->handler, sh->arg | 
TASK_WOKEN_SIGNAL);
                        }
                        desc->count = 0;
                }
        }
        signal_queue_len = 0;

> But from what I see below, this doesn't seem to be the case
> always, and then all events of a particular signal can end up getting "lost".
> I think there is some timing or logic issue here.
> 
> (22 = SIGTTOU)
> 
> /* Break on SIGTTOU. There are 805 events in the
> Program received signal SIGTTOU, Stopped (tty output).
> 0x00002b369ab6a373 in __epoll_wait_nocancel () from /lib64/libc.so.6
> (gdb) print signal_state[22]
> $16 = {count = 805, handlers = {n = 0xe1efa80, p = 0xe1efa80}}
> (gdb) print signal_queue_len
> $17 = 0

That clearly demonstrates a bug! Well, thinking about it now, there would
be a possibility : if the signal is delivered while we're in
__signal_process_queue(), what you observe could indeed happen, because
we'd miss the desc->count and clear signal_queue_len afterwards.

Could you please try to instrument this function to confirm if the issue
is there ? If so we need to use a different set of variables to process
this and protect the loop.

I'll try to do something about it. I've already got a report of a reload
not working once in a while but had no info around it so I attributed it
to a PEBKAC-style issue. If you could share a reproducer, it would really
help. Given your sig count, I guess you send signals in loops ?

Thanks,
Willy


Reply via email to