Hi,

We have encountered a problem where the system hangs.  We are running a 4.7
SMP kernel using kernel polling on a Dual Xeon with hyperthreading enabled
(essentially a 4 processor system).  As a result, the only HW interrupts in
the system are hardclock (8254), the rtc, serial console and scsi.  The
synchronous interrupts are (8254 and rtc).  When the system is hung, I have
found that the ipending and iactive bits for the 8254 and rtc are set
(meaning the interrupt is pending and active) although giant lock is not
held and all processors are idle (and halted).  This lead me to believe that
somehow the ipending bit was set "just before" the last interrupt returned.
The only way the system would be able to run that interrupt again is if
another interrupt would run and it would notice that ipending is set, and it
would run (an interrupt delay would be seen).  In a non-polling system, I
imagine the ethernet interrupts would wake it up.  I believe I found a
potential hole where this could happen.

In i386/isa/ipl.s:

#ifdef SMP
        cli                             /* early to prevent INT deadlock */
doreti_next2:
#endif
        movl    %eax,%ecx
        notl    %ecx                    /* set bit = unmasked level */
#ifndef SMP
        cli
#endif
        andl    _ipending,%ecx          /* set bit = unmasked pending INT */
        jne     doreti_unpend
        movl    %eax,_cpl

I'm concerned in the instance the ipending is checked and deemed to be not
set, but just after another interrupt occurs causing ipending to be set.
Because CPL is not yet unmasked, that interrupt is not forwarded.  In
Particular, in i386/isa/apic_vector.s:

3: ;                    /* other cpu has isr lock */                    \
        APIC_ITRACE(apic_itrace_noisrlock, irq_num, APIC_ITRACE_NOISRLOCK)
;\
        lock ;                                                          \
        orl     $IRQ_BIT(irq_num), _ipending ;                          \
        testl   $IRQ_BIT(irq_num), _cpl ;                               \
        jne     4f ;                            /* this INT masked */   \
        call    forward_irq ;    /* forward irq to lock holder */       \
        POP_FRAME ;                             /* and return */        \
        iret ;                                                          \
        ALIGN_TEXT ;                                                    \

The check for _cpl occurs right after the ipending, thus causing a potential
race for checking/modifying the cpl.

One quick solution that I thought might correct this would be in ipl.s,
right after modifying the cpl, recheck the ipending again to see if it
changed, such as:


#ifdef SMP
        cli                             /* early to prevent INT deadlock */
doreti_next2:
#endif
        movl    %eax,%ecx
        notl    %ecx                    /* set bit = unmasked level */
#ifndef SMP
        cli
#endif
        andl    _ipending,%ecx          /* set bit = unmasked pending INT */
        jne     doreti_unpend
        movl    %eax,_cpl
        andl    _ipending,%ecx          /* set bit = unmasked pending INT */
        jne     doreti_unpend


Any opinions/insight?

thanks.
_______________________________________________
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to