On Wed, May 09, 2018 at 11:01:58AM +0200, Alexander Bluhm wrote:
> Hi,
> 
> While running my nightly regression tests, I compiled
> /ports/misc/posixtestsuite.  It was the first time that I was running
> regress while having some other load on the machine.  During
> regress/lib/libc/ieeefp/except the machine hang.  It has 2 CPUs.
> 

Based on the discussion below, it sounds like the same bug mpi and I noticed
a few weeks ago in nantes. A cpu gets stuck with interrupts disabled and a
shootdown can't happen because the IPI isn't being received by that CPU.

You might want to apply mpi's changes to see if it spins out waiting for the
lock, and where. The output of show all locks might be useful also.

-ml

> The final output of the test:
> 
> ===> ieeefp/except
> cc -O2 -pipe   -MD -MP  -c /usr/src/regress/lib/libc/ieeefp/except/except.c
> cc   -o except except.o 
> ./except fltdiv
> 
> This kernel was running:
> 
> OpenBSD 6.3-current (GENERIC.MP) #592: Mon May  7 10:07:12 MDT 2018
>     dera...@i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC.MP
> 
> I could break into ddb:
> 
> Stopped at      db_enter+0x4:   popl    %ebp
> ddb{0}> trace
> db_enter() at db_enter+0x4
> comintr(d577d000) at comintr+0x21e
> intr_handler(f58be8e4,d577c840) at intr_handler+0x30
> Xintr_ioapic3_untramp() at Xintr_ioapic3_untramp+0xd7
> --- interrupt ---
> pmap_tlb_shootwait() at pmap_tlb_shootwait+0x12
> pmap_do_remove_pae(d0d33ce0,f55f2000,f55f3000,0) at pmap_do_remove_pae+0x2ac
> pmap_remove(d0d33ce0,f55f2000,f55f3000) at pmap_remove+0x18
> uvm_unmap_kill_entry(d0d2d2b4,d4c810dc) at uvm_unmap_kill_entry+0xde
> uvm_unmap_remove(d0d2d2b4,f55f2000,f55f3000,f58bea00,0,1) at 
> uvm_unmap_remove+0
> x194
> sys_kbind(d435dcf0,f58bea80,f58bea78) at sys_kbind+0x295
> syscall() at syscall+0x25e
> --- syscall (number -813868376) ---
> end of kernel
> 0x7d6558e8:
> 
> CPU 0 is running clang, CPU 1 is running the except test script.
> 
> ddb{0}> ps
>    PID     TID   PPID    UID  S       FLAGS  WAIT          COMMAND
>  92284  394442  70506      0  7         0x2                except
> *47266  113041  37786     55  7         0x2                cc
>  37786  281652  35994     55  3    0x10008a  pause         sh
>  70506  372899  71391      0  3    0x10008a  pause         make
>  71391  488915  75345      0  3    0x10008a  pause         sh
>  75345  253329  29923      0  3    0x10008a  pause         make
>  29923   89609  68217      0  3    0x10008a  pause         sh
>  68217  294846  81420      0  3    0x10008a  pause         make
>  51311  445816  20823      0  2       0x491                perl
>  81420  149032  81906      0  3    0x10008a  pause         sh
>  81906  389989  44981      0  3    0x10008a  pause         make
>  24237   35914  94782      0  3    0x100082  piperd        gzip
>  94782  375463  44981      0  3    0x100082  piperd        pax
>  44981  114211  25893      0  3        0x82  piperd        perl
>  25893  239558   5387      0  3    0x10008a  pause         ksh
>   5387  100109  39691      0  3        0x92  select        sshd
>  65456  428886  57598      0  3    0x100083  kqread        tail
>  57598  364467  56435      0  3    0x10008b  pause         ksh
>  39040  394741  84200     55  2       0x482                perl
>  84200   57590  22769     55  3    0x10008a  pause         sh
>  22769  388112  71080     55  3    0x10008a  pause         make
>  71080  289240  55503     55  3    0x10008a  pause         sh
>  55503  177103  20823     55  3    0x10008a  pause         make
>  20823  473630  90353      0  3        0x93  wait          perl
>  35994  500360  35455     55  3        0x82  piperd        gmake
>  35455   82895  18413     55  3    0x10008a  pause         make
>  18413    9872   9766     55  3    0x10008a  pause         sh
>   9766   29157  60819     55  3    0x10008a  pause         make
>  60819  198028  51400     55  3    0x10008a  pause         sh
>  51400  455284      1     55  3    0x10008a  pause         make
>  90353  444304  56435      0  3    0x10008b  pause         ksh
>  56435  213296      1      0  2    0x100480                tmux
>  12943  273120  79318      0  3    0x100083  kqread        tmux
>  79318   90427  49332      0  3    0x10008b  pause         ksh
>  49332  480938  39691      0  3        0x92  select        sshd
>  79215  221858      1      0  2    0x100083                getty
>   5182   91398      1      0  3    0x100083  ttyin         getty
>  68061  353121      1      0  3    0x100083  ttyin         getty
>  61973  471346      1      0  3    0x100083  ttyin         getty
>  58677  314567      1      0  3    0x100083  ttyin         getty
>  26310   59684      1      0  3    0x100083  ttyin         getty
>  77772  266793      1      0  2    0x100498                cron
>  69017  469788      1     99  3    0x100090  poll          sndiod
>  67250  378711      1    110  3    0x100090  poll          sndiod
>   7419  486904  35256     95  3    0x100092  kqread        smtpd
>  87223  110989  35256    103  3    0x100092  kqread        smtpd
>  22973  257799  35256     95  3    0x100092  kqread        smtpd
>  22893  197212  35256     95  3    0x100092  kqread        smtpd
>  55776  302222  35256     95  3    0x100092  kqread        smtpd
>  67856  519997  35256     95  3    0x100092  kqread        smtpd
>  35256  194026      1      0  3    0x100080  kqread        smtpd
>  39691  482995      1      0  3        0x80  select        sshd
>  91848  227431      0      0  2     0x14600                acct
>  57929  439430      0      0  3     0x14280  nfsidl        nfsio
>  22984  278690      0      0  3     0x14280  nfsidl        nfsio
>  68247  280175      0      0  3     0x14280  nfsidl        nfsio
>  84145   68638      0      0  3     0x14280  nfsidl        nfsio
>  64212  518189      1      0  3    0x100080  poll          ntpd
>  55093  242273  10888     83  3    0x100092  poll          ntpd
>  10888  104846      1     83  2    0x100492                ntpd
>  49017    6336  81641     74  2    0x100492                pflogd
>  81641   88482      1      0  3        0x80  netio         pflogd
>  83731  475689  54758     73  2    0x100490                syslogd
>  54758  368953      1      0  3    0x100082  netio         syslogd
>  43387  397146      1     77  3    0x100090  poll          dhclient
>  32556  319307      1      0  3        0x80  poll          dhclient
>  52892  503177   8869    115  3    0x100092  kqread        slaacd
>  12458  116305   8869    115  3    0x100092  kqread        slaacd
>   8869  503520      1      0  3        0x80  kqread        slaacd
>  71440  285881      0      0  3     0x14200  bored         radeon-crtc
>   6819  122004      0      0  3     0x14200  bored         ttm_swap
>  95559  405938      0      0  2     0x14200                zerothread
>  19820  329414      0      0  3     0x14200  aiodoned      aiodoned
>  50141  351707      0      0  2     0x14200                update
>  82368  523939      0      0  3     0x14200  cleaner       cleaner
>  45425  519091      0      0  3     0x14200  reaper        reaper
>  42430  481629      0      0  3     0x14200  pgdaemon      pagedaemon
>   4558  298373      0      0  3     0x14200  bored         crynlk
>  87050  360681      0      0  3     0x14200  bored         crypto
>  41228  194479      0      0  3     0x14200  usbtsk        usbtask
>  67492   97981      0      0  3     0x14200  usbatsk       usbatsk
>  98064  318614      0      0  2     0x14200                sensors
>  49963  220000      0      0  3  0x40014200  acpi0         acpi0
>  66040  163919      0      0  3  0x40014200                idle1
>  36632  386505      0      0  2     0x14200                softnet
>  73380  151363      0      0  2     0x14200                systqmp
>  28485   26161      0      0  2     0x14200                systq
>  77840  491220      0      0  2  0x40014200                softclock
>    383   63150      0      0  3  0x40014200                idle0
>  56474  514734      0      0  3     0x14200  kmalloc       kmthread
>      1  392229      0      0  3        0x82  wait          init
>      0       0     -1      0  2     0x10200                swapper
> 
> ddb{0}> show register
> ds                  0x10
> es                  0x10
> fs                  0x20
> gs                     0
> edi           0xd577d000        end+0x4980000
> esi           0xd57810b0        end+0x49840b0
> ebp           0xf58be894
> ebx           0xd04af8f9        i386_bus_space_io_read_multi_4+0x19
> edx                0x3f8
> ecx            0x8000000        __kernel_end_phys+0x7203000
> eax           0xd04af800        ami_refresh_sensors+0xe0
> eip           0xd02c7054        db_enter+0x4
> cs                  0x50
> eflags             0x202
> esp           0xf58be894
> ss                  0x10
> db_enter+0x4:   popl    %ebp
> 
> When I tried to examine the other CPU, ddb locked up.
> 
> ddb{0}> machine ddbcpu 1
> 
> bluhm
> 

Reply via email to