On Wed, May 09, 2018 at 11:01:58AM +0200, Alexander Bluhm wrote: > Hi, > > While running my nightly regression tests, I compiled > /ports/misc/posixtestsuite. It was the first time that I was running > regress while having some other load on the machine. During > regress/lib/libc/ieeefp/except the machine hang. It has 2 CPUs. >
Based on the discussion below, it sounds like the same bug mpi and I noticed a few weeks ago in nantes. A cpu gets stuck with interrupts disabled and a shootdown can't happen because the IPI isn't being received by that CPU. You might want to apply mpi's changes to see if it spins out waiting for the lock, and where. The output of show all locks might be useful also. -ml > The final output of the test: > > ===> ieeefp/except > cc -O2 -pipe -MD -MP -c /usr/src/regress/lib/libc/ieeefp/except/except.c > cc -o except except.o > ./except fltdiv > > This kernel was running: > > OpenBSD 6.3-current (GENERIC.MP) #592: Mon May 7 10:07:12 MDT 2018 > dera...@i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC.MP > > I could break into ddb: > > Stopped at db_enter+0x4: popl %ebp > ddb{0}> trace > db_enter() at db_enter+0x4 > comintr(d577d000) at comintr+0x21e > intr_handler(f58be8e4,d577c840) at intr_handler+0x30 > Xintr_ioapic3_untramp() at Xintr_ioapic3_untramp+0xd7 > --- interrupt --- > pmap_tlb_shootwait() at pmap_tlb_shootwait+0x12 > pmap_do_remove_pae(d0d33ce0,f55f2000,f55f3000,0) at pmap_do_remove_pae+0x2ac > pmap_remove(d0d33ce0,f55f2000,f55f3000) at pmap_remove+0x18 > uvm_unmap_kill_entry(d0d2d2b4,d4c810dc) at uvm_unmap_kill_entry+0xde > uvm_unmap_remove(d0d2d2b4,f55f2000,f55f3000,f58bea00,0,1) at > uvm_unmap_remove+0 > x194 > sys_kbind(d435dcf0,f58bea80,f58bea78) at sys_kbind+0x295 > syscall() at syscall+0x25e > --- syscall (number -813868376) --- > end of kernel > 0x7d6558e8: > > CPU 0 is running clang, CPU 1 is running the except test script. > > ddb{0}> ps > PID TID PPID UID S FLAGS WAIT COMMAND > 92284 394442 70506 0 7 0x2 except > *47266 113041 37786 55 7 0x2 cc > 37786 281652 35994 55 3 0x10008a pause sh > 70506 372899 71391 0 3 0x10008a pause make > 71391 488915 75345 0 3 0x10008a pause sh > 75345 253329 29923 0 3 0x10008a pause make > 29923 89609 68217 0 3 0x10008a pause sh > 68217 294846 81420 0 3 0x10008a pause make > 51311 445816 20823 0 2 0x491 perl > 81420 149032 81906 0 3 0x10008a pause sh > 81906 389989 44981 0 3 0x10008a pause make > 24237 35914 94782 0 3 0x100082 piperd gzip > 94782 375463 44981 0 3 0x100082 piperd pax > 44981 114211 25893 0 3 0x82 piperd perl > 25893 239558 5387 0 3 0x10008a pause ksh > 5387 100109 39691 0 3 0x92 select sshd > 65456 428886 57598 0 3 0x100083 kqread tail > 57598 364467 56435 0 3 0x10008b pause ksh > 39040 394741 84200 55 2 0x482 perl > 84200 57590 22769 55 3 0x10008a pause sh > 22769 388112 71080 55 3 0x10008a pause make > 71080 289240 55503 55 3 0x10008a pause sh > 55503 177103 20823 55 3 0x10008a pause make > 20823 473630 90353 0 3 0x93 wait perl > 35994 500360 35455 55 3 0x82 piperd gmake > 35455 82895 18413 55 3 0x10008a pause make > 18413 9872 9766 55 3 0x10008a pause sh > 9766 29157 60819 55 3 0x10008a pause make > 60819 198028 51400 55 3 0x10008a pause sh > 51400 455284 1 55 3 0x10008a pause make > 90353 444304 56435 0 3 0x10008b pause ksh > 56435 213296 1 0 2 0x100480 tmux > 12943 273120 79318 0 3 0x100083 kqread tmux > 79318 90427 49332 0 3 0x10008b pause ksh > 49332 480938 39691 0 3 0x92 select sshd > 79215 221858 1 0 2 0x100083 getty > 5182 91398 1 0 3 0x100083 ttyin getty > 68061 353121 1 0 3 0x100083 ttyin getty > 61973 471346 1 0 3 0x100083 ttyin getty > 58677 314567 1 0 3 0x100083 ttyin getty > 26310 59684 1 0 3 0x100083 ttyin getty > 77772 266793 1 0 2 0x100498 cron > 69017 469788 1 99 3 0x100090 poll sndiod > 67250 378711 1 110 3 0x100090 poll sndiod > 7419 486904 35256 95 3 0x100092 kqread smtpd > 87223 110989 35256 103 3 0x100092 kqread smtpd > 22973 257799 35256 95 3 0x100092 kqread smtpd > 22893 197212 35256 95 3 0x100092 kqread smtpd > 55776 302222 35256 95 3 0x100092 kqread smtpd > 67856 519997 35256 95 3 0x100092 kqread smtpd > 35256 194026 1 0 3 0x100080 kqread smtpd > 39691 482995 1 0 3 0x80 select sshd > 91848 227431 0 0 2 0x14600 acct > 57929 439430 0 0 3 0x14280 nfsidl nfsio > 22984 278690 0 0 3 0x14280 nfsidl nfsio > 68247 280175 0 0 3 0x14280 nfsidl nfsio > 84145 68638 0 0 3 0x14280 nfsidl nfsio > 64212 518189 1 0 3 0x100080 poll ntpd > 55093 242273 10888 83 3 0x100092 poll ntpd > 10888 104846 1 83 2 0x100492 ntpd > 49017 6336 81641 74 2 0x100492 pflogd > 81641 88482 1 0 3 0x80 netio pflogd > 83731 475689 54758 73 2 0x100490 syslogd > 54758 368953 1 0 3 0x100082 netio syslogd > 43387 397146 1 77 3 0x100090 poll dhclient > 32556 319307 1 0 3 0x80 poll dhclient > 52892 503177 8869 115 3 0x100092 kqread slaacd > 12458 116305 8869 115 3 0x100092 kqread slaacd > 8869 503520 1 0 3 0x80 kqread slaacd > 71440 285881 0 0 3 0x14200 bored radeon-crtc > 6819 122004 0 0 3 0x14200 bored ttm_swap > 95559 405938 0 0 2 0x14200 zerothread > 19820 329414 0 0 3 0x14200 aiodoned aiodoned > 50141 351707 0 0 2 0x14200 update > 82368 523939 0 0 3 0x14200 cleaner cleaner > 45425 519091 0 0 3 0x14200 reaper reaper > 42430 481629 0 0 3 0x14200 pgdaemon pagedaemon > 4558 298373 0 0 3 0x14200 bored crynlk > 87050 360681 0 0 3 0x14200 bored crypto > 41228 194479 0 0 3 0x14200 usbtsk usbtask > 67492 97981 0 0 3 0x14200 usbatsk usbatsk > 98064 318614 0 0 2 0x14200 sensors > 49963 220000 0 0 3 0x40014200 acpi0 acpi0 > 66040 163919 0 0 3 0x40014200 idle1 > 36632 386505 0 0 2 0x14200 softnet > 73380 151363 0 0 2 0x14200 systqmp > 28485 26161 0 0 2 0x14200 systq > 77840 491220 0 0 2 0x40014200 softclock > 383 63150 0 0 3 0x40014200 idle0 > 56474 514734 0 0 3 0x14200 kmalloc kmthread > 1 392229 0 0 3 0x82 wait init > 0 0 -1 0 2 0x10200 swapper > > ddb{0}> show register > ds 0x10 > es 0x10 > fs 0x20 > gs 0 > edi 0xd577d000 end+0x4980000 > esi 0xd57810b0 end+0x49840b0 > ebp 0xf58be894 > ebx 0xd04af8f9 i386_bus_space_io_read_multi_4+0x19 > edx 0x3f8 > ecx 0x8000000 __kernel_end_phys+0x7203000 > eax 0xd04af800 ami_refresh_sensors+0xe0 > eip 0xd02c7054 db_enter+0x4 > cs 0x50 > eflags 0x202 > esp 0xf58be894 > ss 0x10 > db_enter+0x4: popl %ebp > > When I tried to examine the other CPU, ddb locked up. > > ddb{0}> machine ddbcpu 1 > > bluhm >