On Wed, Jul 2, 2025 at 2:38 AM K R <[email protected]> wrote:
>
> On Tue, Jul 1, 2025 at 3:08 PM Martin Pieuchot <[email protected]> wrote:
> >
> > On 01/07/25(Tue) 14:29, K R wrote:
> > > On Tue, Jul 1, 2025 at 9:07 AM Claudio Jeker <[email protected]>
> > > wrote:
> > > >
> > > > On Tue, Jul 01, 2025 at 04:57:14AM -0300, K R wrote:
> > > > > On Mon, Jun 30, 2025 at 2:39 AM K R <[email protected]> wrote:
> > > > > >
> > > > > > On Fri, Jun 27, 2025 at 3:34 AM Martin Pieuchot
> > > > > > <[email protected]> wrote:
> > > > > > >
> > > > > > > On 26/06/25(Thu) 11:02, K R wrote:
> > > > > > > > On Wed, Jun 25, 2025 at 1:30 PM K R <[email protected]>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > [...]
> > > > > > > > > > Hi Alexander,
> > > > > > > > > >
> > > > > > > > > > The good news: I can consistently reproduce the hang
> > > > > > > > > > problem. But the
> > > > > > > > > > bad news is that even with a WITNESS kernel and
> > > > > > > > > > kern.witness.watch=2
> > > > > > > > > > (or even 3) I don't see any message or kernel panic.
> > > > > > >
> > > > > > > Do you mind sharing your recipe to reproduce the hang?
> > > > >
> > > > > Another one. This time I've included output from 'show witness /b',
> > > > > below:
> > > > >
> > > > > pmap_tlb_shootwait: spun outStopped at db_enter+0x14: popq %rbp
> > > >
> > > > I think the number one question here is why did pmap_tlb_shootwait spin
> > > > out?
> > > > This is waiting for other CPUs to finish their work.
> > > > So first of all what value has tlb_shoot_wait (x /qx tlb_shoot_wait)
> > > > The probably also the values of tlb_shoot_first_pcid, tlb_shoot_addr1
> > > > and
> > > > tlb_shoot_addr2 are of interest. (use x /qx for the addrs and x /x for
> > > > first_pcid).
> > >
> > > Hi Claudio, please see below the output from the commands you
> > > requested. If you need something else, please let me know.
> >
> > Thanks, could you try the diff below and see if it helps?
>
> Hi Martin, thanks! I've just applied your patch, built a new kernel
> and now the machine is running the stress test. I'll keep you posted.
Another crash. Please see below:
pmap_tlb_shootwait: spun outStopped at db_enter+0x14: popq %rbp
ddb{1}> trace
db_enter() at db_enter+0x14
pmap_page_remove(fffffd81618e56f8) at pmap_page_remove+0x38e
uvm_anfree_list(fffffd9c9f929cb0,0) at uvm_anfree_list+0x63
amap_wipeout(fffffd9c9273d6f0) at amap_wipeout+0x198
uvm_unmap_detach(ffff800055e660f0,0) at uvm_unmap_detach+0xbe
sys_munmap(ffff800055db9750,ffff800055e661f0,ffff800055e66170) at sys_munmap+0x
185
syscall(ffff800055e661f0) at syscall+0x5f9
Xsyscall() at Xsyscall+0x128
end of kernel
end trace frame: 0x25ba4903ca0, count: -8
ddb{1}> show uvm
Current UVM status:
pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12
32128633 VM pages: 51836 active, 176494 inactive, 1 wired, 12995723 free (296
7136 zero)
freemin=1070954, free-target=1427938, inactive-target=0, wired-max=10709544
faults=531662531, traps=528617067, intrs=1230847, ctxswitch=48166333 fpuswitc
h=0
softint=313110, syscalls=19991158, kmapent=11
fault counts:
noram=0, noanon=0, noamap=0, pgwait=0, pgrele=0
relocks=288463(4638), upgrades=0(0) anget(retries)=285483(0), amapcopy=3623
6106
neighbor anon/obj pg=36882/128185, gets(lock/unlock)=351322/293131
cases: anon=269766, anoncow=15717, obj=336363, prcopy=10291, przero=5310293
80
daemon and swap counts:
woke=0, revs=0, scans=0, obscans=0, anscans=0
busy=0, freed=0, reactivate=0, deactivate=0
pageouts=0, pending=0, nswget=0
nswapdev=1
swpages=16777883, swpginuse=0, swpgonly=0 paging=0
kernel pointers:
objs(kern)=0xffffffff82ba5810
ddb{1}> show all locks
Process 9957 (nfdump) thread 0xffff800055ded9e8 (463186)
exclusive rwlock amaplk r = 0 (0xfffffd8bd97d62a0)
exclusive rwlock vmmaplk r = 0 (0xfffffd9ca4496c68)
Process 10322 (nfdump) thread 0xffff800055db8a80 (106827)
exclusive rwlock amaplk r = 0 (0xfffffd8bac1cf148)
shared rwlock vmmaplk r = 0 (0xfffffd9ca82444c8)
Process 10322 (nfdump) thread 0xffff800055dec048 (377455)
exclusive kernel_lock &kernel_lock r = 0 (0xffffffff82ba7508)
exclusive rrwlock inode r = 0 (0xfffffd9cf3193e60)
Process 61123 (nfdump) thread 0xffff800055db9750 (48955)
exclusive rwlock amaplk r = 0 (0xfffffd8bd97d6450)
Process 61123 (nfdump) thread 0xffff800055dec2d8 (397827)
exclusive rwlock amaplk r = 0 (0xfffffd8bac1cff58)
shared rwlock vmmaplk r = 0 (0xfffffd9ca12a5120)
ddb{1}> show witness /b
Number of known direct relationships is 406
Lock order reversal between "&mp->mnt_lock"(rwlock) and "&ip->i_lock"(rrwlock)!
witness: incomplete path, depth 4
ddb{1}> ps
PID TID PPID UID S FLAGS WAIT COMMAND
81994 445896 78898 0 3 0x100083 kqread top
9957 160463 18893 0 7 0x3 nfdump
9957 395677 18893 0 3 0x4000083 fsleep nfdump
9957 204084 18893 0 3 0x4000003 vmmaplk nfdump
9957 56145 18893 0 3 0x4000083 fsleep nfdump
9957 318603 18893 0 3 0x4000083 fsleep nfdump
9957 254799 18893 0 3 0x4000083 fsleep nfdump
9957 8915 18893 0 3 0x4000083 fsleep nfdump
9957 463186 18893 0 7 0x4000003 nfdump
10322 161786 61442 0 7 0x3 nfdump
10322 238857 61442 0 3 0x4000083 fsleep nfdump
10322 106827 61442 0 7 0x4000003 nfdump
10322 301683 61442 0 3 0x4000083 fsleep nfdump
10322 504174 61442 0 3 0x4000083 fsleep nfdump
10322 3469 61442 0 3 0x4000083 fsleep nfdump
10322 374393 61442 0 3 0x4000083 fsleep nfdump
10322 377455 61442 0 7 0x4000003 nfdump
61123 226290 28815 0 3 0x83 fsleep nfdump
61123 488365 28815 0 3 0x4000083 fsleep nfdump
*61123 48955 28815 0 7 0x4000003 nfdump
61123 119112 28815 0 3 0x4000083 fsleep nfdump
61123 521289 28815 0 3 0x4000083 fsleep nfdump
61123 217110 28815 0 3 0x4000083 fsleep nfdump
61123 33442 28815 0 3 0x4000083 fsleep nfdump
61123 397827 28815 0 7 0x4000003 nfdump
18893 493617 1 0 3 0x10008b sigsusp sh
61442 378246 1 0 3 0x10008b sigsusp sh
28815 86747 1 0 3 0x10008b sigsusp sh
78898 518168 73998 0 3 0x10008b sigsusp ksh
73998 78599 1 0 3 0x100080 kqread tmux
88247 481547 1 0 3 0x100083 ttyin getty
2851 350752 1 0 3 0x100083 ttyin getty
95636 97892 1 0 3 0x100083 ttyin getty
99165 336734 1 0 3 0x100083 ttyin getty
94509 344932 1 0 3 0x100083 ttyin getty
27762 274695 1 0 3 0x100083 ttyin ksh
17715 49168 1 0 3 0x100098 kqread cron
69026 332831 1 99 3 0x1100090 kqread sndiod
54694 478450 1 110 3 0x100090 kqread sndiod
5697 443271 97255 95 3 0x1100092 kqread smtpd
46243 99327 97255 103 3 0x1100092 kqread smtpd
50084 224069 97255 95 3 0x1100092 kqread smtpd
86000 123418 97255 95 3 0x100092 kqread smtpd
82343 450981 97255 95 3 0x1100092 kqread smtpd
24636 20127 97255 95 3 0x1100092 kqread smtpd
97255 111651 1 0 3 0x100080 kqread smtpd
12541 37192 1 0 3 0x88 kqread sshd
85367 250626 1 0 3 0x100080 kqread ntpd
28096 515227 14026 83 3 0x100092 kqread ntpd
14026 502374 1 83 3 0x1100092 kqread ntpd
86221 70314 72159 74 3 0x1100092 bpf pflogd
72159 268073 1 0 3 0x80 sbwait pflogd
6108 37249 51475 73 3 0x1100090 kqread syslogd
51475 59727 1 0 3 0x100082 sbwait syslogd
41019 133599 1 0 3 0x100080 kqread resolvd
97128 82772 95469 77 3 0x100092 kqread dhcpleased
53784 299579 95469 77 3 0x100092 kqread dhcpleased
95469 522301 1 0 3 0x80 kqread dhcpleased
40382 114795 49808 115 3 0x100092 kqread slaacd
16896 174980 49808 115 3 0x100092 kqread slaacd
49808 29152 1 0 3 0x100080 kqread slaacd
41856 209048 0 0 3 0x14200 bored smr
32903 448281 0 0 3 0x14200 pgzero zerothread
83463 270619 0 0 3 0x14200 aiodoned aiodoned
98998 117726 0 0 3 0x14200 syncer update
89280 387876 0 0 3 0x14200 cleaner cleaner
96217 481684 0 0 3 0x14200 reaper reaper
57847 323601 0 0 3 0x14200 pgdaemon pagedaemon
44359 303835 0 0 3 0x14200 bored wsdisplay0
88760 328401 0 0 3 0x14200 usbtsk usbtask
33892 238971 0 0 3 0x14200 usbatsk usbatsk
40440 389398 0 0 3 0x40014200 acpi0 acpi0
46014 489943 0 0 7 0x40014200 idle31
95730 109730 0 0 7 0x40014200 idle30
69686 366444 0 0 7 0x40014200 idle29
95754 517295 0 0 7 0x40014200 idle28
69388 237366 0 0 7 0x40014200 idle27
97978 394450 0 0 7 0x40014200 idle26
36311 201255 0 0 7 0x40014200 idle25
20892 262557 0 0 7 0x40014200 idle24
82220 30818 0 0 7 0x40014200 idle23
14077 250183 0 0 7 0x40014200 idle22
80410 225236 0 0 7 0x40014200 idle21
87632 27892 0 0 7 0x40014200 idle20
74401 120036 0 0 7 0x40014200 idle19
90877 181432 0 0 7 0x40014200 idle18
86406 145349 0 0 7 0x40014200 idle17
19938 148411 0 0 7 0x40014200 idle16
43568 369660 0 0 7 0x40014200 idle15
88265 368457 0 0 7 0x40014200 idle14
39599 71270 0 0 3 0x40014200 idle13
58064 267069 0 0 7 0x40014200 idle12
9278 103115 0 0 7 0x40014200 idle11
35091 353089 0 0 3 0x40014200 idle10
65872 256306 0 0 7 0x40014200 idle9
1552 514975 0 0 3 0x40014200 idle8
85383 240017 0 0 7 0x40014200 idle7
85335 105423 0 0 3 0x40014200 idle6
75978 17921 0 0 7 0x40014200 idle5
11959 522673 0 0 3 0x40014200 idle4
51461 450362 0 0 7 0x40014200 idle3
47181 512214 0 0 7 0x40014200 idle2
75561 241987 0 0 3 0x40014200 idle1
44257 7285 0 0 3 0x14200 bored sensors
67558 43722 0 0 3 0x14200 bored softnet3
33902 372321 0 0 3 0x14200 bored softnet2
89790 128317 0 0 3 0x14200 bored softnet1
13715 273891 0 0 3 0x14200 bored softnet0
42976 459414 0 0 3 0x14200 bored systqmp
31099 120209 0 0 3 0x14200 bored systq
99963 372420 0 0 3 0x14200 tmoslp softclockmp
3258 343907 0 0 3 0x40014200 tmoslp softclock
36969 310856 0 0 3 0x40014200 idle0
1 511358 0 0 3 0x82 wait init
0 0 -1 0 3 0x10200 scheduler swapper
>
> Thanks again,
> --Kor
>
> >
> > > > [...]
> > > > The else case of the above if block does pmap_tlb_shootwait() but then
> > > > after that a 2nd pmap_tlb_shootwait() is issued. Not sure if this is
> > > > safe
> > > > in all cases. Especially the first pmap_tlb_shootwait() is done with the
> > > > pm_mtx locked.
> >
> > I believe there's an issue with holding the pm_mtx.
> >
> > Index: arch/amd64/amd64/pmap.c
> > ===================================================================
> > RCS file: /cvs/src/sys/arch/amd64/amd64/pmap.c,v
> > diff -u -p -r1.180 pmap.c
> > --- arch/amd64/amd64/pmap.c 19 Jun 2025 12:01:08 -0000 1.180
> > +++ arch/amd64/amd64/pmap.c 1 Jul 2025 18:06:28 -0000
> > @@ -2947,7 +2947,6 @@ enter_now:
> > if (nocache && (opte & PG_N) == 0) /* XXX impossible? */
> > wbinvd_on_all_cpus();
> > pmap_tlb_shootpage(pmap, va, shootself);
> > - pmap_tlb_shootwait();
> > PTE_BASE[pl1_i(va)] = npte;
> > }
> >
> >
> >
> >