On Wed, Jul 24, 2019 at 10:48:25PM +0200, Alexander Bluhm wrote: > On Wed, Jul 24, 2019 at 08:59:44PM +0200, Alexander Bluhm wrote: > > The reaper on CPU 0 does a NULL dereference when removing the page. > > On CPU 1 zerothread is waiting for kernel lock. CPU 2 and 3 are > > idle. > > > > uvm_fault(0xfffffd8240760cc8, 0x7f827ea48908, 0, 2) -> e > > kernel: page fault trap, code=0 > > Stopped at pmap_page_remove+0x210: xchgq %rax,0(%rcx,%rdx,1) > > Forgot to mention, that was C source line pmap.c:1878 > > opte = pmap_pte_set(&PTE_BASE[pl1_i(pve->pv_va)], 0); > > > I will update kernel and look if panic is reproducable. > > It is reproduceable > > ddb{3}> x/s version > version: OpenBSD 6.5-current (GENERIC.MP) #139: Wed Jul 24 05:11:28 > MDT 2 > 019\012 > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP > \012 > > ddb{3}> show panic > kernel page fault > uvm_fault(0xfffffd823efc7998, 0x7f8444c11f08, 0, 1) -> e > pmap_enter(fffffd823e1ce3f8,889823e1000,5f3c2000,3,22) at pmap_enter+0x1d6 > end trace frame: 0xffff80002210ed30, count: 0 > > Now it happens in pmap.c:2624 > > opte = PTE_BASE[pl1_i(va)]; /* old PTE */ > > Something in PTE_BASE array is not mapped. >
I wrote a quick program to calculate what address this would be (thinking maybe we had some overflow or something) but it does indeed match the faulting address above (0x7f8444c11f08) for the VA 0x889823e1000. This address (0x7f8444c11f08) is in the PTE range, so it looks like it was never allocated or possibly double-freed. Double free matches the previous email's comment as well. If this happens again, it might be interesting to see what pages around that are mapped. For example, for this particular instance, to see if 0x7f8444c10000 is mapped, or 0x7f8444c12000. ddb>'s 'x' command can do that (see if you get another fault or if you get some data). Maybe the data in those pages around it might provide a hint (although that's a longshot). -ml > ddb{3}> trace > pmap_enter(fffffd823e1ce3f8,889823e1000,5f3c2000,3,22) at pmap_enter+0x1d6 > uvm_fault(fffffd823efc7998,889823e1000,0,2) at uvm_fault+0xa2a > pageflttrap() at pageflttrap+0x145 > usertrap(ffff80002210ee20) at usertrap+0x1e3 > recall_trap(6,dfdfdfdfdfdfdfdf,0,6,1000,8890b6fc7c0) at recall_trap+0x8 > end of kernel > end trace frame: 0x888fdfc9330, count: -5 > > Note that at June 11th I reported a similiar trace in pmap to bugs@ > when ld caused a crash. > > ddb{3}> ps > PID TID PPID UID S FLAGS WAIT COMMAND > 76368 342680 5059 0 2 0x2 malloc_duel > 76368 101339 5059 0 7 0x4000002 malloc_duel > 76368 514296 5059 0 3 0x4000082 fsleep malloc_duel > *76368 384915 5059 0 7 0x4000002 malloc_duel > 76368 221830 5059 0 7 0x4000002 malloc_duel > 76368 361827 5059 0 7 0x4000002 malloc_duel > 76368 480274 5059 0 3 0x4000082 fsleep malloc_duel > 76368 468117 5059 0 3 0x4000082 fsleep malloc_duel > 76368 461971 5059 0 3 0x4000082 fsleep malloc_duel > 76368 266728 5059 0 2 0x4000002 malloc_duel > 76368 82327 5059 0 2 0x4000002 malloc_duel > 5059 194815 4702 0 3 0x10008a pause make > 4702 434789 57398 0 3 0x10008a pause sh > 57398 272052 80135 0 3 0x10008a pause make > 80135 83438 74843 0 3 0x10008a pause sh > 74843 269959 24644 0 3 0x10008a pause make > 71213 91038 31378 0 3 0x100082 piperd gzip > 31378 297755 24644 0 3 0x100082 piperd pax > 24644 139228 73204 0 3 0x82 piperd perl > 73204 241400 3907 0 3 0x10008a pause ksh > 3907 427314 77842 0 3 0x92 select sshd > 49732 259852 1 0 3 0x100083 ttyin getty > 58444 180559 1 0 3 0x100083 ttyin getty > 30659 289121 1 0 3 0x100083 ttyin getty > 9656 108850 1 0 3 0x100083 ttyin getty > 24203 10241 1 0 3 0x100083 ttyin getty > 65063 251469 1 0 3 0x100083 ttyin getty > 16142 523320 1 0 3 0x100098 poll cron > 90805 3316 0 0 3 0x14280 nfsidl nfsio > 11202 322177 0 0 3 0x14280 nfsidl nfsio > 73491 331359 0 0 3 0x14280 nfsidl nfsio > 37841 249018 0 0 3 0x14280 nfsidl nfsio > 4136 428500 1 99 3 0x100090 poll sndiod > 12112 519438 1 110 3 0x100090 poll sndiod > 49306 97767 137 95 3 0x100092 kqread smtpd > 70869 189393 137 103 3 0x100092 kqread smtpd > 79867 131344 137 95 3 0x100092 kqread smtpd > 66859 375509 137 95 3 0x100092 kqread smtpd > 22396 48018 137 95 3 0x100092 kqread smtpd > 16604 93317 137 95 3 0x100092 kqread smtpd > 137 452544 1 0 3 0x100080 kqread smtpd > 77842 219221 1 0 3 0x80 select sshd > 88298 318549 0 0 3 0x14200 acct acct > 7436 211089 1 0 3 0x100080 poll ntpd > 15596 214430 72873 83 3 0x100092 poll ntpd > 72873 423080 1 83 3 0x100092 poll ntpd > 639 455748 5843 74 3 0x100092 bpf pflogd > 5843 152563 1 0 3 0x80 netio pflogd > 49089 65344 96782 73 3 0x100090 kqread syslogd > 96782 134250 1 0 3 0x100082 netio syslogd > 15309 57931 1 77 3 0x100090 poll dhclient > 92131 300080 1 0 3 0x80 poll dhclient > 440 434925 45137 115 3 0x100092 kqread slaacd > 23230 157398 45137 115 3 0x100092 kqread slaacd > 45137 283018 1 0 3 0x100080 kqread slaacd > 11751 424885 0 0 3 0x14200 pgzero zerothread > 94669 233757 0 0 3 0x14200 aiodoned aiodoned > 39044 189625 0 0 3 0x14200 syncer update > 11265 246421 0 0 3 0x14200 cleaner cleaner > 86967 386950 0 0 3 0x14200 reaper reaper > 48511 221734 0 0 3 0x14200 pgdaemon pagedaemon > 27362 255648 0 0 3 0x14200 bored crynlk > 58949 107875 0 0 3 0x14200 bored crypto > 88305 317139 0 0 3 0x14200 bored sensors > 62804 248570 0 0 3 0x14200 usbtsk usbtask > 717 253829 0 0 3 0x14200 usbatsk usbatsk > 48070 263826 0 0 3 0x40014200 acpi0 acpi0 > 65386 442770 0 0 3 0x40014200 idle3 > 33089 148765 0 0 3 0x40014200 idle2 > 65055 498669 0 0 3 0x40014200 idle1 > 10578 506553 0 0 3 0x14200 bored softnet > 70559 53653 0 0 3 0x14200 bored systqmp > 6788 104 0 0 3 0x14200 bored systq > 23919 173929 0 0 3 0x40014200 bored softclock > 87424 241507 0 0 3 0x40014200 idle0 > 44349 256295 0 0 3 0x14200 bored smr > 1 488173 0 0 3 0x82 wait init > 0 0 -1 0 3 0x10200 scheduler swapper > > bluhm >