Re: uvm_map_protect(9) & amd64 pmap bug?

Mark Kettenis Wed, 26 Mar 2025 05:34:35 -0700

> Date: Tue, 25 Mar 2025 18:59:46 +0000
> From: Miod Vallat <[email protected]>
> 
> > However, on amd64, with the diff applied the kernel faults when writing
> > to curproc.  In the trace below tatclock+0x108 corresponds to
> > tu_enter(&p->p_tu) in statclock().
> 
> I have tried this and it fails even earlier for me.
> 
> The uvm_map_protect() call in kern_exec.c will now end up invoking
> pmap_protect(), which is an inline function ending up in
> pmap_write_protect(pmap_kernel, va, va + PAGE_SIZE).
> 
> In my case, va = 0xffff.8000.4210.c000 which is in kernel space.
> However, at pmap_write_protect+0x213, which is the pmap_pte_clearbits()
> macro expansion here in the loop:
> 
>                 for (/*null */; spte < epte ; spte++) {
>                         if (!pmap_valid_entry(*spte))
>                                 continue;
>                         pmap_pte_clearbits(spte, clear);
>                         pmap_pte_setbits(spte, set);
>                 }
> 
> we end up with spte == 0x7fffe.c000.8000, which is BELOW the kernel (and
> *spte == 0x464c457f == the ELF signature). Therefore the attempt to flip
> bits in this bogus address faults, pcb_onfault is (correctly not set),
> kpageflttrap() panics.
> 
> Now if you look at the beginning of pmap_write_protect(), it does this:
> 
>         /* should be ok, but just in case ... */
>         sva &= PG_FRAME;
>         eva &= PG_FRAME;
> 
> and I'm afraid I don't understand this. My understanding is that
> PG_FRAME is a mask supposed to apply to physical addresses, not virtual
> addresses!


Indeed.  That code seems to be inherited from i386, where it isn't the
right thing to do either, but doesn't do any actual harm.

> Because of this, my initial page address, known as sva, gets
> "normalized" from 0xffff.8000.4210.c000 to 0x000f.8000.4210.c000, which
> is now LOWER than VM_MIN_KERNEL_ADDRESS and will not sign-extend
> correctly.
> 
> Is the PG_FRAME masking really only intending to mask the low-order
> bits, and should use ~PAGE_MASK instead?

Maybe.  But something needs to be done to handle the VA hole.  So
something like:

        sva = VA_SIGN_POS(sva);
        eva = VA_SIGN_POS(eva);

might work instead and ...

> In addition to this, the computation of `blockend' in the main loop of
> that routine will clear high-order bits (in my case, to
> 0x0000.8000.4220.0000), and because it assumes blockend > va to make
> progress at every iteration, this will actually become an infinite loop
> which will corrupt memory until it faults or you get tired of waiting
> for it to complete.

... fix this endless loop.  But we have to pass the real VA to
pmap_tlb_shootrange().  So that wouldn't work either.

> This STRONGLY hints that this routine has never been used on
> pmap_kernel() addresses until now.

I guess we stopped swapping out kernel stacks long before amd64 was a
thing?

> Can anyone with some amd64 mmu knowledge can confirm this analysis and
> do the required work to make that routine cope with non-userland
> addresses?

Re: uvm_map_protect(9) & amd64 pmap bug?

Reply via email to