:I have tried to understand the following code in vm_map_lookup() without
:much success:
:
:        if (fault_type & VM_PROT_OVERRIDE_WRITE)
:                prot = entry->max_protection;
:        else
:                prot = entry->protection;
:         ........
:
:        if (entry->wired_count && (fault_type & VM_PROT_WRITE) &&
:                        (entry->eflags & MAP_ENTRY_COW) &&
:                        (fault_typea & VM_PROT_OVERRIDE_WRITE) == 0) {
:                        RETURN(KERN_PROTECTION_FAILURE);
:        }
:
:At first, it seems to me that if you want to write a COW page, you must
:have OVERRIDE_WRITE set.

    The VM_PROT_OVERRIDE_WRITE flag is only used for user-wired pages, so
    it does not effect 'normal' page handling.   Look carefully at the
    vm_fault() code (vm/vm_fault.c line 212), that lookup only occurs
    with VM_PROT_OVERRIDE_WRITE set if the normal lookup fails and the
    user has wired the page.

    So if a normal lookup fails and this is a user-wired page, we try 
    the lookup again with VM_PROT_OVERRIDE_WRITE, presumably to handle
    a faked copy-on-write fault for the debugger.  This results in the 
    following:

    First, we temporarily increase the protections to make the page *appear*
    writeable.  Note: only 'appear' writeable, not actually be writeable.

        if (fault_type & VM_PROT_OVERRIDE_WRITE)
                prot = entry->max_protection;
        else
                prot = entry->protection;

    Next we strip off only the fault bits that we care about.  Note that
    we have already adjusted 'prot' based on the VM_PROT_OVERRIDE_WRITE
    flag so 'prot' is probably writeable.  We will thus fall through this
    conditional:
  
        fault_type &= (VM_PROT_READ|VM_PROT_WRITE|VM_PROT_EXECUTE);
        if ((fault_type & prot) != fault_type) {
                        RETURN(KERN_PROTECTION_FAILURE);
        }

    If this is part of a user wire and we have a write fault and the
    page is copy-on-write, *AND* VM_PROT_OVERRIDE_WRITE was not set,
    we return a failure.  This is, in fact, the failure that is returned
    when the vm_fault code initially attempts to do the lookup before
    vm_fault falls through and makes a second attempt with 
    VM_PROT_OVERRIDE_WRITE.

        if (entry->wired_count && (fault_type & VM_PROT_WRITE) &&
                        (entry->eflags & MAP_ENTRY_COW) &&
                        (fault_typea & VM_PROT_OVERRIDE_WRITE) == 0) {
                        RETURN(KERN_PROTECTION_FAILURE);
        }

    Now that we've gotten past this code we revert the protection bits
    if the page is a user-wire, because it was because the page was a
    user wire (indirectly, anyway) that the protections were increased in
    the first place.  We lose the entry->max_protection and
    revert back to entry->protection.  Essentially, we make the page
    (probably) read-only again.

        *wired = (entry->wired_count != 0);
        if (*wired)
                prot = fault_type = entry->protection;

    ... but we've already gotten past the conditionals that can cause
    a failure to be returned, so the code that follows will *still* do
    the copy-on-write for the debugger.

:But later I find that when wired_count is non zero, we are actually
:simulating a page fault, not a real one.
:Anyway, I do not know how the above code (1) prevents a debugger from
:writing a binary code, (2) forces
:a COW when a debugger write other data.
:
:I also have some questions on wiring a page:
:
:(1)  According to the man pages of mlock(2), a wired page can still
:cause protection-violation faults.
:But in the same vm_map_lookup(), we have the following code:
:
:        if (*wired)
:                prot = fault_type = entry->protection;
:
:and the comment says "get it for all possible accesses".  As I undersand
:it, we wire a page by simulating
:a page fault (no matter whether it is kernel or user who is wiring a
:page).

    I'm pretty sure this piece is simply reverting the mess that the
    copy-on-write stuff does for the debugger.  entry->protection is what
    we normally want to use.

    The debugger copy-on-write junk is there so the debugger can modify a
    program's TEXT area but the program itself *cannot* modify its own TEXT
    area.  It's a big mess and I don't fully understand how the structures
    are faked up to handle the case.

:(2)  Can the kernel wire a page of a user process without that user's
:request (by calling mlock)?
:
:Any help is appreciated.

    Yes.  The kernel can wire a page.  It usually busies the page for the
    duration, however, so vm_fault will block on the page and then retry
    without actually noticing that the page has been wired.  I'm probably
    not entirely correct here, John may be able to say more about it.

                                        -Matt
                                        Matthew Dillon 
                                        <dil...@backplane.com>


To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message

Reply via email to