Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly
On 12/05, Borislav Petkov wrote: > > On Thu, Dec 05, 2013 at 06:23:55PM +0100, Oleg Nesterov wrote: > > This is almost off-topic, but I am wondering if (in the long term) we > > can avoid this "insert the bp into every mm" altogether. > > > > Instead, uprobe_write_opcode() should only unmap this page and set > > Ok, sorry if I'm completely off base here but have you guys tried > unmapping the page from all other VMs, This is what I meant, but we can't simply clear this pte, > and causing all > the VMs to refault why? it would be better to install the page on demand. > patching you'd probably need to cause the #PF handler to "loop" until > patching is complete though. We can't do this, but I do not think we need to block #PF handler. However, somehow the #PF handler should know that it should install the patched page owned by uprobes. That is why I talked about SWP_UPROBE_ENTRY (or something similar) But again, in any case this is not trivial. And perhaps I misundestood you... If you actually want to cause all the VMs to refault, then why we can't unmap + refault every mm like the patch I sent does? Just in case, note that we can't share the same page anyway without more complications. Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly
On Thu, Dec 05, 2013 at 06:23:55PM +0100, Oleg Nesterov wrote: > This is almost off-topic, but I am wondering if (in the long term) we > can avoid this "insert the bp into every mm" altogether. > > Instead, uprobe_write_opcode() should only unmap this page and set Ok, sorry if I'm completely off base here but have you guys tried unmapping the page from all other VMs, patching it and causing all the VMs to refault it thereby getting the updated content? During the patching you'd probably need to cause the #PF handler to "loop" until patching is complete though. I don't know whether that is even doable/makes sense - just a dumb idea... -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly
On 12/04, Oleg Nesterov wrote: > > And/Or. Are you still saying that on x86 (and powerpc?) we do not need > these pte games at all and uprobe_write_opcode() can simply do: > > /* Break the mapping unless the page is already anonymous */ > ret = get_user_pages(NULL, mm, vaddr, 1, 1, 1, &page, &vma); > > // this page can't be swapped out due to page_freeze_refs() > copy_to_page(page, vaddr, &opcode, UPROBE_SWBP_INSN_SIZE); > > set_page_dirty_lock(page); This is almost off-topic, but I am wondering if (in the long term) we can avoid this "insert the bp into every mm" altogether. Instead, uprobe_write_opcode() should only unmap this page and set SWP_UPROBE_ENTRY. The probed application should install the page with breakpoints itself on demand during the fault, do_swap_page() ->non_swap_entry() can do this. This also allows to share the patched page, uprobes should read it at uprobe_register() time, make a copy, add the breakpoint, and keep it for do_swap_page(). Perhaps. Not sure this is really possible, and of course this is not that simple, and the filtering complicates this even more. And we should ensure that the application itself or ptracer can't write to this page (or even cow it), but this doesn't look too hard. Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly
On Wed, 2013-12-04 at 09:15 -0800, Linus Torvalds wrote: > On Wed, Dec 4, 2013 at 8:54 AM, H. Peter Anvin wrote: > > > > That is why I talk about the atomic instruction word... most (but not > > *all*) architectures have a fundamental minimum unit of instructions > > which is aligned and can be atomically written. Typically this is 1, 2, > > or 4 bytes. > > Note that it's not just about the "atomically written", it's also > about the guarantee that it's atomically *read*. > > x86 can certainly atomically write a 4-byte instruction too, it's just > that there's no guarantee - even if the instruction is aligned etc - > that the actual instruction decoding always ends up reading it that > way. It might re-read an instruction after encountering a prefix byte > etc etc. So even if it's all properly aligned, the reading side might > do something odd. The ARM architecture has similar issues. Even though the instruction size is mostly fixed, the architecture specification itself only guarantees a very tiny subset of instructions are safe to modify whilst there may be concurrent execution of that instruction. I'm quoting a discussion from a while ago: http://lkml.org/lkml/2012/12/10/346 -- Tixy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly
On 12/04/2013 09:15 AM, Linus Torvalds wrote: > On Wed, Dec 4, 2013 at 8:54 AM, H. Peter Anvin wrote: >> >> That is why I talk about the atomic instruction word... most (but not >> *all*) architectures have a fundamental minimum unit of instructions >> which is aligned and can be atomically written. Typically this is 1, 2, >> or 4 bytes. > > Note that it's not just about the "atomically written", it's also > about the guarantee that it's atomically *read*. > > x86 can certainly atomically write a 4-byte instruction too, it's just > that there's no guarantee - even if the instruction is aligned etc - > that the actual instruction decoding always ends up reading it that > way. It might re-read an instruction after encountering a prefix byte > etc etc. So even if it's all properly aligned, the reading side might > do something odd. > True, at least in theory, but the atomic instruction quantum on x86 is a byte. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly
Peter, Linus, I got lost. So what do you finally think about this change? Please see v2 below: - update the comment above gup(write, force) - add flush_icache_page() before set_pte_at() (nop on x86 and powerpc) - fix the returned value, and with this change it seems to work although I do not trust my testing. I am attaching the code: int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr, uprobe_opcode_t opcode) { struct page *page; struct vm_area_struct *vma; pte_t *ptep, entry; spinlock_t *ptlp; int ret; /* Read the page with vaddr into memory */ ret = get_user_pages(NULL, mm, vaddr, 1, 0, 1, &page, NULL); if (ret < 0) return ret; ret = verify_opcode(page, vaddr, &opcode); if (ret < 0) goto put; retry: put_page(page); /* Break the mapping unless the page is already anonymous */ ret = get_user_pages(NULL, mm, vaddr, 1, 1, 1, &page, &vma); if (ret <= 0) goto put; ptep = page_check_address(page, mm, vaddr, &ptlp, 0); if (!ptep) goto retry; /* Unmap this page to ensure that nobody can execute it */ flush_cache_page(vma, vaddr, pte_pfn(*ptep)); entry = ptep_clear_flush(vma, vaddr, ptep); /* Nobody can fault in this page, modify it */ copy_to_page(page, vaddr, &opcode, UPROBE_SWBP_INSN_SIZE); /* Restore the old mapping */ entry = pte_mkdirty(entry); flush_icache_page(vma, page); set_pte_at(mm, vaddr, ptep, entry); update_mmu_cache(vma, vaddr, ptep); pte_unmap_unlock(ptep, ptlp); ret = 0; put: put_page(page); return ret; } And/Or. Are you still saying that on x86 (and powerpc?) we do not need these pte games at all and uprobe_write_opcode() can simply do: /* Break the mapping unless the page is already anonymous */ ret = get_user_pages(NULL, mm, vaddr, 1, 1, 1, &page, &vma); // this page can't be swapped out due to page_freeze_refs() copy_to_page(page, vaddr, &opcode, UPROBE_SWBP_INSN_SIZE); set_page_dirty_lock(page); Just in case... I have no idea if this matters or not, but uprobe_write_opcode() is also used to restore the original insn which can even cross the page. We still write a single (1st) byte in this case, the byte which was previously replaced by int3. Oleg. --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -114,65 +114,6 @@ static loff_t vaddr_to_offset(struct vm_area_struct *vma, unsigned long vaddr) } /** - * __replace_page - replace page in vma by new page. - * based on replace_page in mm/ksm.c - * - * @vma: vma that holds the pte pointing to page - * @addr: address the old @page is mapped at - * @page: the cowed page we are replacing by kpage - * @kpage:the modified page we replace page by - * - * Returns 0 on success, -EFAULT on failure. - */ -static int __replace_page(struct vm_area_struct *vma, unsigned long addr, - struct page *page, struct page *kpage) -{ - struct mm_struct *mm = vma->vm_mm; - spinlock_t *ptl; - pte_t *ptep; - int err; - /* For mmu_notifiers */ - const unsigned long mmun_start = addr; - const unsigned long mmun_end = addr + PAGE_SIZE; - - /* For try_to_free_swap() and munlock_vma_page() below */ - lock_page(page); - - mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end); - err = -EAGAIN; - ptep = page_check_address(page, mm, addr, &ptl, 0); - if (!ptep) - goto unlock; - - get_page(kpage); - page_add_new_anon_rmap(kpage, vma, addr); - - if (!PageAnon(page)) { - dec_mm_counter(mm, MM_FILEPAGES); - inc_mm_counter(mm, MM_ANONPAGES); - } - - flush_cache_page(vma, addr, pte_pfn(*ptep)); - ptep_clear_flush(vma, addr, ptep); - set_pte_at_notify(mm, addr, ptep, mk_pte(kpage, vma->vm_page_prot)); - - page_remove_rmap(page); - if (!page_mapped(page)) - try_to_free_swap(page); - pte_unmap_unlock(ptep, ptl); - - if (vma->vm_flags & VM_LOCKED) - munlock_vma_page(page); - put_page(page); - - err = 0; - unlock: - mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end); - unlock_page(page); - return err; -} - -/** * is_swbp_insn - check if instruction is breakpoin
Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly
On Wed, Dec 4, 2013 at 8:54 AM, H. Peter Anvin wrote: > > That is why I talk about the atomic instruction word... most (but not > *all*) architectures have a fundamental minimum unit of instructions > which is aligned and can be atomically written. Typically this is 1, 2, > or 4 bytes. Note that it's not just about the "atomically written", it's also about the guarantee that it's atomically *read*. x86 can certainly atomically write a 4-byte instruction too, it's just that there's no guarantee - even if the instruction is aligned etc - that the actual instruction decoding always ends up reading it that way. It might re-read an instruction after encountering a prefix byte etc etc. So even if it's all properly aligned, the reading side might do something odd. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly
On 12/04/2013 08:48 AM, Oleg Nesterov wrote: > On 12/04, H. Peter Anvin wrote: >> >> On 12/04/2013 03:11 AM, Oleg Nesterov wrote: >>> >>> It is still not clear to me if we can simply change a single byte on >>> x86 or not, but at least on powerpc we need to update 4 bytes. Perhaps >>> we can conditionalize these pte games later. >>> >> >> But 4 aligned bytes can be written as a single transaction. > > Ah yes, this is true. > That is why I talk about the atomic instruction word... most (but not *all*) architectures have a fundamental minimum unit of instructions which is aligned and can be atomically written. Typically this is 1, 2, or 4 bytes. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly
On 12/04, H. Peter Anvin wrote: > > On 12/04/2013 03:11 AM, Oleg Nesterov wrote: > > > > It is still not clear to me if we can simply change a single byte on > > x86 or not, but at least on powerpc we need to update 4 bytes. Perhaps > > we can conditionalize these pte games later. > > > > But 4 aligned bytes can be written as a single transaction. Ah yes, this is true. Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly
On 12/04/2013 03:11 AM, Oleg Nesterov wrote: > > It is still not clear to me if we can simply change a single byte on > x86 or not, but at least on powerpc we need to update 4 bytes. Perhaps > we can conditionalize these pte games later. > But 4 aligned bytes can be written as a single transaction. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly
On 12/03, H. Peter Anvin wrote: > > On 12/03/2013 02:01 PM, Linus Torvalds wrote: > > On Tue, Dec 3, 2013 at 12:54 PM, Oleg Nesterov wrote: > >> > >> So do you think the patch I sent is wrong? Why? > > > > I think the TLB shootdown should guarantee that it's ok on other > > CPU's, since that's basically what we do on mmap. > > > > I think that is true for other CPUs; however, there are definitely CPUs > out there (which Linux supports) for which you have to synchronize the I > and D sides "manually" after writing code through memory, at least > through the CPU. That is at least one reason why MIPS has a > cacheflush() system call, for example. OK, probably (with or without the patch I sent) uprobe_write_opcode() needs flush_icache_page(). Altough it is nop on x86 and powerpc (architectures we currently support). But I still can't understand your "There is no architecture-independent way to make code globally visible". If this is true, then how, say, do_swap_page() can work? So I still think the patch should work (I'll add flush_icache_page). > > But looking closer at this, I think I see why the old code did what it > > did. I think it's breaking shared mmap pages on purpose rather than > > dirtying them. Which is probably the right thing to do. > > In other words, treating them as MAP_PRIVATE? Wouldn't it be better to > throw an error if we can't honor the semantics of the mapping that we > are using? Yes, uprobes never writes to MAP_SHARED vmas. Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly
On 12/03, Linus Torvalds wrote: > > On Tue, Dec 3, 2013 at 12:54 PM, Oleg Nesterov wrote: > > > > So do you think the patch I sent is wrong? Why? > > I think the TLB shootdown should guarantee that it's ok on other > CPU's, since that's basically what we do on mmap. OK, thanks. I'll resend this patch. It is still not clear to me if we can simply change a single byte on x86 or not, but at least on powerpc we need to update 4 bytes. Perhaps we can conditionalize these pte games later. > But looking closer at this, I think I see why the old code did what it > did. I think it's breaking shared mmap pages on purpose rather than > dirtying them. Which is probably the right thing to do. Ah, no. uprobes never writes to the shared pages. (hmm, it seems that VM_SHARED check is buggy, but this is offtopic). Otherwise this patch would be very wrong. Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly
On 12/03/2013 02:01 PM, Linus Torvalds wrote: > On Tue, Dec 3, 2013 at 12:54 PM, Oleg Nesterov wrote: >> >> So do you think the patch I sent is wrong? Why? > > I think the TLB shootdown should guarantee that it's ok on other > CPU's, since that's basically what we do on mmap. > I think that is true for other CPUs; however, there are definitely CPUs out there (which Linux supports) for which you have to synchronize the I and D sides "manually" after writing code through memory, at least through the CPU. That is at least one reason why MIPS has a cacheflush() system call, for example. > But looking closer at this, I think I see why the old code did what it > did. I think it's breaking shared mmap pages on purpose rather than > dirtying them. Which is probably the right thing to do. > In other words, treating them as MAP_PRIVATE? Wouldn't it be better to throw an error if we can't honor the semantics of the mapping that we are using? -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly
On 12/03/2013 12:54 PM, Oleg Nesterov wrote: > On 12/03, H. Peter Anvin wrote: >> >> There is no architecture-independent way to make code globally visible. > > Then I probably misunderstood "globally". Or something else. > > So do you think the patch I sent is wrong? Why? > It is wrong in the sense that without an architecture-specific synchronization at the end it is not guaranteed to work. It will work fine on x86 and presumably any other architecture where icache-dcache coherency is enforced in hardware (I am *guessing* that includes all or most SMP platforms, but it is definitely *not* true on all UP platforms.) -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly
On Tue, Dec 3, 2013 at 12:54 PM, Oleg Nesterov wrote: > > So do you think the patch I sent is wrong? Why? I think the TLB shootdown should guarantee that it's ok on other CPU's, since that's basically what we do on mmap. But looking closer at this, I think I see why the old code did what it did. I think it's breaking shared mmap pages on purpose rather than dirtying them. Which is probably the right thing to do. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly
On 12/03, H. Peter Anvin wrote: > > There is no architecture-independent way to make code globally visible. Then I probably misunderstood "globally". Or something else. So do you think the patch I sent is wrong? Why? I am totally confused. Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly
There is no architecture-independent way to make code globally visible. Oleg Nesterov wrote: >On 12/03, H. Peter Anvin wrote: >> >> On 12/03/2013 12:01 PM, Oleg Nesterov wrote: >> > >> >> followed by whatever synchronization necessary to make it globally >visible. >> > >> > Could you explain what this synchronization should actually do ? >> > >> >> That is architecture-dependent. On x86 it requires each CPU on which >> this code is visible to be IPId, on others it involves complex icache >> flushing protocols. > >Hmm. And why this would be better than arch-independent code I sent? > >IOW. Could you please comment the patch I sent? > >Oleg. -- Sent from my mobile phone. Please pardon brevity and lack of formatting. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly
On 12/03, H. Peter Anvin wrote: > > On 12/03/2013 12:01 PM, Oleg Nesterov wrote: > > > >> followed by whatever synchronization necessary to make it globally visible. > > > > Could you explain what this synchronization should actually do ? > > > > That is architecture-dependent. On x86 it requires each CPU on which > this code is visible to be IPId, on others it involves complex icache > flushing protocols. Hmm. And why this would be better than arch-independent code I sent? IOW. Could you please comment the patch I sent? Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly
On 12/03/2013 12:01 PM, Oleg Nesterov wrote: > >> followed by whatever synchronization necessary to make it globally visible. > > Could you explain what this synchronization should actually do ? > That is architecture-dependent. On x86 it requires each CPU on which this code is visible to be IPId, on others it involves complex icache flushing protocols. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly
On 12/03, H. Peter Anvin wrote: > > I guess it would have to be checked, but I would be *highly* surprised > if UPROBE_SWBP_INSN_SIZE ever[1] could be anything than the fundamental > instruction quantum, which means it should never be able to wrap a page, Yes, it can't. > but *also* should mean it should be able to just be put_user()'d put_user() obviously can't work, we need access_remote_vm-like code. > followed by whatever synchronization necessary to make it globally visible. Could you explain what this synchronization should actually do ? Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly
On 12/03, Linus Torvalds wrote: > > On Tue, Dec 3, 2013 at 10:49 AM, Oleg Nesterov wrote: > > > > See the patch below. For review only > > Looks completely broken. Where do you guarantee that it's just a single page? Yes, it is always a single page on all supported architectures. This is even documented. I believe that "NOTE:" comment above uprobe_write_opcode() tries to say this but I guess this comment should be cleanuped. And note also /* uprobe_write_opcode() assumes we don't cross page boundary */ BUG_ON((uprobe->offset & ~PAGE_MASK) + UPROBE_SWBP_INSN_SIZE > PAGE_SIZE); in prepare_uprobe(). > Yes, on x86, UPROBE_SWBP_INSN_SIZE is a single byte. And powerpc checks addr & 3 to ensure it doesn't cross the page. > frankly, on x86, exactly *because* it's a single byte, I don't > understand why we don't just write the damn thing with a single > "put_user()", and stop with all the idiotic games. Well, put_user() obviously can't work, mm != current->mm. So we need get_user_pages() at least. > No need to > invalidate caches, even, because if you overwrite the first byte of an > instruction, it all "just works". I can't comment this, I do not know how the hardware actually works. > Either the instruction decoding gets > the old one, or it gets the new one. Funny that. I have asked why access_process_vm() can't work when I saw the initial version of uprobes patches. I was told this can't work (even on x86). And I was told that this idiotic games were suggested by someone named Linus Torvalds ;) > And on non-x86, UPROBE_SWBP_INSN_SIZE is not necessarily 1, so it > could cross a page boundary. Yes. If we support such an architecture we should obviously update uprobe_write_opcode() accordingly. Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly
On 12/03/2013 11:00 AM, Linus Torvalds wrote: > > Yes, on x86, UPROBE_SWBP_INSN_SIZE is a single byte. But quite > frankly, on x86, exactly *because* it's a single byte, I don't > understand why we don't just write the damn thing with a single > "put_user()", and stop with all the idiotic games. No need to > invalidate caches, even, because if you overwrite the first byte of an > instruction, it all "just works". Either the instruction decoding gets > the old one, or it gets the new one. We already rely on that for the > kernel bp instruction replacement. > > And on non-x86, UPROBE_SWBP_INSN_SIZE is not necessarily 1, so it > could cross a page boundary. Yes, many architectures will have > alignment constraints, but I don't see this testing it. > > Whatever. I think that code is bad, and you should feel bad. But hey, > I think it was pretty bad before too. > I guess it would have to be checked, but I would be *highly* surprised if UPROBE_SWBP_INSN_SIZE ever[1] could be anything than the fundamental instruction quantum, which means it should never be able to wrap a page, but *also* should mean it should be able to just be put_user()'d followed by whatever synchronization necessary to make it globally visible. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly
On Tue, Dec 3, 2013 at 10:49 AM, Oleg Nesterov wrote: > > See the patch below. For review only Looks completely broken. Where do you guarantee that it's just a single page? Yes, on x86, UPROBE_SWBP_INSN_SIZE is a single byte. But quite frankly, on x86, exactly *because* it's a single byte, I don't understand why we don't just write the damn thing with a single "put_user()", and stop with all the idiotic games. No need to invalidate caches, even, because if you overwrite the first byte of an instruction, it all "just works". Either the instruction decoding gets the old one, or it gets the new one. We already rely on that for the kernel bp instruction replacement. And on non-x86, UPROBE_SWBP_INSN_SIZE is not necessarily 1, so it could cross a page boundary. Yes, many architectures will have alignment constraints, but I don't see this testing it. Whatever. I think that code is bad, and you should feel bad. But hey, I think it was pretty bad before too. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly
On 11/29, Linus Torvalds wrote: > > On Fri, Nov 29, 2013 at 3:24 PM, Jiri Kosina wrote: > > > > Do you think this'd be faster than the int3-based aproach? > > Unlikely to be faster, but perhaps more robust and more portable. Maybe. Can't we at least change uprobe_write_opcode() and kill the nontrivial __replace_page() logic? See the patch below. For review only, I can't understand if it needs mmu_notifier_invalidate* and set_pte_notify(), and of course I can easily miss something else. Currently uprobe_write_opcode() always allocs/installs the new page, even if the previous uprobe_write_opcode() has already created the cowed anonymous page. With this patch uprobe_write_opcode() calls gup(write, force), then invalidates pte, then modifies the page, and restores the old pte. The patch is hardly readable, this is how the code looks: int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr, uprobe_opcode_t opcode) { struct page *page; struct vm_area_struct *vma; pte_t *ptep, entry; spinlock_t *ptlp; int ret; /* Read the page with vaddr into memory */ ret = get_user_pages(NULL, mm, vaddr, 1, 0, 1, &page, NULL); if (ret <= 0) return ret; ret = verify_opcode(page, vaddr, &opcode); if (ret <= 0) goto put; retry: put_page(page); /* COW this page if not writable */ ret = get_user_pages(NULL, mm, vaddr, 1, 1, 1, &page, &vma); if (ret <= 0) goto put; ptep = page_check_address(page, mm, vaddr, &ptlp, 0); if (!ptep) goto retry; /* Unmap this page to ensure that nobody can execute it */ flush_cache_page(vma, vaddr, pte_pfn(*ptep)); entry = ptep_clear_flush(vma, vaddr, ptep); /* Nobody can fault in this page, modify it */ copy_to_page(page, vaddr, &opcode, UPROBE_SWBP_INSN_SIZE); /* Restore the old mapping */ entry = pte_mkdirty(entry); set_pte_at(mm, vaddr, ptep, entry); update_mmu_cache(vma, vaddr, ptep); pte_unmap_unlock(ptep, ptlp); put: put_page(page); return ret; } you can safely ignore the fist get_user_pages() and verify_opcode(). I'll appretiate any review, thanks in advance. Oleg. --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -114,65 +114,6 @@ static loff_t vaddr_to_offset(struct vm_area_struct *vma, unsigned long vaddr) } /** - * __replace_page - replace page in vma by new page. - * based on replace_page in mm/ksm.c - * - * @vma: vma that holds the pte pointing to page - * @addr: address the old @page is mapped at - * @page: the cowed page we are replacing by kpage - * @kpage:the modified page we replace page by - * - * Returns 0 on success, -EFAULT on failure. - */ -static int __replace_page(struct vm_area_struct *vma, unsigned long addr, - struct page *page, struct page *kpage) -{ - struct mm_struct *mm = vma->vm_mm; - spinlock_t *ptl; - pte_t *ptep; - int err; - /* For mmu_notifiers */ - const unsigned long mmun_start = addr; - const unsigned long mmun_end = addr + PAGE_SIZE; - - /* For try_to_free_swap() and munlock_vma_page() below */ - lock_page(page); - - mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end); - err = -EAGAIN; - ptep = page_check_address(page, mm, addr, &ptl, 0); - if (!ptep) - goto unlock; - - get_page(kpage); - page_add_new_anon_rmap(kpage, vma, addr); - - if (!PageAnon(page)) { - dec_mm_counter(mm, MM_FILEPAGES); - inc_mm_counter(mm, MM_ANONPAGES); - } - - flush_cache_page(vma, addr, pte_pfn(*ptep)); - ptep_clear_flush(vma, addr, ptep); - set_pte_at_notify(mm, addr, ptep, mk_pte(kpage, vma->vm_page_prot)); - - page_remove_rmap(page); - if (!page_mapped(page)) - try_to_free_swap(page); - pte_unmap_unlock(ptep, ptl); - - if (vma->vm_flags & VM_LOCKED) - munlock_vma_page(page); - put_page(page); - - err = 0; - unlock: - mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end); - unlock_page(page); - return err; -} - -/** * is_swbp_insn - check if instruction is breakpoint instruction. * @insn: instruction to be checked. * Default implementation of is_swbp_insn @@ -264,43 +205,46 @@ static int verify_opcode(struct page *page, unsigned long vaddr, uprobe_opcode_t int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr,