Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly

2013-12-05 Thread Oleg Nesterov
On 12/05, Borislav Petkov wrote:
>
> On Thu, Dec 05, 2013 at 06:23:55PM +0100, Oleg Nesterov wrote:
> > This is almost off-topic, but I am wondering if (in the long term) we
> > can avoid this "insert the bp into every mm" altogether.
> >
> > Instead, uprobe_write_opcode() should only unmap this page and set
>
> Ok, sorry if I'm completely off base here but have you guys tried
> unmapping the page from all other VMs,

This is what I meant, but we can't simply clear this pte,

> and causing all
> the VMs to refault

why? it would be better to install the page on demand.

> patching you'd probably need to cause the #PF handler to "loop" until
> patching is complete though.

We can't do this, but I do not think we need to block #PF handler.

However, somehow the #PF handler should know that it should install
the patched page owned by uprobes. That is why I talked about
SWP_UPROBE_ENTRY (or something similar)

But again, in any case this is not trivial.

And perhaps I misundestood you... If you actually want to cause
all the VMs to refault, then why we can't unmap + refault every
mm like the patch I sent does? Just in case, note that we can't
share the same page anyway without more complications.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly

2013-12-05 Thread Borislav Petkov
On Thu, Dec 05, 2013 at 06:23:55PM +0100, Oleg Nesterov wrote:
> This is almost off-topic, but I am wondering if (in the long term) we
> can avoid this "insert the bp into every mm" altogether.
>
> Instead, uprobe_write_opcode() should only unmap this page and set

Ok, sorry if I'm completely off base here but have you guys tried
unmapping the page from all other VMs, patching it and causing all
the VMs to refault it thereby getting the updated content? During the
patching you'd probably need to cause the #PF handler to "loop" until
patching is complete though.

I don't know whether that is even doable/makes sense - just a dumb
idea...

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly

2013-12-05 Thread Oleg Nesterov
On 12/04, Oleg Nesterov wrote:
>
> And/Or. Are you still saying that on x86 (and powerpc?) we do not need
> these pte games at all and uprobe_write_opcode() can simply do:
>
>   /* Break the mapping unless the page is already anonymous */
>   ret = get_user_pages(NULL, mm, vaddr, 1, 1, 1, , );
>
>   // this page can't be swapped out due to page_freeze_refs()
>   copy_to_page(page, vaddr, , UPROBE_SWBP_INSN_SIZE);
>   
>   set_page_dirty_lock(page);

This is almost off-topic, but I am wondering if (in the long term)
we can avoid this "insert the bp into every mm" altogether.

Instead, uprobe_write_opcode() should only unmap this page and set
SWP_UPROBE_ENTRY. The probed application should install the page
with breakpoints itself on demand during the fault, do_swap_page()
->non_swap_entry() can do this.

This also allows to share the patched page, uprobes should read it
at uprobe_register() time, make a copy, add the breakpoint, and keep
it for do_swap_page().

Perhaps. Not sure this is really possible, and of course this is not
that simple, and the filtering complicates this even more. And we
should ensure that the application itself or ptracer can't write to
this page (or even cow it), but this doesn't look too hard.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly

2013-12-05 Thread Jon Medhurst (Tixy)
On Wed, 2013-12-04 at 09:15 -0800, Linus Torvalds wrote:
> On Wed, Dec 4, 2013 at 8:54 AM, H. Peter Anvin  wrote:
> >
> > That is why I talk about the atomic instruction word... most (but not
> > *all*) architectures have a fundamental minimum unit of instructions
> > which is aligned and can be atomically written.  Typically this is 1, 2,
> > or 4 bytes.
> 
> Note that it's not just about the "atomically written", it's also
> about the guarantee that it's atomically *read*.
> 
> x86 can certainly atomically write a 4-byte instruction too, it's just
> that there's no guarantee - even if the instruction is aligned etc -
> that the actual instruction decoding always ends up reading it that
> way. It might re-read an instruction after encountering a prefix byte
> etc etc. So even if it's all properly aligned, the reading side might
> do something odd.

The ARM architecture has similar issues. Even though the instruction
size is mostly fixed, the architecture specification itself only
guarantees a very tiny subset of instructions are safe to modify whilst
there may be concurrent execution of that instruction. I'm quoting a
discussion from a while ago: http://lkml.org/lkml/2012/12/10/346

-- 
Tixy



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly

2013-12-05 Thread Jon Medhurst (Tixy)
On Wed, 2013-12-04 at 09:15 -0800, Linus Torvalds wrote:
 On Wed, Dec 4, 2013 at 8:54 AM, H. Peter Anvin h...@zytor.com wrote:
 
  That is why I talk about the atomic instruction word... most (but not
  *all*) architectures have a fundamental minimum unit of instructions
  which is aligned and can be atomically written.  Typically this is 1, 2,
  or 4 bytes.
 
 Note that it's not just about the atomically written, it's also
 about the guarantee that it's atomically *read*.
 
 x86 can certainly atomically write a 4-byte instruction too, it's just
 that there's no guarantee - even if the instruction is aligned etc -
 that the actual instruction decoding always ends up reading it that
 way. It might re-read an instruction after encountering a prefix byte
 etc etc. So even if it's all properly aligned, the reading side might
 do something odd.

The ARM architecture has similar issues. Even though the instruction
size is mostly fixed, the architecture specification itself only
guarantees a very tiny subset of instructions are safe to modify whilst
there may be concurrent execution of that instruction. I'm quoting a
discussion from a while ago: http://lkml.org/lkml/2012/12/10/346

-- 
Tixy



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly

2013-12-05 Thread Oleg Nesterov
On 12/04, Oleg Nesterov wrote:

 And/Or. Are you still saying that on x86 (and powerpc?) we do not need
 these pte games at all and uprobe_write_opcode() can simply do:

   /* Break the mapping unless the page is already anonymous */
   ret = get_user_pages(NULL, mm, vaddr, 1, 1, 1, page, vma);

   // this page can't be swapped out due to page_freeze_refs()
   copy_to_page(page, vaddr, opcode, UPROBE_SWBP_INSN_SIZE);
   
   set_page_dirty_lock(page);

This is almost off-topic, but I am wondering if (in the long term)
we can avoid this insert the bp into every mm altogether.

Instead, uprobe_write_opcode() should only unmap this page and set
SWP_UPROBE_ENTRY. The probed application should install the page
with breakpoints itself on demand during the fault, do_swap_page()
-non_swap_entry() can do this.

This also allows to share the patched page, uprobes should read it
at uprobe_register() time, make a copy, add the breakpoint, and keep
it for do_swap_page().

Perhaps. Not sure this is really possible, and of course this is not
that simple, and the filtering complicates this even more. And we
should ensure that the application itself or ptracer can't write to
this page (or even cow it), but this doesn't look too hard.

Oleg.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly

2013-12-05 Thread Borislav Petkov
On Thu, Dec 05, 2013 at 06:23:55PM +0100, Oleg Nesterov wrote:
 This is almost off-topic, but I am wondering if (in the long term) we
 can avoid this insert the bp into every mm altogether.

 Instead, uprobe_write_opcode() should only unmap this page and set

Ok, sorry if I'm completely off base here but have you guys tried
unmapping the page from all other VMs, patching it and causing all
the VMs to refault it thereby getting the updated content? During the
patching you'd probably need to cause the #PF handler to loop until
patching is complete though.

I don't know whether that is even doable/makes sense - just a dumb
idea...

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly

2013-12-05 Thread Oleg Nesterov
On 12/05, Borislav Petkov wrote:

 On Thu, Dec 05, 2013 at 06:23:55PM +0100, Oleg Nesterov wrote:
  This is almost off-topic, but I am wondering if (in the long term) we
  can avoid this insert the bp into every mm altogether.
 
  Instead, uprobe_write_opcode() should only unmap this page and set

 Ok, sorry if I'm completely off base here but have you guys tried
 unmapping the page from all other VMs,

This is what I meant, but we can't simply clear this pte,

 and causing all
 the VMs to refault

why? it would be better to install the page on demand.

 patching you'd probably need to cause the #PF handler to loop until
 patching is complete though.

We can't do this, but I do not think we need to block #PF handler.

However, somehow the #PF handler should know that it should install
the patched page owned by uprobes. That is why I talked about
SWP_UPROBE_ENTRY (or something similar)

But again, in any case this is not trivial.

And perhaps I misundestood you... If you actually want to cause
all the VMs to refault, then why we can't unmap + refault every
mm like the patch I sent does? Just in case, note that we can't
share the same page anyway without more complications.

Oleg.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly

2013-12-04 Thread H. Peter Anvin
On 12/04/2013 09:15 AM, Linus Torvalds wrote:
> On Wed, Dec 4, 2013 at 8:54 AM, H. Peter Anvin  wrote:
>>
>> That is why I talk about the atomic instruction word... most (but not
>> *all*) architectures have a fundamental minimum unit of instructions
>> which is aligned and can be atomically written.  Typically this is 1, 2,
>> or 4 bytes.
> 
> Note that it's not just about the "atomically written", it's also
> about the guarantee that it's atomically *read*.
> 
> x86 can certainly atomically write a 4-byte instruction too, it's just
> that there's no guarantee - even if the instruction is aligned etc -
> that the actual instruction decoding always ends up reading it that
> way. It might re-read an instruction after encountering a prefix byte
> etc etc. So even if it's all properly aligned, the reading side might
> do something odd.
> 

True, at least in theory, but the atomic instruction quantum on x86 is a
byte.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly

2013-12-04 Thread Oleg Nesterov
Peter, Linus, I got lost.

So what do you finally think about this change? Please see v2 below:

- update the comment above gup(write, force)

- add flush_icache_page() before set_pte_at() (nop on x86
  and powerpc)

- fix the returned value, and with this change it seems
  to work although I do not trust my testing.

I am attaching the code:

int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr,
uprobe_opcode_t opcode)
{
struct page *page;
struct vm_area_struct *vma;
pte_t *ptep, entry;
spinlock_t *ptlp;
int ret;

/* Read the page with vaddr into memory */
ret = get_user_pages(NULL, mm, vaddr, 1, 0, 1, , NULL);
if (ret < 0)
return ret;

ret = verify_opcode(page, vaddr, );
if (ret < 0)
goto put;

 retry:
put_page(page);
/* Break the mapping unless the page is already anonymous */
ret = get_user_pages(NULL, mm, vaddr, 1, 1, 1, , );
if (ret <= 0)
goto put;

ptep = page_check_address(page, mm, vaddr, , 0);
if (!ptep)
goto retry;

/* Unmap this page to ensure that nobody can execute it */
flush_cache_page(vma, vaddr, pte_pfn(*ptep));
entry = ptep_clear_flush(vma, vaddr, ptep);

/* Nobody can fault in this page, modify it */
copy_to_page(page, vaddr, , UPROBE_SWBP_INSN_SIZE);

/* Restore the old mapping */
entry = pte_mkdirty(entry);
flush_icache_page(vma, page);
set_pte_at(mm, vaddr, ptep, entry);
update_mmu_cache(vma, vaddr, ptep);
pte_unmap_unlock(ptep, ptlp);
ret = 0;
 put:
put_page(page);
return ret;
}

And/Or. Are you still saying that on x86 (and powerpc?) we do not need
these pte games at all and uprobe_write_opcode() can simply do:

/* Break the mapping unless the page is already anonymous */
ret = get_user_pages(NULL, mm, vaddr, 1, 1, 1, , );

// this page can't be swapped out due to page_freeze_refs()
copy_to_page(page, vaddr, , UPROBE_SWBP_INSN_SIZE);

set_page_dirty_lock(page);

Just in case... I have no idea if this matters or not, but
uprobe_write_opcode() is also used to restore the original insn which
can even cross the page. We still write a single (1st) byte in this
case, the byte which was previously replaced by int3.

Oleg.


--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -114,65 +114,6 @@ static loff_t vaddr_to_offset(struct vm_area_struct *vma, 
unsigned long vaddr)
 }
 
 /**
- * __replace_page - replace page in vma by new page.
- * based on replace_page in mm/ksm.c
- *
- * @vma:  vma that holds the pte pointing to page
- * @addr: address the old @page is mapped at
- * @page: the cowed page we are replacing by kpage
- * @kpage:the modified page we replace page by
- *
- * Returns 0 on success, -EFAULT on failure.
- */
-static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
-   struct page *page, struct page *kpage)
-{
-   struct mm_struct *mm = vma->vm_mm;
-   spinlock_t *ptl;
-   pte_t *ptep;
-   int err;
-   /* For mmu_notifiers */
-   const unsigned long mmun_start = addr;
-   const unsigned long mmun_end   = addr + PAGE_SIZE;
-
-   /* For try_to_free_swap() and munlock_vma_page() below */
-   lock_page(page);
-
-   mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
-   err = -EAGAIN;
-   ptep = page_check_address(page, mm, addr, , 0);
-   if (!ptep)
-   goto unlock;
-
-   get_page(kpage);
-   page_add_new_anon_rmap(kpage, vma, addr);
-
-   if (!PageAnon(page)) {
-   dec_mm_counter(mm, MM_FILEPAGES);
-   inc_mm_counter(mm, MM_ANONPAGES);
-   }
-
-   flush_cache_page(vma, addr, pte_pfn(*ptep));
-   ptep_clear_flush(vma, addr, ptep);
-   set_pte_at_notify(mm, addr, ptep, mk_pte(kpage, vma->vm_page_prot));
-
-   page_remove_rmap(page);
-   if (!page_mapped(page))
-   try_to_free_swap(page);
-   pte_unmap_unlock(ptep, ptl);
-
-   if (vma->vm_flags & VM_LOCKED)
-   munlock_vma_page(page);
-   put_page(page);
-
-   err = 0;
- unlock:
-   mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
-   unlock_page(page);
-   return err;
-}
-
-/**
  * is_swbp_insn - check if instruction is breakpoint instruction.
  * @insn: instruction to be checked.

Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly

2013-12-04 Thread Linus Torvalds
On Wed, Dec 4, 2013 at 8:54 AM, H. Peter Anvin  wrote:
>
> That is why I talk about the atomic instruction word... most (but not
> *all*) architectures have a fundamental minimum unit of instructions
> which is aligned and can be atomically written.  Typically this is 1, 2,
> or 4 bytes.

Note that it's not just about the "atomically written", it's also
about the guarantee that it's atomically *read*.

x86 can certainly atomically write a 4-byte instruction too, it's just
that there's no guarantee - even if the instruction is aligned etc -
that the actual instruction decoding always ends up reading it that
way. It might re-read an instruction after encountering a prefix byte
etc etc. So even if it's all properly aligned, the reading side might
do something odd.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly

2013-12-04 Thread H. Peter Anvin
On 12/04/2013 08:48 AM, Oleg Nesterov wrote:
> On 12/04, H. Peter Anvin wrote:
>>
>> On 12/04/2013 03:11 AM, Oleg Nesterov wrote:
>>>
>>> It is still not clear to me if we can simply change a single byte on
>>> x86 or not, but at least on powerpc we need to update 4 bytes. Perhaps
>>> we can conditionalize these pte games later.
>>>
>>
>> But 4 aligned bytes can be written as a single transaction.
> 
> Ah yes, this is true.
> 

That is why I talk about the atomic instruction word... most (but not
*all*) architectures have a fundamental minimum unit of instructions
which is aligned and can be atomically written.  Typically this is 1, 2,
or 4 bytes.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly

2013-12-04 Thread Oleg Nesterov
On 12/04, H. Peter Anvin wrote:
>
> On 12/04/2013 03:11 AM, Oleg Nesterov wrote:
> >
> > It is still not clear to me if we can simply change a single byte on
> > x86 or not, but at least on powerpc we need to update 4 bytes. Perhaps
> > we can conditionalize these pte games later.
> >
>
> But 4 aligned bytes can be written as a single transaction.

Ah yes, this is true.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly

2013-12-04 Thread H. Peter Anvin
On 12/04/2013 03:11 AM, Oleg Nesterov wrote:
> 
> It is still not clear to me if we can simply change a single byte on
> x86 or not, but at least on powerpc we need to update 4 bytes. Perhaps
> we can conditionalize these pte games later.
> 

But 4 aligned bytes can be written as a single transaction.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly

2013-12-04 Thread Oleg Nesterov
On 12/03, H. Peter Anvin wrote:
>
> On 12/03/2013 02:01 PM, Linus Torvalds wrote:
> > On Tue, Dec 3, 2013 at 12:54 PM, Oleg Nesterov  wrote:
> >>
> >> So do you think the patch I sent is wrong? Why?
> >
> > I think the TLB shootdown should guarantee that it's ok on other
> > CPU's, since that's basically what we do on mmap.
> >
>
> I think that is true for other CPUs; however, there are definitely CPUs
> out there (which Linux supports) for which you have to synchronize the I
> and D sides "manually" after writing code through memory, at least
> through the CPU.  That is at least one reason why MIPS has a
> cacheflush() system call, for example.

OK, probably (with or without the patch I sent) uprobe_write_opcode() needs
flush_icache_page(). Altough it is nop on x86 and powerpc (architectures
we currently support).

But I still can't understand your "There is no architecture-independent
way to make code globally visible". If this is true, then how, say,
do_swap_page() can work?

So I still think the patch should work (I'll add flush_icache_page).

> > But looking closer at this, I think I see why the old code did what it
> > did. I think it's breaking shared mmap pages on purpose rather than
> > dirtying them. Which is probably the right thing to do.
>
> In other words, treating them as MAP_PRIVATE?  Wouldn't it be better to
> throw an error if we can't honor the semantics of the mapping that we
> are using?

Yes, uprobes never writes to MAP_SHARED vmas.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly

2013-12-04 Thread Oleg Nesterov
On 12/03, Linus Torvalds wrote:
>
> On Tue, Dec 3, 2013 at 12:54 PM, Oleg Nesterov  wrote:
> >
> > So do you think the patch I sent is wrong? Why?
>
> I think the TLB shootdown should guarantee that it's ok on other
> CPU's, since that's basically what we do on mmap.

OK, thanks. I'll resend this patch.

It is still not clear to me if we can simply change a single byte on
x86 or not, but at least on powerpc we need to update 4 bytes. Perhaps
we can conditionalize these pte games later.

> But looking closer at this, I think I see why the old code did what it
> did. I think it's breaking shared mmap pages on purpose rather than
> dirtying them. Which is probably the right thing to do.

Ah, no. uprobes never writes to the shared pages. (hmm, it seems that
VM_SHARED check is buggy, but this is offtopic). Otherwise this patch
would be very wrong.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly

2013-12-04 Thread Oleg Nesterov
On 12/03, Linus Torvalds wrote:

 On Tue, Dec 3, 2013 at 12:54 PM, Oleg Nesterov o...@redhat.com wrote:
 
  So do you think the patch I sent is wrong? Why?

 I think the TLB shootdown should guarantee that it's ok on other
 CPU's, since that's basically what we do on mmap.

OK, thanks. I'll resend this patch.

It is still not clear to me if we can simply change a single byte on
x86 or not, but at least on powerpc we need to update 4 bytes. Perhaps
we can conditionalize these pte games later.

 But looking closer at this, I think I see why the old code did what it
 did. I think it's breaking shared mmap pages on purpose rather than
 dirtying them. Which is probably the right thing to do.

Ah, no. uprobes never writes to the shared pages. (hmm, it seems that
VM_SHARED check is buggy, but this is offtopic). Otherwise this patch
would be very wrong.

Oleg.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly

2013-12-04 Thread Oleg Nesterov
On 12/03, H. Peter Anvin wrote:

 On 12/03/2013 02:01 PM, Linus Torvalds wrote:
  On Tue, Dec 3, 2013 at 12:54 PM, Oleg Nesterov o...@redhat.com wrote:
 
  So do you think the patch I sent is wrong? Why?
 
  I think the TLB shootdown should guarantee that it's ok on other
  CPU's, since that's basically what we do on mmap.
 

 I think that is true for other CPUs; however, there are definitely CPUs
 out there (which Linux supports) for which you have to synchronize the I
 and D sides manually after writing code through memory, at least
 through the CPU.  That is at least one reason why MIPS has a
 cacheflush() system call, for example.

OK, probably (with or without the patch I sent) uprobe_write_opcode() needs
flush_icache_page(). Altough it is nop on x86 and powerpc (architectures
we currently support).

But I still can't understand your There is no architecture-independent
way to make code globally visible. If this is true, then how, say,
do_swap_page() can work?

So I still think the patch should work (I'll add flush_icache_page).

  But looking closer at this, I think I see why the old code did what it
  did. I think it's breaking shared mmap pages on purpose rather than
  dirtying them. Which is probably the right thing to do.

 In other words, treating them as MAP_PRIVATE?  Wouldn't it be better to
 throw an error if we can't honor the semantics of the mapping that we
 are using?

Yes, uprobes never writes to MAP_SHARED vmas.

Oleg.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly

2013-12-04 Thread H. Peter Anvin
On 12/04/2013 03:11 AM, Oleg Nesterov wrote:
 
 It is still not clear to me if we can simply change a single byte on
 x86 or not, but at least on powerpc we need to update 4 bytes. Perhaps
 we can conditionalize these pte games later.
 

But 4 aligned bytes can be written as a single transaction.

-hpa

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly

2013-12-04 Thread Oleg Nesterov
On 12/04, H. Peter Anvin wrote:

 On 12/04/2013 03:11 AM, Oleg Nesterov wrote:
 
  It is still not clear to me if we can simply change a single byte on
  x86 or not, but at least on powerpc we need to update 4 bytes. Perhaps
  we can conditionalize these pte games later.
 

 But 4 aligned bytes can be written as a single transaction.

Ah yes, this is true.

Oleg.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly

2013-12-04 Thread H. Peter Anvin
On 12/04/2013 08:48 AM, Oleg Nesterov wrote:
 On 12/04, H. Peter Anvin wrote:

 On 12/04/2013 03:11 AM, Oleg Nesterov wrote:

 It is still not clear to me if we can simply change a single byte on
 x86 or not, but at least on powerpc we need to update 4 bytes. Perhaps
 we can conditionalize these pte games later.


 But 4 aligned bytes can be written as a single transaction.
 
 Ah yes, this is true.
 

That is why I talk about the atomic instruction word... most (but not
*all*) architectures have a fundamental minimum unit of instructions
which is aligned and can be atomically written.  Typically this is 1, 2,
or 4 bytes.

-hpa

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly

2013-12-04 Thread Linus Torvalds
On Wed, Dec 4, 2013 at 8:54 AM, H. Peter Anvin h...@zytor.com wrote:

 That is why I talk about the atomic instruction word... most (but not
 *all*) architectures have a fundamental minimum unit of instructions
 which is aligned and can be atomically written.  Typically this is 1, 2,
 or 4 bytes.

Note that it's not just about the atomically written, it's also
about the guarantee that it's atomically *read*.

x86 can certainly atomically write a 4-byte instruction too, it's just
that there's no guarantee - even if the instruction is aligned etc -
that the actual instruction decoding always ends up reading it that
way. It might re-read an instruction after encountering a prefix byte
etc etc. So even if it's all properly aligned, the reading side might
do something odd.

Linus
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly

2013-12-04 Thread Oleg Nesterov
Peter, Linus, I got lost.

So what do you finally think about this change? Please see v2 below:

- update the comment above gup(write, force)

- add flush_icache_page() before set_pte_at() (nop on x86
  and powerpc)

- fix the returned value, and with this change it seems
  to work although I do not trust my testing.

I am attaching the code:

int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr,
uprobe_opcode_t opcode)
{
struct page *page;
struct vm_area_struct *vma;
pte_t *ptep, entry;
spinlock_t *ptlp;
int ret;

/* Read the page with vaddr into memory */
ret = get_user_pages(NULL, mm, vaddr, 1, 0, 1, page, NULL);
if (ret  0)
return ret;

ret = verify_opcode(page, vaddr, opcode);
if (ret  0)
goto put;

 retry:
put_page(page);
/* Break the mapping unless the page is already anonymous */
ret = get_user_pages(NULL, mm, vaddr, 1, 1, 1, page, vma);
if (ret = 0)
goto put;

ptep = page_check_address(page, mm, vaddr, ptlp, 0);
if (!ptep)
goto retry;

/* Unmap this page to ensure that nobody can execute it */
flush_cache_page(vma, vaddr, pte_pfn(*ptep));
entry = ptep_clear_flush(vma, vaddr, ptep);

/* Nobody can fault in this page, modify it */
copy_to_page(page, vaddr, opcode, UPROBE_SWBP_INSN_SIZE);

/* Restore the old mapping */
entry = pte_mkdirty(entry);
flush_icache_page(vma, page);
set_pte_at(mm, vaddr, ptep, entry);
update_mmu_cache(vma, vaddr, ptep);
pte_unmap_unlock(ptep, ptlp);
ret = 0;
 put:
put_page(page);
return ret;
}

And/Or. Are you still saying that on x86 (and powerpc?) we do not need
these pte games at all and uprobe_write_opcode() can simply do:

/* Break the mapping unless the page is already anonymous */
ret = get_user_pages(NULL, mm, vaddr, 1, 1, 1, page, vma);

// this page can't be swapped out due to page_freeze_refs()
copy_to_page(page, vaddr, opcode, UPROBE_SWBP_INSN_SIZE);

set_page_dirty_lock(page);

Just in case... I have no idea if this matters or not, but
uprobe_write_opcode() is also used to restore the original insn which
can even cross the page. We still write a single (1st) byte in this
case, the byte which was previously replaced by int3.

Oleg.


--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -114,65 +114,6 @@ static loff_t vaddr_to_offset(struct vm_area_struct *vma, 
unsigned long vaddr)
 }
 
 /**
- * __replace_page - replace page in vma by new page.
- * based on replace_page in mm/ksm.c
- *
- * @vma:  vma that holds the pte pointing to page
- * @addr: address the old @page is mapped at
- * @page: the cowed page we are replacing by kpage
- * @kpage:the modified page we replace page by
- *
- * Returns 0 on success, -EFAULT on failure.
- */
-static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
-   struct page *page, struct page *kpage)
-{
-   struct mm_struct *mm = vma-vm_mm;
-   spinlock_t *ptl;
-   pte_t *ptep;
-   int err;
-   /* For mmu_notifiers */
-   const unsigned long mmun_start = addr;
-   const unsigned long mmun_end   = addr + PAGE_SIZE;
-
-   /* For try_to_free_swap() and munlock_vma_page() below */
-   lock_page(page);
-
-   mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
-   err = -EAGAIN;
-   ptep = page_check_address(page, mm, addr, ptl, 0);
-   if (!ptep)
-   goto unlock;
-
-   get_page(kpage);
-   page_add_new_anon_rmap(kpage, vma, addr);
-
-   if (!PageAnon(page)) {
-   dec_mm_counter(mm, MM_FILEPAGES);
-   inc_mm_counter(mm, MM_ANONPAGES);
-   }
-
-   flush_cache_page(vma, addr, pte_pfn(*ptep));
-   ptep_clear_flush(vma, addr, ptep);
-   set_pte_at_notify(mm, addr, ptep, mk_pte(kpage, vma-vm_page_prot));
-
-   page_remove_rmap(page);
-   if (!page_mapped(page))
-   try_to_free_swap(page);
-   pte_unmap_unlock(ptep, ptl);
-
-   if (vma-vm_flags  VM_LOCKED)
-   munlock_vma_page(page);
-   put_page(page);
-
-   err = 0;
- unlock:
-   mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
-   unlock_page(page);
-   return err;
-}
-
-/**
  * is_swbp_insn - check if instruction is breakpoint instruction.
  

Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly

2013-12-04 Thread H. Peter Anvin
On 12/04/2013 09:15 AM, Linus Torvalds wrote:
 On Wed, Dec 4, 2013 at 8:54 AM, H. Peter Anvin h...@zytor.com wrote:

 That is why I talk about the atomic instruction word... most (but not
 *all*) architectures have a fundamental minimum unit of instructions
 which is aligned and can be atomically written.  Typically this is 1, 2,
 or 4 bytes.
 
 Note that it's not just about the atomically written, it's also
 about the guarantee that it's atomically *read*.
 
 x86 can certainly atomically write a 4-byte instruction too, it's just
 that there's no guarantee - even if the instruction is aligned etc -
 that the actual instruction decoding always ends up reading it that
 way. It might re-read an instruction after encountering a prefix byte
 etc etc. So even if it's all properly aligned, the reading side might
 do something odd.
 

True, at least in theory, but the atomic instruction quantum on x86 is a
byte.

-hpa


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly

2013-12-03 Thread H. Peter Anvin
On 12/03/2013 02:01 PM, Linus Torvalds wrote:
> On Tue, Dec 3, 2013 at 12:54 PM, Oleg Nesterov  wrote:
>>
>> So do you think the patch I sent is wrong? Why?
> 
> I think the TLB shootdown should guarantee that it's ok on other
> CPU's, since that's basically what we do on mmap.
> 

I think that is true for other CPUs; however, there are definitely CPUs
out there (which Linux supports) for which you have to synchronize the I
and D sides "manually" after writing code through memory, at least
through the CPU.  That is at least one reason why MIPS has a
cacheflush() system call, for example.

> But looking closer at this, I think I see why the old code did what it
> did. I think it's breaking shared mmap pages on purpose rather than
> dirtying them. Which is probably the right thing to do.
> 

In other words, treating them as MAP_PRIVATE?  Wouldn't it be better to
throw an error if we can't honor the semantics of the mapping that we
are using?

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly

2013-12-03 Thread H. Peter Anvin
On 12/03/2013 12:54 PM, Oleg Nesterov wrote:
> On 12/03, H. Peter Anvin wrote:
>>
>> There is no architecture-independent way to make code globally visible.
> 
> Then I probably misunderstood "globally". Or something else.
> 
> So do you think the patch I sent is wrong? Why?
> 

It is wrong in the sense that without an architecture-specific
synchronization at the end it is not guaranteed to work.  It will work
fine on x86 and presumably any other architecture where icache-dcache
coherency is enforced in hardware (I am *guessing* that includes all or
most SMP platforms, but it is definitely *not* true on all UP platforms.)

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly

2013-12-03 Thread Linus Torvalds
On Tue, Dec 3, 2013 at 12:54 PM, Oleg Nesterov  wrote:
>
> So do you think the patch I sent is wrong? Why?

I think the TLB shootdown should guarantee that it's ok on other
CPU's, since that's basically what we do on mmap.

But looking closer at this, I think I see why the old code did what it
did. I think it's breaking shared mmap pages on purpose rather than
dirtying them. Which is probably the right thing to do.

  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly

2013-12-03 Thread Oleg Nesterov
On 12/03, H. Peter Anvin wrote:
>
> There is no architecture-independent way to make code globally visible.

Then I probably misunderstood "globally". Or something else.

So do you think the patch I sent is wrong? Why?

I am totally confused.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly

2013-12-03 Thread H. Peter Anvin
There is no architecture-independent way to make code globally visible.

Oleg Nesterov  wrote:
>On 12/03, H. Peter Anvin wrote:
>>
>> On 12/03/2013 12:01 PM, Oleg Nesterov wrote:
>> >
>> >> followed by whatever synchronization necessary to make it globally
>visible.
>> >
>> > Could you explain what this synchronization should actually do ?
>> >
>>
>> That is architecture-dependent. On x86 it requires each CPU on which
>> this code is visible to be IPId, on others it involves complex icache
>> flushing protocols.
>
>Hmm. And why this would be better than arch-independent code I sent?
>
>IOW. Could you please comment the patch I sent?
>
>Oleg.

-- 
Sent from my mobile phone.  Please pardon brevity and lack of formatting.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly

2013-12-03 Thread Oleg Nesterov
On 12/03, H. Peter Anvin wrote:
>
> On 12/03/2013 12:01 PM, Oleg Nesterov wrote:
> >
> >> followed by whatever synchronization necessary to make it globally visible.
> >
> > Could you explain what this synchronization should actually do ?
> >
>
> That is architecture-dependent. On x86 it requires each CPU on which
> this code is visible to be IPId, on others it involves complex icache
> flushing protocols.

Hmm. And why this would be better than arch-independent code I sent?

IOW. Could you please comment the patch I sent?

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly

2013-12-03 Thread H. Peter Anvin
On 12/03/2013 12:01 PM, Oleg Nesterov wrote:
> 
>> followed by whatever synchronization necessary to make it globally visible.
> 
> Could you explain what this synchronization should actually do ?
> 

That is architecture-dependent.  On x86 it requires each CPU on which
this code is visible to be IPId, on others it involves complex icache
flushing protocols.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly

2013-12-03 Thread Oleg Nesterov
On 12/03, H. Peter Anvin wrote:
>
> I guess it would have to be checked, but I would be *highly* surprised
> if UPROBE_SWBP_INSN_SIZE ever[1] could be anything than the fundamental
> instruction quantum, which means it should never be able to wrap a page,

Yes, it can't.

> but *also* should mean it should be able to just be put_user()'d

put_user() obviously can't work, we need access_remote_vm-like code.

> followed by whatever synchronization necessary to make it globally visible.

Could you explain what this synchronization should actually do ?

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly

2013-12-03 Thread Oleg Nesterov
On 12/03, Linus Torvalds wrote:
>
> On Tue, Dec 3, 2013 at 10:49 AM, Oleg Nesterov  wrote:
> >
> > See the patch below. For review only
>
> Looks completely broken. Where do you guarantee that it's just a single page?

Yes, it is always a single page on all supported architectures.

This is even documented. I believe that "NOTE:" comment above
uprobe_write_opcode() tries to say this but I guess this comment
should be cleanuped.

And note also

/* uprobe_write_opcode() assumes we don't cross page boundary */
BUG_ON((uprobe->offset & ~PAGE_MASK) +
UPROBE_SWBP_INSN_SIZE > PAGE_SIZE);

in prepare_uprobe().

> Yes, on x86, UPROBE_SWBP_INSN_SIZE is a single byte.

And powerpc checks addr & 3 to ensure it doesn't cross the page.

> frankly, on x86, exactly *because* it's a single byte, I don't
> understand why we don't just write the damn thing with a single
> "put_user()", and stop with all the idiotic games.

Well, put_user() obviously can't work, mm != current->mm.
So we need get_user_pages() at least.

> No need to
> invalidate caches, even, because if you overwrite the first byte of an
> instruction, it all "just works".

I can't comment this, I do not know how the hardware actually works.

> Either the instruction decoding gets
> the old one, or it gets the new one.

Funny that.

I have asked why access_process_vm() can't work when I saw the initial
version of uprobes patches. I was told this can't work (even on x86).

And I was told that this idiotic games were suggested by someone
named Linus Torvalds ;)

> And on non-x86, UPROBE_SWBP_INSN_SIZE is not necessarily 1, so it
> could cross a page boundary.

Yes. If we support such an architecture we should obviously update
uprobe_write_opcode() accordingly.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly

2013-12-03 Thread H. Peter Anvin
On 12/03/2013 11:00 AM, Linus Torvalds wrote:
> 
> Yes, on x86, UPROBE_SWBP_INSN_SIZE is a single byte. But quite
> frankly, on x86, exactly *because* it's a single byte, I don't
> understand why we don't just write the damn thing with a single
> "put_user()", and stop with all the idiotic games. No need to
> invalidate caches, even, because if you overwrite the first byte of an
> instruction, it all "just works". Either the instruction decoding gets
> the old one, or it gets the new one. We already rely on that for the
> kernel bp instruction replacement.
> 
> And on non-x86, UPROBE_SWBP_INSN_SIZE is not necessarily 1, so it
> could cross a page boundary. Yes, many architectures will have
> alignment constraints, but I don't see this testing it.
> 
> Whatever. I think that code is bad, and you should feel bad. But hey,
> I think it was pretty bad before too.
> 

I guess it would have to be checked, but I would be *highly* surprised
if UPROBE_SWBP_INSN_SIZE ever[1] could be anything than the fundamental
instruction quantum, which means it should never be able to wrap a page,
but *also* should mean it should be able to just be put_user()'d
followed by whatever synchronization necessary to make it globally visible.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly

2013-12-03 Thread Linus Torvalds
On Tue, Dec 3, 2013 at 10:49 AM, Oleg Nesterov  wrote:
>
> See the patch below. For review only

Looks completely broken. Where do you guarantee that it's just a single page?

Yes, on x86, UPROBE_SWBP_INSN_SIZE is a single byte. But quite
frankly, on x86, exactly *because* it's a single byte, I don't
understand why we don't just write the damn thing with a single
"put_user()", and stop with all the idiotic games. No need to
invalidate caches, even, because if you overwrite the first byte of an
instruction, it all "just works". Either the instruction decoding gets
the old one, or it gets the new one. We already rely on that for the
kernel bp instruction replacement.

And on non-x86, UPROBE_SWBP_INSN_SIZE is not necessarily 1, so it
could cross a page boundary. Yes, many architectures will have
alignment constraints, but I don't see this testing it.

Whatever. I think that code is bad, and you should feel bad. But hey,
I think it was pretty bad before too.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly

2013-12-03 Thread Linus Torvalds
On Tue, Dec 3, 2013 at 10:49 AM, Oleg Nesterov o...@redhat.com wrote:

 See the patch below. For review only

Looks completely broken. Where do you guarantee that it's just a single page?

Yes, on x86, UPROBE_SWBP_INSN_SIZE is a single byte. But quite
frankly, on x86, exactly *because* it's a single byte, I don't
understand why we don't just write the damn thing with a single
put_user(), and stop with all the idiotic games. No need to
invalidate caches, even, because if you overwrite the first byte of an
instruction, it all just works. Either the instruction decoding gets
the old one, or it gets the new one. We already rely on that for the
kernel bp instruction replacement.

And on non-x86, UPROBE_SWBP_INSN_SIZE is not necessarily 1, so it
could cross a page boundary. Yes, many architectures will have
alignment constraints, but I don't see this testing it.

Whatever. I think that code is bad, and you should feel bad. But hey,
I think it was pretty bad before too.

Linus
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly

2013-12-03 Thread H. Peter Anvin
On 12/03/2013 11:00 AM, Linus Torvalds wrote:
 
 Yes, on x86, UPROBE_SWBP_INSN_SIZE is a single byte. But quite
 frankly, on x86, exactly *because* it's a single byte, I don't
 understand why we don't just write the damn thing with a single
 put_user(), and stop with all the idiotic games. No need to
 invalidate caches, even, because if you overwrite the first byte of an
 instruction, it all just works. Either the instruction decoding gets
 the old one, or it gets the new one. We already rely on that for the
 kernel bp instruction replacement.
 
 And on non-x86, UPROBE_SWBP_INSN_SIZE is not necessarily 1, so it
 could cross a page boundary. Yes, many architectures will have
 alignment constraints, but I don't see this testing it.
 
 Whatever. I think that code is bad, and you should feel bad. But hey,
 I think it was pretty bad before too.
 

I guess it would have to be checked, but I would be *highly* surprised
if UPROBE_SWBP_INSN_SIZE ever[1] could be anything than the fundamental
instruction quantum, which means it should never be able to wrap a page,
but *also* should mean it should be able to just be put_user()'d
followed by whatever synchronization necessary to make it globally visible.

-hpa

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly

2013-12-03 Thread Oleg Nesterov
On 12/03, Linus Torvalds wrote:

 On Tue, Dec 3, 2013 at 10:49 AM, Oleg Nesterov o...@redhat.com wrote:
 
  See the patch below. For review only

 Looks completely broken. Where do you guarantee that it's just a single page?

Yes, it is always a single page on all supported architectures.

This is even documented. I believe that NOTE: comment above
uprobe_write_opcode() tries to say this but I guess this comment
should be cleanuped.

And note also

/* uprobe_write_opcode() assumes we don't cross page boundary */
BUG_ON((uprobe-offset  ~PAGE_MASK) +
UPROBE_SWBP_INSN_SIZE  PAGE_SIZE);

in prepare_uprobe().

 Yes, on x86, UPROBE_SWBP_INSN_SIZE is a single byte.

And powerpc checks addr  3 to ensure it doesn't cross the page.

 frankly, on x86, exactly *because* it's a single byte, I don't
 understand why we don't just write the damn thing with a single
 put_user(), and stop with all the idiotic games.

Well, put_user() obviously can't work, mm != current-mm.
So we need get_user_pages() at least.

 No need to
 invalidate caches, even, because if you overwrite the first byte of an
 instruction, it all just works.

I can't comment this, I do not know how the hardware actually works.

 Either the instruction decoding gets
 the old one, or it gets the new one.

Funny that.

I have asked why access_process_vm() can't work when I saw the initial
version of uprobes patches. I was told this can't work (even on x86).

And I was told that this idiotic games were suggested by someone
named Linus Torvalds ;)

 And on non-x86, UPROBE_SWBP_INSN_SIZE is not necessarily 1, so it
 could cross a page boundary.

Yes. If we support such an architecture we should obviously update
uprobe_write_opcode() accordingly.

Oleg.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly

2013-12-03 Thread Oleg Nesterov
On 12/03, H. Peter Anvin wrote:

 I guess it would have to be checked, but I would be *highly* surprised
 if UPROBE_SWBP_INSN_SIZE ever[1] could be anything than the fundamental
 instruction quantum, which means it should never be able to wrap a page,

Yes, it can't.

 but *also* should mean it should be able to just be put_user()'d

put_user() obviously can't work, we need access_remote_vm-like code.

 followed by whatever synchronization necessary to make it globally visible.

Could you explain what this synchronization should actually do ?

Oleg.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly

2013-12-03 Thread H. Peter Anvin
On 12/03/2013 12:01 PM, Oleg Nesterov wrote:
 
 followed by whatever synchronization necessary to make it globally visible.
 
 Could you explain what this synchronization should actually do ?
 

That is architecture-dependent.  On x86 it requires each CPU on which
this code is visible to be IPId, on others it involves complex icache
flushing protocols.

-hpa


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly

2013-12-03 Thread Oleg Nesterov
On 12/03, H. Peter Anvin wrote:

 On 12/03/2013 12:01 PM, Oleg Nesterov wrote:
 
  followed by whatever synchronization necessary to make it globally visible.
 
  Could you explain what this synchronization should actually do ?
 

 That is architecture-dependent. On x86 it requires each CPU on which
 this code is visible to be IPId, on others it involves complex icache
 flushing protocols.

Hmm. And why this would be better than arch-independent code I sent?

IOW. Could you please comment the patch I sent?

Oleg.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly

2013-12-03 Thread H. Peter Anvin
There is no architecture-independent way to make code globally visible.

Oleg Nesterov o...@redhat.com wrote:
On 12/03, H. Peter Anvin wrote:

 On 12/03/2013 12:01 PM, Oleg Nesterov wrote:
 
  followed by whatever synchronization necessary to make it globally
visible.
 
  Could you explain what this synchronization should actually do ?
 

 That is architecture-dependent. On x86 it requires each CPU on which
 this code is visible to be IPId, on others it involves complex icache
 flushing protocols.

Hmm. And why this would be better than arch-independent code I sent?

IOW. Could you please comment the patch I sent?

Oleg.

-- 
Sent from my mobile phone.  Please pardon brevity and lack of formatting.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly

2013-12-03 Thread Oleg Nesterov
On 12/03, H. Peter Anvin wrote:

 There is no architecture-independent way to make code globally visible.

Then I probably misunderstood globally. Or something else.

So do you think the patch I sent is wrong? Why?

I am totally confused.

Oleg.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly

2013-12-03 Thread Linus Torvalds
On Tue, Dec 3, 2013 at 12:54 PM, Oleg Nesterov o...@redhat.com wrote:

 So do you think the patch I sent is wrong? Why?

I think the TLB shootdown should guarantee that it's ok on other
CPU's, since that's basically what we do on mmap.

But looking closer at this, I think I see why the old code did what it
did. I think it's breaking shared mmap pages on purpose rather than
dirtying them. Which is probably the right thing to do.

  Linus
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly

2013-12-03 Thread H. Peter Anvin
On 12/03/2013 12:54 PM, Oleg Nesterov wrote:
 On 12/03, H. Peter Anvin wrote:

 There is no architecture-independent way to make code globally visible.
 
 Then I probably misunderstood globally. Or something else.
 
 So do you think the patch I sent is wrong? Why?
 

It is wrong in the sense that without an architecture-specific
synchronization at the end it is not guaranteed to work.  It will work
fine on x86 and presumably any other architecture where icache-dcache
coherency is enforced in hardware (I am *guessing* that includes all or
most SMP platforms, but it is definitely *not* true on all UP platforms.)

-hpa


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly

2013-12-03 Thread H. Peter Anvin
On 12/03/2013 02:01 PM, Linus Torvalds wrote:
 On Tue, Dec 3, 2013 at 12:54 PM, Oleg Nesterov o...@redhat.com wrote:

 So do you think the patch I sent is wrong? Why?
 
 I think the TLB shootdown should guarantee that it's ok on other
 CPU's, since that's basically what we do on mmap.
 

I think that is true for other CPUs; however, there are definitely CPUs
out there (which Linux supports) for which you have to synchronize the I
and D sides manually after writing code through memory, at least
through the CPU.  That is at least one reason why MIPS has a
cacheflush() system call, for example.

 But looking closer at this, I think I see why the old code did what it
 did. I think it's breaking shared mmap pages on purpose rather than
 dirtying them. Which is probably the right thing to do.
 

In other words, treating them as MAP_PRIVATE?  Wouldn't it be better to
throw an error if we can't honor the semantics of the mapping that we
are using?

-hpa

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/