Re: [PATCH] uprobes: fix scratch register selection for rip-relative fixups

2014-05-05 Thread Jim Keniston
On Fri, 2014-05-02 at 17:04 +0200, Denys Vlasenko wrote:
> On 05/02/2014 02:48 AM, Jim Keniston wrote:
> > On Thu, 2014-05-01 at 19:09 +0200, Denys Vlasenko wrote:
> >> +#define VEX2_(insn)   X86_VEX_V((insn)->vex_prefix.bytes[1])
> >> +#define VEX3_(insn)   X86_VEX_V((insn)->vex_prefix.bytes[2])
> > 
> > I disclaim any knowledge about VEX* stuff.
> ...
> > skipped this next part...
> > 
> >> +  /* Similar treatment for VEX3 prefix */
> >> +  /* TODO: add XOP/EVEX treatment when insn decoder supports them */
> >> +  if (insn->vex_prefix.nbytes == 3) {
> >> +  /*
> >> +   * vex2: c5rLpp   (has no b bit)
> >> +   * vex3/xop: c4/8f rxbm wLpp
> >> +   * evex: 62rxbR00mm.w1pp.zllBVaaa
> >> +   *   (evex will need setting of both b and x since
> >> +   *   in non-sib encoding evex.x is 4th bit of MODRM.rm)
> >> +   * Setting VEX3.b (setting because it has inverted meaning):
> >> +   */
> >> +  cursor = auprobe->insn + insn_offset_vex_prefix(insn) + 1;
> >> +  *cursor |= 0x20;
> >>}
> > 
> > ... resumed reviewing here.
> 
> I do realize that most people reading this code won't be
> aficionados of memorizing x86 insn encoding quirks.
> 
> To help them, I added the comment which spells out bit layout
> of vex/evex/xop prefixes (see above). Does it help?
> 
...

Well, yes.  I read your comments in conjunction with the "VEX Prefix"
wikipedia page and figured out what you're doing, and it looks right.  I
would definitely leave the comments in; otherwise there's a chance that
even you won't remember all this stuff 6 months from now.

Jim

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] uprobes: fix scratch register selection for rip-relative fixups

2014-05-05 Thread Jim Keniston
On Fri, 2014-05-02 at 17:04 +0200, Denys Vlasenko wrote:
 On 05/02/2014 02:48 AM, Jim Keniston wrote:
  On Thu, 2014-05-01 at 19:09 +0200, Denys Vlasenko wrote:
  +#define VEX2_(insn)   X86_VEX_V((insn)-vex_prefix.bytes[1])
  +#define VEX3_(insn)   X86_VEX_V((insn)-vex_prefix.bytes[2])
  
  I disclaim any knowledge about VEX* stuff.
 ...
  skipped this next part...
  
  +  /* Similar treatment for VEX3 prefix */
  +  /* TODO: add XOP/EVEX treatment when insn decoder supports them */
  +  if (insn-vex_prefix.nbytes == 3) {
  +  /*
  +   * vex2: c5rLpp   (has no b bit)
  +   * vex3/xop: c4/8f rxbm wLpp
  +   * evex: 62rxbR00mm.w1pp.zllBVaaa
  +   *   (evex will need setting of both b and x since
  +   *   in non-sib encoding evex.x is 4th bit of MODRM.rm)
  +   * Setting VEX3.b (setting because it has inverted meaning):
  +   */
  +  cursor = auprobe-insn + insn_offset_vex_prefix(insn) + 1;
  +  *cursor |= 0x20;
 }
  
  ... resumed reviewing here.
 
 I do realize that most people reading this code won't be
 aficionados of memorizing x86 insn encoding quirks.
 
 To help them, I added the comment which spells out bit layout
 of vex/evex/xop prefixes (see above). Does it help?
 
...

Well, yes.  I read your comments in conjunction with the VEX Prefix
wikipedia page and figured out what you're doing, and it looks right.  I
would definitely leave the comments in; otherwise there's a chance that
even you won't remember all this stuff 6 months from now.

Jim

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] uprobes: fix scratch register selection for rip-relative fixups

2014-05-02 Thread Denys Vlasenko
On 05/02/2014 02:48 AM, Jim Keniston wrote:
> On Thu, 2014-05-01 at 19:09 +0200, Denys Vlasenko wrote:
>> +#define VEX2_(insn) X86_VEX_V((insn)->vex_prefix.bytes[1])
>> +#define VEX3_(insn) X86_VEX_V((insn)->vex_prefix.bytes[2])
> 
> I disclaim any knowledge about VEX* stuff.
...
> skipped this next part...
> 
>> +/* Similar treatment for VEX3 prefix */
>> +/* TODO: add XOP/EVEX treatment when insn decoder supports them */
>> +if (insn->vex_prefix.nbytes == 3) {
>> +/*
>> + * vex2: c5rLpp   (has no b bit)
>> + * vex3/xop: c4/8f rxbm wLpp
>> + * evex: 62rxbR00mm.w1pp.zllBVaaa
>> + *   (evex will need setting of both b and x since
>> + *   in non-sib encoding evex.x is 4th bit of MODRM.rm)
>> + * Setting VEX3.b (setting because it has inverted meaning):
>> + */
>> +cursor = auprobe->insn + insn_offset_vex_prefix(insn) + 1;
>> +*cursor |= 0x20;
>>  }
> 
> ... resumed reviewing here.

I do realize that most people reading this code won't be
aficionados of memorizing x86 insn encoding quirks.

To help them, I added the comment which spells out bit layout
of vex/evex/xop prefixes (see above). Does it help?

> VEX again.  Covering my ears and humming...
> 
>> +if (insn->vex_prefix.nbytes == 2) {
>> +reg2 = VEX2_(insn) ^ 0xf;
>> +}
>> +if (insn->vex_prefix.nbytes == 3) {
>> +reg2 = VEX3_(insn) ^ 0xf;
>> +}
>> +/* TODO: add XOP, EXEV  reading */
>> +

In fact there is room for improvement in this part:
it's better to ignore most-significant bit of this field
(32-bit mode strikes back).

>> +reg2 = 3; /* BX */
>> +if (can_use_regs & UPROBE_FIX_RIP_DI)
>> +reg2 = 7;
>> +if (can_use_regs & UPROBE_FIX_RIP_SI)
>> +reg2 = 6;
> 
> It seems more natural to code this as:
>   if (can_use_regs & UPROBE_FIX_RIP_SI)
>   reg2 = 6;
>   else if (can_use_regs & UPROBE_FIX_RIP_DI)
>   reg2 = 7;
>   else
>   reg2 = 3; /* BX */
> ... which is pretty much how you do it in scratch_reg().

Okay.
I can also get rid of "can_use_regs" variable altogether then!
Sending new version in a jiffy.
-- 
vda


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] uprobes: fix scratch register selection for rip-relative fixups

2014-05-02 Thread Denys Vlasenko
On 05/02/2014 02:48 AM, Jim Keniston wrote:
 On Thu, 2014-05-01 at 19:09 +0200, Denys Vlasenko wrote:
 +#define VEX2_(insn) X86_VEX_V((insn)-vex_prefix.bytes[1])
 +#define VEX3_(insn) X86_VEX_V((insn)-vex_prefix.bytes[2])
 
 I disclaim any knowledge about VEX* stuff.
...
 skipped this next part...
 
 +/* Similar treatment for VEX3 prefix */
 +/* TODO: add XOP/EVEX treatment when insn decoder supports them */
 +if (insn-vex_prefix.nbytes == 3) {
 +/*
 + * vex2: c5rLpp   (has no b bit)
 + * vex3/xop: c4/8f rxbm wLpp
 + * evex: 62rxbR00mm.w1pp.zllBVaaa
 + *   (evex will need setting of both b and x since
 + *   in non-sib encoding evex.x is 4th bit of MODRM.rm)
 + * Setting VEX3.b (setting because it has inverted meaning):
 + */
 +cursor = auprobe-insn + insn_offset_vex_prefix(insn) + 1;
 +*cursor |= 0x20;
  }
 
 ... resumed reviewing here.

I do realize that most people reading this code won't be
aficionados of memorizing x86 insn encoding quirks.

To help them, I added the comment which spells out bit layout
of vex/evex/xop prefixes (see above). Does it help?

 VEX again.  Covering my ears and humming...
 
 +if (insn-vex_prefix.nbytes == 2) {
 +reg2 = VEX2_(insn) ^ 0xf;
 +}
 +if (insn-vex_prefix.nbytes == 3) {
 +reg2 = VEX3_(insn) ^ 0xf;
 +}
 +/* TODO: add XOP, EXEV  reading */
 +

In fact there is room for improvement in this part:
it's better to ignore most-significant bit of this field
(32-bit mode strikes back).

 +reg2 = 3; /* BX */
 +if (can_use_regs  UPROBE_FIX_RIP_DI)
 +reg2 = 7;
 +if (can_use_regs  UPROBE_FIX_RIP_SI)
 +reg2 = 6;
 
 It seems more natural to code this as:
   if (can_use_regs  UPROBE_FIX_RIP_SI)
   reg2 = 6;
   else if (can_use_regs  UPROBE_FIX_RIP_DI)
   reg2 = 7;
   else
   reg2 = 3; /* BX */
 ... which is pretty much how you do it in scratch_reg().

Okay.
I can also get rid of can_use_regs variable altogether then!
Sending new version in a jiffy.
-- 
vda


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] uprobes: fix scratch register selection for rip-relative fixups

2014-05-01 Thread Jim Keniston
On Thu, 2014-05-01 at 19:09 +0200, Denys Vlasenko wrote:
> Before this patch, instructions such as div, mul,
> shifts with count in CL, cmpxchg are mishandled.
> 
> This patch adds vex prefix handling. In particular,
> it avoids colliding with register operand encoded
> in vex. field.
> 
> Since we need to avoid two possible register operands,
> the selection of scratch register needs to be from at least
> three registers.
> 
> After looking through a lot of CPU docs, it looks like
> the safest choice is SI,DI,BX. Selecting BX needs care
> to not collide with implicit use of BX by cmpxchg8b.

I skipped reviewing some of this because I don't know about those parts
of the x86 architecture.  What I saw and understood looks like it'll
work, although I have a couple of suggestions there.  See below.

Thanks for your diligence and creativity on this.

Jim

> 
> Signed-off-by: Denys Vlasenko 
> CC: Jim Keniston 
> CC: Masami Hiramatsu 
> CC: Srikar Dronamraju 
> CC: Ingo Molnar 
> CC: Oleg Nesterov 
> ---
>  arch/x86/kernel/uprobes.c | 154 
> ++
>  1 file changed, 116 insertions(+), 38 deletions(-)
> 
> diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
> index c41bb4b..6cb53d1 100644
> --- a/arch/x86/kernel/uprobes.c
> +++ b/arch/x86/kernel/uprobes.c
> @@ -41,8 +41,10 @@
>  /* Instruction will modify TF, don't change it */
>  #define UPROBE_FIX_SETF  0x04
> 
> -#define UPROBE_FIX_RIP_AX0x08
> -#define UPROBE_FIX_RIP_CX0x10
> +#define UPROBE_FIX_RIP_SI0x08
> +#define UPROBE_FIX_RIP_DI0x10
> +#define UPROBE_FIX_RIP_BX0x20
> +#define UPROBE_FIX_RIP_ALL   0x38

I agree with Oleg's suggestion about how to define UPROBE_FIX_RIP_ALL.

> 
>  #define  UPROBE_TRAP_NR  UINT_MAX
> 
> @@ -51,6 +53,8 @@
>  #define OPCODE2(insn)((insn)->opcode.bytes[1])
>  #define OPCODE3(insn)((insn)->opcode.bytes[2])
>  #define MODRM_REG(insn)  X86_MODRM_REG((insn)->modrm.value)
> +#define VEX2_(insn)  X86_VEX_V((insn)->vex_prefix.bytes[1])
> +#define VEX3_(insn)  X86_VEX_V((insn)->vex_prefix.bytes[2])

I disclaim any knowledge about VEX* stuff.

> 
>  #define W(row, b0, b1, b2, b3, b4, b5, b6, b7, b8, b9, ba, bb, bc, bd, be, 
> bf)\
>   (((b0##UL << 0x0)|(b1##UL << 0x1)|(b2##UL << 0x2)|(b3##UL << 0x3) |   \
> @@ -274,63 +278,137 @@ static inline bool is_64bit_mm(struct mm_struct *mm)
>  static void riprel_analyze(struct arch_uprobe *auprobe, struct insn *insn)
>  {
>   u8 *cursor;
> + u8 can_use_regs;
>   u8 reg;
> + int reg2;
> 
>   if (!insn_rip_relative(insn))
>   return;
> 
>   /*
> -  * insn_rip_relative() would have decoded rex_prefix, modrm.
> +  * insn_rip_relative() would have decoded rex_prefix,
> +  * vex_prefix, modrm.
>* Clear REX.b bit (extension of MODRM.rm field):
> -  * we want to encode rax/rcx, not r8/r9.
> +  * we want to encode low numbered reg, not r8+.
>*/
>   if (insn->rex_prefix.nbytes) {
>   cursor = auprobe->insn + insn_offset_rex_prefix(insn);
> - *cursor &= 0xfe;/* Clearing REX.B bit */
> + /* REX byte has 0100wrxb layout, clearing REX.b bit */
> + *cursor &= 0xfe;
> + }

skipped this next part...

> + /* Similar treatment for VEX3 prefix */
> + /* TODO: add XOP/EVEX treatment when insn decoder supports them */
> + if (insn->vex_prefix.nbytes == 3) {
> + /*
> +  * vex2: c5rLpp   (has no b bit)
> +  * vex3/xop: c4/8f rxbm wLpp
> +  * evex: 62rxbR00mm.w1pp.zllBVaaa
> +  *   (evex will need setting of both b and x since
> +  *   in non-sib encoding evex.x is 4th bit of MODRM.rm)
> +  * Setting VEX3.b (setting because it has inverted meaning):
> +  */
> + cursor = auprobe->insn + insn_offset_vex_prefix(insn) + 1;
> + *cursor |= 0x20;
>   }
> 

... resumed reviewing here.

>   /*
> +  * Convert from rip-relative addressing
> +  * to register-relative addressing via a scratch register.
> +  *
> +  * This is tricky since there are insns with modrm byte
> +  * which also use registers not encoded in modrm byte:
> +  * [i]div/[i]mul: implicitly use dx:ax
> +  * shift ops: implicitly use cx
> +  * cmpxchg: implicitly uses ax
> +  * cmpxchg8/16b: implicitly uses dx:ax and bx:cx
> +  *   Encoding: 0f c7/1 modrm
> +  *   The code below thinks that reg=1 (cx), chooses si as scratch.

I verified the above in the AMD manual, but cannot comment on this next
section...

> +  * mulx: implicitly uses dx: mulx r/m,r1,r2: r1:r2 = dx * r/m
> +  *   First appeared in Haswell (BMI2 insn). It is vex-encoded.
> +  *   Example where none of bx,cx,dx can be used as scratch reg:
> +  *   c4 e2 

Re: [PATCH] uprobes: fix scratch register selection for rip-relative fixups

2014-05-01 Thread Oleg Nesterov
On 05/01, Denys Vlasenko wrote:
>
> Before this patch, instructions such as div, mul,
> shifts with count in CL, cmpxchg are mishandled.

Thanks. I'll try to read this patch tomorrow, but you do know that I can't
review (or even understand ;) the change in riprel_analyze().

As for other changes, perhaps we can cleanup them later somehow, but I am
not going to discuss this now. I agree that the fix should be as simple
as possible.

Only on nit,

> -#define UPROBE_FIX_RIP_AX0x08
> -#define UPROBE_FIX_RIP_CX0x10
> +#define UPROBE_FIX_RIP_SI0x08
> +#define UPROBE_FIX_RIP_DI0x10
> +#define UPROBE_FIX_RIP_BX0x20
> +#define UPROBE_FIX_RIP_ALL   0x38

OK, but please define UPROBE_FIX_RIP_ALL as

#define UPROBE_FIX_RIP_ALL  \
(UPROBE_FIX_RIP_SI | UPROBE_FIX_RIP_DI | UPROBE_FIX_RIP_BX)

Please do not bother to do this right now, lets wait for review.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] uprobes: fix scratch register selection for rip-relative fixups

2014-05-01 Thread Denys Vlasenko
Before this patch, instructions such as div, mul,
shifts with count in CL, cmpxchg are mishandled.

This patch adds vex prefix handling. In particular,
it avoids colliding with register operand encoded
in vex. field.

Since we need to avoid two possible register operands,
the selection of scratch register needs to be from at least
three registers.

After looking through a lot of CPU docs, it looks like
the safest choice is SI,DI,BX. Selecting BX needs care
to not collide with implicit use of BX by cmpxchg8b.

Signed-off-by: Denys Vlasenko 
CC: Jim Keniston 
CC: Masami Hiramatsu 
CC: Srikar Dronamraju 
CC: Ingo Molnar 
CC: Oleg Nesterov 
---
 arch/x86/kernel/uprobes.c | 154 ++
 1 file changed, 116 insertions(+), 38 deletions(-)

diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index c41bb4b..6cb53d1 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -41,8 +41,10 @@
 /* Instruction will modify TF, don't change it */
 #define UPROBE_FIX_SETF0x04
 
-#define UPROBE_FIX_RIP_AX  0x08
-#define UPROBE_FIX_RIP_CX  0x10
+#define UPROBE_FIX_RIP_SI  0x08
+#define UPROBE_FIX_RIP_DI  0x10
+#define UPROBE_FIX_RIP_BX  0x20
+#define UPROBE_FIX_RIP_ALL 0x38
 
 #defineUPROBE_TRAP_NR  UINT_MAX
 
@@ -51,6 +53,8 @@
 #define OPCODE2(insn)  ((insn)->opcode.bytes[1])
 #define OPCODE3(insn)  ((insn)->opcode.bytes[2])
 #define MODRM_REG(insn)X86_MODRM_REG((insn)->modrm.value)
+#define VEX2_(insn)X86_VEX_V((insn)->vex_prefix.bytes[1])
+#define VEX3_(insn)X86_VEX_V((insn)->vex_prefix.bytes[2])
 
 #define W(row, b0, b1, b2, b3, b4, b5, b6, b7, b8, b9, ba, bb, bc, bd, be, bf)\
(((b0##UL << 0x0)|(b1##UL << 0x1)|(b2##UL << 0x2)|(b3##UL << 0x3) |   \
@@ -274,63 +278,137 @@ static inline bool is_64bit_mm(struct mm_struct *mm)
 static void riprel_analyze(struct arch_uprobe *auprobe, struct insn *insn)
 {
u8 *cursor;
+   u8 can_use_regs;
u8 reg;
+   int reg2;
 
if (!insn_rip_relative(insn))
return;
 
/*
-* insn_rip_relative() would have decoded rex_prefix, modrm.
+* insn_rip_relative() would have decoded rex_prefix,
+* vex_prefix, modrm.
 * Clear REX.b bit (extension of MODRM.rm field):
-* we want to encode rax/rcx, not r8/r9.
+* we want to encode low numbered reg, not r8+.
 */
if (insn->rex_prefix.nbytes) {
cursor = auprobe->insn + insn_offset_rex_prefix(insn);
-   *cursor &= 0xfe;/* Clearing REX.B bit */
+   /* REX byte has 0100wrxb layout, clearing REX.b bit */
+   *cursor &= 0xfe;
+   }
+   /* Similar treatment for VEX3 prefix */
+   /* TODO: add XOP/EVEX treatment when insn decoder supports them */
+   if (insn->vex_prefix.nbytes == 3) {
+   /*
+* vex2: c5rLpp   (has no b bit)
+* vex3/xop: c4/8f rxbm wLpp
+* evex: 62rxbR00mm.w1pp.zllBVaaa
+*   (evex will need setting of both b and x since
+*   in non-sib encoding evex.x is 4th bit of MODRM.rm)
+* Setting VEX3.b (setting because it has inverted meaning):
+*/
+   cursor = auprobe->insn + insn_offset_vex_prefix(insn) + 1;
+   *cursor |= 0x20;
}
 
/*
+* Convert from rip-relative addressing
+* to register-relative addressing via a scratch register.
+*
+* This is tricky since there are insns with modrm byte
+* which also use registers not encoded in modrm byte:
+* [i]div/[i]mul: implicitly use dx:ax
+* shift ops: implicitly use cx
+* cmpxchg: implicitly uses ax
+* cmpxchg8/16b: implicitly uses dx:ax and bx:cx
+*   Encoding: 0f c7/1 modrm
+*   The code below thinks that reg=1 (cx), chooses si as scratch.
+* mulx: implicitly uses dx: mulx r/m,r1,r2: r1:r2 = dx * r/m
+*   First appeared in Haswell (BMI2 insn). It is vex-encoded.
+*   Example where none of bx,cx,dx can be used as scratch reg:
+*   c4 e2 63 f6 0d disp32   mulx disp32(%rip),%ebx,%ecx
+* [v]pcmpistri: implicitly uses cx, xmm0
+* [v]pcmpistrm: implicitly uses xmm0
+* [v]pcmpestri: implicitly uses ax, dx, cx, xmm0
+* [v]pcmpestrm: implicitly uses ax, dx, xmm0
+*   Evil SSE4.2 string comparison ops from hell.
+* maskmovq/[v]maskmovdqu: implicitly uses (ds:rdi) as destination.
+*   Encoding: 0f f7 modrm, 66 0f f7 modrm, vex-encoded: c5 f9 f7 modrm.
+*   Store op1, byte-masked by op2 msb's in each byte, to (ds:rdi).
+*   AMD says it has no 3-operand form (vex. must be )
+*   and that it can have only register operands, not mem
+  

[PATCH] uprobes: fix scratch register selection for rip-relative fixups

2014-05-01 Thread Denys Vlasenko
Before this patch, instructions such as div, mul,
shifts with count in CL, cmpxchg are mishandled.

This patch adds vex prefix handling. In particular,
it avoids colliding with register operand encoded
in vex. field.

Since we need to avoid two possible register operands,
the selection of scratch register needs to be from at least
three registers.

After looking through a lot of CPU docs, it looks like
the safest choice is SI,DI,BX. Selecting BX needs care
to not collide with implicit use of BX by cmpxchg8b.

Signed-off-by: Denys Vlasenko dvlas...@redhat.com
CC: Jim Keniston jkeni...@us.ibm.com
CC: Masami Hiramatsu masami.hiramatsu...@hitachi.com
CC: Srikar Dronamraju sri...@linux.vnet.ibm.com
CC: Ingo Molnar mi...@kernel.org
CC: Oleg Nesterov o...@redhat.com
---
 arch/x86/kernel/uprobes.c | 154 ++
 1 file changed, 116 insertions(+), 38 deletions(-)

diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index c41bb4b..6cb53d1 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -41,8 +41,10 @@
 /* Instruction will modify TF, don't change it */
 #define UPROBE_FIX_SETF0x04
 
-#define UPROBE_FIX_RIP_AX  0x08
-#define UPROBE_FIX_RIP_CX  0x10
+#define UPROBE_FIX_RIP_SI  0x08
+#define UPROBE_FIX_RIP_DI  0x10
+#define UPROBE_FIX_RIP_BX  0x20
+#define UPROBE_FIX_RIP_ALL 0x38
 
 #defineUPROBE_TRAP_NR  UINT_MAX
 
@@ -51,6 +53,8 @@
 #define OPCODE2(insn)  ((insn)-opcode.bytes[1])
 #define OPCODE3(insn)  ((insn)-opcode.bytes[2])
 #define MODRM_REG(insn)X86_MODRM_REG((insn)-modrm.value)
+#define VEX2_(insn)X86_VEX_V((insn)-vex_prefix.bytes[1])
+#define VEX3_(insn)X86_VEX_V((insn)-vex_prefix.bytes[2])
 
 #define W(row, b0, b1, b2, b3, b4, b5, b6, b7, b8, b9, ba, bb, bc, bd, be, bf)\
(((b0##UL  0x0)|(b1##UL  0x1)|(b2##UL  0x2)|(b3##UL  0x3) |   \
@@ -274,63 +278,137 @@ static inline bool is_64bit_mm(struct mm_struct *mm)
 static void riprel_analyze(struct arch_uprobe *auprobe, struct insn *insn)
 {
u8 *cursor;
+   u8 can_use_regs;
u8 reg;
+   int reg2;
 
if (!insn_rip_relative(insn))
return;
 
/*
-* insn_rip_relative() would have decoded rex_prefix, modrm.
+* insn_rip_relative() would have decoded rex_prefix,
+* vex_prefix, modrm.
 * Clear REX.b bit (extension of MODRM.rm field):
-* we want to encode rax/rcx, not r8/r9.
+* we want to encode low numbered reg, not r8+.
 */
if (insn-rex_prefix.nbytes) {
cursor = auprobe-insn + insn_offset_rex_prefix(insn);
-   *cursor = 0xfe;/* Clearing REX.B bit */
+   /* REX byte has 0100wrxb layout, clearing REX.b bit */
+   *cursor = 0xfe;
+   }
+   /* Similar treatment for VEX3 prefix */
+   /* TODO: add XOP/EVEX treatment when insn decoder supports them */
+   if (insn-vex_prefix.nbytes == 3) {
+   /*
+* vex2: c5rLpp   (has no b bit)
+* vex3/xop: c4/8f rxbm wLpp
+* evex: 62rxbR00mm.w1pp.zllBVaaa
+*   (evex will need setting of both b and x since
+*   in non-sib encoding evex.x is 4th bit of MODRM.rm)
+* Setting VEX3.b (setting because it has inverted meaning):
+*/
+   cursor = auprobe-insn + insn_offset_vex_prefix(insn) + 1;
+   *cursor |= 0x20;
}
 
/*
+* Convert from rip-relative addressing
+* to register-relative addressing via a scratch register.
+*
+* This is tricky since there are insns with modrm byte
+* which also use registers not encoded in modrm byte:
+* [i]div/[i]mul: implicitly use dx:ax
+* shift ops: implicitly use cx
+* cmpxchg: implicitly uses ax
+* cmpxchg8/16b: implicitly uses dx:ax and bx:cx
+*   Encoding: 0f c7/1 modrm
+*   The code below thinks that reg=1 (cx), chooses si as scratch.
+* mulx: implicitly uses dx: mulx r/m,r1,r2: r1:r2 = dx * r/m
+*   First appeared in Haswell (BMI2 insn). It is vex-encoded.
+*   Example where none of bx,cx,dx can be used as scratch reg:
+*   c4 e2 63 f6 0d disp32   mulx disp32(%rip),%ebx,%ecx
+* [v]pcmpistri: implicitly uses cx, xmm0
+* [v]pcmpistrm: implicitly uses xmm0
+* [v]pcmpestri: implicitly uses ax, dx, cx, xmm0
+* [v]pcmpestrm: implicitly uses ax, dx, xmm0
+*   Evil SSE4.2 string comparison ops from hell.
+* maskmovq/[v]maskmovdqu: implicitly uses (ds:rdi) as destination.
+*   Encoding: 0f f7 modrm, 66 0f f7 modrm, vex-encoded: c5 f9 f7 modrm.
+*   Store op1, byte-masked by op2 msb's in each byte, to (ds:rdi).
+*   AMD says it has no 

Re: [PATCH] uprobes: fix scratch register selection for rip-relative fixups

2014-05-01 Thread Oleg Nesterov
On 05/01, Denys Vlasenko wrote:

 Before this patch, instructions such as div, mul,
 shifts with count in CL, cmpxchg are mishandled.

Thanks. I'll try to read this patch tomorrow, but you do know that I can't
review (or even understand ;) the change in riprel_analyze().

As for other changes, perhaps we can cleanup them later somehow, but I am
not going to discuss this now. I agree that the fix should be as simple
as possible.

Only on nit,

 -#define UPROBE_FIX_RIP_AX0x08
 -#define UPROBE_FIX_RIP_CX0x10
 +#define UPROBE_FIX_RIP_SI0x08
 +#define UPROBE_FIX_RIP_DI0x10
 +#define UPROBE_FIX_RIP_BX0x20
 +#define UPROBE_FIX_RIP_ALL   0x38

OK, but please define UPROBE_FIX_RIP_ALL as

#define UPROBE_FIX_RIP_ALL  \
(UPROBE_FIX_RIP_SI | UPROBE_FIX_RIP_DI | UPROBE_FIX_RIP_BX)

Please do not bother to do this right now, lets wait for review.

Oleg.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] uprobes: fix scratch register selection for rip-relative fixups

2014-05-01 Thread Jim Keniston
On Thu, 2014-05-01 at 19:09 +0200, Denys Vlasenko wrote:
 Before this patch, instructions such as div, mul,
 shifts with count in CL, cmpxchg are mishandled.
 
 This patch adds vex prefix handling. In particular,
 it avoids colliding with register operand encoded
 in vex. field.
 
 Since we need to avoid two possible register operands,
 the selection of scratch register needs to be from at least
 three registers.
 
 After looking through a lot of CPU docs, it looks like
 the safest choice is SI,DI,BX. Selecting BX needs care
 to not collide with implicit use of BX by cmpxchg8b.

I skipped reviewing some of this because I don't know about those parts
of the x86 architecture.  What I saw and understood looks like it'll
work, although I have a couple of suggestions there.  See below.

Thanks for your diligence and creativity on this.

Jim

 
 Signed-off-by: Denys Vlasenko dvlas...@redhat.com
 CC: Jim Keniston jkeni...@us.ibm.com
 CC: Masami Hiramatsu masami.hiramatsu...@hitachi.com
 CC: Srikar Dronamraju sri...@linux.vnet.ibm.com
 CC: Ingo Molnar mi...@kernel.org
 CC: Oleg Nesterov o...@redhat.com
 ---
  arch/x86/kernel/uprobes.c | 154 
 ++
  1 file changed, 116 insertions(+), 38 deletions(-)
 
 diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
 index c41bb4b..6cb53d1 100644
 --- a/arch/x86/kernel/uprobes.c
 +++ b/arch/x86/kernel/uprobes.c
 @@ -41,8 +41,10 @@
  /* Instruction will modify TF, don't change it */
  #define UPROBE_FIX_SETF  0x04
 
 -#define UPROBE_FIX_RIP_AX0x08
 -#define UPROBE_FIX_RIP_CX0x10
 +#define UPROBE_FIX_RIP_SI0x08
 +#define UPROBE_FIX_RIP_DI0x10
 +#define UPROBE_FIX_RIP_BX0x20
 +#define UPROBE_FIX_RIP_ALL   0x38

I agree with Oleg's suggestion about how to define UPROBE_FIX_RIP_ALL.

 
  #define  UPROBE_TRAP_NR  UINT_MAX
 
 @@ -51,6 +53,8 @@
  #define OPCODE2(insn)((insn)-opcode.bytes[1])
  #define OPCODE3(insn)((insn)-opcode.bytes[2])
  #define MODRM_REG(insn)  X86_MODRM_REG((insn)-modrm.value)
 +#define VEX2_(insn)  X86_VEX_V((insn)-vex_prefix.bytes[1])
 +#define VEX3_(insn)  X86_VEX_V((insn)-vex_prefix.bytes[2])

I disclaim any knowledge about VEX* stuff.

 
  #define W(row, b0, b1, b2, b3, b4, b5, b6, b7, b8, b9, ba, bb, bc, bd, be, 
 bf)\
   (((b0##UL  0x0)|(b1##UL  0x1)|(b2##UL  0x2)|(b3##UL  0x3) |   \
 @@ -274,63 +278,137 @@ static inline bool is_64bit_mm(struct mm_struct *mm)
  static void riprel_analyze(struct arch_uprobe *auprobe, struct insn *insn)
  {
   u8 *cursor;
 + u8 can_use_regs;
   u8 reg;
 + int reg2;
 
   if (!insn_rip_relative(insn))
   return;
 
   /*
 -  * insn_rip_relative() would have decoded rex_prefix, modrm.
 +  * insn_rip_relative() would have decoded rex_prefix,
 +  * vex_prefix, modrm.
* Clear REX.b bit (extension of MODRM.rm field):
 -  * we want to encode rax/rcx, not r8/r9.
 +  * we want to encode low numbered reg, not r8+.
*/
   if (insn-rex_prefix.nbytes) {
   cursor = auprobe-insn + insn_offset_rex_prefix(insn);
 - *cursor = 0xfe;/* Clearing REX.B bit */
 + /* REX byte has 0100wrxb layout, clearing REX.b bit */
 + *cursor = 0xfe;
 + }

skipped this next part...

 + /* Similar treatment for VEX3 prefix */
 + /* TODO: add XOP/EVEX treatment when insn decoder supports them */
 + if (insn-vex_prefix.nbytes == 3) {
 + /*
 +  * vex2: c5rLpp   (has no b bit)
 +  * vex3/xop: c4/8f rxbm wLpp
 +  * evex: 62rxbR00mm.w1pp.zllBVaaa
 +  *   (evex will need setting of both b and x since
 +  *   in non-sib encoding evex.x is 4th bit of MODRM.rm)
 +  * Setting VEX3.b (setting because it has inverted meaning):
 +  */
 + cursor = auprobe-insn + insn_offset_vex_prefix(insn) + 1;
 + *cursor |= 0x20;
   }
 

... resumed reviewing here.

   /*
 +  * Convert from rip-relative addressing
 +  * to register-relative addressing via a scratch register.
 +  *
 +  * This is tricky since there are insns with modrm byte
 +  * which also use registers not encoded in modrm byte:
 +  * [i]div/[i]mul: implicitly use dx:ax
 +  * shift ops: implicitly use cx
 +  * cmpxchg: implicitly uses ax
 +  * cmpxchg8/16b: implicitly uses dx:ax and bx:cx
 +  *   Encoding: 0f c7/1 modrm
 +  *   The code below thinks that reg=1 (cx), chooses si as scratch.

I verified the above in the AMD manual, but cannot comment on this next
section...

 +  * mulx: implicitly uses dx: mulx r/m,r1,r2: r1:r2 = dx * r/m
 +  *   First appeared in Haswell (BMI2 insn). It is vex-encoded.
 +  *   Example where none of bx,cx,dx can be used as scratch reg:
 +  *   c4 e2 63