Re: [PATCH v7 18/26] x86/insn-eval: Add support to resolve 16-bit addressing encodings

2017-06-15 Thread Ricardo Neri
On Wed, 2017-06-07 at 18:28 +0200, Borislav Petkov wrote:
> On Fri, May 05, 2017 at 11:17:16AM -0700, Ricardo Neri wrote:
> > Tasks running in virtual-8086 mode or in protected mode with code
> > segment descriptors that specify 16-bit default address sizes via the
> > D bit will use 16-bit addressing form encodings as described in the Intel
> > 64 and IA-32 Architecture Software Developer's Manual Volume 2A Section
> > 2.1.5. 16-bit addressing encodings differ in several ways from the
> > 32-bit/64-bit addressing form encodings: ModRM.rm points to different
> > registers and, in some cases, effective addresses are indicated by the
> > addition of the value of two registers. Also, there is no support for SIB
> > bytes. Thus, a separate function is needed to parse this form of
> > addressing.
> > 
> > A couple of functions are introduced. get_reg_offset_16() obtains the
> > offset from the base of pt_regs of the registers indicated by the ModRM
> > byte of the address encoding. get_addr_ref_16() computes the linear
> > address indicated by the instructions using the value of the registers
> > given by ModRM as well as the base address of the segment.
> > 
> > Cc: Dave Hansen 
> > Cc: Adam Buchbinder 
> > Cc: Colin Ian King 
> > Cc: Lorenzo Stoakes 
> > Cc: Qiaowei Ren 
> > Cc: Arnaldo Carvalho de Melo 
> > Cc: Masami Hiramatsu 
> > Cc: Adrian Hunter 
> > Cc: Kees Cook 
> > Cc: Thomas Garnier 
> > Cc: Peter Zijlstra 
> > Cc: Borislav Petkov 
> > Cc: Dmitry Vyukov 
> > Cc: Ravi V. Shankar 
> > Cc: x...@kernel.org
> > Signed-off-by: Ricardo Neri 
> > ---
> >  arch/x86/lib/insn-eval.c | 155 
> > +++
> >  1 file changed, 155 insertions(+)
> > 
> > diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
> > index 9822061..928a662 100644
> > --- a/arch/x86/lib/insn-eval.c
> > +++ b/arch/x86/lib/insn-eval.c
> > @@ -431,6 +431,73 @@ static int get_reg_offset(struct insn *insn, struct 
> > pt_regs *regs,
> >  }
> >  
> >  /**
> > + * get_reg_offset_16 - Obtain offset of register indicated by instruction
> 
> Please end function names with parentheses.

I will correct.
> 
> > + * @insn:  Instruction structure containing ModRM and SiB bytes
> 
> s/SiB/SIB/g

I will correct.
> 
> > + * @regs:  Structure with register values as seen when entering kernel mode
> > + * @offs1: Offset of the first operand register
> > + * @offs2: Offset of the second opeand register, if applicable.
> > + *
> > + * Obtain the offset, in pt_regs, of the registers indicated by the ModRM 
> > byte
> > + * within insn. This function is to be used with 16-bit address encodings. 
> > The
> > + * offs1 and offs2 will be written with the offset of the two registers
> > + * indicated by the instruction. In cases where any of the registers is not
> > + * referenced by the instruction, the value will be set to -EDOM.
> > + *
> > + * Return: 0 on success, -EINVAL on failure.
> > + */
> > +static int get_reg_offset_16(struct insn *insn, struct pt_regs *regs,
> > +int *offs1, int *offs2)
> > +{
> > +   /* 16-bit addressing can use one or two registers */
> > +   static const int regoff1[] = {
> > +   offsetof(struct pt_regs, bx),
> > +   offsetof(struct pt_regs, bx),
> > +   offsetof(struct pt_regs, bp),
> > +   offsetof(struct pt_regs, bp),
> > +   offsetof(struct pt_regs, si),
> > +   offsetof(struct pt_regs, di),
> > +   offsetof(struct pt_regs, bp),
> > +   offsetof(struct pt_regs, bx),
> > +   };
> > +
> > +   static const int regoff2[] = {
> > +   offsetof(struct pt_regs, si),
> > +   offsetof(struct pt_regs, di),
> > +   offsetof(struct pt_regs, si),
> > +   offsetof(struct pt_regs, di),
> > +   -EDOM,
> > +   -EDOM,
> > +   -EDOM,
> > +   -EDOM,
> > +   };
> 
> You mean "Table 2-1. 16-Bit Addressing Forms with the ModR/M Byte" in
> the SDM, right?

Yes.
> 
> Please add a comment pointing to it here because it is not trivial to
> map that code to the documentation.

Sure, I will add a comment pointing to this table.

> 
> > +
> > +   if (!offs1 || !offs2)
> > +   return -EINVAL;
> > +
> > +   /* operand is a register, use the generic function */
> > +   if (X86_MODRM_MOD(insn->modrm.value) == 3) {
> > +   *offs1 = insn_get_modrm_rm_off(insn, regs);
> > +   *offs2 = -EDOM;
> > +   return 0;
> > +   }
> > +
> > +   *offs1 = regoff1[X86_MODRM_RM(insn->modrm.value)];
> > +   *offs2 = regoff2[X86_MODRM_RM(insn->modrm.value)];
> > +
> > +   /*
> > +* If no displacement is 

Re: [PATCH v7 16/26] x86/insn-eval: Support both signed 32-bit and 64-bit effective addresses

2017-06-15 Thread Ricardo Neri
On Wed, 2017-06-07 at 17:49 +0200, Borislav Petkov wrote:
> On Fri, May 05, 2017 at 11:17:14AM -0700, Ricardo Neri wrote:
> > @@ -697,18 +753,21 @@ void __user *insn_get_addr_ref(struct insn *insn, 
> > struct pt_regs *regs)
> >  {
> > unsigned long linear_addr, seg_base_addr, seg_limit;
> > long eff_addr, base, indx;
> > -   int addr_offset, base_offset, indx_offset;
> > +   int addr_offset, base_offset, indx_offset, addr_bytes;
> > insn_byte_t sib;
> >  
> > insn_get_modrm(insn);
> > insn_get_sib(insn);
> > sib = insn->sib.value;
> > +   addr_bytes = insn->addr_bytes;
> >  
> > if (X86_MODRM_MOD(insn->modrm.value) == 3) {
> > addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
> > if (addr_offset < 0)
> > goto out_err;
> > -   eff_addr = regs_get_register(regs, addr_offset);
> > +   eff_addr = get_mem_offset(regs, addr_offset, addr_bytes);
> > +   if (eff_addr == -1L)
> > +   goto out_err;
> > seg_base_addr = insn_get_seg_base(regs, insn, addr_offset);
> > if (seg_base_addr == -1L)
> > goto out_err;
> 
> This code here is too dense, it needs spacing for better readability.

I have spaced out in my upcoming version.

Thanks and BR,
Ricardo

--
To unsubscribe from this list: send the line "unsubscribe linux-msdos" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 14/26] x86/insn-eval: Indicate a 32-bit displacement if ModRM.mod is 0 and ModRM.rm is 5

2017-06-15 Thread Ricardo Neri
On Wed, 2017-06-07 at 15:15 +0200, Borislav Petkov wrote:
> On Fri, May 05, 2017 at 11:17:12AM -0700, Ricardo Neri wrote:
> > Section 2.2.1.3 of the Intel 64 and IA-32 Architectures Software
> > Developer's Manual volume 2A states that when ModRM.mod is zero and
> > ModRM.rm is 101b, a 32-bit displacement follows the ModRM byte. This means
> > that none of the registers are used in the computation of the effective
> > address. A return value of -EDOM signals callers that they should not use
> > the value of registers when computing the effective address for the
> > instruction.
> > 
> > In IA-32e 64-bit mode (long mode), the effective address is given by the
> > 32-bit displacement plus the value of RIP of the next instruction.
> > In IA-32e compatibility mode (protected mode), only the displacement is
> > used.
> > 
> > The instruction decoder takes care of obtaining the displacement.
> 
> ...
> 
> > diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
> > index 693e5a8..4f600de 100644
> > --- a/arch/x86/lib/insn-eval.c
> > +++ b/arch/x86/lib/insn-eval.c
> > @@ -379,6 +379,12 @@ static int get_reg_offset(struct insn *insn, struct 
> > pt_regs *regs,
> > switch (type) {
> > case REG_TYPE_RM:
> > regno = X86_MODRM_RM(insn->modrm.value);
> 
> 
> < newline here.

Will add the new line.
> 
> > +   /*
> > +* ModRM.mod == 0 and ModRM.rm == 5 means a 32-bit displacement
> > +* follows the ModRM byte.
> > +*/
> > +   if (!X86_MODRM_MOD(insn->modrm.value) && regno == 5)
> > +   return -EDOM;
> > if (X86_REX_B(insn->rex_prefix.value))
> > regno += 8;
> > break;
> > @@ -730,9 +736,21 @@ void __user *insn_get_addr_ref(struct insn *insn, 
> > struct pt_regs *regs)
> > eff_addr = base + indx * (1 << X86_SIB_SCALE(sib));
> > } else {
> > addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
> > -   if (addr_offset < 0)
> 
> ditto.

Will add the new line.
> 
> > +   /*
> > +* -EDOM means that we must ignore the address_offset.
> > +* In such a case, in 64-bit mode the effective address
> > +* relative to the RIP of the following instruction.
> > +*/
> > +   if (addr_offset == -EDOM) {
> > +   eff_addr = 0;
> > +   if (user_64bit_mode(regs))
> > +   eff_addr = (long)regs->ip +
> > +  insn->length;
> 
> Let that line stick out and write it balanced:
> 
> if (addr_offset == -EDOM) {
> if (user_64bit_mode(regs))
> eff_addr = (long)regs->ip + 
> insn->length;
> else
> eff_addr = 0;
> 
> should be easier parseable this way.

Will rewrite as you suggest.

Thanks and BR,
Ricardo

--
To unsubscribe from this list: send the line "unsubscribe linux-msdos" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 13/26] x86/insn-eval: Add function to get default params of code segment

2017-06-15 Thread Ricardo Neri
On Wed, 2017-06-07 at 14:59 +0200, Borislav Petkov wrote:
> On Fri, May 05, 2017 at 11:17:11AM -0700, Ricardo Neri wrote:
> > This function returns the default values of the address and operand sizes
> > as specified in the segment descriptor. This information is determined
> > from the D and L bits. Hence, it can be used for both IA-32e 64-bit and
> > 32-bit legacy modes. For virtual-8086 mode, the default address and
> > operand sizes are always 2 bytes.
> > 
> > The D bit is only meaningful for code segments. Thus, these functions
> > always use the code segment selector contained in regs.
> > 
> > Cc: Dave Hansen 
> > Cc: Adam Buchbinder 
> > Cc: Colin Ian King 
> > Cc: Lorenzo Stoakes 
> > Cc: Qiaowei Ren 
> > Cc: Arnaldo Carvalho de Melo 
> > Cc: Masami Hiramatsu 
> > Cc: Adrian Hunter 
> > Cc: Kees Cook 
> > Cc: Thomas Garnier 
> > Cc: Peter Zijlstra 
> > Cc: Borislav Petkov 
> > Cc: Dmitry Vyukov 
> > Cc: Ravi V. Shankar 
> > Cc: x...@kernel.org
> > Signed-off-by: Ricardo Neri 
> > ---
> >  arch/x86/include/asm/insn-eval.h |  6 
> >  arch/x86/lib/insn-eval.c | 65 
> > 
> >  2 files changed, 71 insertions(+)
> > 
> > diff --git a/arch/x86/include/asm/insn-eval.h 
> > b/arch/x86/include/asm/insn-eval.h
> > index 7f3c7fe..9ed1c88 100644
> > --- a/arch/x86/include/asm/insn-eval.h
> > +++ b/arch/x86/include/asm/insn-eval.h
> > @@ -11,9 +11,15 @@
> >  #include 
> >  #include 
> >  
> > +struct insn_code_seg_defaults {
> 
> A whole struct for a function which gets called only once?
> 
> Bah, that's a bit too much, if you ask me.
> 
> So you're returning two small unsigned integers - i.e., you can just as
> well return a single u8 and put address and operand sizes in there:
> 
>   ret = oper_sz | addr_sz << 4;
> 
> No need for special structs for that.

OK. This makes sense. Perhaps I can use a couple of #define's to set and
get the the address and operand sizes in a single u8. This would make
the code more readable.

> 
> > +   unsigned char address_bytes;
> > +   unsigned char operand_bytes;
> > +};
> > +
> >  void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs);
> >  int insn_get_modrm_rm_off(struct insn *insn, struct pt_regs *regs);
> >  unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn,
> > int regoff);
> > +struct insn_code_seg_defaults insn_get_code_seg_defaults(struct pt_regs 
> > *regs);
> >  
> >  #endif /* _ASM_X86_INSN_EVAL_H */
> > diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
> > index c77ed80..693e5a8 100644
> > --- a/arch/x86/lib/insn-eval.c
> > +++ b/arch/x86/lib/insn-eval.c
> > @@ -603,6 +603,71 @@ static unsigned long get_seg_limit(struct pt_regs 
> > *regs, struct insn *insn,
> >  }
> >  
> >  /**
> > + * insn_get_code_seg_defaults() - Obtain code segment default parameters
> > + * @regs:  Structure with register values as seen when entering kernel mode
> > + *
> > + * Obtain the default parameters of the code segment: address and operand 
> > sizes.
> > + * The code segment is obtained from the selector contained in the CS 
> > register
> > + * in regs. In protected mode, the default address is determined by 
> > inspecting
> > + * the L and D bits of the segment descriptor. In virtual-8086 mode, the 
> > default
> > + * is always two bytes for both address and operand sizes.
> > + *
> > + * Return: A populated insn_code_seg_defaults structure on success. The
> > + * structure contains only zeros on failure.
> 
> s/failure/error/

Will correct.
> 
> > + */
> > +struct insn_code_seg_defaults insn_get_code_seg_defaults(struct pt_regs 
> > *regs)
> > +{
> > +   struct desc_struct *desc;
> > +   struct insn_code_seg_defaults defs;
> > +   unsigned short sel;
> > +   /*
> > +* The most significant byte of AR_TYPE_MASK determines whether a
> > +* segment contains data or code.
> > +*/
> > +   unsigned int type_mask = AR_TYPE_MASK & (1 << 11);
> > +
> > +   memset(, 0, sizeof(defs));
> > +
> > +   if (v8086_mode(regs)) {
> > +   defs.address_bytes = 2;
> > +   defs.operand_bytes = 2;
> > +   return defs;
> > +   }
> > +
> > +   sel = (unsigned short)regs->cs;
> > +
> > +   desc = get_desc(sel);
> > +   if (!desc)
> > +   return defs;
> > +
> > +   /* if data segment, return */
> > +   if (!(desc->b & type_mask))
> > +   return defs;
> 
> So you can simplify that into:
> 
>   /* A code segment? */
>   if (!(desc->b & BIT(11)))
>   return defs;
> 
> and remove that type_mask thing.

Alternatively, I can do desc->type & BIT(3) to avoid 

Re: [PATCH v7 10/26] x86/insn-eval: Add utility functions to get segment selector

2017-06-15 Thread Ricardo Neri
On Thu, 2017-06-15 at 11:37 -0700, Ricardo Neri wrote:
> > Yuck, didn't we talk about this already?
> 
> I am sorry Borislav. I thought you agreed that I could use the values
> of
> the segment override prefixes to identify the segment registers [1].

This time with the reference:
[1]. https://lkml.org/lkml/2017/5/5/377


--
To unsubscribe from this list: send the line "unsubscribe linux-msdos" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 10/26] x86/insn-eval: Add utility functions to get segment selector

2017-06-15 Thread Ricardo Neri
On Tue, 2017-05-30 at 12:35 +0200, Borislav Petkov wrote:
> On Fri, May 05, 2017 at 11:17:08AM -0700, Ricardo Neri wrote:
> > When computing a linear address and segmentation is used, we need to know
> > the base address of the segment involved in the computation. In most of
> > the cases, the segment base address will be zero as in USER_DS/USER32_DS.
> > However, it may be possible that a user space program defines its own
> > segments via a local descriptor table. In such a case, the segment base
> > address may not be zero .Thus, the segment base address is needed to
> > calculate correctly the linear address.
> > 
> > The segment selector to be used when computing a linear address is
> > determined by either any of segment override prefixes in the
> > instruction or inferred from the registers involved in the computation of
> > the effective address; in that order. Also, there are cases when the
> > overrides shall be ignored (code segments are always selected by the CS
> > segment register; string instructions always use the ES segment register
> > along with the EDI register).
> > 
> > For clarity, this process can be split into two steps: resolving the
> > relevant segment register to use and, once known, read its value to
> > obtain the segment selector.
> > 
> > The method to obtain the segment selector depends on several factors. In
> > 32-bit builds, segment selectors are saved into the pt_regs structure
> > when switching to kernel mode. The same is also true for virtual-8086
> > mode. In 64-bit builds, segmentation is mostly ignored, except when
> > running a program in 32-bit legacy mode. In this case, CS and SS can be
> > obtained from pt_regs. DS, ES, FS and GS can be read directly from
> > the respective segment registers.
> > 
> > Lastly, the only two segment registers that are not ignored in long mode
> > are FS and GS. In these two cases, base addresses are obtained from the
> > respective MSRs.
> > 
> > Cc: Dave Hansen 
> > Cc: Adam Buchbinder 
> > Cc: Colin Ian King 
> > Cc: Lorenzo Stoakes 
> > Cc: Qiaowei Ren 
> > Cc: Arnaldo Carvalho de Melo 
> > Cc: Masami Hiramatsu 
> > Cc: Adrian Hunter 
> > Cc: Kees Cook 
> > Cc: Thomas Garnier 
> > Cc: Peter Zijlstra 
> > Cc: Borislav Petkov 
> > Cc: Dmitry Vyukov 
> > Cc: Ravi V. Shankar 
> > Cc: x...@kernel.org
> > Signed-off-by: Ricardo Neri 
> > ---
> >  arch/x86/lib/insn-eval.c | 256 
> > +++
> >  1 file changed, 256 insertions(+)
> > 
> > diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
> > index 1634762..0a496f4 100644
> > --- a/arch/x86/lib/insn-eval.c
> > +++ b/arch/x86/lib/insn-eval.c
> > @@ -9,6 +9,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  
> >  enum reg_type {
> > REG_TYPE_RM = 0,
> > @@ -33,6 +34,17 @@ enum string_instruction {
> > SCASW_SCASD = 0xaf,
> >  };
> >  
> > +enum segment_register {
> > +   SEG_REG_INVAL = -1,
> > +   SEG_REG_IGNORE = 0,
> > +   SEG_REG_CS = 0x23,
> > +   SEG_REG_SS = 0x36,
> > +   SEG_REG_DS = 0x3e,
> > +   SEG_REG_ES = 0x26,
> > +   SEG_REG_FS = 0x64,
> > +   SEG_REG_GS = 0x65,
> > +};
> 
> Yuck, didn't we talk about this already?

I am sorry Borislav. I thought you agreed that I could use the values of
the segment override prefixes to identify the segment registers [1].
> 
> Those are segment override prefixes so call them as such.
> 
> #define SEG_OVR_PFX_CS0x23
> #define SEG_OVR_PFX_SS0x36
> ...
> 
> and we already have those!
> 
> arch/x86/include/asm/inat.h:
> ...
> #define INAT_PFX_CS 5   /* 0x2E */
> #define INAT_PFX_DS 6   /* 0x3E */
> #define INAT_PFX_ES 7   /* 0x26 */
> #define INAT_PFX_FS 8   /* 0x64 */
> #define INAT_PFX_GS 9   /* 0x65 */
> #define INAT_PFX_SS 10  /* 0x36 */
> 
> well, kinda, they're numbers there and not the actual prefix values.

These numbers can 'translated' to the actual value of the prefixes via
inat_get_opcode_attribute(). In my next version I am planning to use
these function and reuse the aforementioned definitions.

> 
> And then there's:
> 
> arch/x86/kernel/uprobes.c::is_prefix_bad() which looks at some of those.
> 
> Please add your defines to inat.h

Will do.

> and make that function is_prefix_bad()
> use them instead of naked numbers. We need to pay attention to all those
> different things needing to look at insn opcodes and not let them go
> unwieldy by each defining and duplicating stuff.

I have implemented this change and will be part of my next version.
> 
> >  /**
> >   * is_string_instruction - Determine if instruction is a string