instruction-analysis API(s)

2009-02-04 Thread Jim Keniston
Hi, Roland.  Back in a conference call in December, we discussed
approaches to refactoring utrace-related code such as uprobes, to
make some of the services provided there more generally available.
In particular, you suggested an "instruction analysis" service that
various subsystems could exploit -- kprobes and uprobes/ubp at first,
and eventually perhaps gdb, perfmon, kvm, ftrace, and djprobes.

I decided to survey the kernel for subsystems that parse and/or analyze
CPU instructions.  I hoped to review the various approaches -- perhaps
finding one that's widely accepted -- and to evaluate possible clients
for our instruction-analysis service.

The results were discouraging, as summarized below.  I see no
promise of an architecture-agnostic instruction-analysis API.
Within each architecture, I think the best we could do would be an
(architecture-specific) instruction-parsing API.  (And even within
an architecture, different subsystems look at different aspects of
an instruction.)

Srikar Dronamraju and I are exploring two different approaches to an
x86 instruction-parsing service.  Since x86 kvm seems to have one of
the most systematic and thorough approaches, Srikar is prototyping a
generalization of kvm's x86_decode_insn() to make it support kprobes,
and eventually uprobes.  (Note that kvm does NOT appear to be a good
starting place on powerpc and s390.)  Approaching from the minimalist
side, I've implemented an x86 instruction-parsing API with just enough
smarts (so far) to support kprobes and uprobes.

We'd be interested to know whether these efforts are consistent
with what you have in mind.

See more details below.

Jim

Intro
-
"Instruction analysis" refers to the analysis of a CPU instruction
in the kernel or a user program.  Typically, the instruction must
be analyzed so that it can be properly emulated (in the case of
SSOL, by executing the same instruction at a different address),
or so a fault caused by the instruction can be properly handled.
There are other uses as well -- see below.

Possible Clients of an Instruction-Analysis Service
---
Where in the kernel is instruction analysis currently used?
- kprobes (arm, avr32, ia64, mn10300, powerpc, s390, sh, sparc, x86)
- uprobes (ia64, powerpc, s390, x86)
- hypervisors:
- kvm (ia64, powerpc, s390, x86)
- powerpc Cell Beat hypervisor
- floating-point unit emulation (arm, s390, sparc, x86)
- exception handling:
- page fault (powerpc, x86)
- illegal instruction (s390)
- unaligned trap (ia64)
- vm86 fault (x86)
- disassembly (powerpc, s390)
- powerpc: xmon, code patching (for crash dump?)
- ia64: emulation of brl instruction
- x86: alternative-instruction patching (replacing instructions that are
inappropriate for the CPU rev), fault injection
- djprobes (not in kernel, not sure of status)

Note: I looked in detail only at the architectures that implement
kprobes: arm, avr32, ia64, mn10300, powerpc, s390, sh, sparc, and x86.
And I note in passing that sh does a lot of instruction analysis --
as does mips -- but I skipped sh for now.

Note: Roland also listed gdb, perfmon, and ftrace as subsystems that
do instruction analysis.  I think that oprofile has also been suggested.
- I haven't investigated gdb, but I have no reason to think Roland is
wrong about it.
- I've looked briefly at the various components of perfmon[2] and
oprofile, but I don't see any instruction analysis per se; and the
perfmon/oprofile expert I asked (IBM's Carl Love) isn't aware of any.
- Similarly, I don't see instruction analysis per se in ftrace.

Prospects/Problems
--
What are the prospects for adapting these various subsystems to use
a common instruction-analysis service?  Typically, not very good.
Here are some of the problems:
- Different architectures have very different instruction-analysis
needs.
- Different architectures have very different instruction formats and
instruction attributes.  Consequently, the opportunities for common
code shared by multiple architectures are few.
- Different subsystems are interested in different instruction
attributes, and/or classify instructions differently.
- Some subsystems are interested in only certain instructions.
- Some subsystems, such as fault handlers, want to maximize efficiency
by examining as little of the instruction as possible; while others,
such as *probes, take a more leisurely approach (e.g., reading enough
bytes to capture the largest possible instruction, even if that means
faulting in a page).



Re: instruction-analysis API(s)

2009-02-06 Thread Masami Hiramatsu
Hi Jim,

I'm also interested in the instruction decoder.
If you don't mind, could we share the API specification?
I'd like to port djprobe on it.

Thanks!

Jim Keniston wrote:
> Hi, Roland.  Back in a conference call in December, we discussed
> approaches to refactoring utrace-related code such as uprobes, to
> make some of the services provided there more generally available.
> In particular, you suggested an "instruction analysis" service that
> various subsystems could exploit -- kprobes and uprobes/ubp at first,
> and eventually perhaps gdb, perfmon, kvm, ftrace, and djprobes.
> 
> I decided to survey the kernel for subsystems that parse and/or analyze
> CPU instructions.  I hoped to review the various approaches -- perhaps
> finding one that's widely accepted -- and to evaluate possible clients
> for our instruction-analysis service.
> 
> The results were discouraging, as summarized below.  I see no
> promise of an architecture-agnostic instruction-analysis API.
> Within each architecture, I think the best we could do would be an
> (architecture-specific) instruction-parsing API.  (And even within
> an architecture, different subsystems look at different aspects of
> an instruction.)
> 
> Srikar Dronamraju and I are exploring two different approaches to an
> x86 instruction-parsing service.  Since x86 kvm seems to have one of
> the most systematic and thorough approaches, Srikar is prototyping a
> generalization of kvm's x86_decode_insn() to make it support kprobes,
> and eventually uprobes.  (Note that kvm does NOT appear to be a good
> starting place on powerpc and s390.)  Approaching from the minimalist
> side, I've implemented an x86 instruction-parsing API with just enough
> smarts (so far) to support kprobes and uprobes.
> 
> We'd be interested to know whether these efforts are consistent
> with what you have in mind.
> 
> See more details below.
> 
> Jim
> 
> Intro
> -
> "Instruction analysis" refers to the analysis of a CPU instruction
> in the kernel or a user program.  Typically, the instruction must
> be analyzed so that it can be properly emulated (in the case of
> SSOL, by executing the same instruction at a different address),
> or so a fault caused by the instruction can be properly handled.
> There are other uses as well -- see below.
> 
> Possible Clients of an Instruction-Analysis Service
> ---
> Where in the kernel is instruction analysis currently used?
> - kprobes (arm, avr32, ia64, mn10300, powerpc, s390, sh, sparc, x86)
> - uprobes (ia64, powerpc, s390, x86)
> - hypervisors:
>   - kvm (ia64, powerpc, s390, x86)
>   - powerpc Cell Beat hypervisor
> - floating-point unit emulation (arm, s390, sparc, x86)
> - exception handling:
>   - page fault (powerpc, x86)
>   - illegal instruction (s390)
>   - unaligned trap (ia64)
>   - vm86 fault (x86)
> - disassembly (powerpc, s390)
> - powerpc: xmon, code patching (for crash dump?)
> - ia64: emulation of brl instruction
> - x86: alternative-instruction patching (replacing instructions that are
> inappropriate for the CPU rev), fault injection
> - djprobes (not in kernel, not sure of status)
> 
> Note: I looked in detail only at the architectures that implement
> kprobes: arm, avr32, ia64, mn10300, powerpc, s390, sh, sparc, and x86.
> And I note in passing that sh does a lot of instruction analysis --
> as does mips -- but I skipped sh for now.
> 
> Note: Roland also listed gdb, perfmon, and ftrace as subsystems that
> do instruction analysis.  I think that oprofile has also been suggested.
> - I haven't investigated gdb, but I have no reason to think Roland is
> wrong about it.
> - I've looked briefly at the various components of perfmon[2] and
> oprofile, but I don't see any instruction analysis per se; and the
> perfmon/oprofile expert I asked (IBM's Carl Love) isn't aware of any.
> - Similarly, I don't see instruction analysis per se in ftrace.
> 
> Prospects/Problems
> --
> What are the prospects for adapting these various subsystems to use
> a common instruction-analysis service?  Typically, not very good.
> Here are some of the problems:
> - Different architectures have very different instruction-analysis
> needs.
> - Different architectures have very different instruction formats and
> instruction attributes.  Consequently, the opportunities for common
> code shared by multiple architectures are few.
> - Different subsystems are interested in different instruction
> attributes, and/or classify instructions differently.
> - Some subsystems are interested in only certain instructions.
> - Some subsystems, such as fault handlers, want to maximize efficiency
> by examining as little of the instruction as possible; while others,
> such as *probes, take a more leisurely approach (e.g., reading enough
> bytes to capture the largest possible instruction, even if that means
> faulting in a page).
> 

-- 
Masami Hiramatsu

Software Engineer
Hitachi Com

Re: instruction-analysis API(s)

2009-02-06 Thread Jim Keniston
On Fri, 2009-02-06 at 15:49 -0500, Masami Hiramatsu wrote:
> Hi Jim,
> 
> I'm also interested in the instruction decoder.
> If you don't mind, could we share the API specification?
> I'd like to port djprobe on it.

I'm enclosing the little x86 instruction-analysis protoype I hacked
together (insn_x86.*), along with a copy of systemtap's
runtime/uprobes2/uprobes_x86.c, which I modified to use it.

But again, we haven't really settled on an API.  For example, my x86
prototype doesn't collect all the info that kvm needs.  We're thinking
that adapting some existing code (like kvm in the x86 case) might be
more palatable to LKML.

Jim

> 
> Thanks!
> 
> Jim Keniston wrote:
> > Hi, Roland.  Back in a conference call in December, we discussed
> > approaches to refactoring utrace-related code such as uprobes, to
> > make some of the services provided there more generally available.
> > In particular, you suggested an "instruction analysis" service that
> > various subsystems could exploit -- kprobes and uprobes/ubp at first,
> > and eventually perhaps gdb, perfmon, kvm, ftrace, and djprobes.
> > 
...
> > Srikar Dronamraju and I are exploring two different approaches to an
> > x86 instruction-parsing service.  Since x86 kvm seems to have one of
> > the most systematic and thorough approaches, Srikar is prototyping a
> > generalization of kvm's x86_decode_insn() to make it support kprobes,
> > and eventually uprobes.  (Note that kvm does NOT appear to be a good
> > starting place on powerpc and s390.)  Approaching from the minimalist
> > side, I've implemented an x86 instruction-parsing API with just enough
> > smarts (so far) to support kprobes and uprobes.
> > 
> > We'd be interested to know whether these efforts are consistent
> > with what you have in mind.
> > 
> > See more details below.
> > 
> > Jim
...
/*
 * x86 instruction analysis
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with this program; if not, write to the Free Software
 * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
 *
 * Copyright (C) IBM Corporation, 2002, 2004, 2009
 */

#include 
// #include 
#include "insn_x86.h"

/**
 * insn_init() - initialize struct insn
 * @insn:	&struct insn to be initialized
 * @kaddr:	address (in kernel memory) of instruction (or copy thereof)
 * @x86_64:	true for 64-bit kernel or 64-bit app
 */
void insn_init(struct insn *insn, const u8 *kaddr, bool x86_64)
{
	memset(insn, 0, sizeof(*insn));
	insn->kaddr = kaddr;
	insn->next_byte = kaddr;
	insn->x86_64 = x86_64;
}
EXPORT_SYMBOL_GPL(insn_init);

/**
 * insn_get_prefixes - scan x86 instruction prefix bytes
 * @insn:	&struct insn containing instruction
 *
 * Populates the @insn->prefixes bitmap, and updates @insn->next_byte
 * to point to the (first) opcode.  No effect if @insn->prefixes.got
 * is already true.
 */
void insn_get_prefixes(struct insn *insn)
{
	u32 pfx;
	struct insn_field *prefixes = &insn->prefixes;
	if (prefixes->got)
		return;
	for (;; insn->next_byte++, prefixes->nbytes++) {
		u8 b = *(insn->next_byte);
#ifdef CONFIG_X86_64
		if ((b & 0xf0) == 0x40 && insn->x86_64) {
			prefixes->value |= X86_PFX_REX;
			prefixes->value |= (b & 0x0f) * X86_PFX_REX_BASE;
			/* REX prefix is always last. */
			break;
		}
#endif
		switch (b) {
		case 0x26:	pfx = X86_PFX_ES;	break;
		case 0x2E:	pfx = X86_PFX_CS;	break;
		case 0x36:	pfx = X86_PFX_SS;	break;
		case 0x3E:	pfx = X86_PFX_DS;	break;
		case 0x64:	pfx = X86_PFX_FS;	break;
		case 0x65:	pfx = X86_PFX_GS;	break;
		case 0x66:	pfx = X86_PFX_OPNDSZ;	break;
		case 0x67:	pfx = X86_PFX_ADDRSZ;	break;
		case 0xF0:	pfx = X86_PFX_LOCK;	break;
		case 0xF2:	pfx = X86_PFX_REPNE;	break;
		case 0xF3:	pfx = X86_PFX_REPE;	break;
		default:	pfx = 0x0;		break;
		}
		if (!pfx)
			break;
		prefixes->value |= pfx;
	}
	prefixes->got = true;
}
EXPORT_SYMBOL_GPL(insn_get_prefixes);

/**
 * insn_get_opcode - collect opcode(s)
 * @insn:	&struct insn containing instruction
 *
 * Populates @insn->opcode1 (and @insn->opcode2, if it's a 2-byte opcode)
 * and updates @insn->next_byte to point past the opcode byte(s).
 * If necessary, first collects any preceding (prefix) bytes.
 * Sets @insn->opcode.value = opcode1.  No effect if @insn->opcode.got
 * is already true.
 */
void insn_get_opcode(struct insn *insn)
{
	struct insn_field *opcode = &insn->opcode;
	if (opcode->got)
		return;
	if (!insn->prefixes.got)
		insn_get_prefixes(insn);
	insn->opcode1 = *insn->next_byte++;
	if (insn->opcod

Re: instruction-analysis API(s)

2009-02-06 Thread Masami Hiramatsu
Jim Keniston wrote:
[...]
> Possible Clients of an Instruction-Analysis Service
> ---
> Where in the kernel is instruction analysis currently used?

I think we also need to clarify why they need it(what information/action
they require), because it defines API.

> - kprobes (arm, avr32, ia64, mn10300, powerpc, s390, sh, sparc, x86)
> - uprobes (ia64, powerpc, s390, x86)
> - djprobes (not in kernel, not sure of status)

They need 'static' analysis of instructions to get below parameters.
 - length
 - attribute (prefixes, etc)
 - type (jump/accumulation/memory access/flag change, etc)

> - hypervisors:
>   - kvm (ia64, powerpc, s390, x86)
>   - powerpc Cell Beat hypervisor
> - floating-point unit emulation (arm, s390, sparc, x86)

They need 'dynamic' instruction emulation.

> - exception handling:
>   - page fault (powerpc, x86)
>   - illegal instruction (s390)
>   - unaligned trap (ia64)
>   - vm86 fault (x86)

Depends on the case, however, some of them just need instruction
type and length, and these should be done very quickly.
So, they need a light-weight and specialized analyzer/emulator.

> - disassembly (powerpc, s390)
> - powerpc: xmon, code patching (for crash dump?)

Maybe, static analysis is enough?

> - ia64: emulation of brl instruction

Dynamic emulation.

> - x86: alternative-instruction patching (replacing instructions that are
> inappropriate for the CPU rev), fault injection

Static analysis.

[...]
> Prospects/Problems
> --
> What are the prospects for adapting these various subsystems to use
> a common instruction-analysis service?  Typically, not very good.
> Here are some of the problems:
> - Different architectures have very different instruction-analysis
> needs.

IMHO, there are just need two types of interfaces: static analyzer
or dynamic emulator.

> - Different architectures have very different instruction formats and
> instruction attributes.  Consequently, the opportunities for common
> code shared by multiple architectures are few.
> - Different subsystems are interested in different instruction
> attributes, and/or classify instructions differently.
> - Some subsystems are interested in only certain instructions.

Indeed. I think we don't need to care all of them at the start point.
Just starting simply and evolving code on upstream is my recommendation.

> - Some subsystems, such as fault handlers, want to maximize efficiency
> by examining as little of the instruction as possible; while others,
> such as *probes, take a more leisurely approach (e.g., reading enough
> bytes to capture the largest possible instruction, even if that means
> faulting in a page).

Indeed. I think those efficiency-required subsystems are so arch-dependent
that we can (just) shares instruction bitmaps or provide special interfaces.

Thank you for your work!

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: mhira...@redhat.com



Re: instruction-analysis API(s)

2009-02-09 Thread Masami Hiramatsu
Jim Keniston wrote:
> On Fri, 2009-02-06 at 15:49 -0500, Masami Hiramatsu wrote:
>> Hi Jim,
>>
>> I'm also interested in the instruction decoder.
>> If you don't mind, could we share the API specification?
>> I'd like to port djprobe on it.
> 
> I'm enclosing the little x86 instruction-analysis protoype I hacked
> together (insn_x86.*), along with a copy of systemtap's
> runtime/uprobes2/uprobes_x86.c, which I modified to use it.

Hmm, actually, djprobe needs both of the length and the type of
instructions, since it has to know how many bytes must be copied
and be replaced by a long jump.

> But again, we haven't really settled on an API.  For example, my x86
> prototype doesn't collect all the info that kvm needs.  We're thinking
> that adapting some existing code (like kvm in the x86 case) might be
> more palatable to LKML.

Sure, since kvm and emulators have to fetch the values of src/dst
for the emulation, they need actual register values. On the other hand,
the disasm/*probe have to analysis code before hitting, so they
don't know the actual value of the registers.

So, I think we should split x86_decode_insn() into 2 parts, static
analysis and emulation preparation.

For example:
1) analyzing code statically (x86_analyze_insn)
   - just decoding an instruction
   - this phase may consist of several sub-functions.

2) preparing emulation (x86_evaluate_insn)
   - evaluating src/dst based on current(vcpu) registers

3) executing emulation (x86_emulate_insn)
   - emulating an analyzed instruction

Thanks,

> 
> Jim
> 
>> Thanks!
>>
>> Jim Keniston wrote:
>>> Hi, Roland.  Back in a conference call in December, we discussed
>>> approaches to refactoring utrace-related code such as uprobes, to
>>> make some of the services provided there more generally available.
>>> In particular, you suggested an "instruction analysis" service that
>>> various subsystems could exploit -- kprobes and uprobes/ubp at first,
>>> and eventually perhaps gdb, perfmon, kvm, ftrace, and djprobes.
>>>
> ...
>>> Srikar Dronamraju and I are exploring two different approaches to an
>>> x86 instruction-parsing service.  Since x86 kvm seems to have one of
>>> the most systematic and thorough approaches, Srikar is prototyping a
>>> generalization of kvm's x86_decode_insn() to make it support kprobes,
>>> and eventually uprobes.  (Note that kvm does NOT appear to be a good
>>> starting place on powerpc and s390.)  Approaching from the minimalist
>>> side, I've implemented an x86 instruction-parsing API with just enough
>>> smarts (so far) to support kprobes and uprobes.
>>>
>>> We'd be interested to know whether these efforts are consistent
>>> with what you have in mind.
>>>
>>> See more details below.
>>>
>>> Jim
> ...
> 

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: mhira...@redhat.com



Re: instruction-analysis API(s)

2009-02-09 Thread Ananth N Mavinakayanahalli
On Mon, Feb 09, 2009 at 06:05:56PM -0500, Masami Hiramatsu wrote:
> Jim Keniston wrote:
> > On Fri, 2009-02-06 at 15:49 -0500, Masami Hiramatsu wrote:
> >> Hi Jim,
> >>
> >> I'm also interested in the instruction decoder.
> >> If you don't mind, could we share the API specification?
> >> I'd like to port djprobe on it.
> > 
> > I'm enclosing the little x86 instruction-analysis protoype I hacked
> > together (insn_x86.*), along with a copy of systemtap's
> > runtime/uprobes2/uprobes_x86.c, which I modified to use it.
> 
> Hmm, actually, djprobe needs both of the length and the type of
> instructions, since it has to know how many bytes must be copied
> and be replaced by a long jump.
> 
> > But again, we haven't really settled on an API.  For example, my x86
> > prototype doesn't collect all the info that kvm needs.  We're thinking
> > that adapting some existing code (like kvm in the x86 case) might be
> > more palatable to LKML.
> 
> Sure, since kvm and emulators have to fetch the values of src/dst
> for the emulation, they need actual register values. On the other hand,
> the disasm/*probe have to analysis code before hitting, so they
> don't know the actual value of the registers.
> 
> So, I think we should split x86_decode_insn() into 2 parts, static
> analysis and emulation preparation.
> 
> For example:
> 1) analyzing code statically (x86_analyze_insn)
>- just decoding an instruction
>- this phase may consist of several sub-functions.
> 
> 2) preparing emulation (x86_evaluate_insn)
>- evaluating src/dst based on current(vcpu) registers
> 
> 3) executing emulation (x86_emulate_insn)
>- emulating an analyzed instruction

Right, that surely sounds like the way to go. However, we've been
cautioned that the instruction emulation area of the kvm code is very
performance sensitive. But, there is no harm in prototyping the above
and then worrying about any optimizations so there isn't a performance
issue -- in any case, I guess [ku]probes are very infrequent users of
this compared to KVM.

Ananth



Re: instruction-analysis API(s)

2009-02-25 Thread Jim Keniston
On Tue, 2009-02-10 at 10:12 +0530, Ananth N Mavinakayanahalli wrote:
> On Mon, Feb 09, 2009 at 06:05:56PM -0500, Masami Hiramatsu wrote:
> > Jim Keniston wrote:
> > > On Fri, 2009-02-06 at 15:49 -0500, Masami Hiramatsu wrote:
> > >> Hi Jim,
> > >>
> > >> I'm also interested in the instruction decoder.
> > >> If you don't mind, could we share the API specification?
> > >> I'd like to port djprobe on it.
> > > 
> > > I'm enclosing the little x86 instruction-analysis protoype I hacked
> > > together (insn_x86.*), along with a copy of systemtap's
> > > runtime/uprobes2/uprobes_x86.c, which I modified to use it.
> > 
> > Hmm, actually, djprobe needs both of the length and the type of
> > instructions, since it has to know how many bytes must be copied
> > and be replaced by a long jump.
> > 
> > > But again, we haven't really settled on an API.  For example, my x86
> > > prototype doesn't collect all the info that kvm needs.  We're thinking
> > > that adapting some existing code (like kvm in the x86 case) might be
> > > more palatable to LKML.
> > 
> > Sure, since kvm and emulators have to fetch the values of src/dst
> > for the emulation, they need actual register values. On the other hand,
> > the disasm/*probe have to analysis code before hitting, so they
> > don't know the actual value of the registers.
> > 
> > So, I think we should split x86_decode_insn() into 2 parts, static
> > analysis and emulation preparation.
> > 
> > For example:
> > 1) analyzing code statically (x86_analyze_insn)
> >- just decoding an instruction
> >- this phase may consist of several sub-functions.
> > 
> > 2) preparing emulation (x86_evaluate_insn)
> >- evaluating src/dst based on current(vcpu) registers
> > 
> > 3) executing emulation (x86_emulate_insn)
> >- emulating an analyzed instruction
> 
> Right, that surely sounds like the way to go. However, we've been
> cautioned that the instruction emulation area of the kvm code is very
> performance sensitive. But, there is no harm in prototyping the above
> and then worrying about any optimizations so there isn't a performance
> issue -- in any case, I guess [ku]probes are very infrequent users of
> this compared to KVM.
> 
> Ananth

Hi, Masami.

Ananth, Srikar, Maneesh, and I talked about this last night.  While I
was on vacation, Srikar did further investigation into adapting x86
kvm's instuction analysis for more general use, and he's not optimistic.
For the short term, at least (i.e., between now and the Linux Foundation
Collaboration Summit in April), we're going to proceed based on the
prototype I developed.

As you noted, djprobes needs instruction lengths, and my prototype
doesn't provide that info.  (Uprobes computes instruction lengths for
rip-relative x86_64 instructions, but that's only a subset of what you
need.)  Are you interested in extending/enhancing my prototype to make
it useful for djprobes?  If so, I'd be happy to consult.

Thanks.
Jim



Re: instruction-analysis API(s)

2009-02-26 Thread Masami Hiramatsu
Jim Keniston wrote:
> Hi, Masami.
> 
> Ananth, Srikar, Maneesh, and I talked about this last night.  While I
> was on vacation, Srikar did further investigation into adapting x86
> kvm's instuction analysis for more general use, and he's not optimistic.
> For the short term, at least (i.e., between now and the Linux Foundation
> Collaboration Summit in April), we're going to proceed based on the
> prototype I developed.
> 
> As you noted, djprobes needs instruction lengths, and my prototype
> doesn't provide that info.  (Uprobes computes instruction lengths for
> rip-relative x86_64 instructions, but that's only a subset of what you
> need.)  Are you interested in extending/enhancing my prototype to make
> it useful for djprobes?  If so, I'd be happy to consult.

Hi Jim,

Thank you for considering djprobe.
Actually, I'm developing insn_get_length() based on your prototype and
porting djprobe on it. After tested code, I'd like to post the insn_x86 code.

Thank you,

> 
> Thanks.
> Jim
> 

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: mhira...@redhat.com




Re: instruction-analysis API(s)

2009-02-27 Thread Masami Hiramatsu
Jim Keniston wrote:
> On Tue, 2009-02-10 at 10:12 +0530, Ananth N Mavinakayanahalli wrote:
>> On Mon, Feb 09, 2009 at 06:05:56PM -0500, Masami Hiramatsu wrote:
>>> Jim Keniston wrote:
 On Fri, 2009-02-06 at 15:49 -0500, Masami Hiramatsu wrote:
> Hi Jim,
>
> I'm also interested in the instruction decoder.
> If you don't mind, could we share the API specification?
> I'd like to port djprobe on it.
 I'm enclosing the little x86 instruction-analysis protoype I hacked
 together (insn_x86.*), along with a copy of systemtap's
 runtime/uprobes2/uprobes_x86.c, which I modified to use it.
>>> Hmm, actually, djprobe needs both of the length and the type of
>>> instructions, since it has to know how many bytes must be copied
>>> and be replaced by a long jump.
>>>
 But again, we haven't really settled on an API.  For example, my x86
 prototype doesn't collect all the info that kvm needs.  We're thinking
 that adapting some existing code (like kvm in the x86 case) might be
 more palatable to LKML.
>>> Sure, since kvm and emulators have to fetch the values of src/dst
>>> for the emulation, they need actual register values. On the other hand,
>>> the disasm/*probe have to analysis code before hitting, so they
>>> don't know the actual value of the registers.
>>>
>>> So, I think we should split x86_decode_insn() into 2 parts, static
>>> analysis and emulation preparation.
>>>
>>> For example:
>>> 1) analyzing code statically (x86_analyze_insn)
>>>- just decoding an instruction
>>>- this phase may consist of several sub-functions.
>>>
>>> 2) preparing emulation (x86_evaluate_insn)
>>>- evaluating src/dst based on current(vcpu) registers
>>>
>>> 3) executing emulation (x86_emulate_insn)
>>>- emulating an analyzed instruction
>> Right, that surely sounds like the way to go. However, we've been
>> cautioned that the instruction emulation area of the kvm code is very
>> performance sensitive. But, there is no harm in prototyping the above
>> and then worrying about any optimizations so there isn't a performance
>> issue -- in any case, I guess [ku]probes are very infrequent users of
>> this compared to KVM.
>>
>> Ananth
> 
> Hi, Masami.
> 
> Ananth, Srikar, Maneesh, and I talked about this last night.  While I
> was on vacation, Srikar did further investigation into adapting x86
> kvm's instuction analysis for more general use, and he's not optimistic.
> For the short term, at least (i.e., between now and the Linux Foundation
> Collaboration Summit in April), we're going to proceed based on the
> prototype I developed.
> 
> As you noted, djprobes needs instruction lengths, and my prototype
> doesn't provide that info.  (Uprobes computes instruction lengths for
> rip-relative x86_64 instructions, but that's only a subset of what you
> need.)  Are you interested in extending/enhancing my prototype to make
> it useful for djprobes?  If so, I'd be happy to consult.

Here are a patch against your code and an example code for
instruction length decoder.
Curiously, KVM's instruction decoder does not completely
cover all instructions(especially, Jcc/test...).
I had to refer Intel manuals.

Moreover, even with this patch, the decoder is incomplete.
- this doesn't cover 3bytes opcode yet.
- this doesn't decode sib, displacement and immediate.
- might have some bugs :-(


Thank you,

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: mhira...@redhat.com

Index: insn_x86.h
===
--- insn_x86.h  (revision 1510)
+++ insn_x86.h  (working copy)
@@ -66,6 +66,10 @@
struct insn_field displacement;
struct insn_field immediate;
 
+   u8 op_bytes;
+   u8 ad_bytes;
+   u8 length;
+
const u8 *kaddr;/* kernel address of insn (copy) to analyze */
const u8 *next_byte;
bool x86_64;
@@ -75,6 +79,7 @@
 extern void insn_get_prefixes(struct insn *insn);
 extern void insn_get_opcode(struct insn *insn);
 extern void insn_get_modrm(struct insn *insn);
+extern void insn_get_length(struct insn *insn);
 
 #ifdef CONFIG_X86_64
 extern bool insn_rip_relative(struct insn *insn);
Index: insn_x86.c
===
--- insn_x86.c  (revision 1510)
+++ insn_x86.c  (working copy)
@@ -17,7 +17,7 @@
  *
  * Copyright (C) IBM Corporation, 2002, 2004, 2009
  */
-
+#include 
 #include 
 // #include 
 #include "insn_x86.h"
@@ -34,6 +34,11 @@
insn->kaddr = kaddr;
insn->next_byte = kaddr;
insn->x86_64 = x86_64;
+   insn->op_bytes = 4;
+   if (x86_64)
+   insn->ad_bytes = 8;
+   else
+   insn->ad_bytes = 4;
 }
 EXPORT_SYMBOL_GPL(insn_init);
 
@@ -79,10 +84,51 @@
break;
prefixes->value |= pfx;
}
+   if (prefixes->value & X86_PFX_OPNDSZ) {
+   /* oprand size 

Re: instruction-analysis API(s)

2009-03-03 Thread Jim Keniston
On Fri, 2009-02-27 at 16:20 -0500, Masami Hiramatsu wrote:
...
> 
> Here are a patch against your code and an example code for
> instruction length decoder.
> Curiously, KVM's instruction decoder does not completely
> cover all instructions(especially, Jcc/test...).
> I had to refer Intel manuals.
> 
> Moreover, even with this patch, the decoder is incomplete.
> - this doesn't cover 3bytes opcode yet.
> - this doesn't decode sib, displacement and immediate.
> - might have some bugs :-(
> 
> 
> Thank you,

Thanks for your work on this.  Comments below.

Jim

> 
> plain text document attachment (insn_x86.patch)
> Index: insn_x86.h
> ===
> --- insn_x86.h(revision 1510)
> +++ insn_x86.h(working copy)
> @@ -66,6 +66,10 @@
>   struct insn_field displacement;
>   struct insn_field immediate;
> 
> + u8 op_bytes;

I'd probably use opnd_bytes and addr_bytes here, for clarity.  (When I
first saw "op", I thought "opcode".)  Also, we should clarify that these
are the EFFECTIVE lengths, not the lengths of the immediate and
displacement fields in the instruction.

> + u8 ad_bytes;
> + u8 length;
> +
>   const u8 *kaddr;/* kernel address of insn (copy) to analyze */
>   const u8 *next_byte;
>   bool x86_64;
> @@ -75,6 +79,7 @@
>  extern void insn_get_prefixes(struct insn *insn);
>  extern void insn_get_opcode(struct insn *insn);
>  extern void insn_get_modrm(struct insn *insn);
> +extern void insn_get_length(struct insn *insn);
> 
>  #ifdef CONFIG_X86_64
>  extern bool insn_rip_relative(struct insn *insn);
> Index: insn_x86.c
> ===
> --- insn_x86.c(revision 1510)
> +++ insn_x86.c(working copy)
> @@ -17,7 +17,7 @@
>   *
>   * Copyright (C) IBM Corporation, 2002, 2004, 2009
>   */
> -
> +#include 
>  #include 
>  // #include 
>  #include "insn_x86.h"
> @@ -34,6 +34,11 @@
>   insn->kaddr = kaddr;
>   insn->next_byte = kaddr;
>   insn->x86_64 = x86_64;
> + insn->op_bytes = 4;
> + if (x86_64)
> + insn->ad_bytes = 8;
> + else
> + insn->ad_bytes = 4;
>  }
>  EXPORT_SYMBOL_GPL(insn_init);
> 
> @@ -79,10 +84,51 @@
>   break;
>   prefixes->value |= pfx;
>   }
> + if (prefixes->value & X86_PFX_OPNDSZ) {
> + /* oprand size switches 2/4 */
> + insn->op_bytes ^= 6;
> + }
> + if (prefixes->value & X86_PFX_ADDRSZ) {
> + /* address size switches 2/4 or 4/8 */
> +#ifdef CONFIG_X86_64
> + if (insn->x86_64)
> + insn->op_bytes ^= 12;
> + else
> +#endif
> + insn->op_bytes ^= 6;

This seems wrong.  You're checking the address-size prefix, but
adjusting the operand size.

> + }
> +#ifdef CONFIG_X86_64
> + if (prefixes->value & X86_PFX_REXW)
> + insn->op_bytes = 8;
> +#endif
>   prefixes->got = true;
>  }
>  EXPORT_SYMBOL_GPL(insn_get_prefixes);
> 
> +static bool __insn_is_stack(struct insn *insn)

It's not entirely clear to me what this function checks.  (A more
precise name might help.)  You have pushes, pops, and calls here, but
you also have some instructions that don't appear to affect the stack at
all.  And other push and pop instructions are missing.

> +{
> + u8 reg;
> + if (insn->opcode.nbytes == 2)
> + return 0;

The following are 2-byte pushes or pops: 0f-a0, 0f-a1, 0f-a8, and 0f-a9.

Also, since the return value is bool, I'd prefer to see true/false
rather than 1/0.

> +
> + switch(insn->opcode1) {
> + case 0x68:
> + case 0x6a:
> + case 0x9c:
> + case 0x9d:
> + case 0xc5:

0xc5 = lds.  Why lds?

In general, it'd be nice to add a comment showing the mnemonic next to
each hex value -- e.g.,
case 0x68: /* push */

> + case 0xe8:
> + return 1;
> + }

Other related instructions: 9a, 1f, 07, 17, 8f.

> + reg = ((*insn->next_byte) >> 3) & 7;
> + if ((insn->opcode1 & 0xf0) == 0x50 ||
> + (insn->opcode1 == 0x1a && reg == 0) ||

The above line doesn't seem right.  It catches things like
sbb (%rax),%al .

> + (insn->opcode1 == 0xff && (reg & 1) == 0 && reg != 0)) {

Looks like the interesting reg values are 2 (call), 3 (call), and 6
(push).

> + return 1;
> + }
> + return 0;
> +}
> +
>  /**
>   * insn_get_opcode - collect opcode(s)
>   * @insn:&struct insn containing instruction
> @@ -108,6 +154,8 @@
>   opcode->nbytes = 1;
>   opcode->value = insn->opcode1;
>   opcode->got = true;
> + if (insn->x86_64 && __insn_is_stack(insn))
> + insn->op_bytes = 8;
>  }
>  EXPORT_SYMBOL_GPL(insn_get_opcode);
> 
> @@ -208,3 +256,115 @@
>  }
>  EXPORT_SYMBOL_GPL(insn_rip_relative);
>  #endif
> +
> +/**
> + *
> + * insn_get_length() - Get the length of instruction
> + * @insn:&struct insn cont

Re: instruction-analysis API(s)

2009-03-04 Thread Masami Hiramatsu
Hi Jim,

Jim Keniston wrote:
> On Fri, 2009-02-27 at 16:20 -0500, Masami Hiramatsu wrote:
> ...
>> Here are a patch against your code and an example code for
>> instruction length decoder.
>> Curiously, KVM's instruction decoder does not completely
>> cover all instructions(especially, Jcc/test...).
>> I had to refer Intel manuals.
>>
>> Moreover, even with this patch, the decoder is incomplete.
>> - this doesn't cover 3bytes opcode yet.
>> - this doesn't decode sib, displacement and immediate.
>> - might have some bugs :-(
>>
>>
>> Thank you,
> 
> Thanks for your work on this.  Comments below.

Thank you very much for review!

Actually, that code was based on KVM code, so I also found many
opcodes were not supported.

> As I mentioned in private email, you or I should probably refactor this
> into:
> - insn_get_sib()
> - insn_get_displacement()
> - insn_get_immediate()
> - insn_get_length()

Agreed, these should be supported.

I also would like to change struct insn as below;

struct insn {
struct insn_field prefixes; /* prefixes.value is a bitmap */
struct insn_field opcode;   /* opcode.bytes[n] == opcode_n */
struct insn_field modrm;
struct insn_field sib;
struct insn_field displacement;
union {
struct insn_field immediate;
struct insn_field moffset1; /* for 64bit MOV */
struct insn_field immediate1;   /* for 64bit imm or off16/32 */
};
union {
struct insn_field moffset2; /* for 64bit MOV */
struct insn_field immediate2;   /* for 64bit imm or seg16 */
};

u8 opnd_bytes;
u8 addr_bytes;
u8 length;
bool x86_64;

const u8 *kaddr;/* kernel address of insn (copy) to analyze */
const u8 *next_byte;
};

opcode2 and opcode3 will be stored in opcode.value with opcode1.

Now, I'm updating my code. Would anyone also be working on it?

Thank you,

> 
> Jim
> 
>> plain text document attachment (insn_x86.patch)
>> Index: insn_x86.h
>> ===
>> --- insn_x86.h   (revision 1510)
>> +++ insn_x86.h   (working copy)
>> @@ -66,6 +66,10 @@
>>  struct insn_field displacement;
>>  struct insn_field immediate;
>>
>> +u8 op_bytes;
> 
> I'd probably use opnd_bytes and addr_bytes here, for clarity.  (When I
> first saw "op", I thought "opcode".)  Also, we should clarify that these
> are the EFFECTIVE lengths, not the lengths of the immediate and
> displacement fields in the instruction.
> 
>> +u8 ad_bytes;
>> +u8 length;
>> +
>>  const u8 *kaddr;/* kernel address of insn (copy) to analyze */
>>  const u8 *next_byte;
>>  bool x86_64;
>> @@ -75,6 +79,7 @@
>>  extern void insn_get_prefixes(struct insn *insn);
>>  extern void insn_get_opcode(struct insn *insn);
>>  extern void insn_get_modrm(struct insn *insn);
>> +extern void insn_get_length(struct insn *insn);
>>
>>  #ifdef CONFIG_X86_64
>>  extern bool insn_rip_relative(struct insn *insn);
>> Index: insn_x86.c
>> ===
>> --- insn_x86.c   (revision 1510)
>> +++ insn_x86.c   (working copy)
>> @@ -17,7 +17,7 @@
>>   *
>>   * Copyright (C) IBM Corporation, 2002, 2004, 2009
>>   */
>> -
>> +#include 
>>  #include 
>>  // #include 
>>  #include "insn_x86.h"
>> @@ -34,6 +34,11 @@
>>  insn->kaddr = kaddr;
>>  insn->next_byte = kaddr;
>>  insn->x86_64 = x86_64;
>> +insn->op_bytes = 4;
>> +if (x86_64)
>> +insn->ad_bytes = 8;
>> +else
>> +insn->ad_bytes = 4;
>>  }
>>  EXPORT_SYMBOL_GPL(insn_init);
>>
>> @@ -79,10 +84,51 @@
>>  break;
>>  prefixes->value |= pfx;
>>  }
>> +if (prefixes->value & X86_PFX_OPNDSZ) {
>> +/* oprand size switches 2/4 */
>> +insn->op_bytes ^= 6;
>> +}
>> +if (prefixes->value & X86_PFX_ADDRSZ) {
>> +/* address size switches 2/4 or 4/8 */
>> +#ifdef CONFIG_X86_64
>> +if (insn->x86_64)
>> +insn->op_bytes ^= 12;
>> +else
>> +#endif
>> +insn->op_bytes ^= 6;
> 
> This seems wrong.  You're checking the address-size prefix, but
> adjusting the operand size.
> 
>> +}
>> +#ifdef CONFIG_X86_64
>> +if (prefixes->value & X86_PFX_REXW)
>> +insn->op_bytes = 8;
>> +#endif
>>  prefixes->got = true;
>>  }
>>  EXPORT_SYMBOL_GPL(insn_get_prefixes);
>>
>> +static bool __insn_is_stack(struct insn *insn)
> 
> It's not entirely clear to me what this function checks.  (A more
> precise name might help.)  You have pushes, pops, and calls here, but
> you also have some instructions that don't appear to affect the stack at
> all.  And other push and pop instructions are missing.
> 
>> +{
>> +u8 reg;
>> +if (insn->opcode.nbytes == 2)
>> +return 0;
> 
> The following

Re: instruction-analysis API(s)

2009-03-05 Thread Masami Hiramatsu
Hi Jim and Sriker,

Here, I almost rewrote my patch.

Changelog:
- rewrite decoding logic based on Intel' manual.
- supoort insn_get_sib(),insn_get_displacement()
  and insn_get_immediate() too.
- support 3 bytes opcode and 64bit immediate.
- introduce some bitmaps.

Thank you,

Masami Hiramatsu wrote:
> Hi Jim,
> 
> Jim Keniston wrote:
>> On Fri, 2009-02-27 at 16:20 -0500, Masami Hiramatsu wrote:
>> ...
>>> Here are a patch against your code and an example code for
>>> instruction length decoder.
>>> Curiously, KVM's instruction decoder does not completely
>>> cover all instructions(especially, Jcc/test...).
>>> I had to refer Intel manuals.
>>>
>>> Moreover, even with this patch, the decoder is incomplete.
>>> - this doesn't cover 3bytes opcode yet.
>>> - this doesn't decode sib, displacement and immediate.
>>> - might have some bugs :-(
>>>
>>>
>>> Thank you,
>> Thanks for your work on this.  Comments below.
> 
> Thank you very much for review!
> 
> Actually, that code was based on KVM code, so I also found many
> opcodes were not supported.
> 
>> As I mentioned in private email, you or I should probably refactor this
>> into:
>> - insn_get_sib()
>> - insn_get_displacement()
>> - insn_get_immediate()
>> - insn_get_length()
> 
> Agreed, these should be supported.
> 
> I also would like to change struct insn as below;
> 
> struct insn {
> struct insn_field prefixes; /* prefixes.value is a bitmap */
> struct insn_field opcode;   /* opcode.bytes[n] == opcode_n */
> struct insn_field modrm;
> struct insn_field sib;
> struct insn_field displacement;
> union {
> struct insn_field immediate;
> struct insn_field moffset1; /* for 64bit MOV */
> struct insn_field immediate1;   /* for 64bit imm or off16/32 
> */
> };
> union {
> struct insn_field moffset2; /* for 64bit MOV */
> struct insn_field immediate2;   /* for 64bit imm or seg16 */
> };
> 
> u8 opnd_bytes;
> u8 addr_bytes;
> u8 length;
> bool x86_64;
> 
> const u8 *kaddr;/* kernel address of insn (copy) to analyze */
> const u8 *next_byte;
> };
> 
> opcode2 and opcode3 will be stored in opcode.value with opcode1.
> 
> Now, I'm updating my code. Would anyone also be working on it?
> 
> Thank you,
> 
>> Jim
>>
>>> plain text document attachment (insn_x86.patch)
>>> Index: insn_x86.h
>>> ===
>>> --- insn_x86.h  (revision 1510)
>>> +++ insn_x86.h  (working copy)
>>> @@ -66,6 +66,10 @@
>>> struct insn_field displacement;
>>> struct insn_field immediate;
>>>
>>> +   u8 op_bytes;
>> I'd probably use opnd_bytes and addr_bytes here, for clarity.  (When I
>> first saw "op", I thought "opcode".)  Also, we should clarify that these
>> are the EFFECTIVE lengths, not the lengths of the immediate and
>> displacement fields in the instruction.
>>
>>> +   u8 ad_bytes;
>>> +   u8 length;
>>> +
>>> const u8 *kaddr;/* kernel address of insn (copy) to analyze */
>>> const u8 *next_byte;
>>> bool x86_64;
>>> @@ -75,6 +79,7 @@
>>>  extern void insn_get_prefixes(struct insn *insn);
>>>  extern void insn_get_opcode(struct insn *insn);
>>>  extern void insn_get_modrm(struct insn *insn);
>>> +extern void insn_get_length(struct insn *insn);
>>>
>>>  #ifdef CONFIG_X86_64
>>>  extern bool insn_rip_relative(struct insn *insn);
>>> Index: insn_x86.c
>>> ===
>>> --- insn_x86.c  (revision 1510)
>>> +++ insn_x86.c  (working copy)
>>> @@ -17,7 +17,7 @@
>>>   *
>>>   * Copyright (C) IBM Corporation, 2002, 2004, 2009
>>>   */
>>> -
>>> +#include 
>>>  #include 
>>>  // #include 
>>>  #include "insn_x86.h"
>>> @@ -34,6 +34,11 @@
>>> insn->kaddr = kaddr;
>>> insn->next_byte = kaddr;
>>> insn->x86_64 = x86_64;
>>> +   insn->op_bytes = 4;
>>> +   if (x86_64)
>>> +   insn->ad_bytes = 8;
>>> +   else
>>> +   insn->ad_bytes = 4;
>>>  }
>>>  EXPORT_SYMBOL_GPL(insn_init);
>>>
>>> @@ -79,10 +84,51 @@
>>> break;
>>> prefixes->value |= pfx;
>>> }
>>> +   if (prefixes->value & X86_PFX_OPNDSZ) {
>>> +   /* oprand size switches 2/4 */
>>> +   insn->op_bytes ^= 6;
>>> +   }
>>> +   if (prefixes->value & X86_PFX_ADDRSZ) {
>>> +   /* address size switches 2/4 or 4/8 */
>>> +#ifdef CONFIG_X86_64
>>> +   if (insn->x86_64)
>>> +   insn->op_bytes ^= 12;
>>> +   else
>>> +#endif
>>> +   insn->op_bytes ^= 6;
>> This seems wrong.  You're checking the address-size prefix, but
>> adjusting the operand size.
>>
>>> +   }
>>> +#ifdef CONFIG_X86_64
>>> +   if (prefixes->value & X86_PFX_REXW)
>>> +   insn->op_bytes = 8;
>>> +#endif
>>> prefixes->got = true;
>>>  }
>>>  EXPORT_SYMBOL_GPL(insn_g

Re: instruction-analysis API(s)

2009-03-06 Thread Jim Keniston

Quoting Masami Hiramatsu :


Hi Jim and Sriker,

Here, I almost rewrote my patch.

Changelog:
- rewrite decoding logic based on Intel' manual.
- supoort insn_get_sib(),insn_get_displacement()
  and insn_get_immediate() too.
- support 3 bytes opcode and 64bit immediate.
- introduce some bitmaps.

Thank you,


Well, I didn't do much of a code review -- it looks like you addressed  
all my concerns -- but as I mentioned on IRC, I hacked together a test  
rig whereby you can disassemble a designated elf file (e.g., vmlinux,  
libc, libm) and then compare insn_get_length()'s results with  
objdump's results.  The comment in distill.awk shows how to use  
objdump, awk, and test_get_len together.


I also hacked up insn_x86.h and insn_x86.c to work in user space.   
Most of that is accomplished via insn_x86_user.h, but it certainly  
isn't necessary to do it that way.  In particular, __u8, __s8, __u16,  
etc. are versions of u8, s8, u16, etc. that can be used in both kernel  
and user code, so maybe we should switch to those.


I tested with vmlinux, libc, and libm on both an i686 system and an  
x86_64 system.  I found and fixed a few bugs.  Here are the ones that  
come to mind (all fixed):

- shrd/shld, which we discussed
- missing support for weird nops with modrm bytes (0f 1f ...).
- neglected to include the REX prefix in prefixes.nbytes
- missing static decl in an inline function in insn_x86.h

There are some other cases where insn_get_length() doesn't match up  
with the disassembly, but I don't consider them bugs:
- 0x9b is an instruction (fwait), but the disassembler treats it as a  
prefix.  For example 9b df ... can be disassembled as

fstsw ...   // wait, then store status word
or
fwait   // wait
fnstsw ...  // store status word without waiting
Perhaps it's relevant to investigate whether a single-step of 9b df  
... would execute just the fwait or the whole fstsw.  Anyway, this  
explains the "failures" of finit and fstsw that I mentioned to you.  I  
also saw this with fstcw and fclex.
- Illegal instruction sequences, such as an x86_64 instruction that  
starts with 0x40, or a misplaced 0x65 prefix.  Typically, we see these  
when disassembling data.  I just filtered out (via egrep) instructions  
whose disassembly starts with "rex" or includes "(bad)".


We could address the above by filtering them out in distill.awk or  
test_get_len.c.  I think we're clean otherwise.


There's a little more housecleaning to do -- e.g., adding Hitachi (?)  
copyright to IBM copyright, discarding insn_field_exists() and  
insn_extract_reg(), putting this all in git somewhere.  But not tonight.


Pull all the attached files into a directory and have a go -- e.g.,
$ make
$ objdump -d vmlinux | awk -f distill.awk | ./test_get_len [x86_64]

Jim

# Usage: objdump -d a.out | awk -f distill.awk | ./test_get_len
# Distills the disassembly as follows:
# - Removes all lines except the disassembled instructions.
# - For instructions that exceed 1 line (7 bytes), crams all the hex bytes
# into a single line.

BEGIN {
prev_addr = ""
prev_hex = ""
prev_mnemonic = ""
}

/^ *[0-9a-f]+:/ {
if (split($0, field, "\t") < 3) {
# This is a continuation of the same insn.
prev_hex = prev_hex field[2]
} else {
if (prev_addr != "")
printf "%s\t%s\t%s\n", prev_addr, prev_hex, 
prev_mnemonic
prev_addr = field[1]
prev_hex = field[2]
prev_mnemonic = field[3]
}
}

END {
if (prev_addr != "")
printf "%s\t%s\t%s\n", prev_addr, prev_hex, prev_mnemonic
}
/*
 * x86 instruction analysis
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with this program; if not, write to the Free Software
 * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
 *
 * Copyright (C) IBM Corporation, 2002, 2004, 2009
 */
#ifdef KERNEL
#include 
#include 
#else
#include 
#endif
// #include 
#include "insn_x86.h"

MODULE_LICENSE("GPL"); // for test

/**
 * insn_init() - initialize struct insn
 * @insn:   &struct insn to be initialized
 * @kaddr:  address (in kernel memory) of instruction (or copy thereof)
 * @x86_64: true for 64-bit kernel or 64-bit app
 */
void insn_init(struct insn *insn, const u8 *kaddr, bool x86_64)
{
memset(insn, 0, sizeof(*insn));
insn->kaddr = kaddr

Re: instruction-analysis API(s)

2009-03-10 Thread Masami Hiramatsu
Hi Jim,

Jim Keniston wrote:
> Quoting Masami Hiramatsu :
> 
>> Hi Jim and Sriker,
>>
>> Here, I almost rewrote my patch.
>>
>> Changelog:
>> - rewrite decoding logic based on Intel' manual.
>> - supoort insn_get_sib(),insn_get_displacement()
>>   and insn_get_immediate() too.
>> - support 3 bytes opcode and 64bit immediate.
>> - introduce some bitmaps.
>>
>> Thank you,
> 
> Well, I didn't do much of a code review -- it looks like you addressed
> all my concerns -- but as I mentioned on IRC, I hacked together a test
> rig whereby you can disassemble a designated elf file (e.g., vmlinux,
> libc, libm) and then compare insn_get_length()'s results with objdump's
> results.  The comment in distill.awk shows how to use objdump, awk, and
> test_get_len together.

Thank you for review and test!

> I also hacked up insn_x86.h and insn_x86.c to work in user space.  Most
> of that is accomplished via insn_x86_user.h, but it certainly isn't
> necessary to do it that way.  In particular, __u8, __s8, __u16, etc. are
> versions of u8, s8, u16, etc. that can be used in both kernel and user
> code, so maybe we should switch to those.
> 
> I tested with vmlinux, libc, and libm on both an i686 system and an
> x86_64 system.  I found and fixed a few bugs.  Here are the ones that
> come to mind (all fixed):
> - shrd/shld, which we discussed
> - missing support for weird nops with modrm bytes (0f 1f ...).
> - neglected to include the REX prefix in prefixes.nbytes
> - missing static decl in an inline function in insn_x86.h

Thank you for fixing  it.
BTW, it might have to support vm86 mode(especially, for user code).

> There are some other cases where insn_get_length() doesn't match up with
> the disassembly, but I don't consider them bugs:
> - 0x9b is an instruction (fwait), but the disassembler treats it as a
> prefix.  For example 9b df ... can be disassembled as
> fstsw ...// wait, then store status word
> or
> fwait// wait
> fnstsw ...// store status word without waiting
> Perhaps it's relevant to investigate whether a single-step of 9b df ...
> would execute just the fwait or the whole fstsw.  Anyway, this explains
> the "failures" of finit and fstsw that I mentioned to you.  I also saw
> this with fstcw and fclex.

FYI, there is a single wait/fwait instruction described at Intel software
developers manual vol.2B p.399.

> - Illegal instruction sequences, such as an x86_64 instruction that
> starts with 0x40, or a misplaced 0x65 prefix.  Typically, we see these
> when disassembling data.  I just filtered out (via egrep) instructions
> whose disassembly starts with "rex" or includes "(bad)".

Sure, I think insn_* should return -EINVAL or set insn.invalid = 1
if we found those invalid ops. E.g. kernel use BUG() macro, it adds
some raw numbers after ud2, in that case, those raw numbers might
be decoded as an illegal instruction.

> We could address the above by filtering them out in distill.awk or
> test_get_len.c.  I think we're clean otherwise.
> 
> There's a little more housecleaning to do -- e.g., adding Hitachi (?)
> copyright to IBM copyright, discarding insn_field_exists() and
> insn_extract_reg(), putting this all in git somewhere.  But not tonight.
> 
> Pull all the attached files into a directory and have a go -- e.g.,
> $ make
> $ objdump -d vmlinux | awk -f distill.awk | ./test_get_len [x86_64]
> 
> Jim
> 

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: mhira...@redhat.com



Re: instruction-analysis API(s)

2009-03-11 Thread Jim Keniston
On Tue, 2009-03-10 at 15:57 -0400, Masami Hiramatsu wrote:
> Hi Jim,
...
> > 
> > I tested with vmlinux, libc, and libm on both an i686 system and an
> > x86_64 system.
...
> 
> Thank you for fixing  it.
> BTW, it might have to support vm86 mode(especially, for user code).

I have a vague idea of what vm86 mode is, but I don't really understand
what the implications are for instruction analysis or probing.  My
understanding is that its use is rare (e.g., for DOS emulators), so it
hasn't been a requirement for uprobes so far.

> 
> > There are some other cases where insn_get_length() doesn't match up with
> > the disassembly, but I don't consider them bugs:
> > - 0x9b is an instruction (fwait), but the disassembler treats it as a
> > prefix.  For example 9b df ... can be disassembled as
> > fstsw ...// wait, then store status word
> > or
> > fwait// wait
> > fnstsw ...// store status word without waiting
> > Perhaps it's relevant to investigate whether a single-step of 9b df ...
> > would execute just the fwait or the whole fstsw.  Anyway, this explains
> > the "failures" of finit and fstsw that I mentioned to you.  I also saw
> > this with fstcw and fclex.
> 
> FYI, there is a single wait/fwait instruction described at Intel software
> developers manual vol.2B p.399.

Yes, I tried probing an fclex instruction -- which is really fwait +
fnclex -- and the single-step stopped after the fwait.  So our
instruction analysis is correct.  (Of course, I had to adjust uprobes
not to reject the 0x9b opcode -- need to check that in.  PR 5273 is
about this sort of thing.)

> 
> > - Illegal instruction sequences, such as an x86_64 instruction that
> > starts with 0x40, or a misplaced 0x65 prefix.  Typically, we see these
> > when disassembling data.  I just filtered out (via egrep) instructions
> > whose disassembly starts with "rex" or includes "(bad)".
> 
> Sure, I think insn_* should return -EINVAL or set insn.invalid = 1
> if we found those invalid ops. E.g. kernel use BUG() macro, it adds
> some raw numbers after ud2, in that case, those raw numbers might
> be decoded as an illegal instruction.

It could be useful to provide a function to determine whether the byte
sequence is a valid instruction, but I don't think we should make that
check by default.  Here are some reasons:

1. It costs execution time.  For some instructions, you have to examine
the prefixes and/or modrm byte as well as the opcode(s).

2. It takes time to code it 100% right.  In particular, mistakenly
rejecting a valid instruction can be a nuisance.

3. Intel and AMD may not completely agree on which instructions are
valid in which modes.  I've always consulted the AMD manuals, since
they're online and appear complete, but I'm not really sure whether what
they say applies without exception to (say) Pentium and EM64T.

4. kprobes and uprobes have gotten along fine without such a test.
(Uprobes's test is far from complete, and deliberately screens out some
valid instructions, such as sysenter, that we suspect may produce weird
results when single-stepped.)  The assumption is that the address
provided points to the first byte of a valid instruction.  Since on x86,
most random byte sequences look like some kind valid instruction,
catching obviously invalid sequences wouldn't buy us very much.


Jim