instruction-analysis API(s)
Hi, Roland. Back in a conference call in December, we discussed approaches to refactoring utrace-related code such as uprobes, to make some of the services provided there more generally available. In particular, you suggested an "instruction analysis" service that various subsystems could exploit -- kprobes and uprobes/ubp at first, and eventually perhaps gdb, perfmon, kvm, ftrace, and djprobes. I decided to survey the kernel for subsystems that parse and/or analyze CPU instructions. I hoped to review the various approaches -- perhaps finding one that's widely accepted -- and to evaluate possible clients for our instruction-analysis service. The results were discouraging, as summarized below. I see no promise of an architecture-agnostic instruction-analysis API. Within each architecture, I think the best we could do would be an (architecture-specific) instruction-parsing API. (And even within an architecture, different subsystems look at different aspects of an instruction.) Srikar Dronamraju and I are exploring two different approaches to an x86 instruction-parsing service. Since x86 kvm seems to have one of the most systematic and thorough approaches, Srikar is prototyping a generalization of kvm's x86_decode_insn() to make it support kprobes, and eventually uprobes. (Note that kvm does NOT appear to be a good starting place on powerpc and s390.) Approaching from the minimalist side, I've implemented an x86 instruction-parsing API with just enough smarts (so far) to support kprobes and uprobes. We'd be interested to know whether these efforts are consistent with what you have in mind. See more details below. Jim Intro - "Instruction analysis" refers to the analysis of a CPU instruction in the kernel or a user program. Typically, the instruction must be analyzed so that it can be properly emulated (in the case of SSOL, by executing the same instruction at a different address), or so a fault caused by the instruction can be properly handled. There are other uses as well -- see below. Possible Clients of an Instruction-Analysis Service --- Where in the kernel is instruction analysis currently used? - kprobes (arm, avr32, ia64, mn10300, powerpc, s390, sh, sparc, x86) - uprobes (ia64, powerpc, s390, x86) - hypervisors: - kvm (ia64, powerpc, s390, x86) - powerpc Cell Beat hypervisor - floating-point unit emulation (arm, s390, sparc, x86) - exception handling: - page fault (powerpc, x86) - illegal instruction (s390) - unaligned trap (ia64) - vm86 fault (x86) - disassembly (powerpc, s390) - powerpc: xmon, code patching (for crash dump?) - ia64: emulation of brl instruction - x86: alternative-instruction patching (replacing instructions that are inappropriate for the CPU rev), fault injection - djprobes (not in kernel, not sure of status) Note: I looked in detail only at the architectures that implement kprobes: arm, avr32, ia64, mn10300, powerpc, s390, sh, sparc, and x86. And I note in passing that sh does a lot of instruction analysis -- as does mips -- but I skipped sh for now. Note: Roland also listed gdb, perfmon, and ftrace as subsystems that do instruction analysis. I think that oprofile has also been suggested. - I haven't investigated gdb, but I have no reason to think Roland is wrong about it. - I've looked briefly at the various components of perfmon[2] and oprofile, but I don't see any instruction analysis per se; and the perfmon/oprofile expert I asked (IBM's Carl Love) isn't aware of any. - Similarly, I don't see instruction analysis per se in ftrace. Prospects/Problems -- What are the prospects for adapting these various subsystems to use a common instruction-analysis service? Typically, not very good. Here are some of the problems: - Different architectures have very different instruction-analysis needs. - Different architectures have very different instruction formats and instruction attributes. Consequently, the opportunities for common code shared by multiple architectures are few. - Different subsystems are interested in different instruction attributes, and/or classify instructions differently. - Some subsystems are interested in only certain instructions. - Some subsystems, such as fault handlers, want to maximize efficiency by examining as little of the instruction as possible; while others, such as *probes, take a more leisurely approach (e.g., reading enough bytes to capture the largest possible instruction, even if that means faulting in a page).
Re: instruction-analysis API(s)
Hi Jim, I'm also interested in the instruction decoder. If you don't mind, could we share the API specification? I'd like to port djprobe on it. Thanks! Jim Keniston wrote: > Hi, Roland. Back in a conference call in December, we discussed > approaches to refactoring utrace-related code such as uprobes, to > make some of the services provided there more generally available. > In particular, you suggested an "instruction analysis" service that > various subsystems could exploit -- kprobes and uprobes/ubp at first, > and eventually perhaps gdb, perfmon, kvm, ftrace, and djprobes. > > I decided to survey the kernel for subsystems that parse and/or analyze > CPU instructions. I hoped to review the various approaches -- perhaps > finding one that's widely accepted -- and to evaluate possible clients > for our instruction-analysis service. > > The results were discouraging, as summarized below. I see no > promise of an architecture-agnostic instruction-analysis API. > Within each architecture, I think the best we could do would be an > (architecture-specific) instruction-parsing API. (And even within > an architecture, different subsystems look at different aspects of > an instruction.) > > Srikar Dronamraju and I are exploring two different approaches to an > x86 instruction-parsing service. Since x86 kvm seems to have one of > the most systematic and thorough approaches, Srikar is prototyping a > generalization of kvm's x86_decode_insn() to make it support kprobes, > and eventually uprobes. (Note that kvm does NOT appear to be a good > starting place on powerpc and s390.) Approaching from the minimalist > side, I've implemented an x86 instruction-parsing API with just enough > smarts (so far) to support kprobes and uprobes. > > We'd be interested to know whether these efforts are consistent > with what you have in mind. > > See more details below. > > Jim > > Intro > - > "Instruction analysis" refers to the analysis of a CPU instruction > in the kernel or a user program. Typically, the instruction must > be analyzed so that it can be properly emulated (in the case of > SSOL, by executing the same instruction at a different address), > or so a fault caused by the instruction can be properly handled. > There are other uses as well -- see below. > > Possible Clients of an Instruction-Analysis Service > --- > Where in the kernel is instruction analysis currently used? > - kprobes (arm, avr32, ia64, mn10300, powerpc, s390, sh, sparc, x86) > - uprobes (ia64, powerpc, s390, x86) > - hypervisors: > - kvm (ia64, powerpc, s390, x86) > - powerpc Cell Beat hypervisor > - floating-point unit emulation (arm, s390, sparc, x86) > - exception handling: > - page fault (powerpc, x86) > - illegal instruction (s390) > - unaligned trap (ia64) > - vm86 fault (x86) > - disassembly (powerpc, s390) > - powerpc: xmon, code patching (for crash dump?) > - ia64: emulation of brl instruction > - x86: alternative-instruction patching (replacing instructions that are > inappropriate for the CPU rev), fault injection > - djprobes (not in kernel, not sure of status) > > Note: I looked in detail only at the architectures that implement > kprobes: arm, avr32, ia64, mn10300, powerpc, s390, sh, sparc, and x86. > And I note in passing that sh does a lot of instruction analysis -- > as does mips -- but I skipped sh for now. > > Note: Roland also listed gdb, perfmon, and ftrace as subsystems that > do instruction analysis. I think that oprofile has also been suggested. > - I haven't investigated gdb, but I have no reason to think Roland is > wrong about it. > - I've looked briefly at the various components of perfmon[2] and > oprofile, but I don't see any instruction analysis per se; and the > perfmon/oprofile expert I asked (IBM's Carl Love) isn't aware of any. > - Similarly, I don't see instruction analysis per se in ftrace. > > Prospects/Problems > -- > What are the prospects for adapting these various subsystems to use > a common instruction-analysis service? Typically, not very good. > Here are some of the problems: > - Different architectures have very different instruction-analysis > needs. > - Different architectures have very different instruction formats and > instruction attributes. Consequently, the opportunities for common > code shared by multiple architectures are few. > - Different subsystems are interested in different instruction > attributes, and/or classify instructions differently. > - Some subsystems are interested in only certain instructions. > - Some subsystems, such as fault handlers, want to maximize efficiency > by examining as little of the instruction as possible; while others, > such as *probes, take a more leisurely approach (e.g., reading enough > bytes to capture the largest possible instruction, even if that means > faulting in a page). > -- Masami Hiramatsu Software Engineer Hitachi Com
Re: instruction-analysis API(s)
On Fri, 2009-02-06 at 15:49 -0500, Masami Hiramatsu wrote: > Hi Jim, > > I'm also interested in the instruction decoder. > If you don't mind, could we share the API specification? > I'd like to port djprobe on it. I'm enclosing the little x86 instruction-analysis protoype I hacked together (insn_x86.*), along with a copy of systemtap's runtime/uprobes2/uprobes_x86.c, which I modified to use it. But again, we haven't really settled on an API. For example, my x86 prototype doesn't collect all the info that kvm needs. We're thinking that adapting some existing code (like kvm in the x86 case) might be more palatable to LKML. Jim > > Thanks! > > Jim Keniston wrote: > > Hi, Roland. Back in a conference call in December, we discussed > > approaches to refactoring utrace-related code such as uprobes, to > > make some of the services provided there more generally available. > > In particular, you suggested an "instruction analysis" service that > > various subsystems could exploit -- kprobes and uprobes/ubp at first, > > and eventually perhaps gdb, perfmon, kvm, ftrace, and djprobes. > > ... > > Srikar Dronamraju and I are exploring two different approaches to an > > x86 instruction-parsing service. Since x86 kvm seems to have one of > > the most systematic and thorough approaches, Srikar is prototyping a > > generalization of kvm's x86_decode_insn() to make it support kprobes, > > and eventually uprobes. (Note that kvm does NOT appear to be a good > > starting place on powerpc and s390.) Approaching from the minimalist > > side, I've implemented an x86 instruction-parsing API with just enough > > smarts (so far) to support kprobes and uprobes. > > > > We'd be interested to know whether these efforts are consistent > > with what you have in mind. > > > > See more details below. > > > > Jim ... /* * x86 instruction analysis * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. * * Copyright (C) IBM Corporation, 2002, 2004, 2009 */ #include // #include #include "insn_x86.h" /** * insn_init() - initialize struct insn * @insn: &struct insn to be initialized * @kaddr: address (in kernel memory) of instruction (or copy thereof) * @x86_64: true for 64-bit kernel or 64-bit app */ void insn_init(struct insn *insn, const u8 *kaddr, bool x86_64) { memset(insn, 0, sizeof(*insn)); insn->kaddr = kaddr; insn->next_byte = kaddr; insn->x86_64 = x86_64; } EXPORT_SYMBOL_GPL(insn_init); /** * insn_get_prefixes - scan x86 instruction prefix bytes * @insn: &struct insn containing instruction * * Populates the @insn->prefixes bitmap, and updates @insn->next_byte * to point to the (first) opcode. No effect if @insn->prefixes.got * is already true. */ void insn_get_prefixes(struct insn *insn) { u32 pfx; struct insn_field *prefixes = &insn->prefixes; if (prefixes->got) return; for (;; insn->next_byte++, prefixes->nbytes++) { u8 b = *(insn->next_byte); #ifdef CONFIG_X86_64 if ((b & 0xf0) == 0x40 && insn->x86_64) { prefixes->value |= X86_PFX_REX; prefixes->value |= (b & 0x0f) * X86_PFX_REX_BASE; /* REX prefix is always last. */ break; } #endif switch (b) { case 0x26: pfx = X86_PFX_ES; break; case 0x2E: pfx = X86_PFX_CS; break; case 0x36: pfx = X86_PFX_SS; break; case 0x3E: pfx = X86_PFX_DS; break; case 0x64: pfx = X86_PFX_FS; break; case 0x65: pfx = X86_PFX_GS; break; case 0x66: pfx = X86_PFX_OPNDSZ; break; case 0x67: pfx = X86_PFX_ADDRSZ; break; case 0xF0: pfx = X86_PFX_LOCK; break; case 0xF2: pfx = X86_PFX_REPNE; break; case 0xF3: pfx = X86_PFX_REPE; break; default: pfx = 0x0; break; } if (!pfx) break; prefixes->value |= pfx; } prefixes->got = true; } EXPORT_SYMBOL_GPL(insn_get_prefixes); /** * insn_get_opcode - collect opcode(s) * @insn: &struct insn containing instruction * * Populates @insn->opcode1 (and @insn->opcode2, if it's a 2-byte opcode) * and updates @insn->next_byte to point past the opcode byte(s). * If necessary, first collects any preceding (prefix) bytes. * Sets @insn->opcode.value = opcode1. No effect if @insn->opcode.got * is already true. */ void insn_get_opcode(struct insn *insn) { struct insn_field *opcode = &insn->opcode; if (opcode->got) return; if (!insn->prefixes.got) insn_get_prefixes(insn); insn->opcode1 = *insn->next_byte++; if (insn->opcod
Re: instruction-analysis API(s)
Jim Keniston wrote: [...] > Possible Clients of an Instruction-Analysis Service > --- > Where in the kernel is instruction analysis currently used? I think we also need to clarify why they need it(what information/action they require), because it defines API. > - kprobes (arm, avr32, ia64, mn10300, powerpc, s390, sh, sparc, x86) > - uprobes (ia64, powerpc, s390, x86) > - djprobes (not in kernel, not sure of status) They need 'static' analysis of instructions to get below parameters. - length - attribute (prefixes, etc) - type (jump/accumulation/memory access/flag change, etc) > - hypervisors: > - kvm (ia64, powerpc, s390, x86) > - powerpc Cell Beat hypervisor > - floating-point unit emulation (arm, s390, sparc, x86) They need 'dynamic' instruction emulation. > - exception handling: > - page fault (powerpc, x86) > - illegal instruction (s390) > - unaligned trap (ia64) > - vm86 fault (x86) Depends on the case, however, some of them just need instruction type and length, and these should be done very quickly. So, they need a light-weight and specialized analyzer/emulator. > - disassembly (powerpc, s390) > - powerpc: xmon, code patching (for crash dump?) Maybe, static analysis is enough? > - ia64: emulation of brl instruction Dynamic emulation. > - x86: alternative-instruction patching (replacing instructions that are > inappropriate for the CPU rev), fault injection Static analysis. [...] > Prospects/Problems > -- > What are the prospects for adapting these various subsystems to use > a common instruction-analysis service? Typically, not very good. > Here are some of the problems: > - Different architectures have very different instruction-analysis > needs. IMHO, there are just need two types of interfaces: static analyzer or dynamic emulator. > - Different architectures have very different instruction formats and > instruction attributes. Consequently, the opportunities for common > code shared by multiple architectures are few. > - Different subsystems are interested in different instruction > attributes, and/or classify instructions differently. > - Some subsystems are interested in only certain instructions. Indeed. I think we don't need to care all of them at the start point. Just starting simply and evolving code on upstream is my recommendation. > - Some subsystems, such as fault handlers, want to maximize efficiency > by examining as little of the instruction as possible; while others, > such as *probes, take a more leisurely approach (e.g., reading enough > bytes to capture the largest possible instruction, even if that means > faulting in a page). Indeed. I think those efficiency-required subsystems are so arch-dependent that we can (just) shares instruction bitmaps or provide special interfaces. Thank you for your work! -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America) Inc. Software Solutions Division e-mail: mhira...@redhat.com
Re: instruction-analysis API(s)
Jim Keniston wrote: > On Fri, 2009-02-06 at 15:49 -0500, Masami Hiramatsu wrote: >> Hi Jim, >> >> I'm also interested in the instruction decoder. >> If you don't mind, could we share the API specification? >> I'd like to port djprobe on it. > > I'm enclosing the little x86 instruction-analysis protoype I hacked > together (insn_x86.*), along with a copy of systemtap's > runtime/uprobes2/uprobes_x86.c, which I modified to use it. Hmm, actually, djprobe needs both of the length and the type of instructions, since it has to know how many bytes must be copied and be replaced by a long jump. > But again, we haven't really settled on an API. For example, my x86 > prototype doesn't collect all the info that kvm needs. We're thinking > that adapting some existing code (like kvm in the x86 case) might be > more palatable to LKML. Sure, since kvm and emulators have to fetch the values of src/dst for the emulation, they need actual register values. On the other hand, the disasm/*probe have to analysis code before hitting, so they don't know the actual value of the registers. So, I think we should split x86_decode_insn() into 2 parts, static analysis and emulation preparation. For example: 1) analyzing code statically (x86_analyze_insn) - just decoding an instruction - this phase may consist of several sub-functions. 2) preparing emulation (x86_evaluate_insn) - evaluating src/dst based on current(vcpu) registers 3) executing emulation (x86_emulate_insn) - emulating an analyzed instruction Thanks, > > Jim > >> Thanks! >> >> Jim Keniston wrote: >>> Hi, Roland. Back in a conference call in December, we discussed >>> approaches to refactoring utrace-related code such as uprobes, to >>> make some of the services provided there more generally available. >>> In particular, you suggested an "instruction analysis" service that >>> various subsystems could exploit -- kprobes and uprobes/ubp at first, >>> and eventually perhaps gdb, perfmon, kvm, ftrace, and djprobes. >>> > ... >>> Srikar Dronamraju and I are exploring two different approaches to an >>> x86 instruction-parsing service. Since x86 kvm seems to have one of >>> the most systematic and thorough approaches, Srikar is prototyping a >>> generalization of kvm's x86_decode_insn() to make it support kprobes, >>> and eventually uprobes. (Note that kvm does NOT appear to be a good >>> starting place on powerpc and s390.) Approaching from the minimalist >>> side, I've implemented an x86 instruction-parsing API with just enough >>> smarts (so far) to support kprobes and uprobes. >>> >>> We'd be interested to know whether these efforts are consistent >>> with what you have in mind. >>> >>> See more details below. >>> >>> Jim > ... > -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America) Inc. Software Solutions Division e-mail: mhira...@redhat.com
Re: instruction-analysis API(s)
On Mon, Feb 09, 2009 at 06:05:56PM -0500, Masami Hiramatsu wrote: > Jim Keniston wrote: > > On Fri, 2009-02-06 at 15:49 -0500, Masami Hiramatsu wrote: > >> Hi Jim, > >> > >> I'm also interested in the instruction decoder. > >> If you don't mind, could we share the API specification? > >> I'd like to port djprobe on it. > > > > I'm enclosing the little x86 instruction-analysis protoype I hacked > > together (insn_x86.*), along with a copy of systemtap's > > runtime/uprobes2/uprobes_x86.c, which I modified to use it. > > Hmm, actually, djprobe needs both of the length and the type of > instructions, since it has to know how many bytes must be copied > and be replaced by a long jump. > > > But again, we haven't really settled on an API. For example, my x86 > > prototype doesn't collect all the info that kvm needs. We're thinking > > that adapting some existing code (like kvm in the x86 case) might be > > more palatable to LKML. > > Sure, since kvm and emulators have to fetch the values of src/dst > for the emulation, they need actual register values. On the other hand, > the disasm/*probe have to analysis code before hitting, so they > don't know the actual value of the registers. > > So, I think we should split x86_decode_insn() into 2 parts, static > analysis and emulation preparation. > > For example: > 1) analyzing code statically (x86_analyze_insn) >- just decoding an instruction >- this phase may consist of several sub-functions. > > 2) preparing emulation (x86_evaluate_insn) >- evaluating src/dst based on current(vcpu) registers > > 3) executing emulation (x86_emulate_insn) >- emulating an analyzed instruction Right, that surely sounds like the way to go. However, we've been cautioned that the instruction emulation area of the kvm code is very performance sensitive. But, there is no harm in prototyping the above and then worrying about any optimizations so there isn't a performance issue -- in any case, I guess [ku]probes are very infrequent users of this compared to KVM. Ananth
Re: instruction-analysis API(s)
On Tue, 2009-02-10 at 10:12 +0530, Ananth N Mavinakayanahalli wrote: > On Mon, Feb 09, 2009 at 06:05:56PM -0500, Masami Hiramatsu wrote: > > Jim Keniston wrote: > > > On Fri, 2009-02-06 at 15:49 -0500, Masami Hiramatsu wrote: > > >> Hi Jim, > > >> > > >> I'm also interested in the instruction decoder. > > >> If you don't mind, could we share the API specification? > > >> I'd like to port djprobe on it. > > > > > > I'm enclosing the little x86 instruction-analysis protoype I hacked > > > together (insn_x86.*), along with a copy of systemtap's > > > runtime/uprobes2/uprobes_x86.c, which I modified to use it. > > > > Hmm, actually, djprobe needs both of the length and the type of > > instructions, since it has to know how many bytes must be copied > > and be replaced by a long jump. > > > > > But again, we haven't really settled on an API. For example, my x86 > > > prototype doesn't collect all the info that kvm needs. We're thinking > > > that adapting some existing code (like kvm in the x86 case) might be > > > more palatable to LKML. > > > > Sure, since kvm and emulators have to fetch the values of src/dst > > for the emulation, they need actual register values. On the other hand, > > the disasm/*probe have to analysis code before hitting, so they > > don't know the actual value of the registers. > > > > So, I think we should split x86_decode_insn() into 2 parts, static > > analysis and emulation preparation. > > > > For example: > > 1) analyzing code statically (x86_analyze_insn) > >- just decoding an instruction > >- this phase may consist of several sub-functions. > > > > 2) preparing emulation (x86_evaluate_insn) > >- evaluating src/dst based on current(vcpu) registers > > > > 3) executing emulation (x86_emulate_insn) > >- emulating an analyzed instruction > > Right, that surely sounds like the way to go. However, we've been > cautioned that the instruction emulation area of the kvm code is very > performance sensitive. But, there is no harm in prototyping the above > and then worrying about any optimizations so there isn't a performance > issue -- in any case, I guess [ku]probes are very infrequent users of > this compared to KVM. > > Ananth Hi, Masami. Ananth, Srikar, Maneesh, and I talked about this last night. While I was on vacation, Srikar did further investigation into adapting x86 kvm's instuction analysis for more general use, and he's not optimistic. For the short term, at least (i.e., between now and the Linux Foundation Collaboration Summit in April), we're going to proceed based on the prototype I developed. As you noted, djprobes needs instruction lengths, and my prototype doesn't provide that info. (Uprobes computes instruction lengths for rip-relative x86_64 instructions, but that's only a subset of what you need.) Are you interested in extending/enhancing my prototype to make it useful for djprobes? If so, I'd be happy to consult. Thanks. Jim
Re: instruction-analysis API(s)
Jim Keniston wrote: > Hi, Masami. > > Ananth, Srikar, Maneesh, and I talked about this last night. While I > was on vacation, Srikar did further investigation into adapting x86 > kvm's instuction analysis for more general use, and he's not optimistic. > For the short term, at least (i.e., between now and the Linux Foundation > Collaboration Summit in April), we're going to proceed based on the > prototype I developed. > > As you noted, djprobes needs instruction lengths, and my prototype > doesn't provide that info. (Uprobes computes instruction lengths for > rip-relative x86_64 instructions, but that's only a subset of what you > need.) Are you interested in extending/enhancing my prototype to make > it useful for djprobes? If so, I'd be happy to consult. Hi Jim, Thank you for considering djprobe. Actually, I'm developing insn_get_length() based on your prototype and porting djprobe on it. After tested code, I'd like to post the insn_x86 code. Thank you, > > Thanks. > Jim > -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America) Inc. Software Solutions Division e-mail: mhira...@redhat.com
Re: instruction-analysis API(s)
Jim Keniston wrote: > On Tue, 2009-02-10 at 10:12 +0530, Ananth N Mavinakayanahalli wrote: >> On Mon, Feb 09, 2009 at 06:05:56PM -0500, Masami Hiramatsu wrote: >>> Jim Keniston wrote: On Fri, 2009-02-06 at 15:49 -0500, Masami Hiramatsu wrote: > Hi Jim, > > I'm also interested in the instruction decoder. > If you don't mind, could we share the API specification? > I'd like to port djprobe on it. I'm enclosing the little x86 instruction-analysis protoype I hacked together (insn_x86.*), along with a copy of systemtap's runtime/uprobes2/uprobes_x86.c, which I modified to use it. >>> Hmm, actually, djprobe needs both of the length and the type of >>> instructions, since it has to know how many bytes must be copied >>> and be replaced by a long jump. >>> But again, we haven't really settled on an API. For example, my x86 prototype doesn't collect all the info that kvm needs. We're thinking that adapting some existing code (like kvm in the x86 case) might be more palatable to LKML. >>> Sure, since kvm and emulators have to fetch the values of src/dst >>> for the emulation, they need actual register values. On the other hand, >>> the disasm/*probe have to analysis code before hitting, so they >>> don't know the actual value of the registers. >>> >>> So, I think we should split x86_decode_insn() into 2 parts, static >>> analysis and emulation preparation. >>> >>> For example: >>> 1) analyzing code statically (x86_analyze_insn) >>>- just decoding an instruction >>>- this phase may consist of several sub-functions. >>> >>> 2) preparing emulation (x86_evaluate_insn) >>>- evaluating src/dst based on current(vcpu) registers >>> >>> 3) executing emulation (x86_emulate_insn) >>>- emulating an analyzed instruction >> Right, that surely sounds like the way to go. However, we've been >> cautioned that the instruction emulation area of the kvm code is very >> performance sensitive. But, there is no harm in prototyping the above >> and then worrying about any optimizations so there isn't a performance >> issue -- in any case, I guess [ku]probes are very infrequent users of >> this compared to KVM. >> >> Ananth > > Hi, Masami. > > Ananth, Srikar, Maneesh, and I talked about this last night. While I > was on vacation, Srikar did further investigation into adapting x86 > kvm's instuction analysis for more general use, and he's not optimistic. > For the short term, at least (i.e., between now and the Linux Foundation > Collaboration Summit in April), we're going to proceed based on the > prototype I developed. > > As you noted, djprobes needs instruction lengths, and my prototype > doesn't provide that info. (Uprobes computes instruction lengths for > rip-relative x86_64 instructions, but that's only a subset of what you > need.) Are you interested in extending/enhancing my prototype to make > it useful for djprobes? If so, I'd be happy to consult. Here are a patch against your code and an example code for instruction length decoder. Curiously, KVM's instruction decoder does not completely cover all instructions(especially, Jcc/test...). I had to refer Intel manuals. Moreover, even with this patch, the decoder is incomplete. - this doesn't cover 3bytes opcode yet. - this doesn't decode sib, displacement and immediate. - might have some bugs :-( Thank you, -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America) Inc. Software Solutions Division e-mail: mhira...@redhat.com Index: insn_x86.h === --- insn_x86.h (revision 1510) +++ insn_x86.h (working copy) @@ -66,6 +66,10 @@ struct insn_field displacement; struct insn_field immediate; + u8 op_bytes; + u8 ad_bytes; + u8 length; + const u8 *kaddr;/* kernel address of insn (copy) to analyze */ const u8 *next_byte; bool x86_64; @@ -75,6 +79,7 @@ extern void insn_get_prefixes(struct insn *insn); extern void insn_get_opcode(struct insn *insn); extern void insn_get_modrm(struct insn *insn); +extern void insn_get_length(struct insn *insn); #ifdef CONFIG_X86_64 extern bool insn_rip_relative(struct insn *insn); Index: insn_x86.c === --- insn_x86.c (revision 1510) +++ insn_x86.c (working copy) @@ -17,7 +17,7 @@ * * Copyright (C) IBM Corporation, 2002, 2004, 2009 */ - +#include #include // #include #include "insn_x86.h" @@ -34,6 +34,11 @@ insn->kaddr = kaddr; insn->next_byte = kaddr; insn->x86_64 = x86_64; + insn->op_bytes = 4; + if (x86_64) + insn->ad_bytes = 8; + else + insn->ad_bytes = 4; } EXPORT_SYMBOL_GPL(insn_init); @@ -79,10 +84,51 @@ break; prefixes->value |= pfx; } + if (prefixes->value & X86_PFX_OPNDSZ) { + /* oprand size
Re: instruction-analysis API(s)
On Fri, 2009-02-27 at 16:20 -0500, Masami Hiramatsu wrote: ... > > Here are a patch against your code and an example code for > instruction length decoder. > Curiously, KVM's instruction decoder does not completely > cover all instructions(especially, Jcc/test...). > I had to refer Intel manuals. > > Moreover, even with this patch, the decoder is incomplete. > - this doesn't cover 3bytes opcode yet. > - this doesn't decode sib, displacement and immediate. > - might have some bugs :-( > > > Thank you, Thanks for your work on this. Comments below. Jim > > plain text document attachment (insn_x86.patch) > Index: insn_x86.h > === > --- insn_x86.h(revision 1510) > +++ insn_x86.h(working copy) > @@ -66,6 +66,10 @@ > struct insn_field displacement; > struct insn_field immediate; > > + u8 op_bytes; I'd probably use opnd_bytes and addr_bytes here, for clarity. (When I first saw "op", I thought "opcode".) Also, we should clarify that these are the EFFECTIVE lengths, not the lengths of the immediate and displacement fields in the instruction. > + u8 ad_bytes; > + u8 length; > + > const u8 *kaddr;/* kernel address of insn (copy) to analyze */ > const u8 *next_byte; > bool x86_64; > @@ -75,6 +79,7 @@ > extern void insn_get_prefixes(struct insn *insn); > extern void insn_get_opcode(struct insn *insn); > extern void insn_get_modrm(struct insn *insn); > +extern void insn_get_length(struct insn *insn); > > #ifdef CONFIG_X86_64 > extern bool insn_rip_relative(struct insn *insn); > Index: insn_x86.c > === > --- insn_x86.c(revision 1510) > +++ insn_x86.c(working copy) > @@ -17,7 +17,7 @@ > * > * Copyright (C) IBM Corporation, 2002, 2004, 2009 > */ > - > +#include > #include > // #include > #include "insn_x86.h" > @@ -34,6 +34,11 @@ > insn->kaddr = kaddr; > insn->next_byte = kaddr; > insn->x86_64 = x86_64; > + insn->op_bytes = 4; > + if (x86_64) > + insn->ad_bytes = 8; > + else > + insn->ad_bytes = 4; > } > EXPORT_SYMBOL_GPL(insn_init); > > @@ -79,10 +84,51 @@ > break; > prefixes->value |= pfx; > } > + if (prefixes->value & X86_PFX_OPNDSZ) { > + /* oprand size switches 2/4 */ > + insn->op_bytes ^= 6; > + } > + if (prefixes->value & X86_PFX_ADDRSZ) { > + /* address size switches 2/4 or 4/8 */ > +#ifdef CONFIG_X86_64 > + if (insn->x86_64) > + insn->op_bytes ^= 12; > + else > +#endif > + insn->op_bytes ^= 6; This seems wrong. You're checking the address-size prefix, but adjusting the operand size. > + } > +#ifdef CONFIG_X86_64 > + if (prefixes->value & X86_PFX_REXW) > + insn->op_bytes = 8; > +#endif > prefixes->got = true; > } > EXPORT_SYMBOL_GPL(insn_get_prefixes); > > +static bool __insn_is_stack(struct insn *insn) It's not entirely clear to me what this function checks. (A more precise name might help.) You have pushes, pops, and calls here, but you also have some instructions that don't appear to affect the stack at all. And other push and pop instructions are missing. > +{ > + u8 reg; > + if (insn->opcode.nbytes == 2) > + return 0; The following are 2-byte pushes or pops: 0f-a0, 0f-a1, 0f-a8, and 0f-a9. Also, since the return value is bool, I'd prefer to see true/false rather than 1/0. > + > + switch(insn->opcode1) { > + case 0x68: > + case 0x6a: > + case 0x9c: > + case 0x9d: > + case 0xc5: 0xc5 = lds. Why lds? In general, it'd be nice to add a comment showing the mnemonic next to each hex value -- e.g., case 0x68: /* push */ > + case 0xe8: > + return 1; > + } Other related instructions: 9a, 1f, 07, 17, 8f. > + reg = ((*insn->next_byte) >> 3) & 7; > + if ((insn->opcode1 & 0xf0) == 0x50 || > + (insn->opcode1 == 0x1a && reg == 0) || The above line doesn't seem right. It catches things like sbb (%rax),%al . > + (insn->opcode1 == 0xff && (reg & 1) == 0 && reg != 0)) { Looks like the interesting reg values are 2 (call), 3 (call), and 6 (push). > + return 1; > + } > + return 0; > +} > + > /** > * insn_get_opcode - collect opcode(s) > * @insn:&struct insn containing instruction > @@ -108,6 +154,8 @@ > opcode->nbytes = 1; > opcode->value = insn->opcode1; > opcode->got = true; > + if (insn->x86_64 && __insn_is_stack(insn)) > + insn->op_bytes = 8; > } > EXPORT_SYMBOL_GPL(insn_get_opcode); > > @@ -208,3 +256,115 @@ > } > EXPORT_SYMBOL_GPL(insn_rip_relative); > #endif > + > +/** > + * > + * insn_get_length() - Get the length of instruction > + * @insn:&struct insn cont
Re: instruction-analysis API(s)
Hi Jim, Jim Keniston wrote: > On Fri, 2009-02-27 at 16:20 -0500, Masami Hiramatsu wrote: > ... >> Here are a patch against your code and an example code for >> instruction length decoder. >> Curiously, KVM's instruction decoder does not completely >> cover all instructions(especially, Jcc/test...). >> I had to refer Intel manuals. >> >> Moreover, even with this patch, the decoder is incomplete. >> - this doesn't cover 3bytes opcode yet. >> - this doesn't decode sib, displacement and immediate. >> - might have some bugs :-( >> >> >> Thank you, > > Thanks for your work on this. Comments below. Thank you very much for review! Actually, that code was based on KVM code, so I also found many opcodes were not supported. > As I mentioned in private email, you or I should probably refactor this > into: > - insn_get_sib() > - insn_get_displacement() > - insn_get_immediate() > - insn_get_length() Agreed, these should be supported. I also would like to change struct insn as below; struct insn { struct insn_field prefixes; /* prefixes.value is a bitmap */ struct insn_field opcode; /* opcode.bytes[n] == opcode_n */ struct insn_field modrm; struct insn_field sib; struct insn_field displacement; union { struct insn_field immediate; struct insn_field moffset1; /* for 64bit MOV */ struct insn_field immediate1; /* for 64bit imm or off16/32 */ }; union { struct insn_field moffset2; /* for 64bit MOV */ struct insn_field immediate2; /* for 64bit imm or seg16 */ }; u8 opnd_bytes; u8 addr_bytes; u8 length; bool x86_64; const u8 *kaddr;/* kernel address of insn (copy) to analyze */ const u8 *next_byte; }; opcode2 and opcode3 will be stored in opcode.value with opcode1. Now, I'm updating my code. Would anyone also be working on it? Thank you, > > Jim > >> plain text document attachment (insn_x86.patch) >> Index: insn_x86.h >> === >> --- insn_x86.h (revision 1510) >> +++ insn_x86.h (working copy) >> @@ -66,6 +66,10 @@ >> struct insn_field displacement; >> struct insn_field immediate; >> >> +u8 op_bytes; > > I'd probably use opnd_bytes and addr_bytes here, for clarity. (When I > first saw "op", I thought "opcode".) Also, we should clarify that these > are the EFFECTIVE lengths, not the lengths of the immediate and > displacement fields in the instruction. > >> +u8 ad_bytes; >> +u8 length; >> + >> const u8 *kaddr;/* kernel address of insn (copy) to analyze */ >> const u8 *next_byte; >> bool x86_64; >> @@ -75,6 +79,7 @@ >> extern void insn_get_prefixes(struct insn *insn); >> extern void insn_get_opcode(struct insn *insn); >> extern void insn_get_modrm(struct insn *insn); >> +extern void insn_get_length(struct insn *insn); >> >> #ifdef CONFIG_X86_64 >> extern bool insn_rip_relative(struct insn *insn); >> Index: insn_x86.c >> === >> --- insn_x86.c (revision 1510) >> +++ insn_x86.c (working copy) >> @@ -17,7 +17,7 @@ >> * >> * Copyright (C) IBM Corporation, 2002, 2004, 2009 >> */ >> - >> +#include >> #include >> // #include >> #include "insn_x86.h" >> @@ -34,6 +34,11 @@ >> insn->kaddr = kaddr; >> insn->next_byte = kaddr; >> insn->x86_64 = x86_64; >> +insn->op_bytes = 4; >> +if (x86_64) >> +insn->ad_bytes = 8; >> +else >> +insn->ad_bytes = 4; >> } >> EXPORT_SYMBOL_GPL(insn_init); >> >> @@ -79,10 +84,51 @@ >> break; >> prefixes->value |= pfx; >> } >> +if (prefixes->value & X86_PFX_OPNDSZ) { >> +/* oprand size switches 2/4 */ >> +insn->op_bytes ^= 6; >> +} >> +if (prefixes->value & X86_PFX_ADDRSZ) { >> +/* address size switches 2/4 or 4/8 */ >> +#ifdef CONFIG_X86_64 >> +if (insn->x86_64) >> +insn->op_bytes ^= 12; >> +else >> +#endif >> +insn->op_bytes ^= 6; > > This seems wrong. You're checking the address-size prefix, but > adjusting the operand size. > >> +} >> +#ifdef CONFIG_X86_64 >> +if (prefixes->value & X86_PFX_REXW) >> +insn->op_bytes = 8; >> +#endif >> prefixes->got = true; >> } >> EXPORT_SYMBOL_GPL(insn_get_prefixes); >> >> +static bool __insn_is_stack(struct insn *insn) > > It's not entirely clear to me what this function checks. (A more > precise name might help.) You have pushes, pops, and calls here, but > you also have some instructions that don't appear to affect the stack at > all. And other push and pop instructions are missing. > >> +{ >> +u8 reg; >> +if (insn->opcode.nbytes == 2) >> +return 0; > > The following
Re: instruction-analysis API(s)
Hi Jim and Sriker, Here, I almost rewrote my patch. Changelog: - rewrite decoding logic based on Intel' manual. - supoort insn_get_sib(),insn_get_displacement() and insn_get_immediate() too. - support 3 bytes opcode and 64bit immediate. - introduce some bitmaps. Thank you, Masami Hiramatsu wrote: > Hi Jim, > > Jim Keniston wrote: >> On Fri, 2009-02-27 at 16:20 -0500, Masami Hiramatsu wrote: >> ... >>> Here are a patch against your code and an example code for >>> instruction length decoder. >>> Curiously, KVM's instruction decoder does not completely >>> cover all instructions(especially, Jcc/test...). >>> I had to refer Intel manuals. >>> >>> Moreover, even with this patch, the decoder is incomplete. >>> - this doesn't cover 3bytes opcode yet. >>> - this doesn't decode sib, displacement and immediate. >>> - might have some bugs :-( >>> >>> >>> Thank you, >> Thanks for your work on this. Comments below. > > Thank you very much for review! > > Actually, that code was based on KVM code, so I also found many > opcodes were not supported. > >> As I mentioned in private email, you or I should probably refactor this >> into: >> - insn_get_sib() >> - insn_get_displacement() >> - insn_get_immediate() >> - insn_get_length() > > Agreed, these should be supported. > > I also would like to change struct insn as below; > > struct insn { > struct insn_field prefixes; /* prefixes.value is a bitmap */ > struct insn_field opcode; /* opcode.bytes[n] == opcode_n */ > struct insn_field modrm; > struct insn_field sib; > struct insn_field displacement; > union { > struct insn_field immediate; > struct insn_field moffset1; /* for 64bit MOV */ > struct insn_field immediate1; /* for 64bit imm or off16/32 > */ > }; > union { > struct insn_field moffset2; /* for 64bit MOV */ > struct insn_field immediate2; /* for 64bit imm or seg16 */ > }; > > u8 opnd_bytes; > u8 addr_bytes; > u8 length; > bool x86_64; > > const u8 *kaddr;/* kernel address of insn (copy) to analyze */ > const u8 *next_byte; > }; > > opcode2 and opcode3 will be stored in opcode.value with opcode1. > > Now, I'm updating my code. Would anyone also be working on it? > > Thank you, > >> Jim >> >>> plain text document attachment (insn_x86.patch) >>> Index: insn_x86.h >>> === >>> --- insn_x86.h (revision 1510) >>> +++ insn_x86.h (working copy) >>> @@ -66,6 +66,10 @@ >>> struct insn_field displacement; >>> struct insn_field immediate; >>> >>> + u8 op_bytes; >> I'd probably use opnd_bytes and addr_bytes here, for clarity. (When I >> first saw "op", I thought "opcode".) Also, we should clarify that these >> are the EFFECTIVE lengths, not the lengths of the immediate and >> displacement fields in the instruction. >> >>> + u8 ad_bytes; >>> + u8 length; >>> + >>> const u8 *kaddr;/* kernel address of insn (copy) to analyze */ >>> const u8 *next_byte; >>> bool x86_64; >>> @@ -75,6 +79,7 @@ >>> extern void insn_get_prefixes(struct insn *insn); >>> extern void insn_get_opcode(struct insn *insn); >>> extern void insn_get_modrm(struct insn *insn); >>> +extern void insn_get_length(struct insn *insn); >>> >>> #ifdef CONFIG_X86_64 >>> extern bool insn_rip_relative(struct insn *insn); >>> Index: insn_x86.c >>> === >>> --- insn_x86.c (revision 1510) >>> +++ insn_x86.c (working copy) >>> @@ -17,7 +17,7 @@ >>> * >>> * Copyright (C) IBM Corporation, 2002, 2004, 2009 >>> */ >>> - >>> +#include >>> #include >>> // #include >>> #include "insn_x86.h" >>> @@ -34,6 +34,11 @@ >>> insn->kaddr = kaddr; >>> insn->next_byte = kaddr; >>> insn->x86_64 = x86_64; >>> + insn->op_bytes = 4; >>> + if (x86_64) >>> + insn->ad_bytes = 8; >>> + else >>> + insn->ad_bytes = 4; >>> } >>> EXPORT_SYMBOL_GPL(insn_init); >>> >>> @@ -79,10 +84,51 @@ >>> break; >>> prefixes->value |= pfx; >>> } >>> + if (prefixes->value & X86_PFX_OPNDSZ) { >>> + /* oprand size switches 2/4 */ >>> + insn->op_bytes ^= 6; >>> + } >>> + if (prefixes->value & X86_PFX_ADDRSZ) { >>> + /* address size switches 2/4 or 4/8 */ >>> +#ifdef CONFIG_X86_64 >>> + if (insn->x86_64) >>> + insn->op_bytes ^= 12; >>> + else >>> +#endif >>> + insn->op_bytes ^= 6; >> This seems wrong. You're checking the address-size prefix, but >> adjusting the operand size. >> >>> + } >>> +#ifdef CONFIG_X86_64 >>> + if (prefixes->value & X86_PFX_REXW) >>> + insn->op_bytes = 8; >>> +#endif >>> prefixes->got = true; >>> } >>> EXPORT_SYMBOL_GPL(insn_g
Re: instruction-analysis API(s)
Quoting Masami Hiramatsu : Hi Jim and Sriker, Here, I almost rewrote my patch. Changelog: - rewrite decoding logic based on Intel' manual. - supoort insn_get_sib(),insn_get_displacement() and insn_get_immediate() too. - support 3 bytes opcode and 64bit immediate. - introduce some bitmaps. Thank you, Well, I didn't do much of a code review -- it looks like you addressed all my concerns -- but as I mentioned on IRC, I hacked together a test rig whereby you can disassemble a designated elf file (e.g., vmlinux, libc, libm) and then compare insn_get_length()'s results with objdump's results. The comment in distill.awk shows how to use objdump, awk, and test_get_len together. I also hacked up insn_x86.h and insn_x86.c to work in user space. Most of that is accomplished via insn_x86_user.h, but it certainly isn't necessary to do it that way. In particular, __u8, __s8, __u16, etc. are versions of u8, s8, u16, etc. that can be used in both kernel and user code, so maybe we should switch to those. I tested with vmlinux, libc, and libm on both an i686 system and an x86_64 system. I found and fixed a few bugs. Here are the ones that come to mind (all fixed): - shrd/shld, which we discussed - missing support for weird nops with modrm bytes (0f 1f ...). - neglected to include the REX prefix in prefixes.nbytes - missing static decl in an inline function in insn_x86.h There are some other cases where insn_get_length() doesn't match up with the disassembly, but I don't consider them bugs: - 0x9b is an instruction (fwait), but the disassembler treats it as a prefix. For example 9b df ... can be disassembled as fstsw ... // wait, then store status word or fwait // wait fnstsw ... // store status word without waiting Perhaps it's relevant to investigate whether a single-step of 9b df ... would execute just the fwait or the whole fstsw. Anyway, this explains the "failures" of finit and fstsw that I mentioned to you. I also saw this with fstcw and fclex. - Illegal instruction sequences, such as an x86_64 instruction that starts with 0x40, or a misplaced 0x65 prefix. Typically, we see these when disassembling data. I just filtered out (via egrep) instructions whose disassembly starts with "rex" or includes "(bad)". We could address the above by filtering them out in distill.awk or test_get_len.c. I think we're clean otherwise. There's a little more housecleaning to do -- e.g., adding Hitachi (?) copyright to IBM copyright, discarding insn_field_exists() and insn_extract_reg(), putting this all in git somewhere. But not tonight. Pull all the attached files into a directory and have a go -- e.g., $ make $ objdump -d vmlinux | awk -f distill.awk | ./test_get_len [x86_64] Jim # Usage: objdump -d a.out | awk -f distill.awk | ./test_get_len # Distills the disassembly as follows: # - Removes all lines except the disassembled instructions. # - For instructions that exceed 1 line (7 bytes), crams all the hex bytes # into a single line. BEGIN { prev_addr = "" prev_hex = "" prev_mnemonic = "" } /^ *[0-9a-f]+:/ { if (split($0, field, "\t") < 3) { # This is a continuation of the same insn. prev_hex = prev_hex field[2] } else { if (prev_addr != "") printf "%s\t%s\t%s\n", prev_addr, prev_hex, prev_mnemonic prev_addr = field[1] prev_hex = field[2] prev_mnemonic = field[3] } } END { if (prev_addr != "") printf "%s\t%s\t%s\n", prev_addr, prev_hex, prev_mnemonic } /* * x86 instruction analysis * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. * * Copyright (C) IBM Corporation, 2002, 2004, 2009 */ #ifdef KERNEL #include #include #else #include #endif // #include #include "insn_x86.h" MODULE_LICENSE("GPL"); // for test /** * insn_init() - initialize struct insn * @insn: &struct insn to be initialized * @kaddr: address (in kernel memory) of instruction (or copy thereof) * @x86_64: true for 64-bit kernel or 64-bit app */ void insn_init(struct insn *insn, const u8 *kaddr, bool x86_64) { memset(insn, 0, sizeof(*insn)); insn->kaddr = kaddr
Re: instruction-analysis API(s)
Hi Jim, Jim Keniston wrote: > Quoting Masami Hiramatsu : > >> Hi Jim and Sriker, >> >> Here, I almost rewrote my patch. >> >> Changelog: >> - rewrite decoding logic based on Intel' manual. >> - supoort insn_get_sib(),insn_get_displacement() >> and insn_get_immediate() too. >> - support 3 bytes opcode and 64bit immediate. >> - introduce some bitmaps. >> >> Thank you, > > Well, I didn't do much of a code review -- it looks like you addressed > all my concerns -- but as I mentioned on IRC, I hacked together a test > rig whereby you can disassemble a designated elf file (e.g., vmlinux, > libc, libm) and then compare insn_get_length()'s results with objdump's > results. The comment in distill.awk shows how to use objdump, awk, and > test_get_len together. Thank you for review and test! > I also hacked up insn_x86.h and insn_x86.c to work in user space. Most > of that is accomplished via insn_x86_user.h, but it certainly isn't > necessary to do it that way. In particular, __u8, __s8, __u16, etc. are > versions of u8, s8, u16, etc. that can be used in both kernel and user > code, so maybe we should switch to those. > > I tested with vmlinux, libc, and libm on both an i686 system and an > x86_64 system. I found and fixed a few bugs. Here are the ones that > come to mind (all fixed): > - shrd/shld, which we discussed > - missing support for weird nops with modrm bytes (0f 1f ...). > - neglected to include the REX prefix in prefixes.nbytes > - missing static decl in an inline function in insn_x86.h Thank you for fixing it. BTW, it might have to support vm86 mode(especially, for user code). > There are some other cases where insn_get_length() doesn't match up with > the disassembly, but I don't consider them bugs: > - 0x9b is an instruction (fwait), but the disassembler treats it as a > prefix. For example 9b df ... can be disassembled as > fstsw ...// wait, then store status word > or > fwait// wait > fnstsw ...// store status word without waiting > Perhaps it's relevant to investigate whether a single-step of 9b df ... > would execute just the fwait or the whole fstsw. Anyway, this explains > the "failures" of finit and fstsw that I mentioned to you. I also saw > this with fstcw and fclex. FYI, there is a single wait/fwait instruction described at Intel software developers manual vol.2B p.399. > - Illegal instruction sequences, such as an x86_64 instruction that > starts with 0x40, or a misplaced 0x65 prefix. Typically, we see these > when disassembling data. I just filtered out (via egrep) instructions > whose disassembly starts with "rex" or includes "(bad)". Sure, I think insn_* should return -EINVAL or set insn.invalid = 1 if we found those invalid ops. E.g. kernel use BUG() macro, it adds some raw numbers after ud2, in that case, those raw numbers might be decoded as an illegal instruction. > We could address the above by filtering them out in distill.awk or > test_get_len.c. I think we're clean otherwise. > > There's a little more housecleaning to do -- e.g., adding Hitachi (?) > copyright to IBM copyright, discarding insn_field_exists() and > insn_extract_reg(), putting this all in git somewhere. But not tonight. > > Pull all the attached files into a directory and have a go -- e.g., > $ make > $ objdump -d vmlinux | awk -f distill.awk | ./test_get_len [x86_64] > > Jim > -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America) Inc. Software Solutions Division e-mail: mhira...@redhat.com
Re: instruction-analysis API(s)
On Tue, 2009-03-10 at 15:57 -0400, Masami Hiramatsu wrote: > Hi Jim, ... > > > > I tested with vmlinux, libc, and libm on both an i686 system and an > > x86_64 system. ... > > Thank you for fixing it. > BTW, it might have to support vm86 mode(especially, for user code). I have a vague idea of what vm86 mode is, but I don't really understand what the implications are for instruction analysis or probing. My understanding is that its use is rare (e.g., for DOS emulators), so it hasn't been a requirement for uprobes so far. > > > There are some other cases where insn_get_length() doesn't match up with > > the disassembly, but I don't consider them bugs: > > - 0x9b is an instruction (fwait), but the disassembler treats it as a > > prefix. For example 9b df ... can be disassembled as > > fstsw ...// wait, then store status word > > or > > fwait// wait > > fnstsw ...// store status word without waiting > > Perhaps it's relevant to investigate whether a single-step of 9b df ... > > would execute just the fwait or the whole fstsw. Anyway, this explains > > the "failures" of finit and fstsw that I mentioned to you. I also saw > > this with fstcw and fclex. > > FYI, there is a single wait/fwait instruction described at Intel software > developers manual vol.2B p.399. Yes, I tried probing an fclex instruction -- which is really fwait + fnclex -- and the single-step stopped after the fwait. So our instruction analysis is correct. (Of course, I had to adjust uprobes not to reject the 0x9b opcode -- need to check that in. PR 5273 is about this sort of thing.) > > > - Illegal instruction sequences, such as an x86_64 instruction that > > starts with 0x40, or a misplaced 0x65 prefix. Typically, we see these > > when disassembling data. I just filtered out (via egrep) instructions > > whose disassembly starts with "rex" or includes "(bad)". > > Sure, I think insn_* should return -EINVAL or set insn.invalid = 1 > if we found those invalid ops. E.g. kernel use BUG() macro, it adds > some raw numbers after ud2, in that case, those raw numbers might > be decoded as an illegal instruction. It could be useful to provide a function to determine whether the byte sequence is a valid instruction, but I don't think we should make that check by default. Here are some reasons: 1. It costs execution time. For some instructions, you have to examine the prefixes and/or modrm byte as well as the opcode(s). 2. It takes time to code it 100% right. In particular, mistakenly rejecting a valid instruction can be a nuisance. 3. Intel and AMD may not completely agree on which instructions are valid in which modes. I've always consulted the AMD manuals, since they're online and appear complete, but I'm not really sure whether what they say applies without exception to (say) Pentium and EM64T. 4. kprobes and uprobes have gotten along fine without such a test. (Uprobes's test is far from complete, and deliberately screens out some valid instructions, such as sysenter, that we suspect may produce weird results when single-stepped.) The assumption is that the address provided points to the first byte of a valid instruction. Since on x86, most random byte sequences look like some kind valid instruction, catching obviously invalid sequences wouldn't buy us very much. Jim