Re: [RFC] [PATCH 0/7] UBP, XOL and Uprobes [ Summary of Comments and actions to be taken ]

2010-01-22 Thread Masami Hiramatsu
Peter Zijlstra wrote:
 On Fri, 2010-01-22 at 12:32 +0530, Srikar Dronamraju wrote:
 
 2. XOL vma vs Emulation vs Single Stepping Inline vs using Protection
 Rings.
  XOL VMA is an additional process address vma.  This is
  opposition to add an additional vma without user actually
  requesting for the same.

  XOL vma and single stepping inline are the two architecture
  independent implementations. While other implementations are
  more architecture specific. Single stepping inline wouldnt go
  well with multithreaded process.

  Even though XOL vma has its own issues, we will go with it since
  other implementations seem to have more complications.

  we would look forward to implementing boosters later. 
  Later on, if we come across another techniques with lesser
  side-effects than the XOL vma, we would switch to using them.
 
 How about modifying glibc to reserve like 64 bytes on the TLS structure
 it has and storing the ins and possible boost jmp there? Since each
 thread can only have a single trap at any one time that should be
 enough.

Hmm, it is a good idea. Well, we'll have a copy of original insn
in kernel, but it could be simpler than managing XOL vma. :-)

Thank you,

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: mhira...@redhat.com



Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)

2010-01-20 Thread Masami Hiramatsu
Frederic Weisbecker wrote:
 On Tue, Jan 19, 2010 at 09:47:45AM -0800, Jim Keniston wrote:
 Do you have plans for a variant 
 that's completely in userspace?

 I don't know of any such plans, but I'd be interested to read more of
 your thoughts here.  As I understand it, you've suggested replacing the
 probed instruction with a jump into an instrumentation vma (the XOL
 area, or something similar).  Masami has demonstrated -- through his
 djprobes enhancement to kprobes -- that this can be done for many x86
 instructions.

 What does the code in the jumped-to vma do?  Is the instrumentation code
 that corresponds to the uprobe handlers encoded in an ad hoc .so?
 
 
 Once the instrumentation is requested by a process that is not the
 instrumented one, this looks impossible to set a uprobe without a
 minimal voluntary collaboration from the instrumented process
 (events sent through IPC or whatever). So that looks too limited,
 this is not anymore a true dynamic uprobe.

Agreed. Since uprobe's handler must be running in kernel,
we need to jump into kernel space anyway. Booster (just skips
a single-stepping(trap) exception) may be useful for
improving uprobe performance.

And also as Andi said, using jump instead of int3 in userspace
has 2GB address space limitation. It's not a problem for kernel
inside, but a big problem in userspace.

Thank you,

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: mhira...@redhat.com



Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)

2010-01-18 Thread Masami Hiramatsu
Jim Keniston wrote:
 Not really.  For #3 (boosting), you need to know everything for #2,  
 plus be able to compute the length of each instruction -- which we can  
 now do for x86.  To emulate an instruction (#4), you need to replicate  
 what it does, side-effects and all.  The x86 instruction set seems to  
 be adding new floating-point instructions all the time, and I bet even  
 Masami doesn't know what they all do, but so far, they all seem to  
 adhere to the instruction-length rules encoded in Masami's instruction  
 decoder.

Actually, current x86 decoder doesn't support FP(x87) instructions.(even
it already supported AVX) But I think it's not so hard to add it.

Thank you,

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: mhira...@redhat.com



Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)

2010-01-18 Thread Masami Hiramatsu
Jim Keniston wrote:
 On Mon, 2010-01-18 at 10:58 -0500, Masami Hiramatsu wrote:
 Jim Keniston wrote:
 Not really.  For #3 (boosting), you need to know everything for #2,  
 plus be able to compute the length of each instruction -- which we can  
 now do for x86.  To emulate an instruction (#4), you need to replicate  
 what it does, side-effects and all.  The x86 instruction set seems to  
 be adding new floating-point instructions all the time, and I bet even  
 Masami doesn't know what they all do, but so far, they all seem to  
 adhere to the instruction-length rules encoded in Masami's instruction  
 decoder.

 Actually, current x86 decoder doesn't support FP(x87) instructions.(even
 it already supported AVX) But I think it's not so hard to add it.

 
 At one point I verified that it worked for all the x87 instructions in
 libm:
 https://www.redhat.com/archives/utrace-devel/2009-March/msg00031.html
 I'm pretty sure I tested mmx instructions as well.  But I guess this was
 before you rearranged the opcode tables.
 
 Yeah, it wouldn't be hard to add back in, at least for purposes of
 computing instruction lengths.

objdump -d /lib/libm.so.6  | awk -f arch/x86/tools/distill.awk | ./test_get_len 
Succeed: decoded and checked 37198 instructions

Hmm, yeah, that's already supported :-D.

Thank you,

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: mhira...@redhat.com



Re: [RFC] [PATCH 7/7] Ftrace plugin for Uprobes

2010-01-13 Thread Masami Hiramatsu
Steven Rostedt wrote:
 On Tue, 2010-01-12 at 05:54 +0100, Frederic Weisbecker wrote:
 
 Now what if I want to launch ls and want to profile a function
 inside. What can I do with a trace event. I can't create the
 probe event based on a pid as I don't know it in advance.
 I could give it the ls cmdline and it manages to activate
 on the next ls launched. This is racy as another ls can
 be launched concurrently.
 
 You make a wrapper script:
 
 #!/bin/sh
 add probe to ls with pid $$
 exec $*
 
 I do this all the time to limit the function tracer to a specific app.
 
 #!/bin/sh
 echo $$  /debug/tracing/set_ftrace_pid
 echo function  /debug/tracing/current_tracer
 exec $*

I recommend you to add below line at the end of the script,
from my experience. :)

echo nop  /debug/tracing/current_tracer

Thank you,


-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: mhira...@redhat.com



Re: [RFC] [PATCH 7/7] Ftrace plugin for Uprobes

2010-01-13 Thread Masami Hiramatsu
Masami Hiramatsu wrote:
 Steven Rostedt wrote:
 On Tue, 2010-01-12 at 05:54 +0100, Frederic Weisbecker wrote:

 Now what if I want to launch ls and want to profile a function
 inside. What can I do with a trace event. I can't create the
 probe event based on a pid as I don't know it in advance.
 I could give it the ls cmdline and it manages to activate
 on the next ls launched. This is racy as another ls can
 be launched concurrently.

 You make a wrapper script:

 #!/bin/sh
 add probe to ls with pid $$
 exec $*

 I do this all the time to limit the function tracer to a specific app.

 #!/bin/sh
 echo $$  /debug/tracing/set_ftrace_pid
 echo function  /debug/tracing/current_tracer
 exec $*
 
 I recommend you to add below line at the end of the script,
 from my experience. :)
 
 echo nop  /debug/tracing/current_tracer

Oops, my bad, it doesn't work after exec...
But, it is very important to disable function tracer after
tracing target process.

So, perhaps, below script may work.

#!/bin/sh
(echo $BASHPID  /debug/tracing/set_ftrace_pid
 echo function  /debug/tracing/current_tracer
 exec $*)
echo nop  /debug/tracing/current_tracer

Thanks,

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: mhira...@redhat.com



Re: instruction-analysis API(s)

2009-03-04 Thread Masami Hiramatsu
Hi Jim,

Jim Keniston wrote:
 On Fri, 2009-02-27 at 16:20 -0500, Masami Hiramatsu wrote:
 ...
 Here are a patch against your code and an example code for
 instruction length decoder.
 Curiously, KVM's instruction decoder does not completely
 cover all instructions(especially, Jcc/test...).
 I had to refer Intel manuals.

 Moreover, even with this patch, the decoder is incomplete.
 - this doesn't cover 3bytes opcode yet.
 - this doesn't decode sib, displacement and immediate.
 - might have some bugs :-(


 Thank you,
 
 Thanks for your work on this.  Comments below.

Thank you very much for review!

Actually, that code was based on KVM code, so I also found many
opcodes were not supported.

 As I mentioned in private email, you or I should probably refactor this
 into:
 - insn_get_sib()
 - insn_get_displacement()
 - insn_get_immediate()
 - insn_get_length()

Agreed, these should be supported.

I also would like to change struct insn as below;

struct insn {
struct insn_field prefixes; /* prefixes.value is a bitmap */
struct insn_field opcode;   /* opcode.bytes[n] == opcode_n */
struct insn_field modrm;
struct insn_field sib;
struct insn_field displacement;
union {
struct insn_field immediate;
struct insn_field moffset1; /* for 64bit MOV */
struct insn_field immediate1;   /* for 64bit imm or off16/32 */
};
union {
struct insn_field moffset2; /* for 64bit MOV */
struct insn_field immediate2;   /* for 64bit imm or seg16 */
};

u8 opnd_bytes;
u8 addr_bytes;
u8 length;
bool x86_64;

const u8 *kaddr;/* kernel address of insn (copy) to analyze */
const u8 *next_byte;
};

opcode2 and opcode3 will be stored in opcode.value with opcode1.

Now, I'm updating my code. Would anyone also be working on it?

Thank you,

 
 Jim
 
 plain text document attachment (insn_x86.patch)
 Index: insn_x86.h
 ===
 --- insn_x86.h   (revision 1510)
 +++ insn_x86.h   (working copy)
 @@ -66,6 +66,10 @@
  struct insn_field displacement;
  struct insn_field immediate;

 +u8 op_bytes;
 
 I'd probably use opnd_bytes and addr_bytes here, for clarity.  (When I
 first saw op, I thought opcode.)  Also, we should clarify that these
 are the EFFECTIVE lengths, not the lengths of the immediate and
 displacement fields in the instruction.
 
 +u8 ad_bytes;
 +u8 length;
 +
  const u8 *kaddr;/* kernel address of insn (copy) to analyze */
  const u8 *next_byte;
  bool x86_64;
 @@ -75,6 +79,7 @@
  extern void insn_get_prefixes(struct insn *insn);
  extern void insn_get_opcode(struct insn *insn);
  extern void insn_get_modrm(struct insn *insn);
 +extern void insn_get_length(struct insn *insn);

  #ifdef CONFIG_X86_64
  extern bool insn_rip_relative(struct insn *insn);
 Index: insn_x86.c
 ===
 --- insn_x86.c   (revision 1510)
 +++ insn_x86.c   (working copy)
 @@ -17,7 +17,7 @@
   *
   * Copyright (C) IBM Corporation, 2002, 2004, 2009
   */
 -
 +#include linux/module.h
  #include linux/string.h
  // #include asm/insn.h
  #include insn_x86.h
 @@ -34,6 +34,11 @@
  insn-kaddr = kaddr;
  insn-next_byte = kaddr;
  insn-x86_64 = x86_64;
 +insn-op_bytes = 4;
 +if (x86_64)
 +insn-ad_bytes = 8;
 +else
 +insn-ad_bytes = 4;
  }
  EXPORT_SYMBOL_GPL(insn_init);

 @@ -79,10 +84,51 @@
  break;
  prefixes-value |= pfx;
  }
 +if (prefixes-value  X86_PFX_OPNDSZ) {
 +/* oprand size switches 2/4 */
 +insn-op_bytes ^= 6;
 +}
 +if (prefixes-value  X86_PFX_ADDRSZ) {
 +/* address size switches 2/4 or 4/8 */
 +#ifdef CONFIG_X86_64
 +if (insn-x86_64)
 +insn-op_bytes ^= 12;
 +else
 +#endif
 +insn-op_bytes ^= 6;
 
 This seems wrong.  You're checking the address-size prefix, but
 adjusting the operand size.
 
 +}
 +#ifdef CONFIG_X86_64
 +if (prefixes-value  X86_PFX_REXW)
 +insn-op_bytes = 8;
 +#endif
  prefixes-got = true;
  }
  EXPORT_SYMBOL_GPL(insn_get_prefixes);

 +static bool __insn_is_stack(struct insn *insn)
 
 It's not entirely clear to me what this function checks.  (A more
 precise name might help.)  You have pushes, pops, and calls here, but
 you also have some instructions that don't appear to affect the stack at
 all.  And other push and pop instructions are missing.
 
 +{
 +u8 reg;
 +if (insn-opcode.nbytes == 2)
 +return 0;
 
 The following are 2-byte pushes or pops: 0f-a0, 0f-a1, 0f-a8, and 0f-a9.
 
 Also, since the return value is bool, I'd prefer to see true/false
 rather than 1/0.
 
 +
 +switch(insn-opcode1) {
 +case 0x68

Re: instruction-analysis API(s)

2009-02-26 Thread Masami Hiramatsu
Jim Keniston wrote:
 Hi, Masami.
 
 Ananth, Srikar, Maneesh, and I talked about this last night.  While I
 was on vacation, Srikar did further investigation into adapting x86
 kvm's instuction analysis for more general use, and he's not optimistic.
 For the short term, at least (i.e., between now and the Linux Foundation
 Collaboration Summit in April), we're going to proceed based on the
 prototype I developed.
 
 As you noted, djprobes needs instruction lengths, and my prototype
 doesn't provide that info.  (Uprobes computes instruction lengths for
 rip-relative x86_64 instructions, but that's only a subset of what you
 need.)  Are you interested in extending/enhancing my prototype to make
 it useful for djprobes?  If so, I'd be happy to consult.

Hi Jim,

Thank you for considering djprobe.
Actually, I'm developing insn_get_length() based on your prototype and
porting djprobe on it. After tested code, I'd like to post the insn_x86 code.

Thank you,

 
 Thanks.
 Jim
 

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: mhira...@redhat.com




Re: global tracing

2008-08-06 Thread Masami Hiramatsu
Hi,

Roland McGrath wrote:
 Actually, this point is where I'm stuck on these weeks.

 If we add marker or tracepoint to trace every syscalls,
 we might have to put it in the tracehook or audit and set
 TIF_SYSCALL_TRACE for every process, or put tracepoint
 in the syscall entrance/exit asm-code and check another
 flag. Since latter adds additional flag-checking in fast-path,
 I think it is not acceptable.
 
 I agree completely that it would be wrong to do any new arch work for this,
 especially assembly hacking.  Certainly piggy-backing on the existing
 TIF_SYSCALL_TRACE or TIF_SYSCALL_AUDIT in some fashion is the way to go.
 
 If you don't need complete user register access at your tracepoint, then
 the audit path is an option.  I suspect you do, and so TIF_SYSCALL_TRACE
 is what to use.  Then you can put tracepoints in tracehook_report_syscall_*.

Actually, I did it and found it is not simple to hook audit syscall.
It seems that audit flag is not synchronously cleared/set on processes with
audit_context.
I think tracehook is better and simpler way to do that. But there is still
some audit-related problem when I set TIF_SYSCALL_TRACE flag on every process,
and I'm investigating that.

Maybe I need to improve syscall audit.

 It's straightforward to write a loop to set TIF_SYSCALL_TRACE on every
 task.  The only wrinkle is dealing with clearing the flag correctly.  You
 don't need a loop, because it can be cleared lazily by each thread when it
 gets into the slow path and finds it has no reason to be there.  This is
 not very hard.  It only requires adding a few lines in the utrace code to
 check your global-syscall-trace flag in deciding when to clear
 TIF_SYSCALL_TRACE.

That's a good idea. I'll check that.

 This would be unlike a plain tracepoint only in that you have to make this
 explicit call to switch it on and off.  (Maybe this could be rolled into
 the tracepoint probe registration API.)

Sure, even though, we can enable it when initializing tracepoint-marker
conversion module.

 I'm not at all arguing against having utrace global tracing to provide you
 this feature instead.  (I already raised the pros/cons about that generally
 and that discussion can continue.)  But this is how you'd do it sensibly
 with tracepoints IMHO.  (The details I just described are not much different
 from what utrace global tracing would have for handling TIF_SYSCALL_TRACE.)

I agree that.
I think if I can set TIF_SYSCALL_TRACE on each process safely, it can work
with utrace global tracing too.
In that case, I can move to utrace global tracing feature.

Thank you,

 
 
 Thanks,
 Roland

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: [EMAIL PROTECTED]