from:"Ananth N Mavinakayanahalli"

Re: gdbstub initial code

2010-07-12 Thread Ananth N Mavinakayanahalli

On Mon, Jul 12, 2010 at 08:37:29PM +0200, Oleg Nesterov wrote:
 Hello.
 
 Please see the attachment. Don't take this code seriously, this is
 the early prototype and everything should be rewritten. It barely
 uses utrace, only to stop the target.
 
   (gdb) file /path/to/binary
   (gdb) target extended-remote /proc/ugdb
   (gdb) attach PID
   (gdb) disassemble _start
   (gdb) bt
   (gdb) info registers
   (gdb) info threads
   (gdb) detach
 
 This seems to work, but I had to export access_process_vm().
 
 Currently it only attaches to the single thread. vCont or ^C doesn't
 work.
 
 I still can't understand what utrace_xxx_pid() buys us, and I still
 think that utrace_prepare_examine() can't protect the task even for
 regset calls.

IMHO, if this is the start of another stab at getting utrace in the
upstream kernel, you may want to consider Linus' problem statement
for utrace -- infrastructure that will allow strace and gdb on the
same thread at the same time.

OTOH, the Tom Tromey alluded on lkml that if kernel provides
infrastructure that allows for breakpoint assistance and 'displaced'
stepping, with a suitable user interface, preferably a ptrace variant
that is fd based, they'll migrate gdb to using the interface. (The bp
assistance and displaced stepping can be provided with extensions to
the current uprobes set under review).

In any case, its good to see a restart of this effort. Though there
has been support for gdbstub in the past, an overwhelming majority
of people would like to see a 'user interface', be it ptrace2 or
PTRACE_ATTACH2 or whatever it needs to be called.

Given these requirements, and given that the 'new' uprobes is close to
-tip, would it be more useful to pursue an alternate syscall approach
rather than gdbstub?

Ananth

Re: linux-next: add utrace tree

2010-01-29 Thread Ananth N Mavinakayanahalli

On Fri, Jan 29, 2010 at 10:11:16AM +0100, Ingo Molnar wrote:
 
 * Ananth N Mavinakayanahalli ana...@in.ibm.com wrote:
 
  On Fri, Jan 29, 2010 at 08:39:07AM +0100, Ingo Molnar wrote:
  
  ...
  
   When we merged kprobes ~10 years ago we made the (rather bad) mistake of 
   merging a raw, opaque facility and leaving 'the rest' up to some other 
   entity. 
   IBM kprobes hackers vanished the day the original kprobes code went 
   upstream 
   and the high level entity never truly materialized in-kernel, for nearly 
   a 
   decade!
  
  I don't know what you are referring to here... Kprobes was merged in 2.6.9 
  (~August 2004 -- less than 6 years ago). [...]
 
 Ok, 6 years then :-)

  [...] Since then, we did work on ports to powerpc and s390. We implemented 
  kretprobes. We made it much scalable using RCU; we did the powerpc booster 
  to skip single-step when possible, not to mention various bug fixes over 
  the 
  years.
 
 Except it had no real in-kernel user.

Not that I want to rebut you Ingo, but there were in-kernel users since 2006
(net/ipv4/tcp_probe.c) :-)

Aside, I am also glad that we have more flexibility with the perf
integration.

Ananth

Re: linux-next: add utrace tree

2010-01-28 Thread Ananth N Mavinakayanahalli

On Thu, Jan 28, 2010 at 09:55:02AM +0100, Ingo Molnar wrote:

...

 Lets compare the two cases via a drawing. Your current uprobes submission 
 does:
 
  [kernel]  do probe thing single-step trap
^| ^  |
|v |  v
  [user] INT3XOL-ins  next ins-stream
 
  ( add the need for serialization to make sure the whole single-step thing 
does not get out of sync with reality. )
 
 And emulator approach would do:
 
  [kernel]  emul-demux-fastpath, do probe thing
^ |
| v
  [user] INT3 next ins-stream
 
 far simpler conceptually, and faster as well, because it's one kernel entry.

Ingo,

Yes, conceptually, emulation is simpler. In fact, it may even be the
right thing to do from a housekeeping POV if gdb were enabled to use
breakpoint assistance in the kernel. However... emulation is not
easy. Just quoting Peter Anvin:

 On the more general rule of interpretation: I'm really concerned about
 having a bunch of partially-capable x86 interpreters all over the
 kernel.  x86 is *hard* to emulate, and it will only get harder as the
 architecture evolves.

   -hpa

Yes, I know you suggested we start with a small subset.

We already have an implementation of instruction emulation in kernel for
x86 and powerpc, but its too KVM centric. If there is a generic
emulation layer, we would use it.

There are conflicting opinions for either case; complicated as it is,
the XOL scheme works and, to a large extent, it is easily extendable to
other architectures compared to the emulation approach. Uprobes can be
made to use emulation when possible/available, but I don't think this
should be gating decision for the initial implementation of the feature.

Ananth

Re: linux-next: add utrace tree

2010-01-27 Thread Ananth N Mavinakayanahalli

On Wed, Jan 27, 2010 at 11:55:16AM +0100, Peter Zijlstra wrote:
 On Wed, 2010-01-27 at 02:43 -0800, Linus Torvalds wrote:
  
  On Wed, 27 Jan 2010, Peter Zijlstra wrote:
   
   Right, so you're going to love uprobes, which does exactly that. The
   current proposal is overwriting the target instruction with an INT3 and
   injecting an extra vma into the target process's address space
   containing the original instruction(s) and possible jumps back to the
   old code stream.
  
  Just out of interest, how does it handle the threading issue?
  
  Last I saw, at least some CPU people were _very_ nervous about overwriting 
  instructions if another CPU might be just about to execute them.
  
  Even the overwrite only the first byte with 'int3' made them go umm, I 
  need to talk to some core CPU people to see if that's ok. They mumble 
  about possible CPU errata, I$ coherency, instruction retry etc.
  
  I realize kprobes does this very thing, but kprobes is esoteric stuff and 
  doesn't have much choice. In user space, you _could_ do the modification 
  on a different physical page and then just switch the page table entry 
  instead, and not get into the whole D$/I$ coherency thing at all.
 
 Right, so there's two aspects:
 
  1) concurrency when inserting the probe
  2) concurrency when hitting the probe
 
 1) used to be dealt with by using utrace to stop all threads in the
 process and then writing the instruction. I suggested to CoW the page,
 modify the instruction, set the pagetable and flush tlbs at full speed
 -- the very thing you suggest here.
 
 2) so traditionally (and the intel arch manual describes this) is to
 replace the instruction, single step it, and write the probe back. This
 is racy for multi-threading. The current uprobes stuff solves this by
 doing single-step-out-of-line (XOL).
 
 XOL injects a new vma into the target process and puts the old
 instruction there, then it single steps on the new location, leaving the
 original site with INT3.
 
 This doesn't work for things like RIP relative instructions, so uprobes
 considers them un-probable.

Probing RIP-relative instructions work just fine; there are fixups that
take care of it.

 Also, I myself really object to inserting a vma in a running process,
 its like a land-lord, sure he has the key but he won't come in an poke
 through your things.
 
 The alternative is to place the instruction in TLS or stack space, since
 each thread can only have a single trap at a time, you only need space
 for 1 instruction (plus a possible jump out to the original site). There
 is the 'problem' of marking the TLS/stack executable when being probed.
 
 Then there is the whole emulation angle, the uprobes people basically
 say its too much effort to write a x86 emulator.

We don't need to write one. I don't know how easy it is to make the kvm
emulator less kvm-centric (vcpus, kvm_context, etc). Avi?

Ananth

Re: linux-next: add utrace tree

2010-01-27 Thread Ananth N Mavinakayanahalli

On Wed, Jan 27, 2010 at 12:08:31PM +0100, Peter Zijlstra wrote:
 On Wed, 2010-01-27 at 16:35 +0530, Ananth N Mavinakayanahalli wrote:
  Probing RIP-relative instructions work just fine; there are fixups that
  take care of it. 
 
 Ah my bad then, it was my understanding you simply bailed on those.
 
 Just for my information, how large are the replacement sequences?

The RIP relative instruction is transformed into indirect addressing
mode using a scratch register.

For details http://marc.info/?l=linux-kernelm=126401936114639w=2. 

Ananth

Re: linux-next: add utrace tree

2010-01-26 Thread Ananth N Mavinakayanahalli

On Mon, Jan 25, 2010 at 01:41:57PM -0800, Linus Torvalds wrote:
 
 
 On Mon, 25 Jan 2010, Tom Tromey wrote:

...

  * Support displaced stepping in the kernel; I think this would improve
performance when debugging in non-stop mode.
 
 Don't we already do that at least on x86? Just doing a single-step should 
 work on an instruction even if it has a breakpoint on it, because we set 
 the TF bit.
 
 Or maybe I'm not understanding what displaced stepping means to you.

If Tom is referring to supporting single-stepping out of line, ie., not
putting back the original instruction at the bp location, yes, we
already support it on various architectures for kernel breakpoints,
through the kprobes infrastructure.

For userspace, there are more complications to take care of. We are
reworking a prototype based on community comments (see the long UBP/XOL
thread on lkml from a few days ago). Hopefully the userspace breakpoint
assistance layer will be generic enough for gdb to also take advantage
of, though the interface details need to be hashed out.

Ananth

Re: linux-next: add utrace tree

2010-01-24 Thread Ananth N Mavinakayanahalli

On Sat, Jan 23, 2010 at 12:23:33PM +0100, Ingo Molnar wrote:
 
 * Kyle Moffett k...@moffetthome.net wrote:
 
  On Fri, Jan 22, 2010 at 19:22, Linus Torvalds
  torva...@linux-foundation.org wrote:
 
...

 In that sense it might be better to fix/enhance ptrace, if there's interest. 
 I've written a handful of ptrace extensions in the past (none of them went 
 upstream tho), it can be done in a useful manner and the code is pretty 
 hackable. There are basic problems left to be solved: for example why is 
 there 
 still no 'memory block copy' call, why are we _still_ limited to one word per 
 system call PTRACE_PEEK* memory copies? It's ridiculous. SparcLinux has 
 PTRACE_WRITE*/READ* support that implements this, but none of the other 
 architectures have it so it's essentially unused.
 
 Or another possible direction would be to extend the perf events syscall with 
 interception capabilities. It's far more performant at extracting application 
 state without scheduling than any ptrace method - and interception/injection 
 would be a natural next step - if there's interest.

This certainly is now a chicken and egg problem. Everybody agrees that
Linux needs something better than ptrace; legacy ptrace will continue to
live, so will utilities written to it (strace, etc).

But should that limit what Linux can offer? What's the way out?

- Enhance ptrace: At least one ptrace maintainer (Roland) had publically
  stated he doesn't prefer enhancing legacy ptrace -- that its already a
  beast to maintain, and adding more complexity to it does it no good.

- Extend perf; would perf then use utrace underneath? Or would one have
  to redo some of what utrace already does for thread level control?

- Give utrace a syscall and make it the primary way for users to
  interact with the layer. There are benefits to this if there is
  agreement on the utrace layer itself, maybe with less fexibility than
  what it currently offers? If yes, what should it look like?

Any new debug facility will have to incorporate some or most learnings
from what utrace tried to address. It would be sad to just dump utrace
and redo everything from scratch or band-aid existing interfaces.

Ananth

Re: linux-next: add utrace tree

2010-01-21 Thread Ananth N Mavinakayanahalli

On Thu, Jan 21, 2010 at 05:28:42PM -0800, Linus Torvalds wrote:
 
 
 On Thu, 21 Jan 2010, Andrew Morton wrote:
  
  ptrace is a nasty, complex part of the kernel which has a long history
  of problems, but it's all been pretty quiet in there for the the past few
  years.
 
 More importantly, we're not ever going to get rid of it. 

FWIW, Oleg's implementation of ptrace over utrace is 100% compatible
with legacy ptrace; gdb testsuite indicates that
(http://lkml.org/lkml/2009/12/21/98).

Ananth

Re: [RFC] [PATCH 0/7] UBP, XOL and Uprobes [ Summary of Comments and actions to be taken ]

2010-01-21 Thread Ananth N Mavinakayanahalli

On Fri, Jan 22, 2010 at 12:32:32PM +0530, Srikar Dronamraju wrote:
 Here is a summary of the Comments and actions that need to be taken for
 the current uprobes patchset. Please let me know if I missed or
 misunderstood any of your comments.  
 
 1. Uprobes depends on trap signal.
   Uprobes depends on trap signal rather than hooking to the global
 die notifier. It was suggested that we hook to the global die notifier.
 
   In the next version of patches, Uprobes will use the global die
   notifier and look at the per-task count of the probes in use to
   see if it has to be consumed.
 
   However this would reduce the ability of uprobe handlers to
   sleep. Since we are dealing with userspace, sleeping in handlers
   would have been a good feature. We are looking at ways to get
   around this limitation.

We could set a TIF_ flag in the notifier to indicate a breakpoint hit
and process it in task context before the task heads into userspace.

Ananth

Re: linux-next: add utrace tree

2010-01-19 Thread Ananth N Mavinakayanahalli

On Wed, Jan 20, 2010 at 06:49:50AM +0100, Ingo Molnar wrote:

Ingo,

 Note, i'm not yet convinced that this (and the rest: uprobes and systemtap, 
 etc.) can go uptream in its present form.

Agreed, uprobes is still not upstream ready -- it was an RFC. We are
working through the comments there to get it ready for merger.

 IMHO the far more important thing to address beyond formalities and workflow 
 cleanliness are the (many) technical observations and objections offered by 
 Peter Zijstra on lkml. Not just the git history but also the abstractions and 
 concepts are messy and should be reworked IMO, and also good and working perf 
 events integration should be achieved, etc.

I think Oleg addressed most of Peter's concerns on utrace when the
ptrace/utrace patchset was reposted.

Perf integration with uprobes will be done and discussions have started
with Masami and Frederic. There are a couple of fundamental technical
aspects (XOL vma vs. emulation; breakpoint insertion through CoW and not
through quiesce) that need resolution.

 The fact that there's a well established upstream workflow for 
 instrumentation 
 patches, which is being routed around by the utrace/uprobes/systemtap code 
 here is not a good sign in terms of reaching a good upstream solution. Lets 
 hope it works out well though.

Agreed.

On the other hand, having ptrace/utrace in the -next tree will give it a
lot more testing, while any outstanding technical issues are being addressed.

Stephen,
To exercise ptrace/utrace, it would be very useful if you pulled in

git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-utrace.git branch 
utrace-ptrace

instead of 'master'.

Thanks,
Ananth

Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)

2010-01-18 Thread Ananth N Mavinakayanahalli

On Mon, Jan 18, 2010 at 02:13:25PM +0200, Pekka Enberg wrote:
 Hi Avi,
 
 On Mon, 2010-01-18 at 14:01 +0200, Avi Kivity wrote:
  Maybe you place no value on uprobes.  But people who debug userspace
  likely will see a reason.
 
 On 01/18/2010 02:06 PM, Peter Zijlstra wrote:
  I do see value in uprobes, I just don't like it mucking about with the
  address space. Nor does it appear required.
 
 On Mon, Jan 18, 2010 at 2:09 PM, Avi Kivity a...@redhat.com wrote:
  Well, the alternatives are very unappealing.  Emulation and single-stepping
  are going to be very slow compared to a couple of jumps.
 
 So how big chunks of the address space are we talking here for uprobes?

As Srikar mentioned, the least we start with is 1 page. Though you can
have as many probes as you want, there are certain optimizations we can
do, depending on the most common usecases.

For eg., if you'd consider the start of a routine to be the most
commonly traced location, most routines in a binary would generally
start with the same instruction (say push %ebp), and we can refcount a
slot with that instruction to be used for all probes of the same
instruction.

Ananth

Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)

2010-01-18 Thread Ananth N Mavinakayanahalli

On Mon, Jan 18, 2010 at 06:52:32PM +0200, Avi Kivity wrote:
 On 01/18/2010 05:43 PM, Ananth N Mavinakayanahalli wrote:

 Well, the alternatives are very unappealing.  Emulation and single-stepping
 are going to be very slow compared to a couple of jumps.

 So how big chunks of the address space are we talking here for uprobes?
  
 As Srikar mentioned, the least we start with is 1 page. Though you can
 have as many probes as you want, there are certain optimizations we can
 do, depending on the most common usecases.

 For eg., if you'd consider the start of a routine to be the most
 commonly traced location, most routines in a binary would generally
 start with the same instruction (say push %ebp), and we can refcount a
 slot with that instruction to be used for all probes of the same
 instruction.


 But then you can't follow the instruction with a jump back to the code...

Right. This will work only for the non boosted case where single-stepping
is mandatory. I guess the tradeoff is vma space and speed.

Ananth

Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)

2010-01-15 Thread Ananth N Mavinakayanahalli

On Fri, Jan 15, 2010 at 10:03:48AM +0100, Peter Zijlstra wrote:
 On Thu, 2010-01-14 at 11:46 -0800, Jim Keniston wrote:
  
   discussed elsewhere. 
 
 Thanks for the pointer...

:-)

Peter,
I think Jim was referring to
http://sources.redhat.com/ml/systemtap/2007-q1/msg00571.html

Ananth

Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)

2010-01-15 Thread Ananth N Mavinakayanahalli

On Fri, Jan 15, 2010 at 11:13:32AM +0100, Peter Zijlstra wrote:
 On Fri, 2010-01-15 at 15:40 +0530, Ananth N Mavinakayanahalli wrote:
 
  Ideas?
 
 emulate the one instruction?

In kernel? Generically? Don't think its that easy for userspace --
you have the full gamut of instructions to emulate (fp, vector, etc);
further, the instruction could itself cause a page fault and the like.

Re: [RFC] [PATCH 4/7] Uprobes Implementation

2010-01-12 Thread Ananth N Mavinakayanahalli

On Tue, Jan 12, 2010 at 06:36:00AM +0100, Frederic Weisbecker wrote:
 On Mon, Jan 11, 2010 at 05:55:53PM +0530, Srikar Dronamraju wrote:
  +static const struct utrace_engine_ops uprobe_utrace_ops = {
  +   .report_quiesce = uprobe_report_quiesce,
  +   .report_signal = uprobe_report_signal,
  +   .report_exit = uprobe_report_exit,
  +   .report_clone = uprobe_report_clone,
  +   .report_exec = uprobe_report_exec
  +};
 
 
 So, as stated before, uprobe seems to handle too much standalone
 policies such as freeing on exec, always inherit on clone and never
 on fork. Such rules should be decided from uprobe clients not
 from uprobe itself and that makes it not enough flexible to
 be usable for now.

The freeing on exec is only housekeeping of uprobe data structures. And
probepoints are inherited only on CLONE_THREAD and not otherwise, simply
since the existing probes can be hit in the new thread's context. Not
sure what other policy you are hinting at.

 For example if we want it to be usable by perf, we have two ways:
 
 - a trace event. Unfortunately, like I explained in a previous
   mail, this doesn't seem to be a suitable interface for this
   particular case.
 
 - a performance monitoring unit, with the existing unified interface
   struct pmu, usable by perf.
 
 
 Typically, to use it with perf toward a pmu, perf tools need to
 create a uprobe on perf process and activate its hook on the next exec.
 Thereafter, it's up to perf to decide if we inherit through clone
 and fork.

As mentioned above, the inheritance is only for threads. It should be
fairly easy to inherit probes on fork, and that can be made a perf based
policy decision.

 Here I fear utrace and perf are going to collide.

Utrace does not mandate any of the above concerns you've mentioned.
Utrace just provides callbacks at the said events and uprobes can be
tweaked to accommodate perf's requirements as possible, as feasible.

 See how could be the final struct pmu (we need to extend it
 to support utrace):
 
 struct pmu {
   enable() - called we schedule in a context where we want
 a uprobe to be active. Called very often
 disable() - the above opposite
 
 /* Not yet existing callbacks */
 
 hook_task() - called when a process is created which
we want to activate our hook
would be typically called once on
exec if we have set enable_on_exec
and also on clone()/fork()
if we want to inherit.
 }
 
 
 The above hook_task (could be divided in more precise callback events
 like hook_on_exec, hook_on_clone, etc...) would be needed by perf
 to drive correctly utrace and this is going to collide with utrace
 callbacks that notify execs and forks.
 
 Probably utrace can be kept for all the utrace breakpoint signal
 handling an so. But I guess the rest can be implemented on top
 of a struct pmu and driven by perf like we did with hardware
 breakpoints re-implementation.
 
 Just an idea.

Well, I wonder if perf can ride on utrace's callbacks for the
hook_task() for the clone/fork cases?

Ananth

Re: [PATCH 6/7] implement utrace-ptrace

2009-12-21 Thread Ananth N Mavinakayanahalli

On Mon, Dec 21, 2009 at 07:18:37PM +0530, Ananth N Mavinakayanahalli wrote:
 On Fri, Dec 18, 2009 at 02:11:40AM +0100, Oleg Nesterov wrote:
  The patch adds the new file, kernel/ptrace-utrace.c, which contains
  the new implementation of ptrace over utrace.
  
  This file is not compiled until we have CONFIG_UTRACE option, will be
  added by the next utrace core patch.
  
  It's supposed to be an invisible implementation change, nothing should
  change to userland when CONFIG_UTRACE is enabled.
  
  Signed-off-by: Roland McGrath rol...@redhat.com
  Signed-off-by: Oleg Nesterov o...@redhat.com
  ---
 
 Oleg,
 
 ptrace/utrace performs better with the ptrace-tests [1] testsuite (1 failure
 vs. 3 with vanilla ptrace [1]). The gdb testsuite also has no
 regressions. In fact, the results on the gdb testsuite are identical.

Forgot to mention the tests were on powerpc.

Re: [PATCH 6/7] implement utrace-ptrace

2009-12-21 Thread Ananth N Mavinakayanahalli

On Fri, Dec 18, 2009 at 02:11:40AM +0100, Oleg Nesterov wrote:
 The patch adds the new file, kernel/ptrace-utrace.c, which contains
 the new implementation of ptrace over utrace.
 
 This file is not compiled until we have CONFIG_UTRACE option, will be
 added by the next utrace core patch.
 
 It's supposed to be an invisible implementation change, nothing should
 change to userland when CONFIG_UTRACE is enabled.
 
 Signed-off-by: Roland McGrath rol...@redhat.com
 Signed-off-by: Oleg Nesterov o...@redhat.com
 ---

Oleg,

ptrace/utrace performs better with the ptrace-tests [1] testsuite (1 failure
vs. 3 with vanilla ptrace [1]). The gdb testsuite also has no
regressions. In fact, the results on the gdb testsuite are identical.

Ananth

[1] http://sourceware.org/systemtap/wiki/utrace/tests
[2] detach-stopped and stopped-attach-transparency are the additional
failures. syscall-reset fails in both cases.

Re: powerpc: step-jump-cont failure (Was: [PATCH] utrace: don't set -ops = utrace_detached_ops lockless)

2009-12-08 Thread Ananth N Mavinakayanahalli

On Mon, Dec 07, 2009 at 01:43:27PM +0100, Oleg Nesterov wrote:
 On 12/06, CAI Qian wrote:
 
 Ananth, could you please confirm once again that step-jump-cont (from
 ptrace-tests testsuite) not fail on your machine? If yes, please tell
 me the version of glibc/gcc. Is PTRACE_GETREGS defined on your machine?

Hi Oleg,

It works for me on a Fedora 12 machine.

[ana...@mjs22lp1 ptrace-tests]$ gcc --version
gcc (GCC) 4.4.2 20091027 (Red Hat 4.4.2-7)

[ana...@mjs22lp1 ptrace-tests]$ rpm -qa |grep glibc
glibc-common-2.11-2.ppc
glibc-2.11-2.ppc64
glibc-devel-2.11-2.ppc
glibc-static-2.11-2.ppc
glibc-2.11-2.ppc
glibc-devel-2.11-2.ppc64
glibc-headers-2.11-2.ppc

And yes, PTRACE_GETREGS is defined in /usr/include/asm/ptrace.h

Ananth

Re: powerpc: step-jump-cont failure (Was: [PATCH] utrace: don't set -ops = utrace_detached_ops lockless)

2009-12-08 Thread Ananth N Mavinakayanahalli

On Mon, Dec 07, 2009 at 07:05:40PM +0100, Oleg Nesterov wrote:
 On 12/07, Oleg Nesterov wrote:
 
  On 12/07, Jan Kratochvil wrote:
  
   On Mon, 07 Dec 2009 15:24:51 +0100, Oleg Nesterov wrote:
But. raise_sigusr2 is not equal to the actual address of 
raise_sigusr2(),
this value points to the thunk (I do not know the correct English 
term)
  
   ppc64 calls it function descriptor (GDB
   ppc64_linux_convert_from_func_ptr_addr):
  For PPC64, a function descriptor is a TOC entry,
 
  Thanks Jan.
 
   in a data section,
 
  Yes!
 
  Now I can't understand how this test-case could ever work on ppc.
  step-jump-cont does:
 
  regs-nip = raise_sigusr2;  --- points to data section
  ptrace(PTRACE_CONT);
 
  of course, the tracee gets SIGSEGV, this section is not executable.
 
 Hmm. Looks like, powerpc means a lot of different hardware, and
 _PAGE_EXEC may be 0. I didn't notice this when I quickly grepped
 arch/powerpc/
 
 IOW, perhaps on some machines r implies x ?
 
 Is yes, this can explain why the results differ on different
 machines.

Well, powerpc 32-bit adheres to the SVR4 ABI, while powerpc 64-bit uses
the PPC64-ELF ABI (http://refspecs.linuxfoundation.org/ELF/ppc64/). The
64bit ABI uses function descriptors and the 'func_name' is the data
address, while the '.func_name' is the text address. (See
handle_rt_signal64 in arch/powerpc/kernel/signal_64.c and
kprobe_lookup_name in arch/powerpc/include/asm/kprobes.h.

Ananth

[QUERY] signal_struct-count/live

2009-11-27 Thread Ananth N Mavinakayanahalli

Oleg,

I am confused as to why we need two atomics count and live in signal_struct.

report_death() uses -live as the group_dead indicator, while there are
places (like the scheduler) which uses -count as the nr_threads
indicator.

I tried git blame to see if it remembers why, but the addition predates
2.6.12 and so it does not know.

Could you please shed some light on this?

Ananth

Re: [QUERY] signal_struct-count/live

2009-11-27 Thread Ananth N Mavinakayanahalli

On Fri, Nov 27, 2009 at 04:15:21PM +0100, Oleg Nesterov wrote:
 On 11/27, Ananth N Mavinakayanahalli wrote:
 
  I am confused as to why we need two atomics count and live in signal_struct.
 
  report_death() uses -live as the group_dead indicator,
 
 report_death? Perhaps you meant do_exit() ?

Right, do_exit() and that is what is picked up by
tracehook_report_death(), and in turn by report_death().

  while there are
  places (like the scheduler) which uses -count as the nr_threads
  indicator.
 
  I tried git blame to see if it remembers why, but the addition predates
  2.6.12 and so it does not know.
 
  Could you please shed some light on this?
 
 In short: signal-count must die. I was going to do this a long ago
 but never had the time. See also 4ab6c08336535f8c8e42cf45d7adeda882eff06e
 commit, this is the first step.
 
 Last time I did the grepping almost any usage of signal-count is
 not right. For example, __exit_signal() is correct, but it doesn't
 need to use -count.
 
 Except: it is needed for things like get_nr_threads() in proc.
 
 In short: never use signal-count ;)

Thanks for the clarification Oleg.

Ananth

Re: powerpc: fork stepping (Was: [RFC, PATCH 0/14] utrace/ptrace)

2009-11-27 Thread Ananth N Mavinakayanahalli

On Fri, Nov 27, 2009 at 06:46:27PM +0100, Veaceslav Falico wrote:
 On Thu, Nov 26, 2009 at 11:37:03PM +0100, Oleg Nesterov wrote:
 
  Could you look at this
 
ptrace-copy_process-should-disable-stepping.patch
http://marc.info/?l=linux-mm-commitsm=125789789322573
 
  patch? It is not clear to me how we can modify the test-case to
  verify it fixes the original problem for powerpc.
 
 I modified the test-case, it confirms that
 ptrace-copy_process-should-disable-stepping.patch fixes the
 problem with TIF_SINGLESTEP copied by fork() on powerpc.
 
 Probably we need a similar fix for step-fork.c in ptrace-tests.
 
 Modified the original testcase to call fork via syscall(__NR_fork),
 to avoid the looping inside libc's fork() on powerpc.
 The parent singlesteps until he sees that the child has forked, after
 that the parent PTRACE_CONTs until the child exits.

Thanks Veaceslav. This works:

Index: ptrace-tests/tests/step-fork.c
===
--- ptrace-tests.orig/tests/step-fork.c
+++ ptrace-tests/tests/step-fork.c
@@ -29,6 +29,7 @@
 #include unistd.h
 #include sys/wait.h
 #include string.h
+#include sys/syscall.h
 #include signal.h

 #ifndef PTRACE_SINGLESTEP
@@ -78,7 +79,7 @@ main (int argc, char **argv)
sigprocmask (SIG_BLOCK, mask, NULL);
ptrace (PTRACE_TRACEME);
raise (SIGUSR1);
-   if (fork () == 0)
+   if (syscall(__NR_fork) == 0)
  {
read (-1, NULL, 0);
_exit (22);

Oleg,
With the above patch applied, syscall-reset is the only failure I see on
powerpc:

errno 14 (Bad address)
syscall-reset: syscall-reset.c:95: main: Assertion `(*__errno_location
()) == 38' failed.
unexpected child status 67f
FAIL: syscall-reset
...

1 of 40 tests failed
(11 tests were not run)
Please report to utrace-devel@redhat.com


Ananth

Re: powerpc: fork stepping (Was: [RFC, PATCH 0/14] utrace/ptrace)

2009-11-26 Thread Ananth N Mavinakayanahalli

On Thu, Nov 26, 2009 at 03:50:51PM +0100, Oleg Nesterov wrote:
 I changed the subject. This bug has nothing to do with utrace,
 the kernel fails with or without these changes.
 
 On 11/26, Ananth N Mavinakayanahalli wrote:
 
  On Wed, Nov 25, 2009 at 04:40:52PM +0100, Oleg Nesterov wrote:
   On 11/25, Ananth N Mavinakayanahalli wrote:
   
step-fork: step-fork.c:56: handler_fail: Assertion `0' failed.
/bin/sh: line 5: 24803 Aborted ${dir}$tst
FAIL: step-fork
  
   This is expected. Should be fixed by
  
 ptrace-copy_process-should-disable-stepping.patch
  
   in -mm tree. (I am attaching this patch below just in case)
   I din't mention this patch in this series because this bug
   is ortogonal to utrace/ptrace.
 
  The patch doesn't seem to fix the issue on powerpc:
 
  step-fork: step-fork.c:56: handler_fail: Assertion `0' failed.
  /bin/sh: line 5: 17325 Aborted ${dir}$tst
  FAIL: step-fork
 
 Good to know, thanks again Ananth.
 
 I'll take a look. Since I know nothing about powerpc, I can't
 promise the quick fix ;)
 
 The bug was found by code inspection, but the fix is not trivial
 because it depends on arch/, and it turns out the arch-independent
 fix in
 
   ptrace-copy_process-should-disable-stepping.patch
   http://marc.info/?l=linux-mm-commitsm=125789789322573
 
 doesn't work.
 
 Ananth, could you please run the test-case from the changelog
 below ? I do not really expect this can help, but just in case.

Right, it doesn't help :-(

GDB shows that the parent is forever struck at wait().

Ananth

Re: [RFC,PATCH 0/14] utrace/ptrace

2009-11-25 Thread Ananth N Mavinakayanahalli

On Tue, Nov 24, 2009 at 09:01:27PM +0100, Oleg Nesterov wrote:
 Hello.
 
 This is the new iteration of Roland's utrace patch, this time
 with rewrite-ptrace-via-utrace + cleanups in utrace core.
 
 1-7 are already in -mm tree, I am sending them to simplify the
 review.
 
 8-12 don not change the behaviour, simple preparations.
 
 13-14 add utrace-ptrace and utrace

Oleg,

I ran the ptrace-tests testsuite [1] on powerpc on the vanilla ptrace
and then with ptrace/utrace. The results for ptrace/utrace look better
:-)

All tests are 'make check xcheck'.

Ananth

[1] 
cvs -d :pserver:anoncvs:anon...@sources.redhat.com:/cvs/systemtap co ptrace-tests

-
Vanilla ptrace:

PASS: ptrace-on-job-control-stopped
PASS: attach-wait-on-stopped
PASS: detach-can-signal
PASS: attach-into-signal
PASS: attach-sigcont-wait
PASS: sa-resethand-on-cont-signal
PASS: ptrace_cont-defeats-sigblock
PASS: ptrace-cont-sigstop-detach
PASS: ptrace_event_clone
PASS: tif-syscall-trace-after-detach
PASS: event-exit-proc-maps
PASS: event-exit-proc-environ
SKIP: x86_64-ia32-gs
SKIP: x86_64-gsbase
PASS: powerpc-altivec
PASS: peekpokeusr
PASS: watchpoint
PASS: block-step
PASS: step-jump-cont
SKIP: step-jump-cont-strict
PASS: ppc-dabr-race
PASS: signal-loss
PASS: step-into-handler
SKIP: user-area-access
PASS: user-regs-peekpoke
PASS: erestartsys
SKIP: erestart-debugger
SKIP: step-to-breakpoint
errno 14 (Bad address)
syscall-reset: syscall-reset.c:95: main: Assertion `(*__errno_location ()) == 
38' failed.
unexpected child status 67f
FAIL: syscall-reset
PASS: reparent-zombie
PASS: step-simple
SKIP: step-through-sigret
PASS: stop-attach-then-wait
FAIL: detach-stopped
PASS: detach-stopped-rhel5
PASS: clone-multi-ptrace
PASS: clone-ptrace
PASS: o_tracevfork
PASS: o_tracevforkdone
PASS: detach-parting-signal
PASS: detach-sigkill-race
PASS: waitpid-double-report
PASS: o_tracevfork-parent
PASS: stopped-detach-sleeping
FAIL: stopped-attach-transparency
SKIP: erestartsys-trap
SKIP: highmem-debugger
PASS: sigint-before-syscall-exit
SKIP: syscall-from-clone
step-from-clone: step-from-clone.c:195: main: Assertion `(status  8) == 5' 
failed.
step-from-clone: step-from-clone.c:119: handler_fail: Assertion `0' failed.
/bin/sh: line 5: 19825 Aborted ${dir}$tst
FAIL: step-from-clone
step-fork: step-fork.c:56: handler_fail: Assertion `0' failed.
/bin/sh: line 5: 19832 Aborted ${dir}$tst
FAIL: step-fork

5 of 41 tests failed
(10 tests were not run)
Please report to utrace-devel@redhat.com

make[3]: *** [check-TESTS] Error 1
make[3]: Leaving directory `/home/ananth/ptrace-tests/tests'
make[2]: *** [check-am] Error 2
make[2]: Leaving directory `/home/ananth/ptrace-tests/tests'
make[1]: *** [check] Error 2
make[1]: Leaving directory `/home/ananth/ptrace-tests/tests'
make: *** [check-recursive] Error 1

-
ptrace over utrace:

PASS: ptrace-on-job-control-stopped
PASS: attach-wait-on-stopped
PASS: detach-can-signal
PASS: attach-into-signal
PASS: attach-sigcont-wait
PASS: sa-resethand-on-cont-signal
PASS: ptrace_cont-defeats-sigblock
PASS: ptrace-cont-sigstop-detach
PASS: ptrace_event_clone
PASS: tif-syscall-trace-after-detach
PASS: event-exit-proc-maps
PASS: event-exit-proc-environ
SKIP: x86_64-ia32-gs
SKIP: x86_64-gsbase
PASS: powerpc-altivec
PASS: peekpokeusr
PASS: watchpoint
PASS: block-step
PASS: step-jump-cont
SKIP: step-jump-cont-strict
PASS: ppc-dabr-race
PASS: signal-loss
PASS: step-into-handler
SKIP: user-area-access
PASS: user-regs-peekpoke
PASS: erestartsys
SKIP: erestart-debugger
SKIP: step-to-breakpoint
errno 14 (Bad address)
syscall-reset: syscall-reset.c:95: main: Assertion `(*__errno_location ()) == 
38' failed.
unexpected child status 67f
FAIL: syscall-reset
PASS: reparent-zombie
PASS: step-simple
SKIP: step-through-sigret
PASS: stop-attach-then-wait
PASS: detach-stopped
PASS: detach-stopped-rhel5
PASS: clone-multi-ptrace
PASS: clone-ptrace
PASS: o_tracevfork
PASS: o_tracevforkdone
PASS: detach-parting-signal
PASS: detach-sigkill-race
PASS: waitpid-double-report
PASS: o_tracevfork-parent
PASS: stopped-detach-sleeping
PASS: stopped-attach-transparency
SKIP: erestartsys-trap
SKIP: highmem-debugger
PASS: sigint-before-syscall-exit
SKIP: syscall-from-clone
SKIP: step-from-clone
step-fork: step-fork.c:56: handler_fail: Assertion `0' failed.
/bin/sh: line 5: 24803 Aborted ${dir}$tst
FAIL: step-fork

2 of 40 tests failed
(11 tests were not run)
Please report to utrace-devel@redhat.com

make[3]: *** [check-TESTS] Error 1
make[3]: Leaving directory `/home/ananth/ptrace-tests/tests'
make[2]: *** [check-am] Error 2
make[2]: Leaving directory `/home/ananth/ptrace-tests/tests'
make[1]: *** [check] Error 2
make[1]: Leaving directory `/home/ananth/ptrace-tests/tests'
make: *** [check-recursive] Error 1

GDB Testsuite Results on POWERPC

2009-11-25 Thread Ananth N Mavinakayanahalli

Hi,

Here is the summary of GDB testsuite runs on a vanilla kernel and one
with ptrace over utrace on a powerpc machine:

Vanilla ptrace:
=== gdb Summary ===

# of expected passes13970
# of unexpected failures52
# of unexpected successes   2
# of expected failures  40
# of untested testcases 8
# of unresolved testcases   125
# of unsupported tests  55

runtest completed at Wed Nov 25 14:02:52 2009

Ptrace over utrace:
=== gdb Summary ===

# of expected passes13970
# of unexpected failures52
# of unexpected successes   2
# of expected failures  40
# of untested testcases 8
# of unresolved testcases   125
# of unsupported tests  55

runtest completed at Wed Nov 25 14:21:52 2009

Essentially, there is *no* change in any of the numbers with and without
ptrace over utrace.

Regards,
Ananth

Re: [RFC,PATCH 0/14] utrace/ptrace

2009-11-25 Thread Ananth N Mavinakayanahalli

On Wed, Nov 25, 2009 at 04:40:52PM +0100, Oleg Nesterov wrote:
 On 11/25, Ananth N Mavinakayanahalli wrote:
 
  I ran the ptrace-tests testsuite [1] on powerpc on the vanilla ptrace
  and then with ptrace/utrace. The results for ptrace/utrace look better
  :-)
 
 Great! thanks a lot Ananth for doing this.
 
 ptrace-utrace still fails 2 tests,
 
  FAIL: syscall-reset
 
 I'll take a look later. Since unpatched kernel fails this test too
 I am not going to worry right now. I think this is ppc specific, x86
 passes this test.
 
  step-fork: step-fork.c:56: handler_fail: Assertion `0' failed.
  /bin/sh: line 5: 24803 Aborted ${dir}$tst
  FAIL: step-fork
 
 This is expected. Should be fixed by
 
   ptrace-copy_process-should-disable-stepping.patch
 
 in -mm tree. (I am attaching this patch below just in case)
 I din't mention this patch in this series because this bug
 is ortogonal to utrace/ptrace.

Oleg,

The patch doesn't seem to fix the issue on powerpc:

step-fork: step-fork.c:56: handler_fail: Assertion `0' failed.
/bin/sh: line 5: 17325 Aborted ${dir}$tst
FAIL: step-fork

Ananth

Re: [PATCH 1-13] utrace-ptrace V1, for internal review

2009-11-24 Thread Ananth N Mavinakayanahalli

On Tue, Nov 24, 2009 at 04:26:57PM +0100, Oleg Nesterov wrote:
 On 11/24, Ananth N Mavinakayanahalli wrote:
 
  ...
  step-jump-cont: step-jump-cont.c:140: pokeuser: Assertion `l == 0'
  failed.
  /bin/sh: line 4:  9070 Aborted ${dir}$tst
  FAIL: step-jump-cont
 
  errno 14 (Bad address)
  syscall-reset: syscall-reset.c:95: main: Assertion `(*__errno_location
  ()) == 38' failed.
  unexpected child status 67f
  FAIL: syscall-reset
 
 Ananth, thanks a lot.

Hi Oleg,

 Could you please verify that unpatched kernel passes these tests?

There is one additional failure [1] on an unpatched kernel -- so the
ptrace/utrace is better :-)

 Also, could you run make xcheck to do more testing?

The output I provided was for a 'make check xcheck' run.

Ananth

[1]
step-jump-cont: step-jump-cont.c:140: pokeuser: Assertion `l == 0'
failed.
/bin/sh: line 4: 16407 Aborted ${dir}$tst
FAIL: step-jump-cont

...

errno 14 (Bad address)
syscall-reset: syscall-reset.c:95: main: Assertion `(*__errno_location
()) == 38' failed.
unexpected child status 67f
FAIL: syscall-reset

...

FAIL: detach-stopped

...


3 of 27 tests failed
(11 tests were not run)
Please report to utrace-devel@redhat.com

make[3]: *** [check-TESTS] Error 1
make[3]: Leaving directory `/home/ananth/utrace/ptrace-tests/tests'
make[2]: *** [check-am] Error 2
make[2]: Leaving directory `/home/ananth/utrace/ptrace-tests/tests'
make[1]: *** [check] Error 2
make[1]: Leaving directory `/home/ananth/utrace/ptrace-tests/tests'
make: *** [check-recursive] Error 1

Re: PTRACE_EVENT_SIGTRAP

2009-10-30 Thread Ananth N Mavinakayanahalli

On Thu, Oct 29, 2009 at 08:41:58PM +0100, Oleg Nesterov wrote:
 On 10/27, Roland McGrath wrote:

 @@ -495,8 +494,8 @@ static u32 ptrace_report_signal(u32 acti
   context-siginfo = NULL;
 
   if (resume != UTRACE_RESUME) {
 - set_stop_code(context, PTRACE_EVENT_SIGTRAP);
 - return UTRACE_STOP | UTRACE_SIGNAL_IGN;
 + info-si_signo = SIGTRAO;

Surely you meant SIGTRAP here, right? :-)

Re: utrace-indirect branch

2009-10-28 Thread Ananth N Mavinakayanahalli

On Wed, Oct 28, 2009 at 06:55:32PM -0700, Roland McGrath wrote:
 Please take a look at the patch below and tell me what you think.  This
 is the new(ish) utrace-indirect branch (not to be confused with what's
 now old/utrace-indirect).  I first made it a while ago, but I don't
 recall if I ever mentioned it.  This compiles and looks right, but I
 have not done any testing at all.
 
 Unlike the old code that freaked upstream reviewers all the way out,
 this has very simple lifetime rules for struct utrace.  Once allocated,
 it lives until task_struct is freed.  The utrace_task_alloc() logic
 covers the only race (at first attach), and that seems pretty easy
 to understand and be confident in.

First glance, this looks much simpler and easier to understand compared
to the earlier rcu implementation.

...

 + if (!utrace) {
 + if (unlikely(!utrace_task_alloc(target)))
 + return ERR_PTR(-ENOMEM);
 + utrace = target-utrace;

utrace = task_utrace_struct(target);

 + }
 +
   engine = kmem_cache_alloc(utrace_engine_cachep, GFP_KERNEL);
   if (unlikely(!engine))
   return ERR_PTR(-ENOMEM);

Ananth

[PATCH] (trivial) Use set_stop_code() in ptrace_report_signal()

2009-10-13 Thread Ananth N Mavinakayanahalli

Trivial patch - looks like this was opencoded rather than using the helper

Use set_stop_code() wherever possible.

Signed-off-by: Ananth N Mavinakayanahalli ana...@in.ibm.com
---
 kernel/ptrace.c |3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Index: ptrace-13oct/kernel/ptrace.c
===
--- ptrace-13oct.orig/kernel/ptrace.c
+++ ptrace-13oct/kernel/ptrace.c
@@ -331,8 +331,7 @@ static u32 ptrace_report_signal(u32 acti
context-siginfo = NULL;
 
if (resume != UTRACE_RESUME) {
-   context-stop_code = (PTRACE_EVENT_SIGTRAP  8) | 
SIGTRAP;
-
+   set_stop_code(context, PTRACE_EVENT_SIGTRAP);
return UTRACE_STOP | UTRACE_SIGNAL_IGN;
}

[PATCH] (cosmetic) rename ptrace_context()

2009-10-13 Thread Ananth N Mavinakayanahalli

Rename ptrace_context() to get_ptrace_context() and use it for better
code readability. (I know it doesn't take a ref or anything; feel free to
ignore)

Signed-off-by: Ananth N Mavinakayanahalli ana...@in.ibm.com
---
 kernel/ptrace.c |   40 +---
 1 file changed, 21 insertions(+), 19 deletions(-)

Index: ptrace-13oct/kernel/ptrace.c
===
--- ptrace-13oct.orig/kernel/ptrace.c
+++ ptrace-13oct/kernel/ptrace.c
@@ -59,7 +59,7 @@ static inline void set_stop_code(struct 
 }
 
 static inline struct ptrace_context *
-ptrace_context(struct utrace_engine *engine)
+get_ptrace_context(struct utrace_engine *engine)
 {
return engine-data;
 }
@@ -124,7 +124,7 @@ static u32 ptrace_report_exit(enum utrac
  struct task_struct *task,
  long orig_code, long *code)
 {
-   struct ptrace_context *context = ptrace_context(engine);
+   struct ptrace_context *context = get_ptrace_context(engine);
 
WARN_ON(ptrace_event_pending(context));
 
@@ -171,7 +171,7 @@ static u32 ptrace_report_clone(enum utra
   unsigned long clone_flags,
   struct task_struct *child)
 {
-   struct ptrace_context *context = ptrace_context(engine);
+   struct ptrace_context *context = get_ptrace_context(engine);
int event, ret = UTRACE_RESUME;
 
WARN_ON(ptrace_event_pending(context));
@@ -227,7 +227,7 @@ static u32 ptrace_report_syscall_entry(u
   struct task_struct *task,
   struct pt_regs *regs)
 {
-   struct ptrace_context *context = ptrace_context(engine);
+   struct ptrace_context *context = get_ptrace_context(engine);
 
WARN_ON(ptrace_event_pending(context));
 
@@ -241,7 +241,7 @@ static u32 ptrace_report_syscall_exit(en
  struct task_struct *task,
  struct pt_regs *regs)
 {
-   struct ptrace_context *context = ptrace_context(engine);
+   struct ptrace_context *context = get_ptrace_context(engine);
 
if (ptrace_event_pending(context))
return UTRACE_STOP;
@@ -258,7 +258,7 @@ static u32 ptrace_report_exec(enum utrac
  const struct linux_binprm *bprm,
  struct pt_regs *regs)
 {
-   struct ptrace_context *context = ptrace_context(engine);
+   struct ptrace_context *context = get_ptrace_context(engine);
 
WARN_ON(ptrace_event_pending(context));
 
@@ -316,7 +316,7 @@ static u32 ptrace_report_signal(u32 acti
const struct k_sigaction *orig_ka,
struct k_sigaction *return_ka)
 {
-   struct ptrace_context *context = ptrace_context(engine);
+   struct ptrace_context *context = get_ptrace_context(engine);
enum utrace_resume_action resume = context-resume;
 
if (ptrace_event_pending(context)) {
@@ -367,7 +367,7 @@ static u32 ptrace_report_quiesce(u32 act
 struct task_struct *task,
 unsigned long event)
 {
-   struct ptrace_context *context = ptrace_context(engine);
+   struct ptrace_context *context = get_ptrace_context(engine);
 
if (ptrace_event_pending(context))
return UTRACE_STOP;
@@ -395,7 +395,7 @@ static inline int __ptrace_set_options(s
struct utrace_engine *engine,
unsigned long options)
 {
-   struct ptrace_context *context = ptrace_context(engine);
+   struct ptrace_context *context = get_ptrace_context(engine);
/*
 * We need QUIESCE for resume handling, CLONE to check
 * for CLONE_PTRACE, other events are always reported.
@@ -867,7 +867,7 @@ void ptrace_notify_stop(struct task_stru
return;
}
 
-   context = ptrace_context(engine);
+   context = get_ptrace_context(engine);
if (WARN_ON(!ptrace_event_pending(context)))
return;
do_ptrace_notify_stop(context, tracee);
@@ -896,7 +896,7 @@ static void do_ptrace_resume(struct utra
enum utrace_resume_action action,
long request, long data)
 {
-   struct ptrace_context *context = ptrace_context(engine);
+   struct ptrace_context *context = get_ptrace_context(engine);
 
switch (get_stop_event(context)) {
case 0:
@@ -994,12 +994,14 @@ int ptrace_request(struct task_struct *c
   long addr, long data)
 {
struct utrace_engine *engine = ptrace_lookup_engine(child);
+   struct ptrace_context *context;
siginfo_t siginfo;
int ret;
 
if (unlikely(IS_ERR(engine)))
return -ESRCH;
 
+   context

Re: Questions about utrace

2009-09-07 Thread Ananth N Mavinakayanahalli

On Sat, Sep 05, 2009 at 06:01:58PM +0300, Ali Polatel wrote:
 Hey everyone,
 I've been writing a ptrace based sandboxing tool, called sydbox¹, and I
 want to explain about some of my bad experiences with ptrace and whether
 utrace will fix these deficiencies.
 
 First of all ptrace() makes it rather hard writing portable code that
 will work for every architecture. You have to find out registry numbers
 for every architecture. It would be really nice if there were a common
 interface like utrace_get_syscall(), utrace_set_syscall() that
 works for every architecture supported.

Have you looked at using syscall_get_(nr/error/arguments/return_value)
et al in arch/arch/include/asm/syscall.h

 Basically sydbox intercepts some system calls and checks their
 arguments. Using ptrace, however, it's not possible to stop the children
 only at the entry of system calls we are interested in but we have to
 stop them at _every_ system call and check for the system call number.
 Because of this threaded applications run very slowly under sydbox as we
 have to stop them at every sched_yield(). I know that utrace will split
 PTRACE_SYSCALL into two calls SYSCALL_ENTRY and SYSCALL_EXIT and this is
 really cool but maybe the ability to stop the children only at system
 calls the caller is interested in is a better idea.

Utrace engine callbacks happen in the traced task context and as such,
are extremely fast. For the case you have, it isn't difficult for your
ops-report_syscall_entry to either do a UTRACE_STOP/UTRACE_RESUME
depending on what syscall_get_nr() returned.

Aside, have you looked at seccomp? The utrace/seccomp branch on the
utrace git tree has some work Roland had done earlier to make seccomp
use utrace.

Ananth

Re: [RFC, PATCH] teach utrace to destroy engine-data

2009-09-04 Thread Ananth N Mavinakayanahalli

On Fri, Sep 04, 2009 at 02:55:29AM -0700, Roland McGrath wrote:
  Simple answer. Because I do not know how to implement this. At least now.
  I tried to think of this more, but I don't see how to make the first steps.
  
  (Yes, to be honest, this looks like unnecessary complication to me, I have
   to admit. But this is not the reason.)
 
 Well, I am befuddled.  I explained the model and it seems obvious to me how
 you'd go about it, the only nontrivial corner being coping with the MT exec
 reap case.  But I won't let our mutual befuddlement stand in the way of
 making immediate progress the way you like best.
 
  Agreed, but nobody else cares ;)
 
 Ananth, Srikar, Jim, Frank?  (Beuller?)  We do have some people around
 here with some experience dealing with the API.  They made me add the
 engine ref counts in the first place, for Pete's sake.  They've got to
 have some opinions!

I am not sure if all of what I say here makes any sense as I am still in
the midst of understanding the discussion threads between the two of you
(admittedly, its one hairy piece of the kernel). However, here goes:

I prefer Roland's ops-release approach. However, is there any chance
that -ops is no longer valid at the time __utrace_release_engine() is
called? For the ptrace case at least, it'll always be valid, so there
are no issues.

The concept of how utrace engines use refcnts initially befuddled me.
The entity that does a utrace_attach_*, without it having to explicitly
ask for a utrace_engine_get(), already owns a reference. That was
counter intuitive. (That's the reason for a series of Srikar's
utrace_engine_put() fixes too, I guess).

That apart, the problem of freeing -data is similar to what Prasad is
trying to grapple in hardware watchpoints and its ptrace interaction. I
guess its a problem unique to ptrace(?). The issue there, is that when
ptrace did a register_user_hardware_watchpoint(), and that thread later
died/exec'd, ptrace may (will?) not actually unregister the watchpoint,
and the hwbkpt layer is left to fend for itself in cleaning up the
struct hw_breakpoint in flush_thread_hardware_watchpoint(). Now, you
could have non-ptrace users of the interface (who may have passed just
a reference to a statically allocated struct hw_breakpoint) in which case,
freeing the structure in the hwbkpt layer is just wrong. Right now, the
way Prasad has fixed it is to have a check if the
bp-triggered == ptrace_triggered, in which case, free up the structure.

Yes, there is a kind of layering violation here -- hwbkpt now knows
about ptrace_triggered. But I am not sure if there is a better way of
fixing it.

Ananth

ptrace/utrace branch/tree?

2009-08-12 Thread Ananth N Mavinakayanahalli

Oleg,

From conversations between you and Roland, it looks like you're
reworking quite a bit of the ptrace/utrace code. Wondering if you are
hosting your changes in a git tree/branch?

Regards,
Ananth

[RFC][PATCH 0/3] Utrace based non-disruptive application core dumps - V1

2009-07-30 Thread Ananth N Mavinakayanahalli

This series is an RFC for utrace-based non-disruptive application core
dumper. Per Roland McGrath, this is possibly one of the common use-cases
for utrace.

This is the first foray of mine into the hairy core dump land.
Admittedly, I may have missed some (not so) subtle issues that need
careful consideration.

The current implementation uses a /proc trigger. Every process gains a new
/proc/pid/gen_core 'write-only' file. Echoing a value of '1' will
cause the core-dump to be generated. As with fatal core-dumps, the core
file will be saved in the same directory as the location from where the
application was started. The core dump has a format core.pid.timestamp,
where the timestamp is the value of ktime_get() at the time of calling
the binfmt-core_dump().

The current implementation uses a completion(). However, this precludes
the possibility of a process syncrhonously asking for its own dump
(trying this usecase causes the application to hang indefinitely). I am
told this is a much needed feature that Linux lacks.

Awaiting comments...

Regards,
Ananth

[PATCH 1/3] Create the /proc trigger

2009-07-30 Thread Ananth N Mavinakayanahalli

Create the /proc/pid/gen-core file that is the trigger for non-disruptive
application core dumps. Writing any value other than 1 to this file is
invalid.

Signed-off-by: Ananth N Mavinakayanahalli ana...@in.ibm.com
---
 fs/proc/Makefile   |1 
 fs/proc/base.c |3 ++
 fs/proc/internal.h |3 ++
 fs/proc/proc_gencore.c |   65 +
 4 files changed, 72 insertions(+)

Index: utrace-13jul/fs/proc/Makefile
===
--- utrace-13jul.orig/fs/proc/Makefile
+++ utrace-13jul/fs/proc/Makefile
@@ -26,3 +26,4 @@ proc-$(CONFIG_PROC_VMCORE)+= vmcore.o
 proc-$(CONFIG_PROC_DEVICETREE) += proc_devtree.o
 proc-$(CONFIG_PRINTK)  += kmsg.o
 proc-$(CONFIG_PROC_PAGE_MONITOR)   += page.o
+proc-$(CONFIG_UTRACE)  += proc_gencore.o
Index: utrace-13jul/fs/proc/base.c
===
--- utrace-13jul.orig/fs/proc/base.c
+++ utrace-13jul/fs/proc/base.c
@@ -2558,6 +2558,9 @@ static const struct pid_entry tgid_base_
 #if defined(USE_ELF_CORE_DUMP)  defined(CONFIG_ELF_CORE)
REG(coredump_filter, S_IRUGO|S_IWUSR, 
proc_coredump_filter_operations),
 #endif
+#ifdef CONFIG_UTRACE
+   REG(gen_core, S_IWUSR, proc_gen_core_operations),
+#endif
 #ifdef CONFIG_TASK_IO_ACCOUNTING
INF(io,   S_IRUGO, proc_tgid_io_accounting),
 #endif
Index: utrace-13jul/fs/proc/internal.h
===
--- utrace-13jul.orig/fs/proc/internal.h
+++ utrace-13jul/fs/proc/internal.h
@@ -60,6 +60,9 @@ extern const struct file_operations proc
 extern const struct file_operations proc_pagemap_operations;
 extern const struct file_operations proc_net_operations;
 extern const struct inode_operations proc_net_inode_operations;
+#ifdef CONFIG_UTRACE
+extern const struct file_operations proc_gen_core_operations;
+#endif /* CONFIG_UTRACE */
 
 void free_proc_entry(struct proc_dir_entry *de);
 
Index: utrace-13jul/fs/proc/proc_gencore.c
===
--- /dev/null
+++ utrace-13jul/fs/proc/proc_gencore.c
@@ -0,0 +1,65 @@
+/*
+ * Non-disruptive application core dump
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) IBM Corporation, 2009
+ */
+
+#include linux/seq_file.h
+#include linux/utrace.h
+#include internal.h
+
+#include asm/uaccess.h
+
+static ssize_t gen_core_write(struct file *file, const char __user *buf,
+   size_t count, loff_t *ppos)
+{
+   struct task_struct *task;
+   char buffer[PROC_NUMBUF], *end;
+   unsigned int val;
+   int ret;
+
+   ret = -EFAULT;
+   memset(buffer, 0, sizeof(buffer));
+   if (count  sizeof(buffer) - 1)
+   count = sizeof(buffer) - 1;
+   if (copy_from_user(buffer, buf, count))
+   goto out_no_task;
+
+   ret = -EINVAL;
+   val = (unsigned int)simple_strtoul(buffer, end, 0);
+   if (*end == '\n')
+   end++;
+   if ((end - buffer == 0) || (val == 0) || (val  1))
+   goto out_no_task;
+
+   ret = -ESRCH;
+   task = get_proc_task(file-f_dentry-d_inode);
+   if (!task)
+   goto out_no_task;
+
+   ret = end - buffer;
+
+   /* TODO: call core dumper */
+
+   put_task_struct(task);
+out_no_task:
+   return ret;
+}
+
+const struct file_operations proc_gen_core_operations = {
+   .write  = gen_core_write,
+};

[PATCH] Clarify UTRACE_ATTACH_EXCLUSIVE a bit more

2009-04-27 Thread Ananth N Mavinakayanahalli

More than one user has hit the -EEXIST problem when using
utrace_attach_task and UTRACE_ATTACH_EXCLUSIVE without
UTRACE_ATTACH_MATCH_DATA|_OPS. Document that a bit more.

Signed-off-by: Ananth N Mavinakayanahalli ana...@in.ibm.com
---
 kernel/utrace.c |4 
 1 file changed, 4 insertions(+)

Index: utrace-28apr/kernel/utrace.c
===
--- utrace-28apr.orig/kernel/utrace.c
+++ utrace-28apr/kernel/utrace.c
@@ -214,6 +214,10 @@ static int utrace_add_engine(struct task
  *
  * UTRACE_ATTACH_MATCH_OPS: Only consider engines matching @ops.
  * UTRACE_ATTACH_MATCH_DATA: Only consider engines matching @data.
+ *
+ * Using UTRACE_ATTACH_EXCLUSIVE without either of UTRACE_ATTACH_MATCH_OPS
+ * or UTRACE_ATTACH_MATCH_DATA will result in a match for any existing
+ * engine for the task, causing an -%EEXIST return.
  */
 struct utrace_engine *utrace_attach_task(
struct task_struct *target, int flags,

Re: [PATCH 3/3] utrace-based ftrace process engine, v2

2009-03-24 Thread Ananth N Mavinakayanahalli

On Mon, Mar 23, 2009 at 10:54:09PM -0700, Andrew Morton wrote:
 On Tue, 24 Mar 2009 10:59:26 +0530 Ananth N Mavinakayanahalli 
 ana...@in.ibm.com wrote:
 
  On Sat, Mar 21, 2009 at 05:04:22AM -0700, Andrew Morton wrote:
   On Sat, 21 Mar 2009 07:51:41 -0400 Frank Ch. Eigler f...@redhat.com 
   wrote:
On Sat, Mar 21, 2009 at 04:19:54AM -0700, Andrew Morton wrote:
   
   I have strong memories of being traumatised by reading the uprobes code. 
  
  That was a long time ago wasn't it? :-)
  
  That approach was a carry over from an implementation from dprobes that
  used readdir hooks. Yes, that was not the most elegant approach, as such
  has long been shelved.
  
   What's the story on all of that nowadays?
  
  Utrace makes implementing uprobes more cleaner. We have a prototype that
  implements uprobes over utrace. Its per process, doesn't use any
  in-kernel hooks, etc. It currently has a kprobes like interface (needs a
  kernel module), but it shouldn't be difficult to adapt it to use
  utrace's user interfaces (syscalls?) when those come around. The current
  generation of uprobes that has all the bells and whistles can be found at
  http://sources.redhat.com/git/gitweb.cgi?p=systemtap.git;a=tree;f=runtime/uprobes2
  
  However, there are aspects of the current uprobes that can be useful to
  any other userspace tracer: instruction analysis, breakpoint insertion
  and removal, single-stepping support. With these layered on top of
  utrace, building userspace debug/trace tools that depend on utrace
  should be easier, outside of ptrace.
  
  Work is currently on to factor these layers out. The intention is to
  upstream all the bits required for userspace tracing once utrace gets
  in, along with an easy to use interface for userspace developers
  (a /proc or /debugfs interface?) -- one should be able to use it on
  its own or with SystemTap, whatever they prefer. Details are still hazy
  at the moment.
  
  But, utrace is the foundation to do all of that.
  
 
 The sticking point was uprobes's modification of live pagecache.  We said
 ick, COW the pages and you said too expensive.  And there things
 remained.
 
 Is that all going to happen again?

No. All modifications are via access_process_vm().

Ananth

Re: [PATCH 3/3] utrace-based ftrace process engine, v2

2009-03-23 Thread Ananth N Mavinakayanahalli

On Sat, Mar 21, 2009 at 05:04:22AM -0700, Andrew Morton wrote:
 On Sat, 21 Mar 2009 07:51:41 -0400 Frank Ch. Eigler f...@redhat.com wrote:
  On Sat, Mar 21, 2009 at 04:19:54AM -0700, Andrew Morton wrote:
 
 I have strong memories of being traumatised by reading the uprobes code. 

That was a long time ago wasn't it? :-)

That approach was a carry over from an implementation from dprobes that
used readdir hooks. Yes, that was not the most elegant approach, as such
has long been shelved.

 What's the story on all of that nowadays?

Utrace makes implementing uprobes more cleaner. We have a prototype that
implements uprobes over utrace. Its per process, doesn't use any
in-kernel hooks, etc. It currently has a kprobes like interface (needs a
kernel module), but it shouldn't be difficult to adapt it to use
utrace's user interfaces (syscalls?) when those come around. The current
generation of uprobes that has all the bells and whistles can be found at
http://sources.redhat.com/git/gitweb.cgi?p=systemtap.git;a=tree;f=runtime/uprobes2

However, there are aspects of the current uprobes that can be useful to
any other userspace tracer: instruction analysis, breakpoint insertion
and removal, single-stepping support. With these layered on top of
utrace, building userspace debug/trace tools that depend on utrace
should be easier, outside of ptrace.

Work is currently on to factor these layers out. The intention is to
upstream all the bits required for userspace tracing once utrace gets
in, along with an easy to use interface for userspace developers
(a /proc or /debugfs interface?) -- one should be able to use it on
its own or with SystemTap, whatever they prefer. Details are still hazy
at the moment.

But, utrace is the foundation to do all of that.

Ananth

[BUG] utrace_attach_task() never returns when called from the report_clone callback

2009-03-06 Thread Ananth N Mavinakayanahalli

Roland,

With the current utrace/master tree, I am seeing that utrace_attach_task()
never returns when invoked from the clone callback. The same module
works fine with prior utrace (rcu as well as with my embed version).

The testcase is simple:
a. attach an engine to attachstop-mt (from the gdb testsuite) _before_ it
   calls pthread_create.
b. Watch for CLONE_THREAD and try to attach a utrace engine to the new
   thread. The utrace_attach_task() call never returns.

If the utrace module is unloaded, the kernel barfs with the following
innocuous information:

BUG: unable to handle kernel paging request at fdff
IP: [a012009a] 0xa012009a
PGD 203067 PUD 204067 PMD 0 
Oops: 0002 [#1] SMP 
last sysfs file: /sys/devices/pci:01/:01:01.1/irq
CPU 6 
Modules linked in: big list
[last unloaded: utrace_quiesce_threads]
Pid: 6203, comm: attachstop-mt Not tainted 2.6.29-rc7-ut #1 eserver
xSeries 366-[88632RA]-
RIP: 0010:[a012009a]  [a012009a] 0xa012009a
RSP: 0018:8801d34ebe10  EFLAGS: 00010246
RAX: fdff RBX: 8801f11a36c0 RCX: c100
RDX:  RSI: 8801dd0507f8 RDI: 88022daf4500
RBP: fff4 R08: 8801d34ea000 R09: 88022f2596a0
R10: 8800280b1600 R11: 0018 R12: 8801d34f1860
R13: 8802210dd300 R14: 8801dd07e2c0 R15: 003d0f00
FS:  7f58c8d286e0() GS:88022f18e5c0()
knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: fdff CR3: 0002029bd000 CR4: 06e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process attachstop-mt (pid: 6203, threadinfo 8801d34ea000, task
8801d3512440)
Stack:
 003d0f00 8801d34f1860 8802210dd300 8801d3512440
 8801d34ebe70 a012028d 8801dd050618 8801d35129e0
 8801d35129d8 80260480  8801d34f1860
Call Trace:
 [80260480] ? utrace_report_clone+0x95/0xfc
 [80239120] ? do_fork+0x20b/0x2f3
 [804a4035] ? do_page_fault+0x3c7/0x74e
 [8020c243] ? stub_clone+0x13/0x20
 [8020bedb] ? system_call_fastpath+0x16/0x1b
Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
RIP  [a012009a] 0xa012009a
 RSP 8801d34ebe10
CR2: fdff
---[ end trace 96bb7eb644ab73a4 ]---

I have verified that the earlier version of utrace works just fine.

In the earlier case, the engine would go directly on to the attached
list if the calling task was the creator of the new thread. This has
changed with the new implementation.

I haven't yet zeroed in on what exact change caused this problem.

Ananth

Re: [BUG] utrace_attach_task() never returns when called from the report_clone callback

2009-03-06 Thread Ananth N Mavinakayanahalli

On Fri, Mar 06, 2009 at 12:52:34PM -0800, Roland McGrath wrote:
  With the current utrace/master tree, I am seeing that utrace_attach_task()
  never returns when invoked from the clone callback. The same module
  works fine with prior utrace (rcu as well as with my embed version).
 
 I changed the utrace_attach_delay() logic recently.  That is probably it.

Right, reverting dd30e86355e fixes the problem.

 Please post your test case.

Here it is -- does nothing much really :) I used this module in
conjunction with attachstop_mt with an engine attaching to it before the
pthread_create().

---
#include linux/module.h
#include linux/utrace.h
#include linux/err.h

MODULE_DESCRIPTION(Utrace tests);
MODULE_LICENSE(GPL);

static int target_pid;
module_param_named(pid, target_pid, int, 0);

/* Structure for all threads of a process having the same utrace ops */
struct proc_utrace {
struct task_struct *tgid_task;

/* list of task_utrace structs */
struct list_head list;
unsigned int num_threads;
};

struct task_utrace {
struct list_head list;
struct task_struct *task;
/* TODO: Get rid of this and use MATCHING_OPS on task? */
struct utrace_engine *engine;
};

static const struct utrace_engine_ops ut_ops;

static struct task_utrace *get_task_ut(struct task_struct *task,
struct proc_utrace *proc_ut)
{
struct task_utrace *task_ut, *temp;

list_for_each_entry_safe(task_ut, temp, proc_ut-list, list) {
if (task_ut-task == task)
return task_ut;
}
return NULL;
}

static int cleanup_proc_ut(struct proc_utrace *proc_ut)
{
int ret = 0;
struct task_utrace *task_ut, *temp;

printk(KERN_INFO Cleanup_proc_ut\n);
if (proc_ut == NULL)
return 0;

if (list_empty(proc_ut-list))
goto out;

/* walk proc_ut-list and free task_ut */
list_for_each_entry_safe(task_ut, temp, proc_ut-list, list) {
if (task_ut-engine) {
printk(KERN_INFO Calling detach for %d\n,
task_pid_nr(task_ut-task));
ret = utrace_control(task_ut-task,
task_ut-engine, UTRACE_DETACH);
if (ret)
printk(KERN_INFO utrace_detach returned %d\n,
ret);
printk(KERN_INFO Detached engine for %d\n,
task_pid_nr(task_ut-task));
}
list_del(task_ut-list);
kfree(task_ut);
}
out:
kfree(proc_ut);
return ret;
}

static int setup_task_ut(struct task_struct *t, struct proc_utrace *proc_ut)
{
struct task_utrace *task_ut;
int ret = 0;

if (!t || !proc_ut)
return -EINVAL;

printk(KERN_INFO setup_task_ut: attaching for task %d\n,
task_pid_nr(t));
task_ut = kzalloc(sizeof(*task_ut), GFP_KERNEL);
if (!task_ut)
return -ENOMEM;

INIT_LIST_HEAD(task_ut-list);
task_ut-task = t;
list_add_tail(task_ut-list, proc_ut-list);

/*
 * The utrace engine's *data will point to proc_ut.
 */
printk(KERN_INFO Before utrace_attach_task: %d\n, task_pid_nr(t));
task_ut-engine = utrace_attach_task(t, UTRACE_ATTACH_CREATE,
ut_ops, proc_ut);
printk(KERN_INFO After utrace_attach_task: %d, engine = %p\n,
task_pid_nr(t), task_ut-engine);
if (IS_ERR(task_ut-engine)) {
printk(KERN_ERR utrace_attach_task returned %d\n,
(int)PTR_ERR(task_ut-engine));
task_ut-engine = NULL;
ret = -ESRCH;
goto out;
}
printk(KERN_INFO utrace_attach_task: SUCCESS! - engine = %p\n,
task_ut-engine);
if (utrace_set_events(t, task_ut-engine,
UTRACE_EVENT(QUIESCE) | UTRACE_EVENT(CLONE) |
UTRACE_EVENT(EXIT))) {
ret = -ESRCH;
}
proc_ut-num_threads++;
out:
return ret;
}

static u32 ut_quiesce(enum utrace_resume_action action,
struct utrace_engine *engine,
struct task_struct *task, unsigned long event)
{
printk(KERN_INFO In quiesce callback: tid = %d\n, task_pid_nr(task));
return UTRACE_RESUME;
}

/* clone handler -- handle thread spawns and forks */
static u32 ut_clone(enum utrace_resume_action action,
struct utrace_engine *engine,
struct task_struct *parent, unsigned long clone_flags,
struct task_struct *child)
{
struct proc_utrace *proc_ut = (struct proc_utrace *)engine-data;

Re: [BUG] utrace_attach_task() never returns when called from the report_clone callback

2009-03-06 Thread Ananth N Mavinakayanahalli

On Fri, Mar 06, 2009 at 12:52:34PM -0800, Roland McGrath wrote:
  With the current utrace/master tree, I am seeing that utrace_attach_task()
  never returns when invoked from the clone callback. The same module
  works fine with prior utrace (rcu as well as with my embed version).
 
 I changed the utrace_attach_delay() logic recently.  That is probably it.
 Please post your test case.

The issue is that target-real_parent == current-real_parent and not
current on a CLONE_THREAD|CLONE_PARENT. So we keep looping in the
do-while.

Ananth

Re: [PATCH] Embed struct utrace in task_struct - V2

2009-02-22 Thread Ananth N Mavinakayanahalli

On Wed, Jan 21, 2009 at 11:58:25AM +0530, Ananth N Mavinakayanahalli wrote:
 On Mon, Jan 19, 2009 at 03:20:31PM -0800, Roland McGrath wrote:
  Thanks for working on this, Ananth.  (Btw, it's embed.)
  
  I think it would be less disruptive (and materially no different)
  to leave utrace_flags as it is.  That field is the one (and only)
  that is used in hot paths (or used anywhere outside utrace.c).
  It might in future get moved around to stay in a cache-hot part
  of task_struct, for example.
  
  The long comment above struct utrace is really all about implementation
  details inside utrace.c and I don't think you should move that commentary
  to the header file.  Instead, put a comment saying that the contents of
  struct utrace and their use is entirely private to kernel/utrace.c and it
  is only defined in the header to make its size known for struct task_struct
  layout (and init_task.h).
  
  I committed some cosmetic changes that will make for a little less flutter
  in your patch.
 
 Here is V2 of the patch. Tested and works fine. Same two tests on
 powerpc fail, all tests pass on x86, while there are some occurances of
 the ptrace.c WARN_ON.
 
 Roland,
 I've tried to tweak the comments appropriately. Please feel free to
 modify them as you consider fit.

Roland,

Any updates on this and the utrace upstream integration front?

Ananth

Utrace in -next tree?

2008-10-17 Thread Ananth N Mavinakayanahalli

Roland,

What are your thoughts of getting utrace git tree into linux-next?
That way, utrace will have more extensive visibility and testing.

Ananth

[PATCH] Fix spin_unlock order in utrace_stop

2008-09-02 Thread Ananth N Mavinakayanahalli

From: Ananth N Mavinakayanahalli [EMAIL PROTECTED]

utrace_stop() seems to get the spin_unlock sequence inverted in one of the
unlikely branches. Fix it.

Signed-off-by: Ananth N Mavinakayanahalli [EMAIL PROTECTED]
---
 kernel/utrace.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: utrace-2sep/kernel/utrace.c
===
--- utrace-2sep.orig/kernel/utrace.c
+++ utrace-2sep/kernel/utrace.c
@@ -482,8 +482,8 @@ static bool utrace_stop(struct task_stru
spin_lock_irq(task-sighand-siglock);
 
if (unlikely(sigismember(task-pending.signal, SIGKILL))) {
-   spin_unlock(utrace-lock);
spin_unlock_irq(task-sighand-siglock);
+   spin_unlock(utrace-lock);
return true;
}

BUG: Sleeping function called from invalid context...

2008-08-20 Thread Ananth N Mavinakayanahalli

Roland,

Occasionally, I see:

attached to 1717 = 0xde089000
utrace_control: -EINPROGRESS
pid 1717 reports quiesced to 0xde089000
BUG: sleeping function called from invalid context at kernel/sched.c:5428
in_atomic():0, irqs_disabled():1
no locks held by uttest/1717.
irq event stamp: 1396
hardirqs last  enabled at (1395): [c0438309] trace_hardirqs_on+0xb/0xd
hardirqs last disabled at (1396): [c04d2648] trace_hardirqs_off_thunk+0xc/0x10
softirqs last  enabled at (994): [c0420091] __do_softirq+0x97/0x9e
softirqs last disabled at (987): [c04056c5] do_softirq+0x59/0xa8
Pid: 1717, comm: uttest Not tainted 2.6.27-rc3 #1
 [c04178dd] __might_sleep+0xb3/0xb8
 [c041891f] __cond_resched+0x12/0x3e
 [c05dbee8] _cond_resched+0x27/0x32
 [c04437ef] finish_callback+0x37/0xb4
 [c04438b0] start_callback+0x44/0x85
 [c0444587] utrace_resume+0xa8/0x106
 [c0403315] do_notify_resume+0x630/0x654
 [c04375db] ? trace_hardirqs_off+0xb/0xd
 [c043755d] ? trace_hardirqs_off_caller+0x14/0x87
 [c04375db] ? trace_hardirqs_off+0xb/0xd
 [c040736a] ? native_sched_clock+0x71/0x83
 [c0436fd8] ? lock_release_holdtime+0x2b/0x12e
 [c04a392d] ? dnotify_parent+0x5c/0x61
 [c0473288] ? fsnotify_access+0x54/0x5f
 [c04fc958] ? tty_read+0x0/0xa9
 [c0473f4f] ? vfs_read+0xa6/0xdf
 [c0403ab6] work_notifysig+0x13/0x19
 ===
pid 1717 reports quiesced to 0xde089000
autodetach 0xde089000 for 1717 (by 1717)
cannot find PID 1717

The utrace module is a rehased version of test-quiesce.c from the ntrace
tarball to work with the new interfaces. The report_quiesce handler just
returns UTRACE_RESUME.

This is on an x86_32 machine with the latest utrace git and the config
has CONFIG_SMP=n and CONFIG_PREEMPT_VOLUNTARY=y.

Ananth

Re: BUG: Sleeping function called from invalid context...

2008-08-20 Thread Ananth N Mavinakayanahalli

On Wed, Aug 20, 2008 at 11:48:00AM -0700, Roland McGrath wrote:
 Thanks for the report, Ananth.
 
 Ah!  The i386 will enter do_notify_resume() with interrupts disabled.
 Other machines don't do this (x86-64 and powerpc64, anyway).  It is often
 harmless, because if TIF_SIGPENDING is set, we'll first enter
 get_signal_to_deliver() and implicitly reenable interrupts by virtue of
 spin_lock_irq + spin_unlock_irq.  If we don't enter that path, we get
 straight to utrace_notify_resume() with interrupts still disabled.
 
 I've merged the following fix in.  
 Please verify that it works well for you.

The patch does fix the problem. Thanks Roland!

Ananth

Re: RCU, reference counts

2008-08-05 Thread Ananth N Mavinakayanahalli

On Tue, Aug 05, 2008 at 07:20:42PM -0700, Roland McGrath wrote:
  Yes!
 [...]
  What is the use case for a utrace client to do a utrace_engine_get/put()?
  Wouldn't it be more robust if utrace implicitly handles refcounts as
  you've detailed below?
 
 If the only operations that affect this count are implicit, then I assume
 you must mean those are attach and detach.  What today's utrace has now is
 an implicit count: it's 1 when attached, and 0 when not.  It's perfectly
 robust as described.  Since it's implicit, you have to observe all those
 squirrelly rules about synchronizing with UTRACE_DETACH that I've been
 talking about.
 
 If there is no utrace_engine_put, then what difference are you saying an
 enthusiastic Yes! to?  Explain what you have in mind that would be
 somehow different from what we already had, but have no utrace_engine_put.

My initial opinion was that you were moving away from RCU to also rid
the task_struct-utrace assertion failure, which IIRC from some of the
investigations at the time, were mostly for RCU lifetime reasons.

Ananth

Re: asynchronous detach

2008-08-02 Thread Ananth N Mavinakayanahalli

On Thu, Jul 31, 2008 at 03:38:02PM -0700, Roland McGrath wrote:
 
...

  I think this can be a good model to use for non-perturbing quiesce for
  cases as breakpoint insertion and removal or any applicaton text
  modification, where one needs to ensure no thread is executing in the
  vicinity of the said modification. So, even on an -EINPROGRESS on a
  utrace_control, if the thread is in kernel, manipulating application
  text is still safe, right?
 
 For this particular case, maybe safe enough.  I wouldn't like to
 encourage thinking of it as safe in general.  Whatever the kernel is
 doing might be touching user memory, or unmapping it or whatever.  For
 breakpoint insertion/removal, you are pretty much assuming nothing else
 would ever touch the text anyway.  You still need an interlock against
 munmap text + mmap reusing addrs in quick succession.  I suppose you
 could hold mmap_sem or something.
 
  Not sure if a similar model can be used to address the teardown races
  problem.
 
 Hmm.  I mentioned before having a non-synchronizing record of the
 callback in progress.  This lets UTRACE_DETACH return -EINPROGRESS,
 which is after the horse has left the barn, as it were.  There is a
 related extension of this idea I had vaguely in mind but didn't mention.
 A utrace_set_events call that disables event bits could also return
 -EINPROGRESS when some callback might be in progress.  As with detach,
 that means a zero return tells you all is well.  Unlike detach, if you
 get -EINPROGRESS from utrace_set_events, you still have the engine.

So, the intention is that -EINPROGRESS can be returned only on a
utrace_set_events(task, engine, 0), right?

Actually, I think it would also be useful to just have a
utrace_clear_events_sync() that encapsulates the above call. I can see
use for it outside of an asynchronous detach, for cases when one needs
to just turn tracing events off on a thread. (It is definitely not
recommended for debugger/probes usage with breakpoints already set in
user code, but would be useful for someone doing non-invasive syscall
and process lifetime tracing).
 
 If your callback never blocks, that is enough to know that it's safe to
 free your data, unload your module, etc.  (Or if your callback can block
 but what you want to synchronize with is just what it does before it
 blocks.  That's when you don't care about being able to unload the
 module, just juggling your own data structures.)
 
 For the general case of some unknown callback code that might include
 blocking, you could instead do:
 
   ret = utrace_set_events(task, engine, 0);
   while (ret == -EINPROGRESS) {
   ret = utrace_prepare_examine(task, engine, exam);
   if (ret == -EAGAIN)
   yield();
   ret = utrace_set_events(task, engine, 0);
   if (ret == -EINPROGRESS)
   yield();
   }
 
 That is: when it is blocked, check again if a callback still might be in
 progress, and loop.  Assuming your callback is well-behaved and only
 ever blocks for a short time (like kmalloc), this will return soon.

This can cetainly work!

 yield(); should actually be schedule_timeout_interruptible(1); probably,
 and it should be able to return -ERESTARTSYS.
 
 If this is a good plan, then we should encapsulate it in a new call
 utrace_set_events_sync() or something like that.  So then the safe
 detach procedure would be just:
 
   ret = utrace_set_events_sync(task, engine, UTRACE_EVENT(REAP));
   if (!ret)
   ret = utrace_control(task, engine, UTRACE_DETACH);
 
 This can only return zero or -ESRCH.  Zero says that all callbacks are
 definitely finished.  -ESRCH says that your report_reap callback is
 definitely happening (right now, modulo preemption).
 
 We could encapsulate that into utrace_detach_sync() too.

Yes, this does take care of the thread about to die case too.

Ananth

Re: crash-suspend teardown races

2008-07-30 Thread Ananth N Mavinakayanahalli

On Wed, Jul 30, 2008 at 04:19:44PM -0500, David Smith wrote:

...

 @@ -197,14 +224,33 @@ static void __exit exit_crash_suspend(vo
  error, t-pid);
   }
   else {
 - int ret = utrace_control(t, engine, UTRACE_DETACH);
 + int ret = utrace_set_events(t, engine,
 + SHUTDOWN_EVENTS);
   WARN_ON(ret);
 +
 + ret = utrace_control(t, engine, UTRACE_STOP);
 + if (ret == -EINPROGRESS) {
 + ret = utrace_control(t, engine,
 +  UTRACE_INTERRUPT);
 + WARN_ON(ret);
 + }
 + else {
 + WARN_ON(ret);
 + }
   ++n;
   }
   }
   rcu_read_unlock();
 
 - printk(detached from %d threads, unloading\n, n);
 + printk(KERN_INFO requested %d threads to stop\n, n);
 +
 + if (wait_event_timeout(crash_suspend_wq,
 +(atomic_read(attach_count) = 0),
 +60 * HZ) = 0) {
 + printk(KERN_ERR gave up waiting.\n);
 + }
 + printk(KERN_INFO unloading, %d threads left to stop\n,
 +atomic_read(attach_count));
  }
 
  module_init(init_crash_suspend);

Thinking aloud, utrace_control(UTRACE_STOP) returns -EINPROGRESS for
threads not yet stopped
a. possibly still in userspace, yet to pass through a quiesce safe point
b. blocked in the kernel on a syscall or an exception.

Would task_current_syscall() help here? On a -EINPROGRESS return, one
can check for a 0 return from task_current_syscall() which'd mean the
thread is in the kernel. Wouldn't that mean that the thread will
eventually do a report_quiesce?

I think this can be a good model to use for non-perturbing quiesce for
cases as breakpoint insertion and removal or any applicaton text
modification, where one needs to ensure no thread is executing in the
vicinity of the said modification. So, even on an -EINPROGRESS on a
utrace_control, if the thread is in kernel, manipulating application
text is still safe, right?

Not sure if a similar model can be used to address the teardown races
problem.

Thoughts?

Ananth

Re: utrace status

2008-07-08 Thread Ananth N Mavinakayanahalli

Roland,

Here is a minor fix for powerpc syscalls.h.

---
From: Ananth N Mavinakayanahalli [EMAIL PROTECTED]

Remove return 0 from static inline void syscall_set_arguments()
in powerpc/syscalls.h

---
 include/asm-powerpc/syscall.h |1 -
 1 file changed, 1 deletion(-)

Index: kernel-utrace-8jul/include/asm-powerpc/syscall.h
===
--- kernel-utrace-8jul.orig/include/asm-powerpc/syscall.h
+++ kernel-utrace-8jul/include/asm-powerpc/syscall.h
@@ -67,7 +67,6 @@ static inline void syscall_set_arguments
 {
BUG_ON(i + n  6);
memcpy(regs-gpr[3 + i], args, n * sizeof(args[0]));
-   return 0;
 }
 
 #endif /* _ASM_SYSCALL_H */

Re: utrace status

2008-07-08 Thread Ananth N Mavinakayanahalli

On Wed, Jul 02, 2008 at 08:02:36PM -0700, Roland McGrath wrote:
 
 The status of the ptrace cooperation code looks OK but it is not
 very well tested.  I've really only tested on x86-64 lately.  It
 passed most of the ptrace-tests, but I haven't done anything big
 like gdb.

A simple update to ensure ptrace-utrace co-operation patch builds on
powerpc to aid in testing/debug. Srini's task_pt_regs() patch is also
required for the build to be successful.

Signed-off-by: Ananth N Mavinakayanahalli [EMAIL PROTECTED]
---
 kernel/ptrace.c |   25 +
 1 file changed, 25 insertions(+)

Index: kernel-utrace-8jul/kernel/ptrace.c
===
--- kernel-utrace-8jul.orig/kernel/ptrace.c
+++ kernel-utrace-8jul/kernel/ptrace.c
@@ -230,10 +230,17 @@ static u32 utrace_ptrace_report(u32 acti
 
task-exit_code = code;
 
+#ifdef CONFIG_X86
printk(XXX %d ptrace_report act %x (%x) code %x ax %ld orig_ax %ld\n,
   task-pid, action, ptrace_resume_action(task),
   task-exit_code,
   task_pt_regs(task)-ax, task_pt_regs(task)-orig_ax);
+#elif defined(CONFIG_PPC)
+   printk(XXX %d ptrace_report act %x (%x) code %x gpr3 %ld orig_gpr3 
%ld\n,
+  task-pid, action, ptrace_resume_action(task),
+  task-exit_code,
+  task_pt_regs(task)-gpr[3], task_pt_regs(task)-orig_gpr3);
+#endif
WARN_ON(1);
 
return action | UTRACE_STOP;
@@ -340,8 +347,13 @@ static u32 ptrace_report_syscall(enum ut
int code = SIGTRAP;
if (task-ptrace  PT_TRACESYSGOOD)
code |= 0x80;
+#ifdef CONFIG_X86
printk(XXX %d syscall code %x ax %ld orig_ax %ld\n,
   task-pid, task-exit_code, regs-ax, regs-orig_ax);
+#elif defined(CONFIG_PPC)
+   printk(XXX %d syscall code %x gpr3 %ld orig_gpr3 %ld\n,
+  task-pid, task-exit_code, regs-gpr[3], regs-orig_gpr3);
+#endif
WARN_ON(1);
return utrace_ptrace_report(0, task, code);
 }
@@ -398,9 +410,15 @@ static u32 ptrace_report_signal(u32 acti
const struct k_sigaction *orig_ka,
struct k_sigaction *return_ka)
 {
+#ifdef CONFIG_X86
printk(XXX %d signal %d act %x (%x) code %x ax %ld orig_ax %ld\n,
   task-pid, info-si_signo, action, ptrace_resume_action(task),
   task-exit_code, regs-ax, regs-orig_ax);
+#elif defined(CONFIG_PPC)
+   printk(XXX %d signal %d act %x (%x) code %x gpr3 %ld orig_gpr3 %ld\n,
+  task-pid, info-si_signo, action, ptrace_resume_action(task),
+  task-exit_code, regs-gpr[3], regs-orig_gpr3);
+#endif
WARN_ON(1);
 
if (info-si_signo == 0 ||
@@ -418,10 +436,17 @@ static u32 ptrace_report_quiesce(u32 act
 struct task_struct *task,
 unsigned long event)
 {
+#ifdef CONFIG_X86
printk(XXX %d quiesce %lx act %x (%x) code %x ax %ld orig_ax %ld\n,
   task-pid, event, action, ptrace_resume_action(task),
   task-exit_code,
   task_pt_regs(task)-ax, task_pt_regs(task)-orig_ax);
+#elif defined(CONFIG_PPC)
+   printk(XXX %d quiesce %lx act %x (%x) code %x gpr3 %ld orig_gpr3 
%ld\n,
+  task-pid, event, action, ptrace_resume_action(task),
+  task-exit_code,
+  task_pt_regs(task)-gpr[3], task_pt_regs(task)-orig_gpr3);
+#endif
WARN_ON(1);
task-last_siginfo = NULL;
return ptrace_resume_action(task);

Re: user_regset is in!

2008-02-07 Thread Ananth N Mavinakayanahalli

On Wed, Jan 30, 2008 at 12:22:58PM -0800, Roland McGrath wrote:
 The generic and x86 code for user_regset went into Linus's kernel tree today,
 destined for the 2.6.25 release.  I'm very grateful to Ingo Molnar, who helped
 this happen via the x86.git tree.  I've also had some positive feedback from
 the powerpc maintainer, and expect the powerpc user_regset code to go in
 upstream soon as well.

Paul Mackerras has pulled in the regset changes. Should be in Linus'
tree tomorrow.

Ananth

Re: incremental arch work

2008-01-22 Thread Ananth N Mavinakayanahalli

On Tue, Nov 20, 2007 at 08:29:19PM -0800, Roland McGrath wrote:
 Here are the steps I have in mind.  I think this work should be pretty well
 clear to merge upstream without much controversy.  Basically, this is the
 arch parts now done in the tracehook and regset patches, with a little
 sugar.  Several of these steps can be done in parallel and merged upstream
 independently.

...

 Once upstream arch code has merged all the steps above, there will be no
 more arch changes--or very nearly none, anyway--required to work with a
 later merging of utrace or something else like it.  I've thought about
 ways to be more incremental about the core changes, too.  But if we can
 get the ball rolling with the arch changes and get a majority of upstream
 arch trees converted over, that will be a first big win.

Now that code related to most of the steps you outlined in this thread are
scheduled to be pushed upstream (thanks a ton for the work!), from a
(!ptrace) utrace client point of view, the two major remaining components
would be the tracehooks and the engine infrastructure.

We have quite a bit of code in the uprobes codebase that would be
of interest to the larger utrace-client community. These include:

- Breakpoint assistance (including single-stepping out of line)
- Quiescing all threads of a process.

From a utrace-client (and long term maintenance perspective), we'd like to
factor these out. (It'd also make the uprobes codebase much leaner).

- What'd be your suggestions on that front? Do we just base these off of
  the current utrace engine infrastructure?
- Are the tracehooks and engines the next targets upstream? Possibly
  2.6.26?
- Do you have any changes in mind from a utrace-client perspective that
  we need to be cognizant of?

Please advise.

Ananth

54 matches

Mail list logo