Re: linux-next: add utrace tree

2010-01-23 Thread Alexey Dobriyan
On Sat, Jan 23, 2010 at 2:22 AM, Linus Torvalds
torva...@linux-foundation.org wrote:
 This is why when somebody brought up you could do a seccomp-like thing on
 top of utrace that my reaction was and is just totally negative. It shows
 all the wrong kinds of tying things together.

seccomp-via-utrace should be just removed to be honest before its users.
It entered the tree because it was very small and simple.
If rewritten, it no longer is small and simple because of whole kernel/utrace.c.



Re: linux-next: add utrace tree

2010-01-23 Thread Alan Cox
 The killer app for this will be the ability to delete thousands of
 lines of code from GDB, strace, and all the various other tools that
 have to painfully work around the major interface gotchas of ptrace(),
 while at the same time making their handling of complex processes much
 more robust.

Years ago (and it really must be years ago because this was about the
time I started hacking on Linux stuff !) there was a proposal to extract
and sanitize the arch specific stuff in binutils and in gdb etc into
sensible libraries that could be used by other apps.

What I don't understand is why that doesn't solve 99% of your problem.
ptrace is not perfect but most of the real ptrace limitations actually
come about because either the CPU can't do something or because the
supporting logic would be too expensive - things like having extra
private debugger pages.

Yes ptrace needs a lot of icky support code, but it's already been
written...

Alan



Re: linux-next: add utrace tree

2010-01-23 Thread Ingo Molnar

* Kyle Moffett k...@moffetthome.net wrote:

 On Fri, Jan 22, 2010 at 19:22, Linus Torvalds
 torva...@linux-foundation.org wrote:
  There are cases where we really _want_ to have common code. We want to
  have a common VFS interface because we want to show _one_ interface to
  user space across a gazillion different filesystems. We want to have a
  common driver layer (as far as possible) because - again - we expose a
  metric shitload of drivers, and we want to have one unified interface to
  them.
 
 So... Everybody agrees that ptrace() is horrible and a royal pain to use, 
 let alone use correctly and without bugs.  Everybody also agrees that 
 ptrace() needs to stay around for a long time to avoid breaking all the 
 existing users.
 
 Now how do we get from here to a moderately portable API for interrogating, 
 controlling, and intercepting process state? Essentially it would need to 
 support all of the things that a powerful debugger would want to do, 
 including modifying registers and memory, substituting syscall return 
 values, etc.  I believe that utrace is the kernel side of that API.

The problem is, utrace does not do that really.

What utrace does is that it provides an opaque set of APIs for unspecified and 
out of tree _kernel_ modules (such as systemtap). It doesnt support any 
'application' per se. It basically removes the kernel's freedom at shaping its 
own interaction with debug application.

If utrace was a 'better ptrace' syscall, where the syscall itself is the goal 
of the hookery, it would all be rather different. People could argue about 
_that_ interface (and the hooks would be a pure kernel internal 
implementational detail - not an interface specification), and once people 
agree about that ABI and there's enough application momentum behind it, the 
hooks are really not that opaque anymore - they are for that ABI and not more.

Note that it's still a _big_ hurdle: it's hard to agree on a new syscall and 
it's hard to get 'application momentum' behind it. Special Linux system calls 
have a checkered past, they tend to not be used by much anything, and thus 
they tend to be a breeding ground of both bugs, maintenance complexity and 
security problems. Lack of attention is never good.

In that sense it might be better to fix/enhance ptrace, if there's interest. 
I've written a handful of ptrace extensions in the past (none of them went 
upstream tho), it can be done in a useful manner and the code is pretty 
hackable. There are basic problems left to be solved: for example why is there 
still no 'memory block copy' call, why are we _still_ limited to one word per 
system call PTRACE_PEEK* memory copies? It's ridiculous. SparcLinux has 
PTRACE_WRITE*/READ* support that implements this, but none of the other 
architectures have it so it's essentially unused.

Or another possible direction would be to extend the perf events syscall with 
interception capabilities. It's far more performant at extracting application 
state without scheduling than any ptrace method - and interception/injection 
would be a natural next step - if there's interest.

Thanks,

Ingo



Re: linux-next: add utrace tree

2010-01-23 Thread Frank Ch. Eigler
Hi -

mingo wrote:
 [...]
  Now how do we get from here to a moderately portable API for interrogating, 
  controlling, and intercepting process state? Essentially it would need to 
  support all of the things that a powerful debugger would want to do, 
  including modifying registers and memory, substituting syscall return 
  values, etc.  I believe that utrace is the kernel side of that API.
 
 The problem is, utrace does not do that really.

In fact, it is exactly designed for that.

 What utrace does is that it provides an opaque set of APIs for
 unspecified and out of tree _kernel_ modules (such as systemtap). It
 doesnt support any 'application' per se. It basically removes the
 kernel's freedom at shaping its own interaction with debug
 application.

This claim is hard to take any more seriously than emoting that the
blockio layer is opaque because device drivers remove freedom for
the kernel to shape its interaction with hardware.  If you have any
*real evidence* about how any present user of utrace misuses that
capability, or interferes with the kernel's freedom, show us please.


- FChE



Re: linux-next: add utrace tree

2010-01-23 Thread Frank Ch. Eigler
Hi -

On Sat, Jan 23, 2010 at 11:01:21AM +, Alan Cox wrote:
 [...]
 What I don't understand is why [libgdb?] doesn't solve 99% of your problem.
 ptrace is not perfect but most of the real ptrace limitations actually
 come about because either the CPU can't do something or because the
 supporting logic would be too expensive - things like having extra
 private debugger pages.

At least one reason is that ptrace is single-usage-only, so for
example you cannot concurrently debug  strace the same program.
OTOH, utrace is designed to permit clean nesting/sharing semantics for
concurrent debugger-type tools operating on the same processes.

- FChE



Re: linux-next: add utrace tree

2010-01-23 Thread Arnaldo Carvalho de Melo
Em Sat, Jan 23, 2010 at 11:01:21AM +, Alan Cox escreveu:
 Years ago (and it really must be years ago because this was about the
 time I started hacking on Linux stuff !) there was a proposal to extract
 and sanitize the arch specific stuff in binutils and in gdb etc into
 sensible libraries that could be used by other apps.

Aleluiah if it had happened at that time, but sadly... :-(
 
- Arnaldo



Re: linux-next: add utrace tree

2010-01-23 Thread tytso
On Sat, Jan 23, 2010 at 06:47:29AM -0500, Frank Ch. Eigler wrote:
  What utrace does is that it provides an opaque set of APIs for
  unspecified and out of tree _kernel_ modules (such as systemtap). It
  doesnt support any 'application' per se. It basically removes the
  kernel's freedom at shaping its own interaction with debug
  application.
 
 This claim is hard to take any more seriously than emoting that the
 blockio layer is opaque because device drivers remove freedom for
 the kernel to shape its interaction with hardware.  If you have any
 *real evidence* about how any present user of utrace misuses that
 capability, or interferes with the kernel's freedom, show us please.

The fundamental issue which Ingo is trying to say (and which you
apparently don't seem to be understanding) is that utrace doesn't
export a syscall (which is an ABI that we are willing to promise will
be stable), but rather a set of kernel API's (which we never promise
to be stable), and the fact that there will be out-of-tree programs
that are going to be trying to depend on that interface (much like
Systemtap does today when it creates kernel modules) is something that
is considered on par with Nvidia trying to ship proprietary video
drivers.  

(OK, maybe not *quite* as evil as Nvidia because at least SystemTap is
open source, but the bottom line is that enabling out-of-tree modules
isn't considered a good thing, and if we know in advance that there
are out-of-tree modules, there is a strong tendency to want to nip
those in the bud.)

The reason why I avoid Nvidia hardware like the plague is because I
work on bleeding-edge kernels, and even though companies like Nvidia
and Broadcom try very hard to keep up with released upstream kernels,
#1, there is always the concern of what happens if they decide to
change that policy, and #2, invariably something will break during the
-rc1 or -rc2 stage, and then my laptop is useless for running bleeding
edge kernels.  It's one of the reasons why many kernel developers gave
up on SystemTap, because it's not something that can be trusted to be
there, and the fault is not on our changing the API's, it's on
SystemTap depending on API's that were never guaranteed to be stable
in the first place.

If you want to try to slide utrace in, such that we're able to ignore
the fact that there will be this external house that will be built on
quicksand, pointing at how nice the external house will be isn't going
to be helpful.  Nor is pointing at the ability that other people will
be able to build other really nice houses on the aforementioned
quicksand (i.e., out-of-tree kernel modules that depend on kernel
API's).

A simple code cleanup argument is not carrying the day (Look!  We
can cleanup the ptree code!).  It's going to have to be a **really**
cool in-tree kernel funtionality that provides a killer feature (in
Linus's words), enough so that people are willing to overlook the fact
that there's this monster external out-of-tree project that wants to
be depend on API's that may not be stable, and which, even if the
developers don't grump at us, users will grump at us when we change
API's that we had never guaranteed will be stable, and then Systemtap
breaks.

This is probably why Ingo invited you to think about ways of doing
some kind of safe in-kernel bytecode approach.  That has the advantage
of doing away with external kernel modules, with all of their many
downsides: its dependency on unstable kernel API's, the fact that many
financial customers have security policies that prohibit C compilers
on production machines, the inherent security risk of allowing
external random kernel modules to be delivered and loaded into a
system, etc.

  - Ted



Re: linux-next: add utrace tree

2010-01-23 Thread Linus Torvalds


On Sat, 23 Jan 2010, Kyle Moffett wrote:
 
 Now how do we get from here to a moderately portable API for
 interrogating, controlling, and intercepting process state?

Umm? ptrace?

It's not _pretty_, but it's a hell of a lot more portable than utrace is 
ever going to be. Yes, the details differ between OS's (and between 
architectures), but let's face it, things like register state probing is 
_never_ going to be portable across different architectures simply because 
the register state isn't the same.

 The killer app for this will be the ability to delete thousands of
 lines of code from GDB, strace, and all the various other tools that
 have to painfully work around the major interface gotchas of ptrace(),
 while at the same time making their handling of complex processes much
 more robust.

No. There is absolutely _no_ reason to believe that gdb et al would ever 
delete the ptrace interfaces anyway. 

That really is my point. Adding a new interface, when an old and crufty 
(but working) interface is inevitably going to be around anyway - and is 
inevitably always going to have portability issues - is STUPID.

Let's take strace, for example.

Yes, ptrace() is crufty, but have you actually looked at strace source 
code? The problem isn't really a crufty interface to read registers etc, 
the bigger problem for strace is that different architectures and OS's 
have different system call argument rules, different ways to read/write 
system call numbers yadda yadda yadda.

Take a look at strace sources some day. Moving away from ptrace on Linux 
(even if you decided that you don't care about old versions of the kernel 
that don't know anything else) would simplify ABSOLUTELY NOTHING.

Really. Quiet the reverse, I suspect. The Solaris and FreeBSD support uses 
ptrace too, afaik, so you' just be confusing the issue.

And the fact is, strace would still end up supporting ptrace anyway, just 
so that you could run it on old kernels.

So the whole making a new utrace interface would simpligy things is 
simply a total lie. The fact that ptrace is a bit of an odd interface IN 
NO WAY means that any other interface would end up being appreciably 
simpler.

It would just result in _more_ code in strace, and more confusion.

Linus