from:"Roland McGrath"

Re: [PATCH 1/4] ptrace: temporary revert the recent ptrace/jobctl rework

2011-06-21 Thread Roland McGrath

 OK, so here's my (hacky) idea:
 (1) Forget ptrace-via-utrace.  Have utrace be a separate thing.  This
 way the recent ptrace changes won't matter.
 (2) But, what about ptrace co-existing well with utrace?  Make them
 mutually exclusive - a ptraced-process can't be utraced and a
 utraced-process can't be ptraced.

We had this situation before for a while.  It has the substantial downside
that e.g. you cannot do any system-wide systemtap tracing without making
all strace and gdb use impossible.

 Assuming the above is a semi-reasonable idea, it might be a lot less
 work than updating the ptrace-via-utrace code to handle the new ptrace
 changes.

That's for Oleg to say.  (Sorry, Oleg. ;-)


Thanks,
Roland

RE: utrace support on ARM

2011-04-22 Thread Roland McGrath

 I will see if I can interest people in TI and Linaro. I will need a good 
 story... ;-)

It is kernel port modernization work that nearly every other platform has
done by now.

Re: [PATCH 0/5] ptrace-utrace: fix exit_ptrace() vs ptrace_report_signal() races

2010-12-10 Thread Roland McGrath

I've merged these patches to the utrace-ptrace branch, now merged up to
2.6.37-rc5, and also the 2.6.34 and 2.6.35 backport branches.  The 2.6.33
backport branch is no longer being maintained.  I didn't update the 2.6.36
backport branch and probably won't maintain it unless some Fedora release
starts using that kernel version (rawhide is already on 2.6.37-rc5 now).


Thanks,
Roland

Re: Utrace .report_jctl serialization issue ?

2010-11-11 Thread Roland McGrath

 Is a utrace engine with .report_jctl enabled suppose to handle
 do_notify_parent_cldstop(current, notify) processing for the last
 stopping task ?  Or should it muck with task-ptrace to force
 tracehook_notify_jctl() to return a non-zero value ?

I can't figure out exactly how to construe this as a question about the
utrace API.  The documented API is that each thread gets a report_jctl
callback, and the @notify argument is zero in all threads but one.

 I ask because I have a simple multi-threaded process with a utrace engine
 attached to the process group leader; .report_jctl is enabled.

Do you mean the thread_group leader of one process in the process group?
Or do you mean multiple utrace engines, one per thread, all in the process
whose pid==pgid (that's what process group leader means in POSIX)?

 If I SIGTSTP the process, occasionally control is not returned to the
 shell.

That sounds like a kernel bug.  There should be nothing your report_jctl
callback could do (assuming it doesn't send more signals itself) that
affects the normal job control semantics.

The kernels that are appropriate to discuss here are upstream kernels with
the current utrace patches applied (i.e. what you get in the current
utrace-ptrace git branch), or the most recent Fedora kernels that should
include that same code.  The code in question might well be the same in
RHEL6 kernels, but RHEL issues need to be addressed through the proper RHEL
support channels.  What we will help you with here is the current utrace
development code.

Perhaps Oleg and/or I will get time soon to look into this issue.
Chances are better if you provide a test case in the form of a simple
utrace module and a test scenario using it.


Thanks,
Roland

Re: utrace unwanted traps

2010-10-19 Thread Roland McGrath

 Probably, I am not sure. I can't recall the previous discussions. 

Me either (except hazily the one notion I mentioned), but that's why
we have mailing list archives.

 Afaics, this was introduced by 26fefca955cc7c3c83014be2e10b773209d32eea
 utrace: sticky resume action. Before that patch the stopped tracee
 could only remember the pending UTRACE_REPORT or UTRACE_INTERRUPT.

Correct.

  But that's how it used to be, and then we changed it.
  So it merits digging up our old discussion threads that led to doing that
  and see what all we had in mind then.
 
 I tried to grep my mail archives and google, but failed to find anything
 related.

Based on the date of the commit you just mentioned, I browsed in:
https://www.redhat.com/archives/utrace-devel/2009-October/thread.html
and saw:
https://www.redhat.com/archives/utrace-devel/2009-October/msg00287.html
which is the day before that commit.  I didn't read through all that to
refamiliarize myself with all the details.


Thanks,
Roland

Re: [PATCH] utrace_barrier(detached_engine) must not sping without checking -reporting

2010-10-19 Thread Roland McGrath

 Yes. But, with this patch, in this case
 utrace_barrier()-get_utrace_lock(attached = false) returns success
 and then we check utrace-reporting == engine.

I guess that's OK if -reporting == engine is always set when any kind of
callback might be in progress.

 (Hmm. Probably utrace-reap = T case needs more attention,
  utrace_maybe_reap() doesn't set utrace-reporting).

That would be a problem.

 But, unfortunately, this signal_pending() check assumes the caller can
 always handle this ERESTARTSYS. But this is not true, one example is
 the exiting debugger doing exit_ptrace().

I understand it's a problem.  Perhaps we need to have a flag argument or
(probably better) a new utrace_barrier_uninterruptible call.  But, for
this particular situation with exit_ptrace, it could be kludged around.
The caller is never going to dequeue another signal once it's as far
into death as exit_ptrace anyway.  So, it could do a loop of:

ret = utrace_barrier(task, engine);
if (ret == -ERESTARTSYS) {
clear_thread_flag(TIF_SIGPENDING);
continue;
}

Not that I'm suggesting this isn't a dismal kludge.  But it seems like
it would close the hole we have today in practice.


Thanks,
Roland

Re: [PATCH] utrace: utrace_reset() should clear TIF_SINGLESTEP if no more engines

2010-10-14 Thread Roland McGrath

 The 1st patch I sent initially:
 
   --- kstub/kernel/utrace.c~10_utrace_reset_should_clear_ss   
 2010-09-20 03:55:12.0 +0200
   +++ kstub/kernel/utrace.c   2010-09-20 21:33:47.0 +0200
   @@ -709,6 +709,7 @@ static bool utrace_reset(struct task_str
*/
   utrace-resume = UTRACE_RESUME;
   utrace-signal_handler = 0;
   +   user_disable_single_step(task);
   }

   /*
 
 It clears TIF_SINGLESTEP when the last engine detaches.

This we didn't want because utrace_reset can be called when the tracee is
not stopped, and it's not safe to call user_*_step then (with the one
caveat that the SIGKILL race is OK).

 The 2nd one which changes utrace_control(DETACH),
 
   @@ -1139,7 +1139,9 @@ int utrace_control(struct task_struct *t
   target-utrace_flags = ~ENGINE_STOP;
   mark_engine_detached(engine);
   reset = reset || utrace_do_stop(target, utrace);
   -   if (!reset) {
   +   if (reset) {
   +   user_disable_single_step(target);
   +   } else {
   /*
* As in utrace_set_events(), this barrier 
 ensures
* that our engine-flags changes have hit 
 before we
 
 It doesn't cover the case when the TIF_SINGLESTEP tracee detaches
 itself, but a) currently the are no such engines, and b) it looks
 more clean.

If we're trying to meet the advertised API--how it's documented now is
the only reasonable thing if we are actually fixing things properly at
all--we do need to cover the case where the only engine to have
requested single-step earlier then later returns UTRACE_DETACH from a
callback.  That is, if that engine could actually know that the tracee
never reached user mode between its use of UTRACE_SINGLESTEP and its
later UTRACE_DETACH.

 However, I am starting to think that if you prefer this change we
 have a better option. Instead of duplicating UTRACE_RESUME logic,
 perhaps the patch below makes more sense?

Hmm, that does seem like it might be pretty reasonable.  If your
engine had ever permitted the tracee to get back to user mode after
requesting UTRACE_*STEP, then it's just desserts to cause that SIGTRAP
regardless of whether other engines might have kept it stopped in
actuality.  

But, there might be plausible corners where an engine really could know
reliably (without outside knowledge of what other engines might be
doing) that the tracee can't really have been back to user mode yet.
For example, if it checked signal_pending(tracee) before it used
UTRACE_*STEP, then it could know for sure that its report_signal
callback will have been called before it ever got to user mode, so if
that callback returns UTRACE_DETACH, it better not be left with stepping
enabled.  In that particular case, it shouldn't be a problem, since
after its report_signal we will always hit finish_resume_report later.
But we should think about if there are any holes of a similar nature.

 Please tell me which fix you prefer? Or just commit the fix you
 like more.

I'm only going to merge a specific commit of yours that you have tested.


Thanks,
Roland

Re: [PATCH] utrace_barrier(detached_engine) must not sping without checking -reporting

2010-10-13 Thread Roland McGrath

Please feel free to ignore this if it's no longer relevant to your work.
I'm trying to catch up on the backlog of replies I owe you.  I chose
this one as the arbitrary cut-off before which I think I have already
neglected things too long to still matter.

 If engine is detached (has utrace_detached_ops), utrace_barrier(engine)
 spins until engine-ops becomes NULL. This is just wrong.

Yes, this happens because get_utrace_lock returns -ERESTARTSYS and
utrace_barrier checks for this and loops.  I agree these long-spin
scenarios would be wrong.

The reason it tries to wait for fully detached state is that after
utrace_control(task,engine,UTRACE_DETACH), @task could still be in the
middle of a callback for @engine.

 Suppose that utrace_control(DETACH) returns -EINPROGRESS, now we should
 call utrace_barrier(). However, it is possible that -EINPROGRESS means
 we raced with sys_sleep(A_LOT) doing report_syscall_entry(). 

Right.  Perhaps utrace_barrier could do some different variant of the
(utrace-reporting != engine) check.

 Change get_utrace_lock() to succeed if the caller is utrace_barrier()
 and ops == utrace_detached_ops. I do not see any reason why this case
 should be special from utrace_barrier's pov. It can just check
 -reporting and return 0 or do another iteration.
[...]
 Also, it is not clear why utrace_barrier() needs utrace-lock,
 except to ensure it is safe to dereference target/utrace.

Well, wouldn't that be reason enough?  The comment in utrace_barrier
talks about needing the lock.  This corresponds to the comment in the
UTRACE_DETACH case of finish_callback_report.  Do you think those
comments are inaccurate about what's required?

 Note: we should also reconsider() utrace_barrier()-signal_pending() check.

IMHO it is badly wrong to have utrace_barrier do an uninterruptible wait
(even moreso since it's really a spin).  If a buggy callback gets stuck
blocking or spinning and fails to return promptly, then you wedge any
debugger thread trying to synchronize with it via utrace_barrier.  If
you can't even interrupt that debugger thread, then there will really be
no chance to recover from the deadlock.


Thanks,
Roland

Re: gdbstub initial code, v8 ptrace

2010-10-13 Thread Roland McGrath

 But. Suppose we have to attached engines. The first engine gets
 UTRACE_SIGNAL_REPORT and returns UTRACE_STOP | UTRACE_SIGNAL_IGN.

Right, that's what it would do.  I see, you're saying that the
report.result passed on to the next engine would appear like it had
been a real signal that the previous engine decided to ignore (or
whose sa_handler==SIG_IGN or default action was to ignore).  Hmm.

Well, it's also distinguished by having orig_ka==NULL in the callback.
Any real signal has a non-null orig_ka argument.

 or yes, it returns UTRACE_SIGNAL_DELIVER after gdb does signal XX.
 
 Now. The second engine gets UTRACE_SIGNAL_DELIVER or UTRACE_SIGNAL_IGN,
 not UTRACE_SIGNAL_REPORT.

At least in the UTRACE_SIGNAL_DELIVER case, that's really the true
thing for it to see.  The previous engine decided to do a signal
delivery (as directed by return_ka), so the next engine needs to see
this to know what the prevailing state of play is now.

 That is why ugdb_signal_report(UTRACE_SIGNAL_REPORT) tries to return
 UTRACE_STOP | utrace_signal_action(action) to not change report-result
 (passed to the next tracee) inside the reporting loop.

Well, it *can* do that.  If UTRACE_SIGNAL_REPORT (or anything else
random) is the ultimate report.result from the whole callback loop, we
treat it just like UTRACE_SIGNAL_IGN.

 The worst case is UTRACE_SIGNAL_HANDLER. Single-stepping should know
 about this case. This means that any engine should always return
 UTRACE_resume_action | UTRACE_SIGNAL_HANDLER.

I see.  This is indeed the only way for any engine to know that it's
getting the UTRACE_SIGNAL_HANDLER specific notification rather than
some random fallout of someone else's UTRACE_REPORT request or whatnot.

   Probably we can check orig_ka != NULL and treat orig_ka == NULL as
   UTRACE_SIGNAL_REPORT. Hopefully this can't be confused with
   UTRACE_SIGNAL_HANDLER.
 
  I'm not really sure what you mean here.
 
 If -report_signal(orig_ka) was calles with orig_ka == NULL, then,
 whatever utrace_signal_action(action) we see, originally it was
 either UTRACE_SIGNAL_HANDLER or UTRACE_SIGNAL_REPORT but another engine
 returned, say, UTRACE_SIGNAL_DELIVER/UTRACE_SIGNAL_IGN to deliver/stop.

Right.  But this is in fact just the same for either
UTRACE_SIGNAL_REPORT or UTRACE_SIGNAL_HANDLER.

   Not sure about UTRACE_SIGNAL_HOLD, but this is very unlikely race.
 
  You probably don't want to use that, but I'm not entirely sure.  ptrace
  doesn't use it, and the signal interception model is pretty much the same.
 
 Yes, agreed. But there is one corner case. Until we generalize
 utrace_stop()-ptrace_notify_stop() hook, the tracee can be reported as
 stopped to gdb, but before it actually stops it can recieve a signal.

I don't follow this.  If we're stopping, then we don't receive
(dequeue) any other signal until we get resumed.  That should be fine
from gdb's point of view.  The next signal is just pending for later.
The signal that we just reported was a) in fact reported to gdb and
b) is still sitting in the siginfo_t on stack and the engine can
examine/modify that before letting the thread resume.  (The kerneldoc
paragraph about @report_signal in utrace.h makes this guarantee.)


Thanks,
Roland

Re: Tracing with utrace, some questions

2010-10-10 Thread Roland McGrath

 1. From what I've read in the DocBook pages I've figured out that I have
 to have two report entries. One for syscall_entry and one for
 syscall_exit. On syscall_entry I should use syscall_get_nr() and check
 if this number is equal to __NR_socket and return UTRACE_SYSCALL_ABORT
 in that case and on syscall_exit, I need to call syscall_rollback() to
 rollback the registers if utrace_syscall_action(action) returns
 UTRACE_SYSCALL_ABORT. Is this correct?

The report_syscall_entry hook is the only one you need to prevent the
system call from running.  If it returns UTRACE_SYSCALL_ABORT, then the
system call will fail with the ENOSYS error code.  You only need to use a
report_syscall_exit hook if you want the registers the user process sees
after attempting the system call to be different from that.

 2. First I've read the documentation from Roland's page and figured out
 it's out of date. report_syscall_entry callback used to have a struct
 task_struct argument but now it doesn't. How should I get a struct
 task_struct to pass to syscall_get_nr? Am I supposed to keep a
 reference to the struct pid I used in init_module() and use
 pid_task(pid, PIDTYPE_PID) or should I use find_get_pid() just as I used
 in the init_module()?

Those callbacks are made in the affected thread itself.  
So you can just use the magic variable 'current'.

I've updated the web copy of the documentation.  If you install the
kernel-doc rpm on a Fedora system, that contains the version of this same
documentation that goes with the kernel you have.

 3. In the report_syscall_exit callback, is the struct pt_regs argument
 there so that the user can directly pass it to syscall_rollback() or do
 I have to save the registers I had in report_syscall_entry() callback
 and use them instead?

You can just pass that pointer directly.  In fact, the same pointer to
'struct pt_regs' will be passed to both callbacks.  This is always the same
pointer that 'task_pt_regs(current)' would return, just made quicker to
access by having the argument to the callbacks.

 4. __NR_socket is available on some architectures and it's implemented
 on top of __NR_socketcall on others. I'm running this example on x86_64.
 How should my module figure out which mode the target process is running
 in? I suppose this is related to the CS register. 

There are several ways to go about this, depending on how arch-specific you
want to make it.  On x86-64 it's possible for 32-bit processes to make
64-bit system calls and vice versa, though normal user programs will never
actually do this.  Perhaps the easiest and cleanest way, that both covers
this arcane possibility and also works on other architectures, is to call
is_compat_task() (under #ifdef CONFIG_COMPAT).

 Having figured that how am I supposed to include system call numbers from
 both architectures and use __NR_socketcall for 32bit mode and __NR_socket
 for 64bit?

Unfortunately, there is no clean way to refer to the 32-bit syscall numbers
in code inside the 64-bit kernel.  socketcall is not one of the few listed
in asm/ia32_unistd.h, so off hand I think you'd just have to hard-code the
i386 __NR_socketcall value in your module.

 5. Is there any project in the outer-space that does something like
 this, sandboxing or monitoring system calls, from which I can learn
 more?

Renzo Davoli's umview/kmview is just such an animal.
See http://wiki.virtualsquare.org for details.


Thanks,
Roland

Re: gdbstub initial code, v11

2010-09-23 Thread Roland McGrath

  The ones I'm talking about are Z2/Z3 for (data) watchpoints.
 
 Ah, OK, thanks. I'll try to understand how this works.

In theory these will map to uses of the hw_breakpoint interface.


Thanks,
Roland

Re: gdbstub initial code, v11

2010-09-23 Thread Roland McGrath

 I think it would be good to implement a feature that shows how this
 approach is an improvement over the current state of gdb+ptrace or
 gdb+gdbserver.
 
 Exactly what feature this should be... I don't know :-)
 I would imagine something performance-related.

My vague notion was that we'd get it working well enough to have basic
parity with native or gdbserver on some realish test cases, and then just
look at the protocol interaction log to see cases where we could reduce
round-trips between gdb and the stub.  Some of those are bound to entail
protocol extensions and gdb changes to exploit them.  Off hand, the Z cases
might be the only things that won't need that.


Thanks,
Roland

Re: [PATCH] utrace: utrace_reset() should clear TIF_SINGLESTEP if no more engines

2010-09-23 Thread Roland McGrath

 I think we should start with changing utrace_control(DETACH) anyway,
 then try to improve. I'll ressend the one-liner I already showed.

Ok.

 Hmm. I'll try to think more. Right now I don't really understand
 how to do this correctly.

I wasn't immediately sure either.

 OK, finish_callback_report() and utrace_control(DETACH) can set
 TIF_NOTIFY_RESUME. 

Right.  Those utrace_resume has the report.action==UTRACE_RESUME bail-out
case.  So either that would change or detach would also do UTRACE_REPORT.

 But what if there are no more attached engines?
 Looks like, utrace_resume(UTRACE_RESUME) needs to handle this special
 case. And utrace_reset() shouldn't clear task-utrace_flags, otherwise
 utrace_resume/utrace_get_signal won't be called.

Right.  Or else tracehook_notify_resume could call utrace_resume
unconditionally, but I'm not at all sure that is not worse.  The
original theory was that it should always be OK to have some
utrace_flags bits stay set when they are stale, because any kind of
reporting pass that got enabled would hit the report-spurious case
and clean the state up synchronously when it's safe.

 So, probably detach should set TIF_NOTIFY_RESUME, but utrace_reset()
 should do user_disable_single_step() too if no more engines. Confused.

If there are no more engines but the tracee is still running, we still
shouldn't do it there because it still might not be entirely safe.
If the tracee is not stopped, it's only safe to call in current.

 And in fact I don't understand why this is important. When it comes
 to multiracing, any engine can hit the unwanted/unexpected trap
 because another engine can ask for UTRACE_*STEP. 

Right.  An engine earlier in the list could swallow the signal so the
next engines' callbacks didn't see it.  But it doesn't know that some
later engines didn't also ask for stepping.  So there would have to be
some understood convention between engines.  For example, a later
engine could see info-si_signo==SIGTRAP et al and act on that even
when the incoming utrace_signal_action(action)==UTRACE_SIGNAL_IGN.
Of course, that doesn't help a non-stepping engine that is earlier
in the list to know that a later engine will be swallowing the signal.

The original theory on this was that we'd one day stop overloading
user signals for debugger-induced traps.  In some past TODO lists and
postings I referred to extension events.  The idea (in part) was
that things like hardware stepping would generate a special new flavor
of utrace event rather than a real signal that has to be intercepted.
Then engines' callbacks would easily see that this was a debugging
event induced by some engine and ignore it (or more likely, just never
get any callback unless your engine registered interest in stepping).
This would also address the case of asynchronous engine detach just
after a trap has actually hit, where today the SIGTRAP is queued and
then later won't be intercepted by a debugger, and instead kills the
user process.

But we don't have any of that now, and don't yet know if we will
really pursue any big improvements at this API level.

 The only really
 important (I think) case is when the last engine detaches.

That's the most important case, sure.  But in any case that is not
actually racy, we should avoid later spurious traps.

 IOW. Suppose that eninge E does utrace_control(STEP) or its callback
 returns UTRACE_*STEP. If we do not detach this engine, other engines
 will see the trap. 

That's only so if the tracee actually gets back to user mode before we
have another reporting pass.

 So why it is so important to clear X86_EFLAGS_TF if we detach E ?

Perhaps I am worrying too much about it.  The worst thing is if it
could really get stuck.  But that shouldn't be possible if there are
any engines at all, perhaps only any with UTRACE_EVENT(QUIESCE) set.
Worst case, one spurious SIGTRAP will get to a report_signal pass, but
nobody will return UTRACE_*STEP again, so there won't be another.  (Of
course, nobody will swallow that SIGTRAP and so it will terminate the
process first anyway, but that's another problem.)  

What seems important is any non-racy scenarios.  That is, where
perhaps it wasn't properly stopped and E detached asynchronously,
but in practice the tracee was known to be otherwise blocked elsewhere
or something, so the detach should have full effect before it returns
to user mode.  But that is just vague theory off hand.


Thanks,
Roland

Re: [PATCH] utrace: utrace_reset() should clear TIF_SINGLESTEP if no more engines

2010-09-21 Thread Roland McGrath

 Change utrace_reset() to do user_disable_single_step(). Alternatively
 we could change ptrace layer, but I think this is not ptrace specific.

Yes, it's right to fix this in the utrace layer.  
But I'm not sure that's the right place in the code to fix it.

The expectation is that either we'll get to finish_resume_report,
which calls user_*_step, or that utrace_control is resuming us
without any expectation of a report.  For UTRACE_RESUME in that
case, utrace_control calls user_disable_single_step.  So probably
the UTRACE_DETACH case there should just do it to.


Thanks,
Roland

Re: [PATCH] utrace: utrace_reset() should clear TIF_SINGLESTEP if no more engines

2010-09-21 Thread Roland McGrath

 OK, so what should we do for now?

If we can't come to a straightforward answer for the general case
fairly quickly, then we can do the simple change to start with.

 Without the multitracing utrace_resume()-user_disable_single_step()
 fixes the problem. Otherwise we probably need ENGINE_WANTS_SINGLESTEP.

No, no, we don't want that.  We don't need more state.  And, remember,
we really don't need to microoptimize cases when single-step was used
recently--the cost of taking one more single-step trap and percolating
through the signal paths was going to be pretty large anyway.

It's better to have a spurious report (preferably just spurious
TIF_NOTIFY_RESUME with no actual callbacks) following any detach,
or whatever it takes to ensure user_disable_single_step() always
runs if user_enable_*_step did before and there is no engine around
to see a report_quiesce callback pass.  If there is a report_quiesce
or report_signal callback pass where nobody returns UTRACE_*STEP,
then we should never leave stepping enabled when we return to user mode.

 Yes, my concern was running and !current, not exiting. This is OK on
 x86 but user_disable_single_step() is arch specific.

It's not really OK on x86 either, with either SMP or preemption.


Thanks,
Roland

Re: exit_ptrace() ptrace_report_signal() problems

2010-09-17 Thread Roland McGrath

 The problem is, utrace_control(UTRACE_RESUME) can't prevent the stop if
 the tracee has already returned UTRACE_STOP, but utrace_stop() didn't
 take utrace-lock yet.

So you are saying that utrace_barrier does not meet its documented API.
Right?  It says effect ... has been applied.  But that's not true if a
UTRACE_STOP return value will not be cleared by an immediate subsequent
utrace_control(,,UTRACE_RESUME).

 Basically, ptrace_detach_task(sig = -1) should do:
 
   - if we are going to do utrace_control(UTRACE_DETACH), we
 should first instruct the (running) tracee to not report
 a signal, otherwise that signal will be lost.

Right.

   - if the tracee has already reported a signal, we should
 set -resume = UTRACE_DETACH and resume the tracee, like
 explicit detach does.

Right.

 We can check ctx-siginfo != NULL to detect this case.

Ok.

  Especially if it's as I suspect, that we can do
  that without changing the utrace layer.
 
 No, this problem is orthogonal, or I missed something.
 
 Please look at this message
 
   https://www.redhat.com/archives/utrace-devel/2010-June/msg00075.html

Yes, I'd forgotten about that.  We do need to fix utrace_barrier to match
its documented guarantee, or else we cannot rely on it for ptrace.

 In particualar:
 
   Off hand I think it does matter today insofar as it violates the
   documented guarantees of the utrace_barrier API.  If utrace_barrier
   returns 0 it is said to mean that, Any effect of its return value (such
   as %UTRACE_STOP) has already been applied to @engine.  So e.g. if you
   get a wake-up sent by your callback before it returns UTRACE_STOP, and
   then call utrace_barrier followed by utrace_control(,,UTRACE_RESUME),
   then you should be guaranteed that your resumption cleared the
   callback's stop.
 
 Yes, but currently UTRACE_RESUME can't guarantee this. 

From the API perspective I had been thinking in, it's not utrace_control
that's supposed to guarantee it.  It's utrace_barrier that's not
supposed to return yet.  But, that is indeed a sort of inside-out way of
looking at it, really.  What utrace_barrier guarantees is that the
callback bookkeeping is done, and it's not supposed to wait for e.g. the
next engine's callback to run.

 utrace_control(RESUME) should remove ENGINE_STOP like UTRACE_DETACH does

I think you've now talked me into this.  There is no other way that
utrace_barrier can keep its guarantee about the return value effect
without also delaying while other engines' callbacks might run, which
seems much worse.


Thanks,
Roland

Re: ugdb breakpoints

2010-09-14 Thread Roland McGrath

The traditional method is to restore the original instruction replaced by
the breakpoint in text, single-step over that instruction, then restore the
breakpoint in text, then continue.  That method requires all-stop so that
while you are stepping the thread that just hit the breakpoint, you can't
have another thread run past that instruction and miss the breakpoint.

Both this traditional in-place method, and the instruction-copying method,
depend on using single-step.  So stepi has to work before break can work.


Thanks,
Roland

Re: gdbstub initial code, v8

2010-09-10 Thread Roland McGrath

 Please note that last year's gdbstub prototype used kernel uprobes as
 an optional gdb breakpoint implementation (i.e., a backend for the Z
 packets).  When/if the lkml uprobes patches actually get merged, ugdb
 should also use them.

That's something for later, and it's not quite so simple.  If a utrace
engine ever uses uprobes, it probably would need to use the utrace-based
version of uprobes.  If something different goes in upstream, it remains
to be seen how it would interact with utrace, and there would be
specific work required for that.  There are many more issues about that
too.  At any rate, this is all a distraction at the moment, and Oleg
doesn't need any more of those!


Thanks,
Roland

Re: gdbstub initial code, v8 ptrace

2010-09-10 Thread Roland McGrath

 I am a bit confused... OK, ugdb is wrong wrt multitracing.
 UTRACE_SIGNAL_REPORT case shouldn't return UTRACE_STOP | UTRACE_SIGNAL_IGN,
 it should return UTRACE_STOP | UTRACE_SIGNAL_REPORT to keep report-result.

No, UTRACE_SIGNAL_REPORT is not meant for a return value.  Its only use is
in the incoming argument to tell you that a given report_signal call is
standing in for a report_quiesce(0) call.

 But it needs to return UTRACE_SIGNAL_DELIVER?

That's what you return when you want the signal delivered.
When you are stopping the tracee to decide about the signal,
that's not what you want.

UTRACE_STOP | UTRACE_SIGNAL_IGN is correct to not deliver the signal right
now, and stop instead.  If you want to deliver the signal later, then
you'll resume with UTRACE_INTERRUPT to ensure you get back to report_signal
and that can fill in the details and return UTRACE_SIGNAL_DELIVER.

 Probably we can check orig_ka != NULL and treat orig_ka == NULL as
 UTRACE_SIGNAL_REPORT. Hopefully this can't be confused with
 UTRACE_SIGNAL_HANDLER.

I'm not really sure what you mean here.

 Not sure about UTRACE_SIGNAL_HOLD, but this is very unlikely race.

You probably don't want to use that, but I'm not entirely sure.  ptrace
doesn't use it, and the signal interception model is pretty much the same.


Thanks,
Roland

Re: gdbstub initial code, v7

2010-09-10 Thread Roland McGrath

 But I meant another case, when the stopped tracee doesn't have siginfo.
 Currently ugdb just sends this signal to tracee, and then it will be
 reported to gdb. Not sure if this is right or not, I can change this.
 (or perhap this doesn't matter, I dunno).

What do you mean by doesn't have siginfo?  You mean non-signal stops?
What non-signal stops does ugdb report?


Thanks,
Roland

Re: [PATCH 0/3] UTRACE_DETACH fixes

2010-09-10 Thread Roland McGrath

 I misunderstood the UTRACE_INTERRUPT without UTRACE_EVENT(QUIESCE) above
 as if you mean it is possible to get UTRACE_SIGNAL_REPORT without QUIESCE.

No, I meant the opposite: since you won't get UTRACE_SIGNAL_REPORT without
UTRACE_EVENT(QUIESCE), it might seem that UTRACE_INTERRUPT should not be
permitted, as UTRACE_REPORT is not.  But because that is not it's *only*
effect, we don't do permit it even though you won't get UTRACE_SIGNAL_REPORT.

 OK. This means ugdb should set QUIESCE, just to ensure its -report_signal()
 will be called.

If you ever want UTRACE_SIGNAL_REPORT calls, yes.

 OK, please forget. I must admit, this still looks a bit, well, unnatural
 to me. May be it makes sense to add
 
   _UTRACE_EVENT_SIGNAL_REPORT,
   _UTRACE_EVENT_SIGNAL_HANDLER,
 
 into utrace_events. Then ugdb could ask for UTRACE_SIGNAL_REPORT and
 avoid the unnecessary -report_quiesce() calls.

If anything, I think UTRACE_EVENT(RESUME) would make sense, perhaps with a
separate -report_resume callback.  That would request only the cases that
now get report_quiesce(0).  But, it really does not cost much to have 

if (event) return UTRACE_RESUME;

or similar in your report_quiesce callback.  It's beyond refinement of the
utrace API (that we're not doing now), it's just micro-optimization.

 Roland, could you also comment this patch?
 
   https://www.redhat.com/archives/utrace-devel/2010-August/msg00108.html
 
 Again, this looks like a bug to me, but I won't insist if it is not.

Sorry, that is still on my queue.  I haven't forgotten it.  
I just haven't gotten to reviewing it yet because of other
work and it needing a lot of hard thinking.


Thanks,
Roland

Re: gdbstub initial code, v7

2010-09-10 Thread Roland McGrath

 ugdb sets please stop flag and does utrace_control(INTERRUPT). However,
 in unlikely case the tracee can stop before -report_signal() reporting

I don't think this is the right thing to do.  When the intent is explicitly
to interrupt, there is no reason to stop before the interruption is
complete, i.e. report_signal.  If you only stop there, then you can always
process a signal injection with complete flexibility.  The gdb model and
the remote protocol doesn't currently have any concept of requesting a stop
that is not an interruption.


Thanks,
Roland

Re: gdbstub initial code, v7

2010-09-03 Thread Roland McGrath

 Currently GDB does not do anything special, that is if there is siginfo for
 signal SIGUSR1 but one does $C0B (SIGSEGV) does ptrace reset the siginfo or is
 left the SIGUSR1 siginfo for SIGSEGV?

The kernel considers this sloppy behavior on the debugger's part.  If
you inject a different signal, we expect you should PTRACE_SETSIGINFO
to something appropriate, or else that you really didn't care about
the bits being accurate.  If the resumption signal does not match the
siginfo_t.si_signo, then the kernel resets the siginfo as if the
debugger had just used kill with the new signal (i.e. si_pid, si_uid
point to the ptracer).


Thanks,
Roland

Re: [PATCH 0/3] UTRACE_DETACH fixes

2010-09-03 Thread Roland McGrath

 Hmm. I am not sure I understand. If -report_signal != NULL, then
 you can't expect -report_quiesce() after utrace_control(INTERRUPT) ?

Sorry, I used report_quiesce as shorthand for either report_quiesce, or
report_signal with UTRACE_SIGNAL_REPORT.

  So, it is technically kosher enough to use UTRACE_INTERRUPT without
  UTRACE_EVENT(QUIESCE) set, if you really know the situation and are sure
  about all those caveats above.
 
 How? see below.

I mentioned the examples.

  But, it's only UTRACE_EVENT(QUIESCE) that
  gives you any expectation of an extra callback for no event from either
  UTRACE_REPORT or UTRACE_INTERRUPT.
 
 Yes, this is clear too.

Apparently not. ;-)

 So. Now I am even more confused. First of all, ugdb (at least currently)
 does not need report_quiesce() and UTRACE_EVENT(QUIESCE) at all. OK, this
 is not 100% true due to multitracing/etc, lets ignore this. Anyway it
 makes sense to clear QUIESCE after $c to avoid the unnecessary callbacks.
 
 All it needs is -report_signal(). But, the problem is, it is not called
 after utrace_control(INTERRUPT) ! You need QUIESCE to ensure report_signal()
 will be called, and this looks at least strange to me.

QUIESCE is the only event type for no event.  If you only asked for
signal events, then you only get a callback when there is an actual signal
event.  If you want an extra callback before dequeuing any signal, then
that is what QUIESCE is for.

 Once again, I am not asking to change utrace now, but could you explain
 what is wrong with the patch below?

That gives an extra unrequested report_signal callback to every engine that
is only interested in some signal event.  Consider a crash-catcher
engine.  It would only track UTRACE_EVENT(SIGNAL_CORE) (and perhaps CLONE
for its bookkeeping).  This engine never asked for a callback before each
and every attempt to dequeue any signal, which is what you would give it
with your change.


Thanks,
Roland

Re: [PATCH 0/3] UTRACE_DETACH fixes

2010-09-02 Thread Roland McGrath

 But probably we need utrace_detached_ops-report_reap() anyway.
 -report_death can return UTRACE_DETACH, and after that utrace_report_death()
 can skip utrace_reset() and do -report_reap().

Yeah, that's a good point.  There's no good reason to do a special case
check for detached_ops there rather than just having the no-op report_reap
hook.

  If an engine used UTRACE_INTERRUPT without having first set QUIESCE,
  then it has no rightful expectation of getting any callback.
 
 Hmm. This is certainly new to me. Could you confirm?

Well, I didn't say it precisely correctly.  UTRACE_INTERRUPT serves two
purposes.  First, it just interrupts things like a signal would.  You could
use that on its own without even tracing any events at all, just do achieve
fault injection or similar.  Second, it's like UTRACE_REPORT.  

If you do have some other event bits set, then you can expect/rely on
getting those normal events soon if they would otherwise have
happened--i.e., if you know it's blocked in a syscall, and you have
UTRACE_EVENT(SYSCALL_EXIT) set, then you can expect to get a
report_syscall_exit soon.  (But, it's always possible to race with the
syscall finishing on its own, or being interrupted by an outside signal, so
that exit might have come before your utrace_control call if you were not
otherwise synchronizing with your established report_syscall_exit callback.)

As with UTRACE_REPORT, after UTRACE_INTERRUPT you can expect to get some
report_quiesce callback even if there is no other event you were tracing
that happens.  For this to actually happen, you need to have
UTRACE_EVENT(QUIESCE) set.  

So, it is technically kosher enough to use UTRACE_INTERRUPT without
UTRACE_EVENT(QUIESCE) set, if you really know the situation and are sure
about all those caveats above.  But, it's only UTRACE_EVENT(QUIESCE) that
gives you any expectation of an extra callback for no event from either
UTRACE_REPORT or UTRACE_INTERRUPT.  (And since that is the sole possible
purpose of UTRACE_REPORT, we diagnose a utrace_control call like that.)


Thanks,
Roland

Re: gdbstub initial code, v5

2010-08-24 Thread Roland McGrath

 When the main thread exits, gdbserver still exposes it to gdb as
 a running process. It is visible via info threads, you can switch
 to this thread, $Tp or $Hx result in OK as if this thread is alive.
 gdbserver even pretends that $vCont;x:DEAD_THEAD works, although
 this thread obviously can never report something.

This is sort of consistent with the kernel treatment.  The main thread
stays around as a zombie, acting as a moniker for the whole process.  But
indeed that is not actually useful for any thread-granularity control or
information (well, there is the dead thread's usage stats, but that's all).

 I don't think this is really right. This just confuses the user, and
 imho this should be considered like the minor bug.

I tend to agree, but don't think it's a big issue either way, really.

 ugdb doesn't do this. If the main thread exits - it exits like any
 other thread. I played with gdb, it seems to handle this case fine.

Sounds good to me!

   - The exit code (Wxx) can be wrong in mt-case.
 
 The problem is, -report_death can't safely access
 -group_exit_code with kernel  2.6.35. This is
 solveable.

Don't even worry about it.  If there is something trivial to do that makes
it better for earlier kernels, then go ahead.  But if the easy thing to do
gives correct results on =2.6.35 and racily wrong or random results on
older kernels, then we can just live with that.


Thanks,
Roland

Re: [PATCH 3/4] utrace_set_events: fix UTRACE_EVENT(REAP) case

2010-08-19 Thread Roland McGrath

 On 08/16, Roland McGrath wrote:
 
 - It is possible that both -death and -reap are true. In this
   case it is OK to clear UTRACE_EVENT(REAP), but set_events fails.
 
  No, it's not OK to clear it.  Once -reap is set, then the engine's
  ops-report_reap might or might not have been called already.
 
 Afaics - no.
 
 If utrace-death is set (and we check it under utrace-lock) we can
 ignore utrace-reap.
 
 In short, if ops-report_reap can be called before -death is cleared,
 then 2 possible callers of utrace_maybe_reap() can race with each other,
 but this can't happen.

Right.  If -death is still set, it means utrace_report_death is still
running.  So the utrace internals know that the report_reap callbacks
haven't started yet.  But from the utrace API perspective, all you can
know is that release_task() has already been called.  So I think it's
right for the API not to let you clear UTRACE_EVENT(REAP) at that point.

 utrace-death == T means:
 
   - (utrace_flags  _UTRACE_DEATH_EVENTS) == T

Yes.

   - utrace-death was set by utrace_report_death() which will take
 utrace-lock later and clear -death, only then it may call
 ops-report_reap().

Yes.

   - until utrace_report_death() clears -death, _UTRACE_DEATH_EVENTS
 must be set in -utrace_flags, otherwise utrace_maybe_reap(true)
 is buggy.

Yes.

 Note that both utrace-death and _UTRACE_DEATH_EVENTS are cleared
 atomically from utrace-lock pov.

Yes.

 IOW. utrace-death is true, then
 
   IF (utrace-reap)
 
   tracehook_prepare_releas()-utrace_maybe_reap(true) was
   already called, this is how utrace-reap was set.

Meaning release_task() was called--semantically reaping may have begun.

   But utrace_maybe_reap() did nothing and returned.
   We rely on the subsequent utrace_maybe_reap(false) from
   utrace_report_death() - but this can't happen until
   we drop utrace-lock

Correct.

   ELSE
   utrace-reap can't be set until we drop utrace-lock.   

Correct.

 Now that you merged c93fecc925ea7567168f0c94414b9021de2708c5
 get_utrace_lock() must not succeed if utrace-reap == T, this becomes
 a bit off-topic. However, I thought about relaxing the dead check in
 get_utrace_lock(), instead of utrace-reap we could check
 utrace-reap  !utrace-death. In fact, initially I was going to
 do this, but then decided to make the simpler patch for now.

When utrace-reap is set, it means release_task() has been called.  The
API caller has no way to know reaping hasn't already begun--except if
its report_death callback has not returned yet.  But I think that the
API definition of once release_task() has been called (i.e. entered,
maybe not returned yet), any utrace call will get -ESRCH, is a clear
and comprehensible constraint.  We should not open any holes in that.


Thanks,
Roland

Re: [PATCH] fix mark_engine_detached() vs start_callback() race

2010-08-19 Thread Roland McGrath

 This is the minimal temporary ugly fix for now, we should certainly
 cleanup and simplify this logic. The barriers in mark_engine_detached()
 and in start_callback() can't help and should be removed. If we ignore
 utrace_get_signal() we do not even need utrace_detached_quiesce(),
 start_callback() could just do

Agreed.  I applied the patch for now.

 I think in the longer term mark_engine_detached() should not change
 engine-flags at all but add QUIESCE to -utrace_flags. However, this
 breaks utrace_maybe_reap(reap = true) and we should avoid the race
 with finish_callback() which clears -reporting after report_quiesce().

What's the benefit to adding QUIESCE?  If any utrace code gets entered at
all, then any callback run will be able to do the special-case ops check
for detached engines.

 A bit off-topic, but I don't think finish_callback() should check
 engine-ops == utrace_detached_ops before return. Instead we should
 change finish_callback_report() to return the boolean. We shouldn't
 return true without setting report-detaches.

Ok.


Thanks,
Roland

Re: [PATCH?] avoid the unnecessary utrace_resume()-utrace_reset()

2010-08-19 Thread Roland McGrath

 utrace_resume(UTRACE_REPORT) always calls utrace_reset() because
 start_callback() obviously can't clear report-spurious when
 event == 0.
 
 Change start_callback() to correctly clear -spurious in this case.

Ok.

 Note: utrace_control(DETACH) does utrace_do_stop() and sets UTRACE_REPORT
 if the tracee is not stopped. It also does mark_engine_detached() which
 does not set QUIESCE in target-utrace_flags. This means we rely on
 report.spurious which should provoke utrace_reset() from utrace_resume()
 if target-utrace_flags doesn't have QUIESCE. A bit too subtle, imho.

Agreed.  There is no reason utrace_control can't set it in utrace_flags
in its !reset case.

 Also, UTRACE_REPORT can be lost because of UTRACE_INTERRUPT or normal
 signal: utrace_get_signal() checks utrace_flags  UTRACE_EVENT(QUIESCE)
 and returns otherwise. This should be fixed somehow. This check is wrong
 anyway, but it is not clear how we can fix the race with DETACH.

I see.  That would be fixed by utrace_control setting it.


Thanks,
Roland

Re: gdbstub initial code, v4

2010-08-19 Thread Roland McGrath

 Note the second attachment, GDBCAT. It is just the simple perl
 script which connects /proc/ugdb to tcp port. It can be used for
 remote debugging via tcp, or with (unpatched) gdb which can't do
 select() on /proc/ugdb. 

bash$ nc -l 2000  /proc/ugdb 

 Actually, it is more convenient to use
 it in any case, at least for logging purposes.

(gdb) set remote debug 1


Thanks,
Roland

Re: [PATCH 2/4] utrace_set_events: consolidate setting _UTRACE_DEATH_EVENTS checks

2010-08-16 Thread Roland McGrath

 Hmm. For unknown reason I do not see this 2/4 patch on utrace-devel,
 strange.
 
 So I am attaching it in case my email was really lost and you didn't
 get it.

Indeed, it did not come through to me or the list.

Thanks,
Roland

Re: [PATCH 2/4] utrace_set_events: consolidate setting _UTRACE_DEATH_EVENTS checks

2010-08-16 Thread Roland McGrath

 utrace_set_events() checks the setting _UTRACE_DEATH_EVENTS case twice,
 and it is not immediately obvious why the first check is needed, and why
 it is not racy (we are checking exit_state without tasklist). The code is
 correct, but looks confusing.

More comments are always good.

 In short: this double check allows to avoid tasklist when utrace_flags
 already has these bits while engine-flags doesn't.
 
 But multitracing is unlikely case, in the likely case if we add
 _UTRACE_DEATH_EVENTS to engine-flags we set these bits in utrace_flags
 too. I think it makes sense to consolidate these checks to make the
 code a bit more understandable.

I don't really agree about unlikely case.  In many uses, the systemtap
task-finder will have a utrace engine on every task in the system, for
example.  Moreover, this is an importantly distinct particular kind of
micro-optimization to be doing here: avoiding a system-wide lock.  Any
place that we take tasklist_lock at all, we are introducing a system-wide
slowdown or limitation on scaling, just because of our tracing of one task.
So optimizing the minimize that is qualitatively different and really much
more important than avoiding taking the utrace lock, for example.

But with that note in your mind, I am happy to take this patch now as a
simplification and let reoptimization come back later.


Thanks,
Roland

Re: [PATCH 1/3] get_utrace_lock() must not succeed if utrace-reap == T

2010-08-16 Thread Roland McGrath

 get_utrace_lock() must threat utrace-reap == T as engine-ops == NULL.

Yes, I think you're right.  This requires some changes to the kerneldoc and
utrace.tmpl, because it now says that you get EALREADY if report_reap is
already running.  Now it will be consistent with utrace_control, where you
get ESRCH either if report_reap might already be running or if it's
entirely detached and/or reaped.


Thanks,
Roland

Re: [PATCH 2/3] fix utrace_control(DETACH) utrace-death interaction

2010-08-16 Thread Roland McGrath

 The problem is, utrace_control(DETACH) does nothing and returns
 -EALREADY if utrace-death is set, this is not right. We can and
 should detach in this case, we only should skip utrace_reset() to
 avoid the race with utrace_report_death()-REPORT_CALLBACKS().

This behavior is the original (minimal) synchronization scheme from before
we had utrace_barrier.  See Interlock with final callbacks in
Documentation/DocBook/utrace.tmpl.

The expectation is that your report_death is going to clean up your -data
stuff and then return UTRACE_DETACH.  If utrace_control returns -EALREADY,
then you know that report_death is taking care of things.  I'd imagined
that you'd do something like:

mutex_lock(stuff engine-data points to);
mark in there that we are detaching;
ret = utrace_control(task, engine, UTRACE_DETACH);
if (ret == 0) {
/* detached.  tear our stuff down. */
...
return DONE;
} else if (ret == -EALREADY) {
/* report_death is running, i.e. waiting for our lock */
mutex_unlock(...);
return DONE;
} else ...

Put another way, you can have told your report_death beforehand that it
should return UTRACE_DETACH if it runs before your utrace_control call.


Thanks,
Roland

Re: [PATCH 0/3] UTRACE_DETACH fixes

2010-08-16 Thread Roland McGrath

 Whatever we do, start_callback() can see the old engine-flags but
 the new -ops = utrace_detached_ops. Just suppose that the caller
 of UTRACE_DETACH is interrupted right after setting engine-ops.

 * it can check the old flags before using the old ops, or check the old
 * flags before using the new ops, or check the new flags before using the
 * new ops, but can never check the new flags before using the old ops.

 Or do you think I miss something and this is false alarm?

 * Hence, utrace_detached_ops might be used with any old flags in place.
 * It has report_quiesce() and report_reap() callbacks to handle all cases.

report_reap is covered with the utrace_detached_reap() stub.  Every other
callback uses start_callback(), i.e. calls report_quiesce first.
utrace_detached_quiesce() returns UTRACE_DETACH, so start_callback() will
return NULL.

 [...] but I can no longer think properly today.

Sleep well. :-)


Thanks,
Roland

Re: problems with v3

2010-08-13 Thread Roland McGrath

 I don't know this area well, but considering that ser-unix.c is just
 chock full of tty-related goo, I think it is probably important for
 something.  My impression is that this API is not just used for target
 communication but also for manipulating gdb's own terminal.

Ah, I see.

 I've appended the patch I came up with.  I have not tried it at all, but
 it should all be pretty obvious.

Yeah, that seems like it should be reasonable, i.e. more or less like what
I'd figured it would do if it didn't have all these different backends.

However, I think the test you probably want is !S_ISCHR.
I believe that /proc/ugdb as is stats as S_ISREG, not S_ISFIFO.
Actually, better yet, just make it !isatty (fd).
Another pseudodevice that also behaves more like a socket than
like a tty might be S_ISCHR, but only a tty isatty.


Thanks,
Roland

Re: problems with v3 (Was: gdbstub initial code, v3)

2010-08-13 Thread Roland McGrath

 I seem to understand the problem(s). I am a bit surprized. Basically
 I have the questions about almost every utrace_ function. I'll try
 to recheck and summarize my concerns tomorrow with a fresh head, I
 hope I missed something.

Ok.  That stuff about pure kernel implementation issues doesn't need
to go to the archer list and Tom, only to utrace-devel.


Thanks,
Roland

Re: gdbstub initial code, v3

2010-08-12 Thread Roland McGrath

 Indeed, gdb sees that this fd is not pipe/tcp and uses the hardwire
 serial_ops, but hardwire_readchar() doesn't play well with select().
 
 Please teach gdb to use poll/select ?

If it makes it easier, could use:

bash$ nc -l -U /tmp/socket  /proc/ugdb 
(gdb) target remote |nc -U /tmp/socket

for the moment.  Silly of course, but just not to be blocked on cleaning up
gdb's serial-device handling to be more nicely nonblocking.


Thanks,
Roland

Re: problems with v3

2010-08-12 Thread Roland McGrath

I don't really know the gdb code, but I'm surprised it really has multiple
different serial backends.  I would have thought that after opening the
fd, you would treat them all about the same.  If tcsetattr et al work on
it, then you set the baud rate and whatever, if they say ENOTTY then fine.
It might make sense to default to a short read timeout for an actual serial
port (or UDP port), which might be unplugged or the other end dead, or
whatever; and to an infinite timeout for a non-tty, which should more
presumably have its fd shut down and read EOF or EPIPE or whatever when the
stub goes away, and otherwise perhaps the stub is just taking that long.
But aside from that, I don't know why you wouldn't treat all kinds of
remote protocol fd's the same wrt select/poll and blocking/nonblocking
reads/writes, be they serial-port fds, pipe socketpair sockets, network
sockets, or fds to a new magic file that pretends to be a tty or a socket
or whatever (or even a disk file of canned response playback!).


Thanks,
Roland

Re: gdbstub initial code

2010-07-19 Thread Roland McGrath

 At first I tried to support both multiprocess and !multiprocess
 modes, but this needs too many unnecessary code which I'd like
 to avoid, at least now. So this version requires multiprocess+
 in qSupported, otherwise target extended-remote fails.
 
 Please let me know if we need both modes.

No, that is fine.  We aren't trying to make it a universal implementation
of the gdb remote protocol in all variants to work with all gdb versions of
the past.  We're just trying to make it work well with current/future gdb.


Thanks,
Roland

Re: gdbstub initial code

2010-07-16 Thread Roland McGrath

 Note that currently I am not even trying to do something meaningful
 with utrace. My only goal for now is to implement the very basic things
 like attach/detach/stop/cont/exit correctly from the remote protocol
 pov. And I want to do this rightly, then we will discuss utrace
 issues.

Ok.  I think you might need select to work on the pseudofile for gdb to be
happy.  I'm not sure on the situation about that.


Thanks,
Roland

Re: gdbstub initial code

2010-07-13 Thread Roland McGrath

 Given these requirements, and given that the 'new' uprobes is close to
 -tip, would it be more useful to pursue an alternate syscall approach
 rather than gdbstub?

Feel free to pursue whatever you like.  For our own time allocation, we
see an effort along those lines now as a distraction from work that will
really make a qualitative difference in the debugging experience.  Any
new interface is an instant source of endless discussions (at best)
about many details that are ultimately trivial in the greater scheme of
things.  Aside from flaming its details a priori, no new interface is of
any interest to anyone unless its use is integrated into real, non-toy
userland debugging tools and it enables their delivery of qualitatively
significant, new or better aspects of the debugging experience.  

Starting with a whole new interface inevitably involves spending most of
the time on the combination of LKML flames about the interface trivia
and work on toy userland libraries and utilities to demonstrate using
the new kernel features.  That's a whole lot more time and effort and
friction to come around to doing the toy version of what real userland
debugging tools do today, and maybe then start on doing anything that's
actually new or different beyond cleaning up pet-peeve interface trivia,
if you don't get too side-tracked filling in practical holes in your toy
tools along the way first.

What Oleg is embarking on now is a prototyping exercise.  We're not
trying to find a new kind of backwards to bend over to have upstream
people like any new interface layers.  We're trying to get quickly to
the experimental baseline from which we can try to come up with some of
those qualitatively significant new things.  That means having a full,
adult-sized, real-world debugging tool plugged into new and unencumbered
kernel code paths and doing approximately its normal thing at least as
well (approximately) as normal.  In the end, more of the work is on the
userland side (or in fritter about what the user-kernel interface
details should be) than the actual guts of the kernel-side work.  We've
chosen the gdb remote protocol as a prototype vehicle because we start
with about 95% complete support in our closest-to-hand real-world tool
(gdb) for that baseline.  (We also happen to have some gdb-hacking
colleagues nearby to help us experiment with anything that might rely on
that remaining 5% or otherwise on teaching gdb new tricks.)  Hence we
hope to get very quickly to that baseline for experimentation.

From there we can start trying out the things that could really make a
big difference in what a debugger tool can do (or how well/fast it can
do them).  What matters for that isn't the little details of encodings
or syscall interfaces, but the large-granularity issues about how the
interface is used, like how many context switches back and forth to the
debugger it takes to get a task done, etc.  IMHO it would be pointless
to try to design any new interface before knowing concretely what kinds
of things on the big-idea scale make an important difference to the
actual debugging experience with a real-world tool like gdb.  When we've
shown what key features and characteristics deliver a big tangible
payoff, we can worry about how to formulate new interfaces or extensions
that both achieve those essential goods and meet with upstream tastes.


Thanks,
Roland

Re: gdbstub initial code

2010-07-13 Thread Roland McGrath

 What is the reasoning for selecting /proc/ugdb instead of something like
 /proc/pid/ugdb?

The protocol, and gdb, support dealing with many processes over one control
channel.  For normal the debugger model to work where it can attach to a
process on demand, with your model it would have to open another fd for
every process (or perhaps thread), maintain perhaps thousands of fds, do a
select dance, never be able to pipeline notifications from multiple
processes into a single receiving call in the debugger, etc.  Moreover,
that's just not what the protocol is and we already have a protocol with a
client that we can just support as is.


Thanks,
Roland

Re: gdbstub initial code

2010-07-12 Thread Roland McGrath

 Please see the attachment. Don't take this code seriously, this is
 the early prototype and everything should be rewritten. It barely
 uses utrace, only to stop the target.

You've got to start somewhere!  Thanks, Oleg.
It's great to see this get underway.

 This seems to work, but I had to export access_process_vm().

Yeah, that's a known issue.  We can discuss how to either work around or
change it at some point, but it's just a distraction at the moment.

 I still can't understand what utrace_xxx_pid() buys us, and I still
 think that utrace_prepare_examine() can't protect the task even for
 regset calls.

Please start a separate thread here about each of those issues.  The
first one is fairly simple, but what that means in practice depends on
the resolution of the second question, which is a more complex subject.


Thanks,
Roland

Re: gdbstub initial code

2010-07-12 Thread Roland McGrath

 When I had posted a prototype of a gdbstub which Frank and I had
 worked on.  http://lkml.org/lkml/2009/11/30/173, Peter and Ingo
 showed a preference for a combined gdbstub in kernel, i.e kgdb and the
 newer stub should use only one stub in kernel. Do we have plans to
 handle that?

Their actual idea there largely represents a misunderstanding of the
problem space.  But regardless, it's a distraction from the prototype
work that Oleg is doing now.  The actual possibilities for code sharing
between kgdb and something at all like what we're doing now are quite
small.  It's just not a problem at all to get prototyping progress done
with a new implementation of the fairly trivial gdb remote protocol
decoder, and contemplate consolidation later on.


Thanks,
Roland

Re: [PATCH 0/6] utrace: security problems

2010-07-07 Thread Roland McGrath

As to the unsafe_exec stuff, I'd long figured we would have something just
about like that.  (You might recall that an earlier utrace API had an
unsafe_exec engine callback, which had its own unresolved complications.)

For exec transitions (set-id, file caps, selinux), I'd originally figured
an engine's report_exec could check for changes and decide to detach itself
if appropriate.  We will figure out when we come to it whether that can
really cover all the exec angles or not.  setprocattr is the one other
troublesome wrinkle, which I haven't thought all that much about.

I don't think we need to, nor should, try to tie down the security-related
stuff at the outset.  We can work on prototype engines with the proviso
that they are for root only or for experimentation when one doesn't care
about security issues yet.  When we have at least a couple of different
engines with different access models, we will be better placed to figure
out how to tie in the security issues.

In the long run, the security_ptrace() granularity of hook is probably too
blunt an instrument.  We'll want to contemplate the different kinds of
engines with their different kinds of security-relevant interactions
and decide on security checking models that give the appropriate flexibility.
But it's premature to get into that before we have a bit of an ecosystem of
different sorts of modules to consider concretely.


Thanks,
Roland

Re: [PATCH 0/6] utrace: security problems

2010-07-07 Thread Roland McGrath

  For exec transitions (set-id, file caps, selinux), I'd originally figured
  an engine's report_exec could check for changes and decide to detach itself
  if appropriate.
 
 No, it can't. At this point S_ISUID/S_ISGID exid's were already dropped,
 or exec can fail before before tracehook_report_exec().

If an exec fails, nothing changes and there is no security-relevant event
to take notice of.  I don't really follow your other comment.  But ...

 Yes, agreed, let's forget this for now.

Indeed.

 The only question: do you think the trivial 1st patch is correct?

The one that just adds a macro defined to another existing macro?
Any change that preprocesses out to the same code is correct, sure...


Thanks,
Roland

Re: Possible problem with utrace_control

2010-06-23 Thread Roland McGrath

 OK. But then perhaps UTRACE_STICKY_STOP makes sense even without the
 races we discussed. It can simplify the cooperation in this case.

The only cooperation methods we should consider are those that cover all
their races.

 Yes. Let's consider the concrete example. The tracee is going to
 stop and calls utrace_stop(). Before it takes utrace-lock and
 sets state = TASK_TRACED, the debugger does utrace_control(DETACH).
 
 In this case utrace_stop() shouldn't stop, otherwise nobody will
 ever wake it up. That is why we clear this bit in -utrace_flags
 to provoke utrace_reset() which will check carefully the tracee
 has an engine with ENGINE_STOP.

Right.  I agree this race seems to apply to all kinds of resumption, not
just detach.  The question for the non-detach cases is what kind of
guarantee the utrace API is claiming to provide.  For detach it matters
strongly because the loser of the race is gone and there is no way for
it to be responsible for cleaning up the fallout.

For non-detach cases, the picture is more hazy.  The conclusions on what
matters might be different if we have utrace_wake_up() or equivalent.

Off hand I think it does matter today insofar as it violates the
documented guarantees of the utrace_barrier API.  If utrace_barrier
returns 0 it is said to mean that, Any effect of its return value (such
as %UTRACE_STOP) has already been applied to @engine.  So e.g. if you
get a wake-up sent by your callback before it returns UTRACE_STOP, and
then call utrace_barrier followed by utrace_control(,,UTRACE_RESUME),
then you should be guaranteed that your resumption cleared the
callback's stop.

Another example, you call utrace_set_events and disable some events,
including all the ones for which your callbacks ever return UTRACE_STOP.
If that returns 0, or then you call utrace_barrier and it returns 0,
then you know that either it was already stopped before you called
utrace_set_events, or that it's not stopped now.  In either case, you
know that utrace_control(,,UTRACE_RESUME) ensures that it's no longer
stopped.  (That is, stopped by your engine--it of course might still
be stopped by a different engine.)

  For the new API idea, I was talking about something like a utrace_wake_up()
  call.
 
 Now I don't understand you again...
 
 The 1st question: should the new API help us to kill ptrace_notify_stop()
 in utrace_stop() ?

Ideally yes.  This might still have to be some other kind of special case,
we'll have to figure that out.  That would be survivable if it's necessary.
The wait_chldexit wakeup would be covered by what we're discussing, it's
just another wait_queue_head_t.  The signal sending is more problematic.

But I wanted first to contemplate what API would cover new code cleanly
written.  The dismal old ptrace signal/wait synchronization might always be
a difficult outlying case.

 Hmm... After the previous email I thought that this utrace_wake_up()
 or whatever shouldn't be visible outside of utrace.c.

I'm sorry if I was not clear.  I have been talking entirely about new
utrace API features, not internals.  This is a discussion for your
engine-writer hat more than your utrace-maintainer hat.  I'm only
mentioning any utrace implementation details to elucidate what is
realistic or problematic to consider for the new API features.

 Or. Do you literally mean something like
 
   utrace_wait_for_event();// for debugger
   utrace_wake_up();   // for tracee
 
 which should (at least) cover all races with exit/detach/etc ?

I meant utrace_wake_up(wait_queue_head_t *) used in place of wake_up()
inside a utrace callback.  The waiter (tracer) side would use normal
linux/wait.h interfaces (wait_event macro, etc.).


Thanks,
Roland

Re: Possible problem with utrace_control

2010-06-22 Thread Roland McGrath

  utrace_control(current,,UTRACE_STOP) doesn't really do anything more.
 
 It does utrace_do_stop() which sets UTRACE_REPORT/TIF_NOTIFY_RESUME.
 Harmless, but makes sense to avoid.

Right.  I was talking about the API perspective.  Those internal details
are more or less invisible to a given engine.

 OTOH, I am wondering if we can give more power to utrace_control(STOP).
 Currently debugger can't stop the tracee unless it cooperates. However, if
 the tracee doesn't have UTRACE_EVENT(QUIESCE), then utrace_control(STOP)
 works. This is strange.

Why is it strange?  If you don't have any callback, then all your engine
does is get stopped.  If you have callbacks, then you always get callbacks
before the thread does anything (including stop).  Since your callback's
return value always replaces any other resume action for your engine, you
have to cooperate with yourself in not changing your mind the next time
your code is in charge of the tracee.

 Stupid question: what if we just remove the clear_engine_wants_stop()
 code in finish_callback_report() ?

That would change the API so that a callback's resume action is always
overridden by a utrace_control(,,UTRACE_STOP).  Today the practice from a
third-party debugger thread is to do:

ret = utrace_control(task, engine, UTRACE_STOP);
if (ret == 0) {
  // Already stopped, e.g. was in jctl or another engine's stop.
  // This means we can do our asynchronous examination here.
} else if (ret == -EINPROGRESS) {
  // it will get to report_quiesce soonish (modulo blocked in kernel)
}

If the examination you wanted to do was some predetermined sequence of
fetching and fiddling before resuming, then the report_* callback can just
do all that itself.  For example, fetch the registers for a sample and keep
going; insert some breakpoints and keep going; etc.

Today, the callback that goes with that asynchronous-side code has the
choice to do something and stop or to do something and not stop.  With this
change, it would have to explicitly request not stopping by calling
utrace_control(current, engine, UTRACE_RESUME) inside the callback.  It
would need to use its own bookkeeping above the utrace layer to know that
there was a stop it was supposed to clear (i.e. something the code sample
above would set before calling utrace_control).

This changes the set of races with asynchronous stop/resume scenarios.
But it doesn't remove the asynchronous-resume vs callback-stop race.
In concert with wake_up_tracer-after-stop perhaps it gets better.
We'd have to really think through the races with the new API.

  So this would fix that race.
  ...
  (But note if there is
  another engine that decides to stop anyway, then we'll have used a
  second CPU and/or ping-pong scheduled to run the tracer thread in
  parallel with a very short amount of work in the tracee before it just
  blocks anyway.)
 
 Hmm. Not sure I understand... Yes, the tracee still can stop if another
 tracer demands, but this is fine?

Correct.  When that is what's going to happen (e.g. there is another engine
attached that is a slow stopper like ptrace), then it's suboptimal to have
the scheduler decide to yield the CPU immediately on waking your first
tracer.

 Btw, before I forgot about this. shouldn't utrace_control(UTRACE_RESUME)
 remove ENGINE_STOP from -utrace_flags like DETACH does?

Perhaps so.  My memory is dim on the use of that bit in utrace_flags.  It's
to close certain races, but we'll have to remind each other of the details.

 Heh. it turns out that I misunderstood you in the past, I thought
 you dislike the whole idea of wake_up from utrace_stop().

No, just the over-general hook or the over-specific ptrace hack.
I talked about about a utrace_wake_up() that would work this way.

 Hmm. Not sure... at least I don't immediately see something simple.
 
 Except, just add wait_queue_head into struct utrace. This way utrace_barrier()
 can wait longer than needed if it notices utrace-reporting == engine. Or we
 can add it into utrace_engine, in this case utrace_stop() needs
 list_for_each(engine, utrace-attached).

You're talking about two different things here.  Whatever implementation
details change in utrace_barrier() is not directly apropos.  That would not
change the API-visible details at all, just make utrace_barrier() perform
better.  Let's not get into that before we hash out the new API ideas.

For the new API idea, I was talking about something like a utrace_wake_up()
call.  That might be implemented by a wait_queue_head_t * in utrace_engine,
for example.  The implementation internals could involve a simple
iteration, or involve more hidden bookkeeping.  I'm only talking about the
API idea for the moment.  I only mentioned implementation details at all to
indicate that a restricted API (one wake-up per callback per engine) could
be done with constant storage overhead and thus avoid a whole potential can
of worms--which is the only real reason to

Re: Possible problem with utrace_control

2010-06-21 Thread Roland McGrath

 I think Roland is right. Let's forget about utrace for the moment,
 this code looks like
 
   if (!CONDITION) {
   set_task_state(TASK_XXX);
   schedule();
   }
 
 and it can obviously race with
 
   CONDITION = true;
   wake_up_xxx();

This is Linux in-kernel synchronization 101.
The correct blocks of this sort do:

set_current_state(TASK_UNINTERRUPTIBLE);
while (!CONDITION) {
schedule();
}
set_current_state(TASK_RUNNING);

This is entirely orthogonal to anything having to do with utrace per se.
This pattern race-free because wake_up_* will reset the waiter to
TASK_RUNNING after making CONDITION true.  So in the race where the
while (!CONDITION) clause lets it get into schedule(), that just
returns immediately, after which the second iteration sees CONDITION true.

This is why I recommended using higher-level things like linux/wait.h
facilities, where the correct usage patterns are more obvious.

But the race we're actually talking about here is on the other side of
the coin.

 I am wondering if there is another way to handle such a race...
 Suppose we
 
   - add UTRACE_STOP_XXX
 
   - modify utrace_control(UTRACE_STOP_XXX) to do
 mark_engine_wants_stop() and nothing more.

utrace_control(current,,UTRACE_STOP) doesn't really do anything more.

So a callback doing this closes the old race: UTRACE_STOP is in force
when the tracer side wakes up, since utrace_control-on-current was done
before wake_up_tracer().  But...

   - finish_callback_report(UTRACE_STOP_XXX) does
 
   spin_lock(utrace-lock);
   if (engine_wants_stop(engine))
   action = UTRACE_STOP;
   else
   action = UTRACE_REUME;
   spin_unlock(utrace-lock);

Let's clarify what this means for the people following the utrace API
but not its implementation internals: currently, the callback's return
value overrides the engine's last-set state such as UTRACE_STOP, whether
that came from a previous utrace_control before the callback, a
utrace_control call inside the callback, or from the return value of a
previous callback.

So now we've moved to a different race (or a different instance of the
original race, if you like): after wake_up_tracer(), the tracer can come
and clear our engine's UTRACE_STOP state with its utrace_control call;
but then our callback return value just reestablishes UTRACE_STOP and we
still stop, having lost the asynchronous resumption from the tracer.

So Oleg has an idea for a new kind of callback return value, which I'd call
perhaps UTRACE_NOP.  This would mean to stay in UTRACE_STOP if the last
utrace_control call on this engine used UTRACE_STOP, or the last callback
return value left the engine in that state, otherwise UTRACE_RESUME.

 Then, probably producer could do
 
   utrace_control(current, UTRACE_STOP_XXX);
 
   slot = book_message_in_buffer(buf);
   if (slot  0)
   return UTRACE_STOP_XXX;
 
   ...
   return UTRACE_RESUME;
 
 Even better, UTRACE_STOP_XXX can live outside of UTRACE_RESUME_MASK.

Right, it doesn't belong in enum utrace_resume_action--it doesn't make
sense as an argument for utrace_control, for example.  So it could be
something like a return-value flag bit saying stop if marked for stop,
e.g. UTRACE_SINGLESTEP|UTRACE_STICKY_STOP for single-step, unless marked
to stop, in which case stay stopped.

So this would fix that race.  For the scenario we're discussing here,
that's probably just fine.  The only reason to stop was for the consumer
to run.  If it woke up quickly and ran enough to call utrace_control
while we were still finishing the return of the callback and utrace
bookkeeping, then we just don't stop at all.  (But note if there is
another engine that decides to stop anyway, then we'll have used a
second CPU and/or ping-pong scheduled to run the tracer thread in
parallel with a very short amount of work in the tracee before it just
blocks anyway.)

Now let's consider another scenario.  Take the normal debugger scenario,
i.e. a ptrace implementation done cleanly on the utrace API without the
magic kludges we have today, or a ptrace replacement built cleanly from
scratch in the ideal way, or the pending gdbstub implementation.  Here
what the callback does is wake up the tracer, and return UTRACE_STOP.
What the tracer thread does is block in some fashion until the tracee's
wake_up_tracer(), then do whatever logic (in ptrace, that means return
to user mode and let it make another ptrace system call, but that all
might be quite fast), and decide to examine the thread.  To make sure
its own callback is finished and UTRACE_STOP will be in force, it has to
synchronize with utrace_barrier.  So in maximum parallelism, this means,
having preempted the tracee just now by waking up, we now yield back to
the tracee to finish our callback, other engines' callbacks, and utrace
bookkeeping.

Re: Possible problem with utrace_control

2010-06-17 Thread Roland McGrath

You should use linux/wait.h or similar facilities rather than calling
schedule() and wake_up_process() directly.  This doesn't have anything to
do with utrace, it's just the clean practice for normal kinds of blocking
in the kernel, for a variety of reasons.  That has to do with how your
control thread blocks, which is just the normal set of issues for any
thread in kernel code.  You are using the lowest-level way of doing things,
which is not recommended--that's why various higher-level facilities exist.

The admonition to use utrace facilities for all blocks has to do with when
you make a traced thread block.  You are on the correct path in that regard.

The relevant parts of your scenario match what I described.  The race I
talked about between your control thread's utrace_control() and tracee
callbacks returning UTRACE_STOP explains what you are seeing.


Thanks,
Roland

Re: Possible problem with utrace_control

2010-06-13 Thread Roland McGrath

Thanks for your interest in utrace.

It's correct that passing UTRACE_REPORT to utrace_control should
always ensure you get some report_* callback before the tracee returns
to user mode.

However, I think your use may be susceptible to a race between
resumption by utrace_control, and the UTRACE_STOP done by the
callback.  If your callback looks something like:

wake_up_consumer();
return UTRACE_STOP;

then there is a window when the consumer can wake up and call
utrace_control before that callback has actually returned UTRACE_STOP
and had it processed.  In that event, your UTRACE_REPORT comes before
the UTRACE_STOP, and then you do nothing afterwards to resume it.  If
it ever did resume, it would indeed report (the UTRACE_REPORT is still
pending).  But that doesn't help you.

Off hand, I think there is only one way to address this race.  When
you get the notification to consume, before calling utrace_control,
call utrace_barrier first.  This will block the consumer until the
producer's callback has actually finished and had UTRACE_STOP recorded
for your engine.  At that point, you can be sure that your
utrace_control call will indeed wake up the stopped tracee.

Oleg or others here can double-check my reasoning and help explain the
situation if you share your code for them to see.

Because of this possibility, it might make sense to have utrace_control
return -EINPROGRESS for this situation.  That at least would alert you
that something might be amiss.  Unlike other cases of -EINPROGRESS, this
would not just mean that you need to call utrace_barrier to be sure the
effect has completed.  Rather, it means there is a danger of the
scenario I described above, where the effect you asked for is going to
be trumped by the callback's UTRACE_STOP.  So it doesn't quite fit in
with the other situations--it's not really an indication that you have
to follow up with a synchronization, it's an indication that your
synchronization logic is already wrong.  But at least it would relieve
the mystery of such a situation, assuming you are checking return values
and noticing in debugging.  I'm certainly open to opinions on this API
change.

The need to use utrace_barrier as a second synchronization after the
main wake-up (done by whatever normal means you are using) is rather
clunky, and when it matters (as we suspect it does in the occasional
race in your case) it leads to ping-pong scheduling.  So we would like
to improve the utrace API in this area at some point.  One idea we have
discussed before is a utrace_wake_up call.  (This assumes that some
variant of linux/wait.h wake_up* is what you are using in the guts of
wake_up_consumer.)  This would replace wake_up* for use in a callback,
and simply do the wake-up after all the callbacks are finished (or
perhaps just yours, but probably all), so your return value action is
already recorded.  In fact, it might do the wakeup only after the
UTRACE_STOP is not only recorded but actually performed, so that by the
time your consumer wakes up, the tracee is actually stopped and
available for third-party examination.  (You don't actually care about
that for your case--for your purposes, it would be optimal just to delay
the wakeup until your own callback's return value is recorded.  That
way, if your consumer acts quickly enoug on another CPU, it might even
resume the tracee before it actually yields the processor.)

As you might tell from the complexity of that paragraph, there are many
fuzzy nuances of exactly what that better API might be.  (And I didn't
even delve into the possibility that you're using some synchronization
mechanism other than wake_up* calls on a wait_queue_head_t.)  Hence we
haven't tried to work it out exactly yet.  We hope that seeing a variety
of uses such as yours will help us figure out how best to improve the
utrace API for tracee-callback-to-tracer synchronization handshakes.


Thanks,
Roland

Re: No rule to make target

2010-04-13 Thread Roland McGrath

 obj-m := crash_suspend.o

s/_/-/

Re: ptrace crash on PREEMPT 2.6.18-128.7.1.el5 kernel

2010-02-26 Thread Roland McGrath

 But I'm happy to have tracked it down to the utrace-based ptrace
 emulation, and was mostly just interested in knowing if preempt and
 utrace are fundamentally incompatible on x86_64, or something like
 that. I'll fight through the 2.6.18-164 issues instead, since the
 ptrace problem doesn't seem to be happening on that version.

It is probably the case that the RHEL5 utrace code cannot easily be made to
work reliably with CONFIG_PREEMPT.


Thanks,
Roland

Re: linux-next: add utrace tree

2010-01-20 Thread Roland McGrath

 Frank, please be clear as to which branch you want included (master or
 utrace-ptrace).  Also note that neither of those branches matches what
 was posted in the sense that they both have lots of history and merges
 not represented in the patches.  (I assume that they do produce the same
 final source tree, though).

Yes, the trees do match.  I certainly never expected our ancient git
history to get merged in directly upstream.  I've made a new branch on:
git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-utrace.git
called:
next/master
(Actually it's on master.kernel.org and the public mirror is being a little
slow as I write this.)

This starts from v2.6.33-rc4 and then has commits for the 7 patches that
Oleg posted in December.  Beyond that, we've added one follow-on patch to
fix a bug Oleg just tracked down (Oleg will post that patch soon).  And
I've added one more commit with a MAINTAINERS update, shown below.

You can also find the same stuff from the series file and patch files in:
http://people.redhat.com/utrace/2.6-next/

If it makes things easier for linux-next to have this git branch either
rebased or merged from a different fork point, please let me know.


Thanks,
Roland

---
[PATCH] MAINTAINERS: add utrace

This updates the ptrace entry to cover utrace too.
They are part of the same maintenance effort.
Also add the utrace mailing list.

Signed-off-by: Roland McGrath rol...@redhat.com
---
 MAINTAINERS |7 +--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index c8f47bf..8da2a0a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4375,15 +4375,18 @@ M:  Jim Paris j...@jtan.com
 L: cbe-oss-...@ozlabs.org
 S: Maintained
 
-PTRACE SUPPORT
+PTRACE AND UTRACE SUPPORT
 M: Roland McGrath rol...@redhat.com
 M: Oleg Nesterov o...@redhat.com
+L: utrace-devel@redhat.com
 S: Maintained
 F: include/asm-generic/syscall.h
 F: include/linux/ptrace.h
 F: include/linux/regset.h
 F: include/linux/tracehook.h
-F: kernel/ptrace.c
+F: include/linux/utrace.h
+F: kernel/ptrace*
+F: kernel/utrace*
 
 PVRUSB2 VIDEO4LINUX DRIVER
 M: Mike Isely is...@pobox.com

Re: PTRACE_SYSCALL_ENTRY/EXIT

2010-01-18 Thread Roland McGrath

We don't have any particular plans to extend the ptrace interface.  
I strongly doubt we would even try to do anything like that until the
utrace-based ptrace interface is merged into Linux and the old ptrace
implementation gone.

In general, we are not looking for extensions to the ptrace interface.
It is an ugly hairball already and we are more interested in having 
the utrace API layer available inside the kernel and then embarking on
new and sane userland interfaces instead of shoehorning more into ptrace.

That said, some particular kinds of simple enhancements to ptrace are
really quite trivial to implement in the new utrace-based implementation.
The particular area you suggest is one of these.

What I would expect is not new variants of the one-shot interface like
PTRACE_SYSCALL.  Rather, I would envision new PTRACE_O_* options to enable
syscall entry and exit tracing analogous to the PTRACE_EVENT_* events you
can now enable.  This means that you make one PTRACE_SETOPTIONS call to
enable the set of events you want, and then use plain PTRACE_CONT (or
whatever).

If you really want exactly the one-shot behavior instead, then we could
consider that.  But, like I said, we are not looking to add much in the
way of new wrinkles to the dismal ptrace userland interface.


Thanks,
Roland

Re: s390 user_enable_single_step() (Was: odd utrace testing results on s390x)

2010-01-08 Thread Roland McGrath

 Ok, I changed the wording slightly:
 
 Clear the TIF_SINGLE_STEP bit in copy_thread. The new process did not get
 a PER event of its own. It is wrong deliver a SIGTRAP that was meant for
 the parent process.

Very good!

Thanks,
Roland

Re: s390 user_enable_single_step() (Was: odd utrace testing results on s390x)

2010-01-07 Thread Roland McGrath

  Right.  That means we should leave the status quo of not clearing
  TIF_SINGLE_STEP in user_disable_single_step.
 
 Ok, although it seems a bit strange not to do it. Perhaps I should add a
 comment about it.

It doesn't seem strange to me, but then I've just been through all this.
user_*_step is about what the task will do next.  TIF_SINGLE_STEP is about
what the task has done recently.  Of course more good comments always help.
I might be inclined to change the name of TIF_SINGLE_STEP so that its true
purpose is more obvious.  

AFAICT, in fact it is not even about single-step per se.  It means some PER
trap happened and should produce SIGTRAP.  Don't you get the same thing if
you haven't used single-step, but instead used PTRACE_POKEUSR to set up
per_struct with bits that say to trigger for some other reason?

How about calling it TIF_PER_PENDING?

 So after everthing has been converted to utrace we always will load the
 control registers in FixPerRegisters.

Right.  (This could well still change in the future.  But that's how it is
in utrace now.  And regardless of possible future implementation changes it
will always be the case that sometimes it will be called on current.)

 We could use the same compare of the control registers as the code in
 __switch_to. See below.

Yes, sounds good.

 The PSW_MASK_PER is the global PER enablement switch, the PER_EM_MASK
 bits enable the different PER events. We check for the PER_EM_MASK bits
 because it is easier to access in __switch_to. The return PSW is stored
 in the pt_regs structure, we would have to get a pointer to it (what
 regs = task_pt_regs(taks) does in FixPerRegisters). In
 FixPerRegisters we can as well use the PSW_MASK_PER bit.

I see.  Thanks for the explanation.


Thanks,
Roland

Re: s390 user_enable_single_step() (Was: odd utrace testing results on s390x)

2010-01-07 Thread Roland McGrath

 Hmm, command for tracehook_signal_handler say this for stepping:
 @stepping:   nonzero if debugger single-step or block-step in
 use

Are you saying you would like me to clarify that wording somehow?  It's
meant to be implicit that the arch code is not doing any special fakery
about single-step for signal handlers, only processing real single-step
traps (and faking them for a syscall instruction if the arch requires
that).  No other arch does it, so it didn't occur to me that s390 would.
Before tracehook some had ptrace_notify calls there, and the call to
tracehook_signal_handler replaced that call.

  In ptrace (including utrace-based ptrace), this winds up with sending a
  SIGTRAP.  So when we finally do get out of do_signal and TIF_SINGLE_STEP
  causes a second SIGTRAP, it's already pending and the second one makes no
  difference.
 
 So we have been lucky so far.

Actually, Oleg rightly points out:

 Confused again, perhaps I just misunderstood what you mean...
 
 Without utrace, tracehook_signal_handler() doesn't send SIGTRAP, it
 merely does ptrace_notify(SIGTRAP), this means that
[...]
 even without utrace we can have unexpected SIGTRAP.

That is quite true, and I just misremembered when writing that paragraph.

So indeed we have been lucky, but it's not the luck of the problem not
happening on s390, but the luck of nobody ever caring. :-)

 Ok, so with the full utrace the semantics of tracehook_signal_handler
 is more than just causing a SIGTRAP. It is an indication for a signal
 AND a SIGTRAP if single-stepping is active. 

In short, it is the indication of a signal handler having been set up, just
like its kerneldoc description says.  Whatever that should mean to tracing
(SIGTRAP or otherwise) is in the purview of the generic tracing layer, not
the arch layer.

 To make both cases work we
 should stop setting TIF_SINGLE_STEP in do_signal and pass
 current-thread.per_info.single_step to tracehook_signal_handler
 instead of test_thread_flag(TIF_SINGLE_STEP).

Correct.


Thanks,
Roland

Re: s390 user_enable_single_step() (Was: odd utrace testing results on s390x)

2010-01-07 Thread Roland McGrath

 Clear the TIF_SINGLE_STEP bit in copy_thread. If the new process is
 not auto-attached by the tracer it is wrong to delivere SIGTRAP to
 the new process.

The change is right, but this log entry is confusing.  auto-attached has
nothing to do with it, nor does anything about tracing the new process or
not.  The new process has not experienced a PER trap of its own, so it is
wrong to deliver a SIGTRAP that is meant for its creator.


Thanks,
Roland

Re: s390 user_enable_single_step() (Was: odd utrace testing results on s390x)

2010-01-07 Thread Roland McGrath

 I am confused as well. Yes, I thought about regs-psw.mask change too,
 but I don't understand why it helps..
[...]
 But. Acoording to the testing I did (unless I did something wrong
 again) this patch doesn't make any difference in this particular
 case. 6580807da14c423f0d0a708108e6df6ebc8bc83d does.

Those results are quite mysterious to me.  
I think we'll have to get Martin to sort it out definitively.


Thanks,
Roland

Re: s390 user_enable_single_step() (Was: odd utrace testing results on s390x)

2010-01-06 Thread Roland McGrath

 However, with or without CONFIG_UTRACE, 
 6580807da14c423f0d0a708108e6df6ebc8bc83d
 is needed on s390 too, otherwise the child gets unnecessary traps.

This confuses me.  user_disable_single_step on non-current doesn't do
anything not already done by the memset in copy_thread.  Ooh, except
perhaps it does not clear PSW_MASK_PER.  Maybe that matters.  That's
the only thing I can think of.  Maybe Martin can make sense of it.


Thanks,
Roland

Re: s390 user_enable_single_step() (Was: odd utrace testing results on s390x)

2010-01-04 Thread Roland McGrath

 The PER control registers only get reloaded on task switch. Can you test
 if this patch fixes your problem?

Long ago when I first worked with David Wilder on s390 arch code,
I remember we made this change.  It seems to have been forgotten
in the later rounds of reworking and merging.


Thanks,
Roland

Re: s390 user_enable_single_step() (Was: odd utrace testing results on s390x)

2010-01-04 Thread Roland McGrath

 This probably means that copy_process()-user_disable_single_step()
 is not enough to clear the this task wants single-stepping copied
 from parent.

I would suspect s390's TIF_SINGLE_STEP flag here.  That flag means a
single-step trap occurred.  This is what causes do_single_step to be
called before returning to user mode, rather than the machine trap doing it
directly as is done in the other arch implementations.

If I'm right, then this task wants single-stepping is not the problem,
and that really is fully cleared.  In fact, looking at s390's copy_thread
(arch/s390/kernel/process.c) it clears out all the state that is actually
touched by user_enable_single_step and user_disable_single_step.  So for
s390 the new fork.c call is actually superfluous AFAICT.

The problem is that the copied parent state includes the this task has a
pending single-step to report flag.  IMHO it clearly makes sense for
s390's copy_thread to clear this flag in a new task, which it does not do now.

An alternative to that would be just to make its user_disable_single_step
clear the flag.  That could in theory also have an effect on e.g. the
(authentic) pending step report of a tracee that was stopped with
TIF_SINGLE_STEP set when its tracer detached.  This might be considered a
good thing, but since every other arch posts the SIGTRAP immediately they
all have the equivalent issue and s390 doesn't need to be any better than
they are before we have a generic resolution to the whole subject of
tracer-induced signals (which we won't get into now).  I'm not even sure
from my insufficient reading of the s390 assembly code whether this path is
even possible, i.e. do_signal called before do_single_step.

Martin, I suggest having copy_thread clear TIF_SINGLE_STEP.
That bit is always task-private state that should not be copied.

Btw, given the complexity of FixPerRegisters (and its new additional cost
on task==current), you might want to make user_*_single_step bail out if
per_info.single_step is already set/clear on entry.


Thanks,
Roland

Re: [PATCH 0/7] utrace/ptrace

2009-12-23 Thread Roland McGrath

 Well. I had a lot of technical discussions with Roland about utrace,
 but I never asked him why he created this thing ;) To me, utrace
 looks like vfs. Currently we have the single and very poor filesystem,
 ptrace. Until we add the appropriate layer, we can't expect the
 further improvements is this area.

I think that is an excellent analogy, and it's one I've used before.

Oleg and I have had our hands pretty full just with the infrastructure
layer and with ptrace.  Having this layer in the kernel is what makes
it tractable for a lot of other people to collaborate on new features
in this space, and that's what we want to enable and accelerate.  
Some of those on the CC list have worked and are working on such
things, and I hope they will pipe up about those.

Given the date, I suspect we might not see much from anybody on this
(or anything) until January.  Myself, I expect to be largely offline
for the rest of the year.

As Oleg mentioned, I have a cleanup/reimplementation of seccomp using
utrace.  That is quite a trivial use--it demonstrates how easy the
utrace API makes it to do things like that, in contrast to previous
solutions with arch-specific assembly hacking and so forth.  I can
dust that patch off and post it if anybody cares.

Some other features based on utrace have been floating around for some
time, posted here before.  Those include uprobes, kmview, and the gdb
stub.  I don't which of those are quite ready for merging, but honing
and polishing them gets quite a lot more doable with utrace in the
tree instead of out.


Thanks,
Roland

Re: [PATCH 0/7] utrace/ptrace

2009-12-23 Thread Roland McGrath

 Do you have an estimate or better numbers how the overhead of 
 seccomp-over-utrace compares to the current in-tree seccomp?

I never measured it.  I would estimate that any difference one way or
another is in the noise.  The point of seccomp is to run a process that
almost never makes any system calls.  The only effects of utrace for that
use are on the system call path itself, and the essential effects there
(i.e. taking the tracing path vs the hot path) are the same as what the old
seccomp implementation does.

If you have some example uses of seccomp or something that can serve as a
benchmark for it, I would be glad to measure the difference.


Thanks,
Roland

Re: odd utrace testing results on s390x

2009-12-22 Thread Roland McGrath

 Damn, my fault. I forgot to cc you when I sent the fix for s390 (attached
 below), and I forgot to remind you about this fix when we discussed the
 testing on s390.

That change is upstream for 2.6.33 now.  I'll cherry-picked it into 
the 2.6.32/tracehook branch so it will be in the backport patchset too.


Thanks,
Roland

Re: odd utrace testing results on s390x

2009-12-22 Thread Roland McGrath

 Oh. I am still trying to parse arch/s390/kernel/entry.S to understand
 how can we fix these test-cases. I think I need the help, will continue
 tomorrow.

Martin Schwidefsky schwidef...@de.ibm.com is the s390 arch maintainer.
He is friendly and helpful.  You can ask him for help both with
understanding the intended s390 behavior before, and with understanding the
code paths.  He won't expect any of us to grok s390 assembly. :-)


Thanks,
Roland

Re: new Fedora 13 utrace kernel

2009-12-21 Thread Roland McGrath

 The utrace patch looks suspicious in utrace.h, which cause the compilation 
 failure without CONFIG_UTRACE. I have confirmed that the git tree looks sane.
 
 +static inline void utrace_init_task(struct task_struct *child)
 +{
 +}
 +{
 +}

Oops!  That snafu got fixed on the main branch but I forgot to put the
fix on the 2.6.32/utrace branch too.  Fixed now.


Thanks,
Roland

Re: x86: do_debug PTRACE_SINGLESTEP broken by 08d68323d1f0c34452e614263b212ca556dae47f

2009-12-18 Thread Roland McGrath

 Please find the trivial test-case below. It hangs, because
 PTRACE_SINGLESTEP doesn't trigger the trap.

2.6.33-rc1 x86-64 works for me with either -m64 or -m32 version of that test.

 (not sure this matters, but I did the testing under kvm)

Apparently it does.  You should hack some printks into do_debug() and see
how kvm is differing from real hardware.  (Actually you can probably do
this with a notifier added by a module, not that you are shy about
recompiling!)  

Probably kvm's emulation of the hardware behavior wrt the DR6 bits is not
sufficiently faithful.  Conceivably, kvm is being consistent with some
older hardware and we have encoded assumptions that only newer hardware
meets.  But I'd guess it's just a plain kvm bug.


Thanks,
Roland

Re: x86: do_debug PTRACE_SINGLESTEP broken by 08d68323d1f0c34452e614263b212ca556dae47f

2009-12-17 Thread Roland McGrath

Comparing to the old (2.6.32) logic, I think it might be this (untested).
I also note this is the sole use of get_si_code, seems like it should
just be rolled in here.


Thanks,
Roland


diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 3339917..16a88f5 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -530,7 +530,6 @@ dotraplinkage void __kprobes do_debug(struct pt_regs *regs, 
long error_code)
 {
struct task_struct *tsk = current;
unsigned long dr6;
-   int si_code;
 
get_debugreg(dr6, 6);
 
@@ -569,14 +568,15 @@ dotraplinkage void __kprobes do_debug(struct pt_regs 
*regs, long error_code)
 * We already checked v86 mode above, so we can check for kernel mode
 * by just checking the CPL of CS.
 */
+   dr6 = tsk-thread.debugreg6;
if ((dr6  DR_STEP)  !user_mode(regs)) {
tsk-thread.debugreg6 = ~DR_STEP;
set_tsk_thread_flag(tsk, TIF_SINGLESTEP);
regs-flags = ~X86_EFLAGS_TF;
+   } else if (dr6  (DR_STEP | DR_TRAP_BITS)) {
+   send_sigtrap(tsk, regs, error_code, get_si_code(dr6));
}
-   si_code = get_si_code(tsk-thread.debugreg6);
-   if (tsk-thread.debugreg6  (DR_STEP | DR_TRAP_BITS))
-   send_sigtrap(tsk, regs, error_code, si_code);
+
preempt_conditional_cli(regs);
 
return;

Re: x86: do_debug PTRACE_SINGLESTEP broken by 08d68323d1f0c34452e614263b212ca556dae47f

2009-12-17 Thread Roland McGrath

  +   dr6 = tsk-thread.debugreg6;
 
 why? we have tsk-thread.debugreg6 = dr6 above

Yeah, it's a little superfluous.  Except that the existing code uses
tsk-thread.debugreg6 and dr6 inconsistently.  It only matters either way
if some notifier function might change thread.debugreg6, which I wasn't
sure that none might.  i.e., does/should hw_breakpoint hide/remap the
watchpoint-fired bits when they are not for the same-numbered,
ptrace-installed virtual debugreg?  And does/should kprobes, kgdb, and
whatnot, hide DR_STEP from thread.debugreg6 for a step that's not from
user_enable_single_step?

  if ((dr6  DR_STEP)  !user_mode(regs)) {
  tsk-thread.debugreg6 = ~DR_STEP;
  set_tsk_thread_flag(tsk, TIF_SINGLESTEP);
  regs-flags = ~X86_EFLAGS_TF;
 
 this looks strange... we set TIF_SINGLESTEP but clear X86_EFLAGS_TF

This was the original purpose of TIF_SINGLESTEP from long, long ago.  This
happens when TF was set in user mode and then it did syscall/sysenter so TF
is still set at the first instruction in kernel mode.  TF is cleared from
the interrupted kernel registers so the kernel can resume normally.  In the
original logic, TIF_SINGLESTEP served just to make it turn TF back on when
going to user mode.  Since then we grew the complicated step.c stuff and
it all fits together slightly differently than it did when the original
traps.c path was written.

 can't understand how this change can fix the problem. We should always
 send SIGTRAP if the task returns to user-mode with X86_EFLAGS_TF?

If the debug exception happened in user mode, then we should send SIGTRAP.

In the old (2.6.32) code with its goto-heavy logic the !user_mode(regs)
was goto clear_TF_reenable; and that is:

clear_TF_reenable:
set_tsk_thread_flag(tsk, TIF_SINGLESTEP);
regs-flags = ~X86_EFLAGS_TF;
preempt_conditional_cli(regs);
return;

I thought the new logic was falling through to the send_sigtrap case after
if ((dr6  DR_STEP)  !user_mode(regs)).  But now I see that the subtle
use of dr6 vs tsk-thread.debugreg6 (without comments about it!) meant
that DR_STEP is cleared from tsk-thread.debugreg6 before we test it.

So I guess the idea there is that the !user_mode case would swallow the
step indication but still leave some DR_TRAP_BITS set and so you'd generate
a user SIGTRAP in honor of those (i.e. watchpoint hits).  But I thought the
hardware behavior was that a step will set DR_STEP in DR6 but not clear any
DR_TRAP_BITS set from before, so I'm not sure this can't sometimes send a
SIGTRAP twice for a combination of a watchpoint hit and a delayed step.

 OK. I blindly applied this patch, step-simple still fails.

Yeah, it was a quick reaction to the funny-looking control flow.
But I didn't really investigate what is actually happening.


Thanks,
Roland

Re: [RFC,PATCH 14/14] utrace core

2009-12-16 Thread Roland McGrath

 C does implicit casts from enum to integer types, right? So always using
 u32 here would not impose any extra typing on the user, or am I missing
 something subtle here?

No, that's right.  I had just been thinking of the documentation /
convenience issue.  I figured with u32 people might tend to think they
always had to write utrace_resume_action(action) like you do in
report_signal, or would want it to get the enum so e.g. you can write a
switch and have gcc give those warnings about covering all the enum cases.
But you have convinced me that the consistency of using u32 everywhere is
the what's really easiest to understand.

 I don't mind the sharing of the argument, it just looked odd to have the
 u32/unsigned int/enum thing intermixed, since you care about typing
 length (as good a criteria as any) I'd just be lazy and make everything
 u32 ;-)

That's good enough for me.

 As to the content, can't you accomplish the same thing by processing
 such exclusive parent registration before exposing the child in the
 pid-hash? Right before cgroup_fork_callback() in
 kernel/fork.c:copy_process() seems like the ideal site.

As Oleg mentioned, that would add some complications.  The task is not
really fully set up at that point and the clone might actually still fail.
The final stages where the clone can fail are necessarily inside some
important locks held while making the new task visible for lookup.

One of the key features of the utrace API is that callbacks are called in
uncomplicated contexts so you just don't have to worry about a lot of
strangeness and arcane constraints.  That means making such a callback
while holding any spin lock or suchlike is really out of the question.

So, either we have to make a callback where you suggest, before the task
really exists, or where we do now, after the task is visible to others.  An
unfinished task that doesn't yet have all its pointers in place, and still
might be freed before it could ever run, would add a whole lot of hair to
what the utrace API semantics would be.  If you get the callback that
early, then you can attach to it that early, and then you expect some
callbacks about it actually failing ever to exist.  And meanwhile, you
might have to know what you can and can't try to do with a task that is not
really set up yet.  It just seems like a really bad idea.

Hence, we are stuck with the post-clone callback being really post-clone so
that it's possible for other things in the system to see the new task and
try to utrace_attach it.  But, as I said, we are not actually relying on
having a way to rule that out at the utrace level today.  So I think we can
just take out this hack and reconsider it later when we have an active need.

 Best would be to fix up the utrace-ptrace implementation and get rid of
 those other kludges I think, not sure.. its all a bit involved and I'm
 not at all sure I'm fully aware of all the ptrace bits.

We haven't figured out all the best ways to clean up ptrace the rest of
the way yet.  We'd like to keep improving that incrementally after the
basics of utrace are in the tree.

  [...] Surely you are
  not suggesting that all these callbacks should be made with a spin lock
  held, because that would obviously be quite insane.
 
 Because there can be many engines attached? 

Because it's a callback API.  A callback is allowed to use mutex_lock(),
is expected to be preemptible, etc.  Having an interface where you let
somebody's function unwittingly run with spin locks held, preemption
disabled, and so forth, is just obviously a terrible interface.

 Or in case of utrace_reap() maybe push the spin_lock() into it?

I did a restructuring to make that possible.  IMHO it makes the control
flow a bit less clear.  But it came out a bit smaller in the text, so
I'll go with it.

 I'm well aware that ptrace had some quirky bits in, and this might well
 be one of the crazier parts of it, but to the un-initiated reviewer (me)
 this could have done with a comment explaining exactly why this
 particular site is not worth properly abstracting etc..

We'll add more comments there.

 Not if the comment right above the WARN_ON() says that its a clueless
 caller.. but if you're really worried about it, use something like:
 
 WARN(cond, Dumb-ass caller\n);

Oh, that's much better.  I've made all the WARN_ON's give some text.


Thanks,
Roland

post-2.6.32 utrace

2009-12-16 Thread Roland McGrath

Up until now, the utrace git trees were relative to 2.6.32.  Now Linus has
merged several of Oleg's small patches that were prerequisites for utrace,
those populating my tracehook branch.  So, I've now merged the latest
tree from Linus in, and split out 2.6.32/* branches.

We still have a tracehook branch.  These patches (which might be in -mm?)
are on the tracehook branch and still not merged (listed last to first):

a8f782e export __ptrace_detach() and do_notify_parent_cldstop()
c3473e1 ptrace_signal: check PT_PTRACED before reporting a signal
b396f5e tracehooks: check PT_PTRACED before reporting the single-step
45667dd tracehooks: kill some PT_PTRACED checks

These ones have been merged upstream now, but remain on 2.6.32/tracehook
(the basis for the backport branches 2.6.32/utrace*):

e8a2f23 ptrace: cleanup ptrace_init_task()-ptrace_link() path
a88b467 ptrace: x86: change syscall_trace_leave() to rely on tracehook when 
stepping
e01acf4 ptrace: x86: implement user_single_step_siginfo()
462a57b ptrace: change tracehook_report_syscall_exit() to handle stepping
172590d ptrace: powerpc: implement user_single_step_siginfo()
d63b43d ptrace: introduce user_single_step_siginfo() helper
d4ef551 (upstream) signals: check -group_stop_count after 
tracehook_get_signal()

There is so far no drift between the trunk utrace branch and the
2.6.32/utrace branch.  If we have more tweaks to utrace as part
of the next round of upstream submission, we will try to keep the
2.6.32/utrace backport branch identically updated pretty quickly.

Note this is after various changes discussed in the recent LKML review
thread (CC'd here) that I trust you have all been reading.  This includes
some callback API changes that are trivial but will break all your utrace
engine code today.


Thanks,
Roland

new Fedora 13 utrace kernel

2009-12-16 Thread Roland McGrath

In http://kojipkgs.fedoraproject.org/packages/kernel/2.6.32.1/10.fc13/
right now you can find a Fedora kernel with the latest utrace up to the
minute.  Coming soon to a rawhide near you (should be tomorrow).

This is a 2.6.32.1-based kernel with exactly the 2.6.32/* utrace patches as
you see them today in git and on people.redhat.com, including Oleg's
utrace-based ptrace.

If we have any more tweaks to the utrace code soon, we will build
an updated rawhide kernel shortly thereafter.


Thanks,
Roland

Re: [RFC,PATCH 14/14] utrace core

2009-12-14 Thread Roland McGrath

 On 12/14, Oleg Nesterov wrote:
 
  IOW, we must ensure that if ever clear TIF_NOTIFY_RESUME we must not
  miss -pending_attach, correct? and for this case we have mb() after
  clear_thread_flag(). Perhaps instead we should add mb__after_clear_bit()
  into arch/ hooks, but this needs a lot of arch-dependent changes.

Since it's utrace/tracehook code that relies on the barrier I think it
makes sense to have it in tracehook_notify_resume() or utrace_resume().
The arch requirement is having done clear_thread_flag() beforehand, so the
arch-independent code can reasonably assume whatever semantics that is
guaranteed to have.

 Cough. And I always read this rmb as mb. Even when I changed
 the comment to explain that we need a barrier between clear bit
 and read flags, I didn't notice it is in fact rmb...
 
 I guess we need the trivial fix, Roland?

You're the barrier man, send me what changes it should get.


Thanks,
Roland

Re: [RFC,PATCH 14/14] utrace core

2009-12-14 Thread Roland McGrath

 Yes, I think this is correct. It is fine to miss -pending_attach == T,
 and in any case the new attacher can come right after the check, even
 if it was checked under utrace-lock.

Right.

 It is important that the tracee can't miss, say, UTRACE_REPORT request
 (as you already explained), and every time the tracee clears -resume
 it calls splice_attaching().

Right.

  In the stopped cases, there are lots of locks and barriers and things
  after resuming.  (Oleg?)
 
 Every time the tracee resumes after TASK_TRACED it uses utrace-lock
 to synchronize with utrace_control/etc, it must see any changes.

And TASK_STOPPED?

Please send me patches to add whatever comments would make all this clear
enough to Peter when reading the code.


Thanks,
Roland

Re: Tests Failures on PPC64

2009-12-13 Thread Roland McGrath

 Yes. I straced gdb to be sure it really does PTRACE_SET_DEBUGREF to
 use the hardware watchpoint.
 
 There is something strange though. gdb does PTRACE_SINGLESTEP and only
 then PTRACE_CONT after watch xxx.

powerpc's data breakpoints are before-access, whereas x86's are
after-access.  In x86-speak, it's a fault-type exception rather than a
trap-type.  The only way to actually get the caught load or store to
complete is to clear the DABR, single-step, and then restore it.


Thanks,
Roland

Re: [RFC,PATCH 14/14] utrace core

2009-12-13 Thread Roland McGrath

 All that seems to do is call -release() and kmem_cache_free()s the
 utrace_engine thing, why can't that be done with utrace-lock held?

Calling -release with a lock held is clearly insane, sorry.  It is true
that any engine-writer who does anything like utrace_* calls inside their
release callback is doing things the wrong way.  But guaranteeing that
simple mistakes result in spin-lock deadlocks just seems clearly wrong to
me.  A main point of the utrace API is to make it easier to work in this
space and help you avoid writing the pathological bugs.  Adding picayune
gotchas like this just does not help anyone.  No other API callback is made
holding some internal implementation lock, and making this one the sole
exception seems just obviously ill-advised on its face.  I can't really
imagine what a justification for such an obfuscation would be.

 But yeah, passing that list along does seem like a better solution.

So you find it cleaner to have each caller of utrace_reset take another
output parameter and be followed with the same exact source code duplicated
in each call site, than to have utrace_reset() do the unlock and then the
common code itself.  I guess there is no accounting for taste.  

We try not to get excited about trivia, so on matters like this one we will
do whatever the consensus of gate-keeping reviewers wants.  My patch to
implement your suggestion adds 13 lines of source and 134 bytes of compiled
text (x86-64).  Is that what you prefer?

I'll note that the code as it stands uses the __releases annotation for
sparse, as well as thoroughly documenting the locking details in comments.
I gather that clear explanation of the code is in your eyes no excuse for
ever writing code that requires one to actually read the comments.  I'm not
sure that attitude can ever be satisfied by any code that is nontrivial or
makes any attempts at optimization.


Thanks,
Roland

Re: [RFC,PATCH 14/14] utrace core

2009-12-13 Thread Roland McGrath

I'm sorry for the delay.  I'm picking up with responding to the parts of
your review that I did not include in my first reply.  I appreciate very
much the discussion you've had with Oleg about the issues that I did not
address myself.  I look forward to your replies to my comments in that
first reply, which we have yet to see.

 Seems inconsistent on u32 vs enum utrace_resume_action.
 
 Why force enum utrace_resume_action into a u32?

The first argument to almost all callbacks (all the ones made when the
task is alive) is called action and it's consistent that its low bits
contain an enum utrace_resume_action.  The argument is u32 action in
the report_signal and report_syscall_entry callbacks, where it combines
an enum utrace_resume_action with an enum utrace_{signal,syscall}_action
(respectively).  It would indeed be more consistent to use u32 action
throughout, but it seemed nicer not to have engine writers always
writing utrace_resume_action(action) for all the cases where there are
no other bits in there.

The return value is a mirror of the u32 action argument.  In many
calls, it has only an enum utrace_resume_action in it.  But in some it
combines that with another enum utrace_*_action.  There I went for
consistency (and less typing) in the return type, since it doesn't make
any difference to how you have to write the code in your callback
function.  The main reason I used u32 at all instead of unsigned int
is just its shortness for less typing.

I don't really mind changing these details to whatever people think is
best.  The other people writing code to use the utrace API may have more
opinions than I do.  I guess it could even be OK to make the return
value and action argument always a plain enum utrace_resume_action,
and use a second in/out enum utrace_{signal,syscall}_action *
parameter in those particular calls.  But that does put some more
register pressure and loads/stores into this API.

 Seems inconsistent in the bitfield type, also it feels like that 3 the 7
 and the enum should be more tightly integrated, maybe add:
 
 UTRACE_RESUME_MAX
 
 #define UTRACE_RESUME_BITS (ilog2(UTRACE_RESUME_MAX))
 #define UTRACE_RESUME_MASK ((1  UTRACE_RESUME_BITS) - 1)

Yes, that's a good cleanup.  Thanks.
(ilog2(7) is 2, so ilog2() + 1 is what you meant.)

  +static struct utrace_engine *matching_engine(
[...]
 The function does a search, suggesting the function name ought to have
 something like find or search in it.

Ok, I'll make it find_matching_engine.

 Quite gross this.. can't we key off the
 tracehoook_report_clone_complete() and use a wakeup there?

Yes, we intended to clean this up at some point later.  But doing that
is just a distraction and complication right now so we put it off.  For
this case, however, I suppose we could just punt for the initial version.

This code exists to support the special semantics of calling
utrace_attach_task() from inside the parent's report_clone() callback.
That guarantees the parent that it wins any race with any third thread
calling utrace_attach_task().  This guarantees it will be first attacher
in the callback order, but the general subject of callback order is
already something we think we will want to revisit in the future after
we have more experience with lots of different engines trying to work
together.  It's most meaningful when using the UTRACE_ATTACH_EXCLUSIVE
flag--then you can use UTRACE_ATTACH_EXCLUSIVE|UTRACE_ATTACH_MATCH_OPS
to synchronize with other threads trying to attach the same kind of
engine to a task, and give special priority in that to the engine that
attaches in report_clone() from tracing the parent.

Really I came up with this special semantics originally just with ptrace
in mind.  ptrace is an engine that wants to exclude other tracer threads
attaching asynchronously with the same kind of engine, and that wants to
give special priority on a child to the parent's tracer (to implement
PTRACE_O_TRACECLONE et al).  However, Oleg's current ptrace code still
relies on the old hard-coded locking kludges to exclude the separate
attachers and to magically privilege the auto-attach from the parent's
tracer.  So we are not relying on this special semantics yet.

We could just punt utrace_attach_delay() and remove the associated
documentation about the special rights of report_clone() calling
utrace_attach_task().  If we decide it helps clean things up later when
we finish more cleanup of the ptrace world, then we can add the fancy
semantics back in later.

 Does this really need the inline?

It has one caller and that call is unconditional.

 Asymmetric locking like this is really better not done, and looking at
 the callsites its really no bother to clean that up, arguably even makes
 them saner.

By assymetric you mean that utrace_reap releases a lock (as the
__releases annotation indicates).  As should be obvious from the code, the
unlock is done before the loop that does -report_reap callbacks and
utrace_engine_put() (which can make

Re: s390: stepping PT_PTRACED (Was: utrace failed to compile for s390x)

2009-12-09 Thread Roland McGrath

 But... I tried to understand these check and failed. Why do we need them?
 They look unneeded to me, but of course I know nothing about s390.

It's not specific to s390.  Other arch's have equivalent logic.  As
with all things ptrace, I strongly suspect that they just blindly
copied the logic from some old i386 code and never contemplated the
actual intent.

AFAIK this is just part of some old defensive programming or partial
mitigation of the general problem that we overload the user signal
queue as a queue of debugger-induced hardware traps.  This is some
very incomplete mitigation--the general problem is a known issue we
want to address in the future--but it's the status quo, so better
not to tweak it now in case it might be closing an observable hole
today in some usage pattern that someone might actually experience.

The general problem has many corners and races and we still have
those with or without this check.  But consider e.g. the scenario
where the debugger PTRACE_SINGLESTEP'd into a long-blocking syscall
(read on a dangling pipe, etc.), then the debugger suddenly exits
without doing a proper PTRACE_DETACH.  That is an entirely non-racey
case where TIF_SINGLE_STEP was left set but -ptrace is clear, so
without the check you could get the spurious SIGTRAP killing the
tracee (the once-was-tracee-and-no-longer, that is).

Perhaps nowadays exit_ptrace() leads to ptrace_disable() -
user_disable_single_step() and so the TIF_* bit is clear before
resuming and getting here (or at least with utrace, it leads to
UTRACE_DETACH and eventually to user_disable_single_step()).  But at
least before that (and it looks like with the 2.6.32 ptrace code
still), it makes a difference in this scenario.  So it is still an
open can of worms I don't want us to dive into this week, but at
least this one worm-escape hole should stay as plugged as it has
been these last 10+ years.  (For extra credit, write up that case in
ptrace-tests and make it a KFAIL.)


Thanks,
Roland

Re: step-into-handler.c compilation failure on ppc64

2009-12-05 Thread Roland McGrath

How about this?

--- step-into-handler.c 10 Dec 2008 04:42:43 -0800  1.8
+++ step-into-handler.c 05 Dec 2009 09:18:54 -0800  
@@ -35,6 +35,7 @@
 #include sys/time.h
 #include string.h
 #include stddef.h
+#include stdint.h
 
 #if defined __x86_64__
 #define REGISTER_IP regs.rip
@@ -113,11 +114,11 @@ handler_alrm_get (void)
 {
 #if defined __powerpc64__
   /* ppc64 `handler_alrm' resolves to the function descriptor.  */
-  return *(void **) handler_alrm;
+  return *(void **) (uintptr_t) handler_alrm;
 /* __s390x__ defines both the symbols.  */
 #elif defined __s390__  !defined __s390x__
   /* s390 bit 31 is zero here but I am not sure if it cannot be arbitrary.  */
-  return (void *) (0x7fff  (unsigned long) handler_alrm);
+  return (void *) (0x7fff  (uintptr_t) handler_alrm);
 #else
   return handler_alrm;
 #endif

Re: [PATCH] utrace: don't set -ops = utrace_detached_ops lockless

2009-12-04 Thread Roland McGrath

   I forgot that there is another issue (iirc a bit discussed too).
   finish_callback_report() sets -ops = utrace_detached_ops lockless!
 
  You'll have to remind me why this is a problem.
 
   Re: [PATCH 85] ptrace_attach_task: rely on utrace_barrier(), don't 
 check -ops
   https://www.redhat.com/archives/utrace-devel/2009-October/msg00180.html
 
 We already discussed this, but forgot to finish.

Ah, yes.  I had that message still sitting in my folder to think about
again and reply.

 Do you agree with the patch?

I think so, yes.  It could use some more comments about the importance of
the lock.  I added a comment and merged it in.


Thanks,
Roland

Re: powerpc: syscall_dotrace() retcode (Was: powerpc: fork stepping)

2009-12-01 Thread Roland McGrath

We don't really intend to impose any new requirements on the arch behavior
here.  It's up to the arch folks to decide.  It does simplify life somewhat
on x86 that you can change the registers at the syscall-entry stop and then
after skipping the syscall, those registers will be unchanged from what you
set.  But it's never been that way on every other arch, and it doesn't need
to be.  The arch requirement on the tracehook_report_syscall_entry() return
value handling is that it make the syscall not be run, and that the
register state then left be compatible with using syscall_rollback() at the
tracehook_report_syscall_exit() spot.

As to what you get from ptrace explicitly fiddling with registers at
syscall entry, that has always been arch-specific before and we haven't
done anything to change that situation.  On every arch, there is some way
to change the syscall number to be run, and changing it to a known-bogus
number (e.g. -1) makes sure no syscall runs.  On every arch, it's possible
at the tracehook_report_syscall_exit() spot to change the registers to
exactly whatever you want userland to see.  That's enough as it stands to
make it possible to do whatever you want, some way or other.

If the powerpc maintainers want to change the behavior here, that is fine
by me.  But there is no need for that just to satisfy general ptrace
cleanups (or utrace).  Normal concerns require that no such change break
the ptrace behavior that userland could have relied on in the past.

So off hand I don't see a reason to change at all.  If every arch were to
change so that registers changed at syscall-entry were left unmolested by
aborting the syscall, then that might be a new consistency worth having.
But short of that, I don't really see a benefit.

All this implies that the ptrace-tests case relating to this needs to be
tailored differently for powerpc and each other arch so it expects and
verifies exactly the arch-specific behavior that's been seen in the past.


Thanks,
Roland

Re: clone bug (glibc?) (Was: clone-multi-ptrace test failure)

2009-12-01 Thread Roland McGrath

 So. Any ptrace test which uses clone() is broken, at least on x86_64.

If you use clone() directly then you need to have the code run in that
child be purely under your control.  You can't use miscellaneous libc
calls nor any libpthread calls, only ones you are sure do not require
any thread setup.  Given TLS, that means even using errno or anything
that might set it.  It also means any libc function that might use any
TLS you don't know about, i.e. really anything beyond the pure
computation calls like str*/mem*.  It also means running any dynamic
linker code, such as relying on dynamic linking without LD_BIND_NOW
(or -Wl,-z,now at compile time).  

The only thing that changed about this recently in glibc is that even
more code paths through the dynamic linker now happen to depend on
thread setup.

 Jan, Roland, how should we fix this? We can rewrite the code to use
 pthread_create(), this should be trivial. Unfortunately, libpthread
 is not trivial, it can shadow the problem and complicate the testing.

We should avoid library code more thoroughly, not use more of it.  
As well as being complex, it also varies a lot across systems and
interferes with using the same sources to translate to exact
kernel-level testing across various people's development environments.

I think the best bet is to link with -Wl,-z,now and then minimize the
library code you rely on.  (It really only matters to be extra careful
about that for the code running in the clone child.)  So you can use
syscall() if you are not relying on its error-case behavior--if the
system call fails, the function will set errno, which can rely on the
TLS setup.

 And the stupid question. If I create the subthread via pthread_create(),
 how can I know its tid? I grepped glibc-2.11, and afaics pthread_create
 returns the pointer to struct pthread which has pid_t tid but I can
 not find the helper which returns -tid and struct pthread is not
 exported.

There is no official exported way.  You can use syscall(__NR_gettid).
That kernel concept of a global thread ID number does not exist in
pthreads, it is a detail of the Linux implementation.


Thanks,
Roland

Re: [RFC,PATCH 14/14] utrace core

2009-12-01 Thread Roland McGrath

 Could we just drop the tracehook layer if this finally merged
 and call the low level functions directly?

We can certainly do appropriate streamlining cleanups later, by all
means.  The original purpose of the tracehook layer is simply this:

A person hacking on core kernel code or arch code should not have to
think about all the innards of tracing (ptrace, utrace, or anything
else).  If he comes across a tracehook_* call, he can just read its
kernel-doc for explanation of its parameters, return value, what it
expects about its context (locking et al), and what the semantic
significance of making that particular call is.  If changes to the
core/arch code in question do not require changing any of those
aspects, then said person need not consider tracing issues further.
If a change to a function's calling interface or contextual
assumptions looks warranted, then he knows he should discuss the
details with some tracing-related folks (i.e. find tracehook.h in
MAINTAINERS and thus get to me and Oleg).

Likewise, a person hacking on tracing code should not have to think
about every corner of interaction with the core code or arch code.
Each tracehook_* call's kernel-doc comments say everything that
matters about how and where it's called.  If some of those details
are problematic for what we want to do inside the tracing code, then
we know we have to hash out the details with the maintainers of the
core or arch code that makes those calls.  Otherwise we can keep our
focus on tracing infrastructure without spending time getting lost
in arcane details of the arch or core kernel code.

These two things seem permanently worthwhile to me: that the core
and arch source code use simple function calls without open-coding
any innards of the tracing infrastructure; and that these functions
have clear and complete documentation about their calling interfaces
and context (locking et al).  I find it natural and convenient that
such calls have a common prefix that makes it obvious to any reader
of the core code what subsystem the call relates to.

Beyond those ideas, I certainly don't care at all what the names of
these functions are, what common prefix is used, or in which source
files those declarations and kernel-doc comments appear.


Thanks,
Roland

Re: [RFC,PATCH 0/14] utrace/ptrace

2009-12-01 Thread Roland McGrath

Note to all: I'm on the road this week (having had a holiday last week)
and will be somewhat slow in replying on these threads, but I will be
sure to get to them all.

 Yes, nobody likes 2 implementations. I guess Roland and me hate
 CONFIG_UTRACE much more than anybody else.

:-) We both hate maintaining the old ptrace implementation, that is true!
The other thing we hate is having our work delayed arbitrarily again and
again to wait for the arch maintainers of arch's we don't use ourselves.

Oleg and I have been trying to follow the advice we get on how to get this
work merged in.  When multiple gate-keepers give conflicting advice, we
really don't know what to do next.  Our interest is in whatever path
smooths the merging of our work.  We'd greatly prefer to spend our time
hacking on our new code to make it better or different in cool and useful
ways than to spend more time guessing what order of patches and rejuggling
of the work will fit the changing whims of the next round of review.

We don't want to rush anyone, like uninterested arch maintainers.  We don't
want to break anyone who doesn't care about our work (we do test for ptrace
regressions but of course new code will always have new bugs so some
instances of instability in the transition to a new ptrace implementation
have to be expected no matter how hard we try).  We just don't want to keep
working out of tree.

The advantage of making the new code optional is, obviously, that you can
turn it off.  That is, lagging arch's won't be broken, just unable to turn
it on.  And, anyone who doesn't want to try anything new yet can just turn
it off and not be exposed to new code.

The advantage of making the new code nonoptional is, obviously, that you
can't turn it off.  The old implementation will be gone and we won't have
to maintain it any more (outside some -stable streams for a while).  The
kernel source will be cleaner with no optional old cruft code duplicating
functionality.  All ptrace users will be testing the new code, and so we'll
find its bugs and gain confidence that it works solidly.

Like I've said, we want to do whatever the people want.  My concern about
what Christoph has suggested is that it really seems like an open-ended
delay depending on some arch people who are not even in the conversation.
For anyone either who likes utrace or who is concerned about its details,
the sooner we are working in-tree the sooner we can really hash it out
thoroughly and get to having merged code that works well.  As long as there
is anything unfinished like arch work that we've decided is a gating event,
then the review and hashing out of the new code itself does not seem to get
very serious.  (To wit, this thread is still talking about merge order and
such, another long thread was about incidental trivia, and we've only just
started to see a tiny bit of actual code review today.)


Thanks,
Roland

Re: [RFC,PATCH 14/14] utrace core

2009-12-01 Thread Roland McGrath

 This above text seems inconsistent. First you say report_reap() is the
 final callback and should be used for cleanup, then you say
 report_death() is the penultimate callback but might not always happen
 and people would normally use that as cleanup.

 If we cannot rely on report_death() then I would suggest to simply
 recommend report_reap() as cleanup callback and leave it at that.

I'm sorry the explanation was not clear to you.  I'll try to make it clear
now and then I'd appreciate your suggestions on how the documentation could
be worded better.  (I do appreciate your suggestion here but it's one based
on an idea that's not factual, so I don't think we should follow it.)

There is no unreliable aspect to it.  If you call utrace_set_events() to
ask for report_death() callbacks then you will get that callback for that
task--guaranteed--unless utrace_set_events() returned an error code that
tells you unambiguously that you could not get that callback.

What's true is that report_reap() is the last callback you can ever
get for a given task.  If you had asked for report_death() callbacks,
then you always get report_death() and if you've asked for both these
callbacks then report_death() is always before report_reap().  
(This is a special ordering guarantee, because in actuality the
release_task() call that constitutes reap can happen in the parent
simultaneously with the task's own exit path.)

The one situation in which you cannot get report_death() is when the task
was already dead when you called utrace_set_events().  In that case, it
gives an error code as I mentioned above.  Even in that situation, you can
still ask to enable just the report_reap() callback.  With either of these,
if you enabled it successfully, then you are guaranteed to get it.

When you get report_death() and are not interested in getting report_reap()
afterwards, then you can return UTRACE_DETACH from report_death().  If you
don't detach, then you will get report_reap() later too.  The documentation
mentions using report_death() to detach and clean up because many kinds of
uses will have no interest in report_reap() at all.  The only reason to get
a report_reap() callback is if you are interested in tracking when a task's
(real) parent reaps it with wait* calls, but usually people are only really
interested in a task until it dies.

  +  para
  +There is nothing a kernel module can do to keep a structnamestruct
  +task_struct/structname alive outside of
  +functionrcu_read_lock/function. 
 
 Sure there is, get_task_struct() comes to mind.

__put_task_struct() is not exported to modules and so put_task_struct()
cannot be used by modules.

  +still valid until functionrcu_read_unlock/function.  The
  +infrastructure never holds task references of its own.  
 
 And here you imply its existence.

[I take it this refers to the next quoted bit:]

  Though neither
  +functionrcu_read_lock/function nor any other lock is held while
  +making a callback, it's always guaranteed that the structnamestruct
  +task_struct/structname and the structnamestruct
  +utrace_engine/structname passed as arguments remain valid
  +until the callback function returns.

No, we do not imply that the utrace infrastructure holds any task
reference.  The current task_struct is always live while it's running
and until it's passed to release_task().  The task_struct passed to
callbacks is current.  That's all it means.  

The true facts are that utrace callbacks are all made in the current task
except for report_reap(), which is made at the start of release_task().
So the kernel's core logic is holding a task reference at all times that
utrace callbacks are made.  If you really think it is clearer to explain
that set of facts as utrace holds a reference, then please suggest the
exact wording you have in mind.

 The above seems weird to me at best... why hold a pid around when you
 can keep a ref on the task struct? A pid might get reused leaving you
 with a completely different task than you thought you had.

The *_pid interfaces are only there because put_pid() can be used by
modules while put_task_struct() cannot.  If we can make __put_task_struct()
an exported symbol again, I would see no reason for these *_pid calls.

 Right, except you cannot generally rely on this report_death() thing, so
 a trace engine needs to deal with the report_reap() thing anyway.

This is a false antecedent.  I hope the explanation above made this
subject clear to you.

  +  sect2 id=interlocktitleInterlock with final callbacks/title
[...]
 Hrmm, better mention this earlier, or at least refer to this when
 describing the above cleanup bits.

This explanation is somewhat long and has its own whole section so as to
be thoroughly explicit and clear.  Do you think there should just be a
cross reference here in the earlier mention of report_reap() and
report_death()?  Or do you think

Re: utrace-ptrace gdb testsuite tesults

2009-11-25 Thread Roland McGrath

 In general everything where is a word thread has unstable results and
 nonstop tests are also a bit unstable.

So where exactly is the problem in these cases?  Are the tests overly
timing-sensitive where there is no actual behavior bug?  Or is gdb overly
timing-sensitive where there is no actual kernel bug?  Or is it just
unknown, and might be a kernel bug after all (even an undiagnosed one in
vanilla kernels)?

 There are IMO/hopefully very few cases tested by the gdb testsuite and still
 not covered by the ptrace-testsuite, I even do not much expect we will see
 again a new utrace regression caught by the gdb testsuite  uncaught by the
 ptrace-testsuite.

That's certainly good to hear.  If you are pretty confident about that,
then I am quite happy to consider nonregression on all of ptrace-tests the
sole gating test for kernel changes.  We just don't want to wind up having
other upstream reviewers notice a regression using gdb that we didn't
notice before we submitted a kernel change.

 Please point at some built or easily buildable kernel .rpm first.

http://kojipkgs.fedoraproject.org/scratch/roland/task_1825649/


Thanks,
Roland

http://koji.fedoraproject.org/scratch/roland/task_1825649/

2009-11-23 Thread Roland McGrath

At this URL find built rpms (x86_64 and i686 only) that you can install on
a Fedora 12 (or rawhide, probably) system.  These are the upstream kernel
du jour with the current utrace-ptrace branch code (see rpm changelog for
commit id).  (I tried an f12-flavored build too, but it looks like the
current upstream code does not build on ppc.)

I tried ptrace-tests on this kernel.  step-fork fails as expected, this
kernel doesn't have that (utrace-unrelated) upstream fix.  

On i686, I get no other ptrace-tests failures.

On x86_64, detach-sigkill-race consistently fails in tests/
but succeeds in biarch-tests/.

Hmm, looks like that fails on stock F-12 kernel (i.e. vanilla ptrace) too,
so not a regression.


Thanks,
Roland

Re: Q: UTRACE_SYSCALL_RESUMED logic

2009-11-23 Thread Roland McGrath

 On 11/18, Roland McGrath wrote:
 
   In any case, what is the rationality?
 
  The rationale is that if you see utrace_resume_action(action)==UTRACE_STOP
  in your callback, then you know another engine asked for stop
 
 Yes, but engine can't know if the next one is going to return
 UTRACE_STOP.

You can't know if the next one is going to change the registers and resume,
either.  That's just the general engine order issue.  The only point of
UTRACE_SYSCALL_RESUMED is to bring make it as possible to cooperate with a
stop-modify-resume engine as it already is with a modify-in-callback engine.

 OK, thanks. But shouldn't utrace_report_syscall_entry() reset
 report-action = UTRACE_RESUME after do_report_syscall_entry()?
 If we stop, report-action remains UTRACE_STOP when we do the
 reporting loop.

Yes, fixed.

 It is not clear to me why ptrace_report_syscall_entry() uses
 utrace_syscall_action() under if (UTRACE_SYSCALL_RESUMED).

That is what a callback should do if it doesn't have a specific intent.
Otherwise you clobber the choice made by a previous engine.

 This looks a bit strange because it returns the unconditional
 UTRACE_SYSCALL_RUN below. IOW, if ptrace should obey to another
 engine's request to abort this syscall, the code should use
 utrace_syscall_action() consistently.

Yes, it should not change the incoming action for a ptrace syscall report.

 OTOH, PTRACE_O_SYSEMU always aborts. Not sure I understand how
 different engines can be friendly to each other.

The friendly idea can only apply when an engine intends to be noninvasive
to userland behavior.  i.e., as the lone engine you would not perturb the
default behavior, so as a second engine you should not perturb what the
last engine chose.  When the ptrace engine is doing SYSEMU, it intends to
break the normal userland behavior.  There is just inherently no way to be
noninvasive about it.


Thanks,
Roland

Re: [PATCH 3] ptrace: introduce user_single_step_siginfo() helper

2009-11-22 Thread Roland McGrath

 Is it possible to add si_code and si_addr info 
info-si_code = TRAP_TRACE;
   
info-si_addr = instruction_pointer(regs);

This is exactly what arch-specific versions should do here.
The choice of TRAP_TRACE is an arch detail, not a common default.


Thanks,
Roland

Re: [PATCH 133] stepping, accommodate to utrace-cleanup changes

2009-11-20 Thread Roland McGrath

  The former is e.g. PTRACE_SINGLESTEP while an unrelated engine uses
  UTRACE_EVENT(SYSCALL_ENTRY).  The ptrace engine's report_quiesce returns
  UTRACE_SINGLESTEP.  finish_resume_report() calls user_enable_single_step().
  That seems fine.  Right?
 
 In this case ptrace_report_quiesce(event) returns UTRACE_RESUME, note
 that it does
 
   return event ? UTRACE_RESUME : ctx-resume;
 
 and event == SYSCALL_ENTRY. This looks correct.

With the utrace layer changes we're discussing, we need it to return
UTRACE_SINGLESTEP (i.e. ctx-resume) here.  AFAICT that never hurts
even with today's utrace layer.

  I see.  finish_resume_report() will only do utrace_stop() and then we'll
  go directly into the syscall unless someone used UTRACE_REPORT.
 
 Yes,
 
  utrace_stop() will only set TIF_NOTIFY_RESUME and utrace-resume.  In
  the real resume-to-user cases, that's fine because we'll handle that in
  utrace_resume() or utrace_get_signal() before doing anything else.
 
 Not sure I understand. utrace_stop() will not set TIF_NOTIFY_RESUME?
 
 We are talking about the case when the tracee stops in SYSCALL_ENTER,
 and then the tracer does utrace_control(UTRACE_SINGLESTEP) to resume.
 In this case utrace_control() sets -resume/TIF_NOTIFY_RESUME, yes.

I think you do understand fine.  Yes, that's what it will do.  I was just
recognizing that this isn't enough in the syscall-entry case.

  this
  seems like the right idea for how to get tracehook_report_syscall_exit
  called in the cases where it is in old ptrace.
 
 Afaics yes.
 
 But, what if another engine does utrace_control(UTRACE_INTERRUPT) ?

Hmm.  Yes, I think this is the one case that is unlike all the others.
That is, UTRACE_INTERRUPT at syscall-entry.  In all other cases,
nothing would care about utrace-resume until we get to
get_signal_to_deliver anyway, where the UTRACE_SIGNAL_REPORT pass will
trump any pending resume action anyway so you don't care that you lost
track of it when UTRACE_INTERRUPT came in.

Hmm.  So what does UTRACE_INTERRUPT mean here anyway?  It means that
we should report soon and that signal_pending() should be true until
that report.  So today that means you can get into the syscall with
signal_pending() and depending on the particular call that might cause
a normal blocking path not to block and/or a -EINTR/-ERESTART* return,
but some syscalls will just complete normally.

How about if we say that UTRACE_INTERRUPT at syscall-entry means that
the syscall really doesn't happen at all?  That is, instead, you force
an -ERESTARTNOHAND return and the normal roll-back such that you get
your report_signal callback before the syscall is entered.  That even
seems like a useful utrace API feature, as well as conveniently
smoothing over this quirk in the resume action handling.

My idea here is that this makes it OK to lose UTRACE_SINGLESTEP here
because from the user-mode-centric perspective no instruction has
happened yet.  The original guy who did UTRACE_SINGLESTEP (and perhaps
never cared about syscall tracing) will get a generic report_quiesce
or report_signal at which it needs to reassert UTRACE_SINGLESTEP.

This merits more thought.  For now, let's just leave an XXX comment
about a UTRACE_INTERRUPT effectively swallowing the UTRACE_SINGLESTEP
that should cause utrace_report_syscall_exit to be called.  I think we
can revisit this corner after we have merged up all the rest of it.

 Argh. I meant, from the caller pov, utrace_control(UTRACE_SINGLESTEP)
 does enable_step() immediately, like it did before utrace-cleanup
 changes.

I see.  From the utrace API perspective, I don't think anything changes
here.  In today's code, the resume action can take effect without a
subsequent utrace callback.  So that's the same as it ever was, and where
this actually happens is not something that a utrace caller should know or
can tell.

 IOW, suppose the tracee is stopped, and ptrace does UTRACE_SINGLESTEP.
 The tracee resumes, processes -resume, and does enable_step().
 From the ptrace pov, it looks as if utrace_control() does enable_step()
 before utrace_wakeup().

Sure.

 I meant, every time the tracee stops, it should process -resume
 after wakeup. Looks like, the patch you sent in another thread
 (which adds apply_resume_action()) does something like this, right?

Right.


Thanks,
Roland

Re: [PATCH 134] mv kernel/ptrace.c kernel/ptrace-utrace.c

2009-11-20 Thread Roland McGrath

 I am not sure what is the best way to do these renames but I hope
 this doesn't really matter, utrace-ptrace branch is only for us.

Sure, whatever you want to try is fine.

 The goal is to make it testable with and without CONFIG_UTRACE.

Ok.  We need to get upstream feedback on what they do or don't want like
that.  Some people have definitely said they don't want to see two
implementations in the tree at the same time.  But I can never guess what
they will say next week.  You'll get the joy of hashing that out with them. :-)


Thanks,
Roland

Re: [PATCH 0/4] utrace-resume fixes

2009-11-20 Thread Roland McGrath

 Whatever we do, perhaps it makes sense to apply your patch
 in https://www.redhat.com/archives/utrace-devel/2009-November/msg00109.html
 first and then do further changes?

Ok.  v2.6.32-rc8-245-g3d4f9cf has that.  I'll shelve this 4-patch series
while we keep discussing (one more reply to come from me as of now).  Then
you send me a fresh series done however you think is best given that
discussion.  (So if you want to do more relative to these 4 patches, just
resend them as part of the new series.)


Thanks,
Roland

Re: [PATCH 0/4] utrace-resume fixes

2009-11-20 Thread Roland McGrath

 Somehow I can't really understand this patch. I hope more or less
 I can see what it does, but the resulting code looks even more
 subtle to me.

Well, it was an untested draft and probably needed more comments.

 With this patch, apply_resume_action() is always called after
 utrace_stop(). Well, except for utrace_report_exit(), but I guess
 it could do apply_resume_action() too.

It could, but it just never matters at all.  The only thing that can
possibly be meaningful is UTRACE_INTERRUPT, and for that finish_report has
set TIF_SIGPENDING already anyway.

 Now it is not clear why utrace_control(SINGLESTEP) sets
 TIF_NOTIFY_RESUME, it is not needed after apply_resume_action()
 processes -resume. Yes, we have jctl stops, but we can move
 this code into utrace_finish_stop().

Yes, it is not really needed.  But note that it is actually now possible to
use UTRACE_SINGLESTEP without UTRACE_STOP first--not that it's really
useful, but we don't really have any reason to disallow it.  So in that
case it does make a difference.  We could just do:

@@ -1183,7 +1183,8 @@ int utrace_control(struct task_struct *target,
clear_engine_wants_stop(engine);
if (action  utrace-resume) {
utrace-resume = action;
-   set_notify_resume(target);
+   if (reset)
+   set_notify_resume(target);
}
break;

Since TIF_NOTIFY_RESUME will now always be superfluous when stopped.

 It is not clear to me why apply_resume_action(UTRACE_INTERRUPT) does
 set_tsk_thread_flag(TIF_SIGPENDING). In fact I don't understand why
 apply_resume_action() checks UTRACE_INTERRUPT at all. If it is called
 after utrace_stop(), action == start_report() which never resets
 UTRACE_INTERRUPT.

It's only because this code path is shared with the tail of
finish_resume_report, where it is the only thing processing
UTRACE_INTERRUPT in the real return-to-user cases.  i.e., in paths that
don't call finish_report(), TIF_SIGPENDING won't ever have been set by a
callback return value.  utrace_control does always set it up front, so it
is superfluous when that's how UTRACE_INTERRUPT got set.

 And since we process -resume after stop, it is not clear why we
 set -resume = report-resume_action before stop, apply_resume_action()
 could use min(report-resume_action, utrace-resume). Yes, yes,
 we have the nasty utrace-vfork_stop case, I mean it is not easy to
 understand the logic behind all these -resume changes.

Ok.  Feel free to clean it up if you think it makes things clearer.  To me,
it's just natural to do that MIN operation as soon as you know about it.
utrace_stop() always takes the lock anyway, so it's relatively free.  If
the ambient state is stored early, then e.g. utrace_control() won't bother
to set TIF_SIGPENDING or TIF_NOTIFY_RESUME.

I guess I think the logic is simple: apply a new minimum to -resume as
soon as you know about it.

 And afaics, this can't help to fix the tracehook_report_syscall_exit()
  TIF_SINGLESTEP problems in the multitracing case, UTRACE_INTERRUPT
 destroys UTRACE_SINGLESTEP. Ptrace can reassert it later, but this
 will be too late, the trace has already passed this tracehook.

See the other thread, but what I said there is let's take this case up in
its own thread later.

 I don't understand this skip_notify, utrace_stop() is always called
 with skip_notify == true?

Hmm.  Yes, this patch was unfinished when I sent it because your patches
were crossing paths with what I was doing.  We can skip TIF_NOTIFY_RESUME
when we'll always call apply_resume_action() after utrace_stop() before
user mode (i.e. report_exit doesn't matter).

Since report_exit doesn't matter one way or the other, perhaps it will be
cleaner to roll apply_resume_action() into utrace_stop() in some fashion.
I notice we're now actually using the REPORT() macro only four times, and
one of those is report_exit.  So maybe that and finish_report() could
change somehow too to refactor things.  I'll leave it to you to rework all
those and finish_* to the most useful organization of subroutines.

I think we're mutually clear now on the idea of when to do user_*_step()
calls (i.e. apply_resume_action).


Thanks,
Roland

Re: copy_process utrace_init_task (Was: [PATCH 133] stepping, accommodate to utrace-cleanup changes)

2009-11-18 Thread Roland McGrath

 Found the trivial but nasty problem.

Ah!  Good catch.

I added tracehook_init_task() in my tree.  I don't see much benefit in
sending any tracehook patch upstream for this.  tracehook_init_task()
corresponds to tracehook_free_task(), which is only added by utrace
(and both would just be empty in a separate preparatory patch).

I don't see any reason to fiddle the ptrace_init_task() call.  The
setup it does is all stuff that only matters for teardown done by
release_task() or even earlier in the exit sequence.  So it makes
enough sense that the setup side should happen later as it does now.
In the long run, the ptrace init stuff will all just go away.  Before
then I can't see any rationale for touching it.


Thanks,
Roland

Re: tracehook_report_syscall_exit() PT_PTRACED (Was: [PATCH 133] stepping, accommodate to utrace-cleanup changes)

2009-11-17 Thread Roland McGrath

 but now I think perhaps it would be better to send
 ptrace-change-tracehook_report_syscall_exit-to-handle-stepping_fix
 to akpm right now:
 
   --- a/include/linux/tracehook.h
   +++ b/include/linux/tracehook.h
   @@ -134,7 +134,7 @@ static inline __must_check int tracehook
 */
static inline void tracehook_report_syscall_exit(struct pt_regs *regs, 
 int step)
{
   -   if (step) {
   +   if (step  (task_ptrace(current)  PT_PTRACED)) {
   siginfo_t info;
   user_single_step_siginfo(current, regs, info);
   force_sig_info(SIGTRAP, info, current);
 
 What do you think?

Yes, this makes it consistent with the x86 behavior before the change,
which used tracehook_consider_fatal_signal(current, SIGTRAP) in its test.


Thanks,
Roland

Re: [PATCH] utrace: finish_report() must never set -resume = UTRACE_STOP

2009-11-16 Thread Roland McGrath

 Forgot to mention, your tree lacks these patches we sent upstream:

Right.  I'll merge these into some requirements branch (or maybe just the
existing tracehook branch) and merge that into utrace.


Thanks,
Roland

Re: [PATCH] utrace: utrace_attach_task() forgets to return when -utrace == NULL

2009-11-16 Thread Roland McGrath

 Now, if we race with another task doing utrace_task_alloc() and see
 -utrace != NULL, why should we see the correctly initialized *utrace?
 
 utrace_task_alloc() needs wmb(), and the code like above 
 read_barrier_depends().

Ok.  Please review what I put in (esp. commit c575558) and send patches
relative to the latest tree if that's not right.  Perhaps it is overkill to
do read_barrier_depends() in task_utrace_struct().  But AFAICT if we don't
actually need it all those places, that is only because of some
less-obvious barrier effect with checks on -utrace_flags or something.
Do you think it's not really needed in all those places we extract the
pointer before spin_lock et al?


Thanks,
Roland

1 2 3 4 >

1 - 100 of 378 matches

Mail list logo