Re: [PATCH 3/4] utrace_set_events: fix UTRACE_EVENT(REAP) case

2010-08-26 Thread Oleg Nesterov
On 08/19, Roland McGrath wrote:

  On 08/16, Roland McGrath wrote:
  
  Now that you merged c93fecc925ea7567168f0c94414b9021de2708c5
  get_utrace_lock() must not succeed if utrace-reap == T, this becomes
  a bit off-topic. However, I thought about relaxing the dead check in
  get_utrace_lock(), instead of utrace-reap we could check
  utrace-reap  !utrace-death. In fact, initially I was going to
  do this, but then decided to make the simpler patch for now.

 When utrace-reap is set, it means release_task() has been called.  The
 API caller has no way to know reaping hasn't already begun--except if
 its report_death callback has not returned yet.

No! the task can be already reaped even before -report_death() was
called. The task_struct itself can't go away, but that is all (perhaps
you meant this).

And this is the problem for ugdb. Damn, I knew this, that is why
ugdb_process_exit() doesn't try to read -group_exit_code. Still I
managed to forget that this has other implications.

 But I think that the
 API definition of once release_task() has been called (i.e. entered,
 maybe not returned yet), any utrace call will get -ESRCH, is a clear
 and comprehensible constraint.  We should not open any holes in that.

I am not sure. I had another reason in mind to check reap  !death in
get_utrace_lock(), synchronization with -report_death() callback.

But OK. Let's forget about more utrace changes for now.

Oleg.



Re: [PATCH 2/3] fix utrace_control(DETACH) utrace-death interaction

2010-08-26 Thread Oleg Nesterov
On 08/19, Roland McGrath wrote:

  Wait. It doesn't break this. It only breaks -EALREADY contract. And
  I don't understand why this is bad.

 The -EALREADY contract lets you have a report_death callback that does all
 your cleanup and then returns UTRACE_DETACH.

I think this is possible without the current -EALREADY contract.

 To have that plan, you need a
 way for an asynchronous detach attempt to be excluded once report_death has
 begun, and for that asynchronous caller to know that's the situation.

The problem is, with the current conract detach is never simple if you
have report_death. You always have to synchronize with this callback.

But see below.

 It breaks the documented API.  I am of course open to changing the API.
 But we need to get completely clear on what the new range of plans we'll
 support is, and make the documentation and the implementation match.

Yes. I thought it makes sense to change the API and docs if we can improve
things, before utrace is widely used.

But OK, lets's forget about this.

  Why? What is the point? What makes UTRACE_EVENT(DEATH) so special?
  I do not see the logic at all.

 It is special because it is the last interesting event to a traditional
 user such as ptrace.  Hence it is attractive to write a report_death
 callback that returns UTRACE_DETACH spontaneously, i.e. without an
 asynchronous caller (i.e. debugger thread) having requested the detach.

Sure. And this is still possible.

  If -report_death() does something which the caller of utrace_control()
  should know, then they should take care of synchronization anyway.
  With or without this patch, utrace_control(DEATH) can return 0 after
  -report_death() was already called.

 If your engine's report_death always returns UTRACE_DETACH, then
 utrace_control(UTRACE_DETACH) can never return 0 once report_death
 is about to be called nor any time thereafter.

I don't undestand this. OK, probably I missed something, this doesn't
matter.

 But, the current state of play from the broader perspective is that we
 really have no idea about the future viability of the utrace API.  I don't
 think it makes a lot of sense to put a great deal of effort into perfecting
 the corners of the API much further when we don't really have any good
 reasons to think that this API will be the basis of anything that survives
 in the long run.

Yes. I have to agree, this is good point.

So, lets forget about these changes, at least for now.

 The current focus of work is your ugdb prototype.  If some utrace changes
 make it greatly easier to get that to kind-of-working status, then they are
 worthwhile.

No. I didn't think about ugdb at all. I'll find the workaround for ugdb.
If nothing else, I can use utrace_engine_ops-release(). Or something else.

 If it
 works in practice but has disturbing robustness holes, it is OK to just
 comment that loudly now and move on to the next battle, IMHO.

OK, agreed.

Oleg.



Re: [PATCH 0/3] UTRACE_DETACH fixes

2010-08-26 Thread Oleg Nesterov
On 08/19, Roland McGrath wrote:

  OK, instead of filling utrace_detached_ops{} we can change start_callback()
 
  - if (want  UTRACE_EVENT(QUIESCE)) {
  + if ((want  UTRACE_EVENT(QUIESCE)) || ops == detached_ops) {

 If we're going to have a special-case check for utrace_detached_ops, then
 it can just short-circuit entirely and there is no need for any callbacks
 in utrace_detached_ops.  (Then we might even use NULL or (void *)-1l
 instead of utrace_detached_ops.)  All it needs to do is:

   report-detaches = true;
   utrace-reporting = NULL;
   return NULL;

Yes, I thought about this too, see the changelog.

But probably we need utrace_detached_ops-report_reap() anyway.
-report_death can return UTRACE_DETACH, and after that utrace_report_death()
can skip utrace_reset() and do -report_reap().

  And, in particular, I don't understand this code in utrace_get_signal:
 
  } else if (!(task-utrace_flags  UTRACE_EVENT(QUIESCE))) {
  /*
   * We only got here to clear utrace-signal_handler.
   */
  return -1;
  }
 
  How so? What if we were called because of utrace_control(INTERRUPT)
  and QUIESCE is not set?

 If an engine used UTRACE_INTERRUPT without having first set QUIESCE,
 then it has no rightful expectation of getting any callback.

Hmm. This is certainly new to me. Could you confirm?

If yes, shouldn't we change utrace_control()

- if (action = UTRACE_REPORT  action  UTRACE_RESUME 
+ if (action = UTRACE_INTERRUPT  action  UTRACE_RESUME 
  unlikely(!(engine-flags  UTRACE_EVENT(QUIESCE {

then?

I must admit, I do not understand why INTERRUPT needs QUIESCE.
utrace_get_signal() should respect QUIESCE, this is clear. But why
it is needed?

Oleg.



Re: [PATCH] fix mark_engine_detached() vs start_callback() race

2010-08-26 Thread Oleg Nesterov
On 08/19, Roland McGrath wrote:

  I think in the longer term mark_engine_detached() should not change
  engine-flags at all but add QUIESCE to -utrace_flags. However, this
  breaks utrace_maybe_reap(reap = true) and we should avoid the race
  with finish_callback() which clears -reporting after report_quiesce().

 What's the benefit to adding QUIESCE?  If any utrace code gets entered at
 all, then any callback run will be able to do the special-case ops check
 for detached engines.

Well yes, but otoh it could be the last engine. We shouldn't miss, say,
start_callback() from utrace_resume() (although currently -spurious helps).

Anyway, this needs more thinking, and probably you are right and QUIESCE
is not needed.

Oleg.