Re: Tests Failures on PPC64

2009-12-13 Thread Roland McGrath
 Yes. I straced gdb to be sure it really does PTRACE_SET_DEBUGREF to
 use the hardware watchpoint.
 
 There is something strange though. gdb does PTRACE_SINGLESTEP and only
 then PTRACE_CONT after watch xxx.

powerpc's data breakpoints are before-access, whereas x86's are
after-access.  In x86-speak, it's a fault-type exception rather than a
trap-type.  The only way to actually get the caught load or store to
complete is to clear the DABR, single-step, and then restore it.


Thanks,
Roland



Re: [RFC,PATCH 14/14] utrace core

2009-12-13 Thread Roland McGrath
 All that seems to do is call -release() and kmem_cache_free()s the
 utrace_engine thing, why can't that be done with utrace-lock held?

Calling -release with a lock held is clearly insane, sorry.  It is true
that any engine-writer who does anything like utrace_* calls inside their
release callback is doing things the wrong way.  But guaranteeing that
simple mistakes result in spin-lock deadlocks just seems clearly wrong to
me.  A main point of the utrace API is to make it easier to work in this
space and help you avoid writing the pathological bugs.  Adding picayune
gotchas like this just does not help anyone.  No other API callback is made
holding some internal implementation lock, and making this one the sole
exception seems just obviously ill-advised on its face.  I can't really
imagine what a justification for such an obfuscation would be.

 But yeah, passing that list along does seem like a better solution.

So you find it cleaner to have each caller of utrace_reset take another
output parameter and be followed with the same exact source code duplicated
in each call site, than to have utrace_reset() do the unlock and then the
common code itself.  I guess there is no accounting for taste.  

We try not to get excited about trivia, so on matters like this one we will
do whatever the consensus of gate-keeping reviewers wants.  My patch to
implement your suggestion adds 13 lines of source and 134 bytes of compiled
text (x86-64).  Is that what you prefer?

I'll note that the code as it stands uses the __releases annotation for
sparse, as well as thoroughly documenting the locking details in comments.
I gather that clear explanation of the code is in your eyes no excuse for
ever writing code that requires one to actually read the comments.  I'm not
sure that attitude can ever be satisfied by any code that is nontrivial or
makes any attempts at optimization.


Thanks,
Roland



Re: [RFC,PATCH 14/14] utrace core

2009-12-13 Thread Roland McGrath
I'm sorry for the delay.  I'm picking up with responding to the parts of
your review that I did not include in my first reply.  I appreciate very
much the discussion you've had with Oleg about the issues that I did not
address myself.  I look forward to your replies to my comments in that
first reply, which we have yet to see.

 Seems inconsistent on u32 vs enum utrace_resume_action.
 
 Why force enum utrace_resume_action into a u32?

The first argument to almost all callbacks (all the ones made when the
task is alive) is called action and it's consistent that its low bits
contain an enum utrace_resume_action.  The argument is u32 action in
the report_signal and report_syscall_entry callbacks, where it combines
an enum utrace_resume_action with an enum utrace_{signal,syscall}_action
(respectively).  It would indeed be more consistent to use u32 action
throughout, but it seemed nicer not to have engine writers always
writing utrace_resume_action(action) for all the cases where there are
no other bits in there.

The return value is a mirror of the u32 action argument.  In many
calls, it has only an enum utrace_resume_action in it.  But in some it
combines that with another enum utrace_*_action.  There I went for
consistency (and less typing) in the return type, since it doesn't make
any difference to how you have to write the code in your callback
function.  The main reason I used u32 at all instead of unsigned int
is just its shortness for less typing.

I don't really mind changing these details to whatever people think is
best.  The other people writing code to use the utrace API may have more
opinions than I do.  I guess it could even be OK to make the return
value and action argument always a plain enum utrace_resume_action,
and use a second in/out enum utrace_{signal,syscall}_action *
parameter in those particular calls.  But that does put some more
register pressure and loads/stores into this API.

 Seems inconsistent in the bitfield type, also it feels like that 3 the 7
 and the enum should be more tightly integrated, maybe add:
 
 UTRACE_RESUME_MAX
 
 #define UTRACE_RESUME_BITS (ilog2(UTRACE_RESUME_MAX))
 #define UTRACE_RESUME_MASK ((1  UTRACE_RESUME_BITS) - 1)

Yes, that's a good cleanup.  Thanks.
(ilog2(7) is 2, so ilog2() + 1 is what you meant.)

  +static struct utrace_engine *matching_engine(
[...]
 The function does a search, suggesting the function name ought to have
 something like find or search in it.

Ok, I'll make it find_matching_engine.

 Quite gross this.. can't we key off the
 tracehoook_report_clone_complete() and use a wakeup there?

Yes, we intended to clean this up at some point later.  But doing that
is just a distraction and complication right now so we put it off.  For
this case, however, I suppose we could just punt for the initial version.

This code exists to support the special semantics of calling
utrace_attach_task() from inside the parent's report_clone() callback.
That guarantees the parent that it wins any race with any third thread
calling utrace_attach_task().  This guarantees it will be first attacher
in the callback order, but the general subject of callback order is
already something we think we will want to revisit in the future after
we have more experience with lots of different engines trying to work
together.  It's most meaningful when using the UTRACE_ATTACH_EXCLUSIVE
flag--then you can use UTRACE_ATTACH_EXCLUSIVE|UTRACE_ATTACH_MATCH_OPS
to synchronize with other threads trying to attach the same kind of
engine to a task, and give special priority in that to the engine that
attaches in report_clone() from tracing the parent.

Really I came up with this special semantics originally just with ptrace
in mind.  ptrace is an engine that wants to exclude other tracer threads
attaching asynchronously with the same kind of engine, and that wants to
give special priority on a child to the parent's tracer (to implement
PTRACE_O_TRACECLONE et al).  However, Oleg's current ptrace code still
relies on the old hard-coded locking kludges to exclude the separate
attachers and to magically privilege the auto-attach from the parent's
tracer.  So we are not relying on this special semantics yet.

We could just punt utrace_attach_delay() and remove the associated
documentation about the special rights of report_clone() calling
utrace_attach_task().  If we decide it helps clean things up later when
we finish more cleanup of the ptrace world, then we can add the fancy
semantics back in later.

 Does this really need the inline?

It has one caller and that call is unconditional.

 Asymmetric locking like this is really better not done, and looking at
 the callsites its really no bother to clean that up, arguably even makes
 them saner.

By assymetric you mean that utrace_reap releases a lock (as the
__releases annotation indicates).  As should be obvious from the code, the
unlock is done before the loop that does -report_reap callbacks and
utrace_engine_put() (which can make 

Re: step-into-handler.c compilation failure on ppc64

2009-12-13 Thread Jan Kratochvil
On Sat, 05 Dec 2009 18:19:20 +0100, Roland McGrath wrote:
 How about this?
 
 --- step-into-handler.c   10 Dec 2008 04:42:43 -0800  1.8
 +++ step-into-handler.c   05 Dec 2009 09:18:54 -0800  
[...]
 @@ -113,11 +114,11 @@ handler_alrm_get (void)
  {
  #if defined __powerpc64__
/* ppc64 `handler_alrm' resolves to the function descriptor.  */
 -  return *(void **) handler_alrm;
 +  return *(void **) (uintptr_t) handler_alrm;
  /* __s390x__ defines both the symbols.  */
  #elif defined __s390__  !defined __s390x__
/* s390 bit 31 is zero here but I am not sure if it cannot be arbitrary.  
 */
[...]

On Sat, 05 Dec 2009 18:39:05 +0100, CAI Qian wrote:
 Thanks. Fixed.

I have to say it did not help for me (gcc-4.4.2-7.el6.ppc64).
error: dereferencing type-punned pointer will break strict-aliasing 
rules

Checked-in the union-based fix below (both tests PASS on ppc64).


Regards,
Jan

--- erestartsys.c   27 Nov 2009 22:50:31 -  1.13
+++ erestartsys.c   14 Dec 2009 00:38:42 -  1.14
@@ -38,6 +38,7 @@
 #include stddef.h
 #include pty.h
 #include string.h
+#include stdint.h
 
 #if defined __x86_64__
 # define REGISTER_IP .rip
@@ -298,8 +299,23 @@ main (int argc, char **argv)
   user = user_orig;
   user REGISTER_IP = (unsigned long) func;
 #ifdef __powerpc64__
-  user.nip = ((const unsigned long *) func)[0]; /* entry */
-  user.gpr[2] = ((const unsigned long *) func)[1]; /* TOC */
+  {
+/* ppc64 `func' resolves to the function descriptor.  */
+union
+  {
+   void (*f) (void);
+   struct
+ {
+   void *entry;
+   void *toc;
+ }
+   *p;
+  }
+const func_u = { func };
+
+user.nip = (uintptr_t) func_u.p-entry;
+user.gpr[2] = (uintptr_t) func_u.p-toc;
+  }
 #endif
   /* GDB amd64_linux_write_pc():  */
   /* We must be careful with modifying the program counter.  If we
--- step-into-handler.c 8 Dec 2008 18:23:41 -   1.8
+++ step-into-handler.c 14 Dec 2009 00:38:42 -  1.9
@@ -113,7 +113,19 @@ handler_alrm_get (void)
 {
 #if defined __powerpc64__
   /* ppc64 `handler_alrm' resolves to the function descriptor.  */
-  return *(void **) handler_alrm;
+  union
+{
+  void (*f) (int signo);
+  struct
+   {
+ void *entry;
+ void *toc;
+   }
+  *p;
+}
+  const func_u = { handler_alrm };
+
+  return func_u.p-entry;
 /* __s390x__ defines both the symbols.  */
 #elif defined __s390__  !defined __s390x__
   /* s390 bit 31 is zero here but I am not sure if it cannot be arbitrary.  */