Re: [PATCH 8/8] rpctrace: use condition variable to keep messages in sequence

2016-10-31 Thread Kalle Olavi Niemitalo
"Brent W. Baccala"  writes:

> My solution is to signal the condition variable right before the call to
> mach_msg, but this creates a race condition where messages can get
> reordered.
>
> As you know, I never liked this blocking behavior of mach_msg, but I just
> can't see any way around it now.  If you can suggest something, let me
> know...

If rpctrace could detect whether the memory_object_data_request
RPC was caused by one of its threads, it could set the condition
variable on behalf of that thread then.  However, this won't work
if rpctrace doesn't see the memory_object_data_request message.
The original thread might have sent a memory_object_data_request
message to a pager not traced by rpctrace, and that pager might
then have sent a message to the traced task via a port wrapped by
rpctrace.

If rpctrace could ask Mach whether the thread that was supposed
to call mach_msg is blocked because of a pager, that might solve
this.  IIRC, GNU Mach doesn't provide a way to ask that, but it
shouldn't be too hard to add.  rpctrace might have to
inefficiently poll the flag though.

Or examine the contents of the messages and use that to decide
whether they can be safely reordered.  In your example, vm_copy
should be safe to reorder against both mach_port_deallocate and
vm_allocate, if the address ranges and ports do not overlap.

Or implement system call tracing in GNU Mach.



Re: [PATCH 8/8] rpctrace: use condition variable to keep messages in sequence

2016-10-30 Thread Brent W. Baccala
On Sat, Oct 29, 2016 at 1:48 PM, Brent W. Baccala 
wrote:

> Yes, patch7 without patch8 can reorder messages as they're being resent,
> but patch8 uses the mutex introduced in patch7, so the ordering can't be
> reversed.
>
> They could be combined into a single patch.  Do you want me to prepare a
> combined patch and email it in?
>

OK, I'm attaching the combined patch7+patch8.

I've studied it a bit more, and it has a race condition that looks
unavoidable to me.

rpctrace on ext2fs exposes the problem.  ext2fs maps a region of memory
that it manages itself (using libpager), then calls vm_copy() on that
region, using its own task port.  This causes the kernel to make a
memory_object_data_request back to ext2fs, which makes some more requests
(mach_port_deallocate and vm_allocate) on the task port before replying
with a memory_object_data_supply, which allows the original vm_copy to
return.

So, it's calling vm_copy on its task port, and needs to do some more
operations on the task port before the vm_copy completes.  And just like
the problem I described two months ago ("mach_msg blocking on call to
vm_map"), the mach_msg call sending the vm_copy will block even if you're
not waiting for the reply message.

So... in rpctrace, I want to signal the condition variable (allowing the
next message to be processed) after the message has been sent, but since
mach_msg blocks, I can't do that without blocking the entire task port,
which deadlocks the process.

My solution is to signal the condition variable right before the call to
mach_msg, but this creates a race condition where messages can get
reordered.

As you know, I never liked this blocking behavior of mach_msg, but I just
can't see any way around it now.  If you can suggest something, let me
know...

agape
brent
From f12c28e9b16d7a3c628586b739eee3f4e004f753 Mon Sep 17 00:00:00 2001
From: Brent Baccala 
Date: Sun, 30 Oct 2016 16:13:34 -1000
Subject: [PATCH] multithread rpctrace to avoid deadlocks in the kernel

---
 utils/rpctrace.c | 41 ++---
 1 file changed, 38 insertions(+), 3 deletions(-)

diff --git a/utils/rpctrace.c b/utils/rpctrace.c
index c01f8d4..72ca614 100644
--- a/utils/rpctrace.c
+++ b/utils/rpctrace.c
@@ -141,6 +141,8 @@ struct traced_info
   mach_msg_type_name_t type;
   char *name;			/* null or a string describing this */
   task_t task;			/* The task who has the right. */
+  mach_port_seqno_t seqno;  /* next RPC to be processed on this port */
+  pthread_cond_t sequencer; /* used to sequence RPCs when they are processed out-of-order */
 };
 
 /* Each traced port has one receiver info and multiple send wrappers.
@@ -250,6 +252,8 @@ struct port_class *other_class;
 struct port_bucket *traced_bucket;
 FILE *ostream;
 
+pthread_mutex_t tracelock = PTHREAD_RECURSIVE_MUTEX_INITIALIZER_NP;
+
 /* These are the calls made from the tracing engine into
the output formatting code.  */
 
@@ -334,9 +338,13 @@ destroy_receiver_info (struct receiver_info *info)
   while (send_wrapper)
 {
   struct sender_info *next = send_wrapper->next;
+#if 0
+  if (refcounts_hard_references (_INFO (send_wrapper)->pi.refcounts) != 1)
+	fprintf(stderr, "refcounts_hard_references (%ld) == %d\n", TRACED_INFO(send_wrapper)->pi.port_right, refcounts_hard_references (_INFO (send_wrapper)->pi.refcounts));
   assert (
 	refcounts_hard_references (_INFO (send_wrapper)->pi.refcounts)
 	== 1);
+#endif
   /* Reset the receive_right of the send wrapper in advance to avoid
* destroy_receiver_info is called when the port info is destroyed. */
   send_wrapper->receive_right = NULL;
@@ -370,6 +378,8 @@ new_send_wrapper (struct receiver_info *receive, task_t task,
 	 receive->forward, TRACED_INFO (info)->pi.port_right, task2pid (task));
   TRACED_INFO (info)->type = MACH_MSG_TYPE_MOVE_SEND;
   TRACED_INFO (info)->task = task;
+  TRACED_INFO (info)->seqno = 0;
+  pthread_cond_init(& TRACED_INFO(info)->sequencer, NULL);
   info->receive_right = receive;
   info->next = receive->next;
   receive->next = info;
@@ -400,6 +410,8 @@ new_send_once_wrapper (mach_port_t right, mach_port_t *wrapper_right, task_t tas
 			   sizeof *info, );
   assert_perror (err);
   TRACED_INFO (info)->name = 0;
+  TRACED_INFO (info)->seqno = 0;
+  pthread_cond_init(& TRACED_INFO(info)->sequencer, NULL);
 }
 
   info->forward = right;
@@ -451,6 +463,8 @@ traced_clean (void *pi)
 {
   struct sender_info *info = pi;
 
+  pthread_mutex_lock();
+
   assert (TRACED_INFO (info)->type == MACH_MSG_TYPE_MOVE_SEND);
   free (TRACED_INFO (info)->name);
 
@@ -466,6 +480,8 @@ traced_clean (void *pi)
 
   info->receive_right = NULL;
 }
+
+  pthread_mutex_unlock();
 }
 
 /* Check if the receive right has been seen. */
@@ -1144,6 +1160,8 @@ trace_and_forward (mach_msg_header_t *inp, mach_msg_header_t *outp)
 
   assert (info);
 
+  pthread_mutex_lock();
+
   /* A 

Re: [PATCH 8/8] rpctrace: use condition variable to keep messages in sequence

2016-10-29 Thread Samuel Thibault
Brent W. Baccala, on Sat 29 Oct 2016 13:48:23 -1000, wrote:
> I had thought of the entire patch set being applied monolithically.

Yes, but that'll still appear as separate commits, that some people
might fall in the middle of while bisecting something.

> Yes, patch7 without patch8 can reorder messages as they're being resent, but
> patch8 uses the mutex introduced in patch7, so the ordering can't be reversed.

Then you could introduce the mutex in patch8.

> They could be combined into a single patch.

That could be better.

Samuel



Re: [PATCH 8/8] rpctrace: use condition variable to keep messages in sequence

2016-10-29 Thread Brent W. Baccala
On Oct 29, 2016 2:55 AM, "Samuel Thibault"  wrote:
>
> Hello,
>
> Better apply this patch before patch7, shouldn't we?  Otherwise there's
> a little git interval during which rpctrace is unreliable.
>
> Samuel

I had thought of the entire patch set being applied monolithically.

Yes, patch7 without patch8 can reorder messages as they're being resent,
but patch8 uses the mutex introduced in patch7, so the ordering can't be
reversed.

They could be combined into a single patch.  Do you want me to prepare a
combined patch and email it in?

agape
brent


Re: [PATCH 8/8] rpctrace: use condition variable to keep messages in sequence

2016-10-29 Thread Samuel Thibault
Hello,

Better apply this patch before patch7, shouldn't we?  Otherwise there's
a little git interval during which rpctrace is unreliable.

Samuel