Re: timecounter memory barriers

2018-09-18 Thread Mark Kettenis
> Date: Tue, 18 Sep 2018 17:59:21 +0200
> From: Alexander Bluhm 
> 
> Hi,
> 
> In timecounter code, generation number and timehands are volatile,
> but there are no memory barriers.  In general such code is wrong
> for SMP as it tells the compiler to care about ordering but ignores
> cache reordering.
> 
> NetBSD puts membar_producer() at places where I would put them.
> But in binuptime() they have a comment that barriers would be too
> expensive and give an argument for code that we don't have.
> 
>  * Memory barriers are also too expensive to use for such a
>  * performance critical function.  The good news is that we do not
>  * need memory barriers for this type of exclusion, as the thread
>  * updating timecounter_removals will issue a broadcast cross call
>  * before inspecting our l_tcgen value (this elides memory ordering
>  * issues).
> 
> FreeBSD has put atomic operations in binuptime() and tc_windup(),
> but I don't understand exctly what they do.
> 
> gen = atomic_load_acq_int(>th_generation);
> atomic_thread_fence_acq();
> atomic_thread_fence_rel();
> atomic_store_rel_int(>th_generation, ogen);

Those are atomic operations with acquire and release semantics.
They're pretty much a combination of an atomic instruction with a
memory barrier.  Combining these has some benefits on arcitectures,
such as x86, where atomic operations have builtin memory ordering
effects.  They confuse the hell out of me though.

> So I think with our membar functions the code should look like this.
> On amd64 they are just compiler barriers anyway.
> 
> ok?

I think this is correct.

ok kettenis@

> Index: kern/kern_tc.c
> ===
> RCS file: /data/mirror/openbsd/cvs/src/sys/kern/kern_tc.c,v
> retrieving revision 1.33
> diff -u -p -r1.33 kern_tc.c
> --- kern/kern_tc.c28 May 2018 18:05:42 -  1.33
> +++ kern/kern_tc.c18 Sep 2018 15:39:54 -
> @@ -22,6 +22,7 @@
>   */
>  
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -141,8 +142,10 @@ binuptime(struct bintime *bt)
>   do {
>   th = timehands;
>   gen = th->th_generation;
> + membar_consumer();
>   *bt = th->th_offset;
>   bintime_addx(bt, th->th_scale * tc_delta(th));
> + membar_consumer();
>   } while (gen == 0 || gen != th->th_generation);
>  }
>  
> @@ -199,7 +202,9 @@ getnanouptime(struct timespec *tsp)
>   do {
>   th = timehands;
>   gen = th->th_generation;
> + membar_consumer();
>   bintime2timespec(>th_offset, tsp);
> + membar_consumer();
>   } while (gen == 0 || gen != th->th_generation);
>  }
>  
> @@ -212,7 +217,9 @@ getmicrouptime(struct timeval *tvp)
>   do {
>   th = timehands;
>   gen = th->th_generation;
> + membar_consumer();
>   bintime2timeval(>th_offset, tvp);
> + membar_consumer();
>   } while (gen == 0 || gen != th->th_generation);
>  }
>  
> @@ -225,7 +232,9 @@ getnanotime(struct timespec *tsp)
>   do {
>   th = timehands;
>   gen = th->th_generation;
> + membar_consumer();
>   *tsp = th->th_nanotime;
> + membar_consumer();
>   } while (gen == 0 || gen != th->th_generation);
>  }
>  
> @@ -238,7 +247,9 @@ getmicrotime(struct timeval *tvp)
>   do {
>   th = timehands;
>   gen = th->th_generation;
> + membar_consumer();
>   *tvp = th->th_microtime;
> + membar_consumer();
>   } while (gen == 0 || gen != th->th_generation);
>  }
>  
> @@ -390,6 +401,7 @@ tc_windup(void)
>   th = tho->th_next;
>   ogen = th->th_generation;
>   th->th_generation = 0;
> + membar_producer();
>   memcpy(th, tho, offsetof(struct timehands, th_generation));
>  
>   /*
> @@ -481,11 +493,13 @@ tc_windup(void)
>*/
>   if (++ogen == 0)
>   ogen = 1;
> + membar_producer();
>   th->th_generation = ogen;
>  
>   /* Go live with the new struct timehands. */
>   time_second = th->th_microtime.tv_sec;
>   time_uptime = th->th_offset.sec;
> + membar_producer();
>   timehands = th;
>  }
>  
> 
> 



timecounter memory barriers

2018-09-18 Thread Alexander Bluhm
Hi,

In timecounter code, generation number and timehands are volatile,
but there are no memory barriers.  In general such code is wrong
for SMP as it tells the compiler to care about ordering but ignores
cache reordering.

NetBSD puts membar_producer() at places where I would put them.
But in binuptime() they have a comment that barriers would be too
expensive and give an argument for code that we don't have.

 * Memory barriers are also too expensive to use for such a
 * performance critical function.  The good news is that we do not
 * need memory barriers for this type of exclusion, as the thread
 * updating timecounter_removals will issue a broadcast cross call
 * before inspecting our l_tcgen value (this elides memory ordering
 * issues).

FreeBSD has put atomic operations in binuptime() and tc_windup(),
but I don't understand exctly what they do.

gen = atomic_load_acq_int(>th_generation);
atomic_thread_fence_acq();
atomic_thread_fence_rel();
atomic_store_rel_int(>th_generation, ogen);

So I think with our membar functions the code should look like this.
On amd64 they are just compiler barriers anyway.

ok?

bluhm

Index: kern/kern_tc.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/kern/kern_tc.c,v
retrieving revision 1.33
diff -u -p -r1.33 kern_tc.c
--- kern/kern_tc.c  28 May 2018 18:05:42 -  1.33
+++ kern/kern_tc.c  18 Sep 2018 15:39:54 -
@@ -22,6 +22,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -141,8 +142,10 @@ binuptime(struct bintime *bt)
do {
th = timehands;
gen = th->th_generation;
+   membar_consumer();
*bt = th->th_offset;
bintime_addx(bt, th->th_scale * tc_delta(th));
+   membar_consumer();
} while (gen == 0 || gen != th->th_generation);
 }
 
@@ -199,7 +202,9 @@ getnanouptime(struct timespec *tsp)
do {
th = timehands;
gen = th->th_generation;
+   membar_consumer();
bintime2timespec(>th_offset, tsp);
+   membar_consumer();
} while (gen == 0 || gen != th->th_generation);
 }
 
@@ -212,7 +217,9 @@ getmicrouptime(struct timeval *tvp)
do {
th = timehands;
gen = th->th_generation;
+   membar_consumer();
bintime2timeval(>th_offset, tvp);
+   membar_consumer();
} while (gen == 0 || gen != th->th_generation);
 }
 
@@ -225,7 +232,9 @@ getnanotime(struct timespec *tsp)
do {
th = timehands;
gen = th->th_generation;
+   membar_consumer();
*tsp = th->th_nanotime;
+   membar_consumer();
} while (gen == 0 || gen != th->th_generation);
 }
 
@@ -238,7 +247,9 @@ getmicrotime(struct timeval *tvp)
do {
th = timehands;
gen = th->th_generation;
+   membar_consumer();
*tvp = th->th_microtime;
+   membar_consumer();
} while (gen == 0 || gen != th->th_generation);
 }
 
@@ -390,6 +401,7 @@ tc_windup(void)
th = tho->th_next;
ogen = th->th_generation;
th->th_generation = 0;
+   membar_producer();
memcpy(th, tho, offsetof(struct timehands, th_generation));
 
/*
@@ -481,11 +493,13 @@ tc_windup(void)
 */
if (++ogen == 0)
ogen = 1;
+   membar_producer();
th->th_generation = ogen;
 
/* Go live with the new struct timehands. */
time_second = th->th_microtime.tv_sec;
time_uptime = th->th_offset.sec;
+   membar_producer();
timehands = th;
 }