Re: [PATCH v2] spin loop primitives for busy waiting

2017-06-28 Thread Michael Ellerman
Nicholas Piggin  writes:

> Current busy-wait loops are implemented by repeatedly calling cpu_relax()
> to give an arch option for a low-latency option to improve power and/or
> SMT resource contention.
>
> This poses some difficulties for powerpc, which has SMT priority setting
> instructions (priorities determine how ifetch cycles are apportioned).
> powerpc's cpu_relax() is implemented by setting a low priority then
> setting normal priority. This has several problems:
>
>  - Changing thread priority can have some execution cost and potential
>impact to other threads in the core. It's inefficient to execute them
>every time around a busy-wait loop.
>
>  - Depending on implementation details, a `low ; medium` sequence may
>not have much if any affect. Some software with similar pattern
>actually inserts a lot of nops between, in order to cause a few fetch
>cycles with the low priority.
>
>  - The busy-wait loop runs with regular priority. This might only be a few
>fetch cycles, but if there are several threads running such loops, they
>could cause a noticable impact on a non-idle thread.
>
> Implement spin_begin, spin_end primitives that can be used around busy
> wait loops, which default to no-ops. And spin_cpu_relax which defaults to
> cpu_relax.
>
> This will allow architectures to hook the entry and exit of busy-wait
> loops, and will allow powerpc to set low SMT priority at entry, and
> normal priority at exit.
>
> Suggested-by: Linus Torvalds 
> Signed-off-by: Nicholas Piggin 
> ---
>
> Since last time:
> - Fixed spin_do_cond with initial test as suggested by Linus.
> - Renamed it to spin_until_cond, which reads a little better.
>
>  include/linux/processor.h | 70 
> +++
>  1 file changed, 70 insertions(+)
>  create mode 100644 include/linux/processor.h

I'm gonna merge this via the powerpc tree unless anyone objects.

cheers


Re: [PATCH v2] spin loop primitives for busy waiting

2017-06-28 Thread Michael Ellerman
Nicholas Piggin  writes:

> Current busy-wait loops are implemented by repeatedly calling cpu_relax()
> to give an arch option for a low-latency option to improve power and/or
> SMT resource contention.
>
> This poses some difficulties for powerpc, which has SMT priority setting
> instructions (priorities determine how ifetch cycles are apportioned).
> powerpc's cpu_relax() is implemented by setting a low priority then
> setting normal priority. This has several problems:
>
>  - Changing thread priority can have some execution cost and potential
>impact to other threads in the core. It's inefficient to execute them
>every time around a busy-wait loop.
>
>  - Depending on implementation details, a `low ; medium` sequence may
>not have much if any affect. Some software with similar pattern
>actually inserts a lot of nops between, in order to cause a few fetch
>cycles with the low priority.
>
>  - The busy-wait loop runs with regular priority. This might only be a few
>fetch cycles, but if there are several threads running such loops, they
>could cause a noticable impact on a non-idle thread.
>
> Implement spin_begin, spin_end primitives that can be used around busy
> wait loops, which default to no-ops. And spin_cpu_relax which defaults to
> cpu_relax.
>
> This will allow architectures to hook the entry and exit of busy-wait
> loops, and will allow powerpc to set low SMT priority at entry, and
> normal priority at exit.
>
> Suggested-by: Linus Torvalds 
> Signed-off-by: Nicholas Piggin 
> ---
>
> Since last time:
> - Fixed spin_do_cond with initial test as suggested by Linus.
> - Renamed it to spin_until_cond, which reads a little better.
>
>  include/linux/processor.h | 70 
> +++
>  1 file changed, 70 insertions(+)
>  create mode 100644 include/linux/processor.h

I'm gonna merge this via the powerpc tree unless anyone objects.

cheers


[PATCH v2] spin loop primitives for busy waiting

2017-05-28 Thread Nicholas Piggin
Current busy-wait loops are implemented by repeatedly calling cpu_relax()
to give an arch option for a low-latency option to improve power and/or
SMT resource contention.

This poses some difficulties for powerpc, which has SMT priority setting
instructions (priorities determine how ifetch cycles are apportioned).
powerpc's cpu_relax() is implemented by setting a low priority then
setting normal priority. This has several problems:

 - Changing thread priority can have some execution cost and potential
   impact to other threads in the core. It's inefficient to execute them
   every time around a busy-wait loop.

 - Depending on implementation details, a `low ; medium` sequence may
   not have much if any affect. Some software with similar pattern
   actually inserts a lot of nops between, in order to cause a few fetch
   cycles with the low priority.

 - The busy-wait loop runs with regular priority. This might only be a few
   fetch cycles, but if there are several threads running such loops, they
   could cause a noticable impact on a non-idle thread.

Implement spin_begin, spin_end primitives that can be used around busy
wait loops, which default to no-ops. And spin_cpu_relax which defaults to
cpu_relax.

This will allow architectures to hook the entry and exit of busy-wait
loops, and will allow powerpc to set low SMT priority at entry, and
normal priority at exit.

Suggested-by: Linus Torvalds 
Signed-off-by: Nicholas Piggin 
---

Since last time:
- Fixed spin_do_cond with initial test as suggested by Linus.
- Renamed it to spin_until_cond, which reads a little better.

 include/linux/processor.h | 70 +++
 1 file changed, 70 insertions(+)
 create mode 100644 include/linux/processor.h

diff --git a/include/linux/processor.h b/include/linux/processor.h
new file mode 100644
index ..da0c5e56ca02
--- /dev/null
+++ b/include/linux/processor.h
@@ -0,0 +1,70 @@
+/* Misc low level processor primitives */
+#ifndef _LINUX_PROCESSOR_H
+#define _LINUX_PROCESSOR_H
+
+#include 
+
+/*
+ * spin_begin is used before beginning a busy-wait loop, and must be paired
+ * with spin_end when the loop is exited. spin_cpu_relax must be called
+ * within the loop.
+ *
+ * The loop body should be as small and fast as possible, on the order of
+ * tens of instructions/cycles as a guide. It should and avoid calling
+ * cpu_relax, or any "spin" or sleep type of primitive including nested uses
+ * of these primitives. It should not lock or take any other resource.
+ * Violations of these guidelies will not cause a bug, but may cause sub
+ * optimal performance.
+ *
+ * These loops are optimized to be used where wait times are expected to be
+ * less than the cost of a context switch (and associated overhead).
+ *
+ * Detection of resource owner and decision to spin or sleep or guest-yield
+ * (e.g., spin lock holder vcpu preempted, or mutex owner not on CPU) can be
+ * tested within the loop body.
+ */
+#ifndef spin_begin
+#define spin_begin()
+#endif
+
+#ifndef spin_cpu_relax
+#define spin_cpu_relax() cpu_relax()
+#endif
+
+/*
+ * spin_cpu_yield may be called to yield (undirected) to the hypervisor if
+ * necessary. This should be used if the wait is expected to take longer
+ * than context switch overhead, but we can't sleep or do a directed yield.
+ */
+#ifndef spin_cpu_yield
+#define spin_cpu_yield() cpu_relax_yield()
+#endif
+
+#ifndef spin_end
+#define spin_end()
+#endif
+
+/*
+ * spin_until_cond can be used to wait for a condition to become true. It
+ * may be expected that the first iteration will true in the common case
+ * (no spinning), so that callers should not require a first "likely" test
+ * for the uncontended case before using this primitive.
+ *
+ * Usage and implementation guidelines are the same as for the spin_begin
+ * primitives, above.
+ */
+#ifndef spin_until_cond
+#define spin_until_cond(cond)  \
+do {   \
+   if (unlikely(!(cond))) {\
+   spin_begin();   \
+   do {\
+   spin_cpu_relax();   \
+   } while (!(cond));  \
+   spin_end(); \
+   }   \
+} while (0)
+
+#endif
+
+#endif /* _LINUX_PROCESSOR_H */
-- 
2.11.0



[PATCH v2] spin loop primitives for busy waiting

2017-05-28 Thread Nicholas Piggin
Current busy-wait loops are implemented by repeatedly calling cpu_relax()
to give an arch option for a low-latency option to improve power and/or
SMT resource contention.

This poses some difficulties for powerpc, which has SMT priority setting
instructions (priorities determine how ifetch cycles are apportioned).
powerpc's cpu_relax() is implemented by setting a low priority then
setting normal priority. This has several problems:

 - Changing thread priority can have some execution cost and potential
   impact to other threads in the core. It's inefficient to execute them
   every time around a busy-wait loop.

 - Depending on implementation details, a `low ; medium` sequence may
   not have much if any affect. Some software with similar pattern
   actually inserts a lot of nops between, in order to cause a few fetch
   cycles with the low priority.

 - The busy-wait loop runs with regular priority. This might only be a few
   fetch cycles, but if there are several threads running such loops, they
   could cause a noticable impact on a non-idle thread.

Implement spin_begin, spin_end primitives that can be used around busy
wait loops, which default to no-ops. And spin_cpu_relax which defaults to
cpu_relax.

This will allow architectures to hook the entry and exit of busy-wait
loops, and will allow powerpc to set low SMT priority at entry, and
normal priority at exit.

Suggested-by: Linus Torvalds 
Signed-off-by: Nicholas Piggin 
---

Since last time:
- Fixed spin_do_cond with initial test as suggested by Linus.
- Renamed it to spin_until_cond, which reads a little better.

 include/linux/processor.h | 70 +++
 1 file changed, 70 insertions(+)
 create mode 100644 include/linux/processor.h

diff --git a/include/linux/processor.h b/include/linux/processor.h
new file mode 100644
index ..da0c5e56ca02
--- /dev/null
+++ b/include/linux/processor.h
@@ -0,0 +1,70 @@
+/* Misc low level processor primitives */
+#ifndef _LINUX_PROCESSOR_H
+#define _LINUX_PROCESSOR_H
+
+#include 
+
+/*
+ * spin_begin is used before beginning a busy-wait loop, and must be paired
+ * with spin_end when the loop is exited. spin_cpu_relax must be called
+ * within the loop.
+ *
+ * The loop body should be as small and fast as possible, on the order of
+ * tens of instructions/cycles as a guide. It should and avoid calling
+ * cpu_relax, or any "spin" or sleep type of primitive including nested uses
+ * of these primitives. It should not lock or take any other resource.
+ * Violations of these guidelies will not cause a bug, but may cause sub
+ * optimal performance.
+ *
+ * These loops are optimized to be used where wait times are expected to be
+ * less than the cost of a context switch (and associated overhead).
+ *
+ * Detection of resource owner and decision to spin or sleep or guest-yield
+ * (e.g., spin lock holder vcpu preempted, or mutex owner not on CPU) can be
+ * tested within the loop body.
+ */
+#ifndef spin_begin
+#define spin_begin()
+#endif
+
+#ifndef spin_cpu_relax
+#define spin_cpu_relax() cpu_relax()
+#endif
+
+/*
+ * spin_cpu_yield may be called to yield (undirected) to the hypervisor if
+ * necessary. This should be used if the wait is expected to take longer
+ * than context switch overhead, but we can't sleep or do a directed yield.
+ */
+#ifndef spin_cpu_yield
+#define spin_cpu_yield() cpu_relax_yield()
+#endif
+
+#ifndef spin_end
+#define spin_end()
+#endif
+
+/*
+ * spin_until_cond can be used to wait for a condition to become true. It
+ * may be expected that the first iteration will true in the common case
+ * (no spinning), so that callers should not require a first "likely" test
+ * for the uncontended case before using this primitive.
+ *
+ * Usage and implementation guidelines are the same as for the spin_begin
+ * primitives, above.
+ */
+#ifndef spin_until_cond
+#define spin_until_cond(cond)  \
+do {   \
+   if (unlikely(!(cond))) {\
+   spin_begin();   \
+   do {\
+   spin_cpu_relax();   \
+   } while (!(cond));  \
+   spin_end(); \
+   }   \
+} while (0)
+
+#endif
+
+#endif /* _LINUX_PROCESSOR_H */
-- 
2.11.0