Re: [GIT PULL v2 1/5] processor.h: introduce cpu_relax_yield

2016-11-15 Thread Christian Borntraeger
On 11/15/2016 02:37 PM, Russell King - ARM Linux wrote:
> On Tue, Nov 15, 2016 at 02:19:53PM +0100, Christian Borntraeger wrote:
>> On 11/15/2016 01:30 PM, Russell King - ARM Linux wrote:
>>> On Tue, Oct 25, 2016 at 11:03:11AM +0200, Christian Borntraeger wrote:
 For spinning loops people do often use barrier() or cpu_relax().
 For most architectures cpu_relax and barrier are the same, but on
 some architectures cpu_relax can add some latency.
 For example on power,sparc64 and arc, cpu_relax can shift the CPU
 towards other hardware threads in an SMT environment.
 On s390 cpu_relax does even more, it uses an hypercall to the
 hypervisor to give up the timeslice.
 In contrast to the SMT yielding this can result in larger latencies.
 In some places this latency is unwanted, so another variant
 "cpu_relax_lowlatency" was introduced. Before this is used in more
 and more places, lets revert the logic and provide a cpu_relax_yield
 that can be called in places where yielding is more important than
 latency. By default this is the same as cpu_relax on all architectures.
>>>
>>> Rather than having to update all these architectures in this way, can't
>>> we put in some linux/*.h header something like:
>>>
>>> #ifndef cpu_relax_yield
>>> #define cpu_relax_yield() cpu_relax()
>>> #endif
>>>
>>> so only those architectures that need to do something need to be
>>> modified?
>>
>> These patches are part of linux-next since a month or so, changing that 
>> would invalidate all the next testing. If people want that, I can certainly
>> do that, though.
> 
> It's three weeks since you posted them.  For one of those weeks (the
> week you posted them) I was away, and missed them while catching up.
> Sorry, but it sometimes takes a while to spot things amongst the
> backlog, and normally takes some subsequent activity on the thread to
> bring it back into view.

Absolutely no need to apologize. Thank you for doing the review and the 
proposal. 
I will do whatever is consensus, but since this looks like tip/locking material
I will wait for Peter or Ingo to decide about the if and how.

Christian



Re: [GIT PULL v2 1/5] processor.h: introduce cpu_relax_yield

2016-11-15 Thread Russell King - ARM Linux
On Tue, Nov 15, 2016 at 02:19:53PM +0100, Christian Borntraeger wrote:
> On 11/15/2016 01:30 PM, Russell King - ARM Linux wrote:
> > On Tue, Oct 25, 2016 at 11:03:11AM +0200, Christian Borntraeger wrote:
> >> For spinning loops people do often use barrier() or cpu_relax().
> >> For most architectures cpu_relax and barrier are the same, but on
> >> some architectures cpu_relax can add some latency.
> >> For example on power,sparc64 and arc, cpu_relax can shift the CPU
> >> towards other hardware threads in an SMT environment.
> >> On s390 cpu_relax does even more, it uses an hypercall to the
> >> hypervisor to give up the timeslice.
> >> In contrast to the SMT yielding this can result in larger latencies.
> >> In some places this latency is unwanted, so another variant
> >> "cpu_relax_lowlatency" was introduced. Before this is used in more
> >> and more places, lets revert the logic and provide a cpu_relax_yield
> >> that can be called in places where yielding is more important than
> >> latency. By default this is the same as cpu_relax on all architectures.
> > 
> > Rather than having to update all these architectures in this way, can't
> > we put in some linux/*.h header something like:
> > 
> > #ifndef cpu_relax_yield
> > #define cpu_relax_yield() cpu_relax()
> > #endif
> > 
> > so only those architectures that need to do something need to be
> > modified?
> 
> These patches are part of linux-next since a month or so, changing that 
> would invalidate all the next testing. If people want that, I can certainly
> do that, though.

It's three weeks since you posted them.  For one of those weeks (the
week you posted them) I was away, and missed them while catching up.
Sorry, but it sometimes takes a while to spot things amongst the
backlog, and normally takes some subsequent activity on the thread to
bring it back into view.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.


Re: [GIT PULL v2 1/5] processor.h: introduce cpu_relax_yield

2016-11-15 Thread Christian Borntraeger
On 11/15/2016 01:30 PM, Russell King - ARM Linux wrote:
> On Tue, Oct 25, 2016 at 11:03:11AM +0200, Christian Borntraeger wrote:
>> For spinning loops people do often use barrier() or cpu_relax().
>> For most architectures cpu_relax and barrier are the same, but on
>> some architectures cpu_relax can add some latency.
>> For example on power,sparc64 and arc, cpu_relax can shift the CPU
>> towards other hardware threads in an SMT environment.
>> On s390 cpu_relax does even more, it uses an hypercall to the
>> hypervisor to give up the timeslice.
>> In contrast to the SMT yielding this can result in larger latencies.
>> In some places this latency is unwanted, so another variant
>> "cpu_relax_lowlatency" was introduced. Before this is used in more
>> and more places, lets revert the logic and provide a cpu_relax_yield
>> that can be called in places where yielding is more important than
>> latency. By default this is the same as cpu_relax on all architectures.
> 
> Rather than having to update all these architectures in this way, can't
> we put in some linux/*.h header something like:
> 
> #ifndef cpu_relax_yield
> #define cpu_relax_yield() cpu_relax()
> #endif
> 
> so only those architectures that need to do something need to be
> modified?

These patches are part of linux-next since a month or so, changing that 
would invalidate all the next testing. If people want that, I can certainly
do that, though.




Re: [GIT PULL v2 1/5] processor.h: introduce cpu_relax_yield

2016-11-15 Thread Russell King - ARM Linux
On Tue, Oct 25, 2016 at 11:03:11AM +0200, Christian Borntraeger wrote:
> For spinning loops people do often use barrier() or cpu_relax().
> For most architectures cpu_relax and barrier are the same, but on
> some architectures cpu_relax can add some latency.
> For example on power,sparc64 and arc, cpu_relax can shift the CPU
> towards other hardware threads in an SMT environment.
> On s390 cpu_relax does even more, it uses an hypercall to the
> hypervisor to give up the timeslice.
> In contrast to the SMT yielding this can result in larger latencies.
> In some places this latency is unwanted, so another variant
> "cpu_relax_lowlatency" was introduced. Before this is used in more
> and more places, lets revert the logic and provide a cpu_relax_yield
> that can be called in places where yielding is more important than
> latency. By default this is the same as cpu_relax on all architectures.

Rather than having to update all these architectures in this way, can't
we put in some linux/*.h header something like:

#ifndef cpu_relax_yield
#define cpu_relax_yield() cpu_relax()
#endif

so only those architectures that need to do something need to be
modified?

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.


[GIT PULL v2 1/5] processor.h: introduce cpu_relax_yield

2016-10-25 Thread Christian Borntraeger
For spinning loops people do often use barrier() or cpu_relax().
For most architectures cpu_relax and barrier are the same, but on
some architectures cpu_relax can add some latency.
For example on power,sparc64 and arc, cpu_relax can shift the CPU
towards other hardware threads in an SMT environment.
On s390 cpu_relax does even more, it uses an hypercall to the
hypervisor to give up the timeslice.
In contrast to the SMT yielding this can result in larger latencies.
In some places this latency is unwanted, so another variant
"cpu_relax_lowlatency" was introduced. Before this is used in more
and more places, lets revert the logic and provide a cpu_relax_yield
that can be called in places where yielding is more important than
latency. By default this is the same as cpu_relax on all architectures.

Signed-off-by: Christian Borntraeger 
---
 arch/alpha/include/asm/processor.h  | 1 +
 arch/arc/include/asm/processor.h| 2 ++
 arch/arm/include/asm/processor.h| 1 +
 arch/arm64/include/asm/processor.h  | 1 +
 arch/avr32/include/asm/processor.h  | 1 +
 arch/blackfin/include/asm/processor.h   | 1 +
 arch/c6x/include/asm/processor.h| 1 +
 arch/cris/include/asm/processor.h   | 1 +
 arch/frv/include/asm/processor.h| 1 +
 arch/h8300/include/asm/processor.h  | 1 +
 arch/hexagon/include/asm/processor.h| 1 +
 arch/ia64/include/asm/processor.h   | 1 +
 arch/m32r/include/asm/processor.h   | 1 +
 arch/m68k/include/asm/processor.h   | 1 +
 arch/metag/include/asm/processor.h  | 1 +
 arch/microblaze/include/asm/processor.h | 1 +
 arch/mips/include/asm/processor.h   | 1 +
 arch/mn10300/include/asm/processor.h| 1 +
 arch/nios2/include/asm/processor.h  | 1 +
 arch/openrisc/include/asm/processor.h   | 1 +
 arch/parisc/include/asm/processor.h | 1 +
 arch/powerpc/include/asm/processor.h| 1 +
 arch/s390/include/asm/processor.h   | 3 ++-
 arch/s390/kernel/processor.c| 4 ++--
 arch/score/include/asm/processor.h  | 1 +
 arch/sh/include/asm/processor.h | 1 +
 arch/sparc/include/asm/processor_32.h   | 1 +
 arch/sparc/include/asm/processor_64.h   | 1 +
 arch/tile/include/asm/processor.h   | 1 +
 arch/unicore32/include/asm/processor.h  | 1 +
 arch/x86/include/asm/processor.h| 1 +
 arch/x86/um/asm/processor.h | 1 +
 arch/xtensa/include/asm/processor.h | 1 +
 33 files changed, 36 insertions(+), 3 deletions(-)

diff --git a/arch/alpha/include/asm/processor.h 
b/arch/alpha/include/asm/processor.h
index 43a7559..0556fda 100644
--- a/arch/alpha/include/asm/processor.h
+++ b/arch/alpha/include/asm/processor.h
@@ -58,6 +58,7 @@ unsigned long get_wchan(struct task_struct *p);
   ((tsk) == current ? rdusp() : task_thread_info(tsk)->pcb.usp)
 
 #define cpu_relax()barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 #define ARCH_HAS_PREFETCH
diff --git a/arch/arc/include/asm/processor.h b/arch/arc/include/asm/processor.h
index 16b630f..6c158d5 100644
--- a/arch/arc/include/asm/processor.h
+++ b/arch/arc/include/asm/processor.h
@@ -60,6 +60,7 @@ struct task_struct;
 #ifndef CONFIG_EZNPS_MTM_EXT
 
 #define cpu_relax()barrier()
+#define cpu_relax_yield()  cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 #else
@@ -67,6 +68,7 @@ struct task_struct;
 #define cpu_relax() \
__asm__ __volatile__ (".word %0" : : "i"(CTOP_INST_SCHD_RW) : "memory")
 
+#define cpu_relax_yield()  cpu_relax()
 #define cpu_relax_lowlatency() barrier()
 
 #endif
diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h
index 8a1e8e9..db660e0 100644
--- a/arch/arm/include/asm/processor.h
+++ b/arch/arm/include/asm/processor.h
@@ -82,6 +82,7 @@ unsigned long get_wchan(struct task_struct *p);
 #define cpu_relax()barrier()
 #endif
 
+#define cpu_relax_yield()cpu_relax()
 #define cpu_relax_lowlatency()cpu_relax()
 
 #define task_pt_regs(p) \
diff --git a/arch/arm64/include/asm/processor.h 
b/arch/arm64/include/asm/processor.h
index 60e3482..3f9b0e5 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -149,6 +149,7 @@ static inline void cpu_relax(void)
asm volatile("yield" ::: "memory");
 }
 
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency()cpu_relax()
 
 /* Thread switching */
diff --git a/arch/avr32/include/asm/processor.h 
b/arch/avr32/include/asm/processor.h
index 941593c..e412e8b 100644
--- a/arch/avr32/include/asm/processor.h
+++ b/arch/avr32/include/asm/processor.h
@@ -92,6 +92,7 @@ extern struct avr32_cpuinfo boot_cpu_data;
 #define TASK_UNMAPPED_BASE (PAGE_ALIGN(TASK_SIZE / 3))
 
 #define cpu_relax()barrier()
+#define cpu_relax_yield()  cpu_relax()
 #define cpu_relax_lowlatency()cpu_relax()
 #define cpu_sync_pipeline()