Re: [GIT PULL v2 1/5] processor.h: introduce cpu_relax_yield
On 11/15/2016 02:37 PM, Russell King - ARM Linux wrote: > On Tue, Nov 15, 2016 at 02:19:53PM +0100, Christian Borntraeger wrote: >> On 11/15/2016 01:30 PM, Russell King - ARM Linux wrote: >>> On Tue, Oct 25, 2016 at 11:03:11AM +0200, Christian Borntraeger wrote: For spinning loops people do often use barrier() or cpu_relax(). For most architectures cpu_relax and barrier are the same, but on some architectures cpu_relax can add some latency. For example on power,sparc64 and arc, cpu_relax can shift the CPU towards other hardware threads in an SMT environment. On s390 cpu_relax does even more, it uses an hypercall to the hypervisor to give up the timeslice. In contrast to the SMT yielding this can result in larger latencies. In some places this latency is unwanted, so another variant "cpu_relax_lowlatency" was introduced. Before this is used in more and more places, lets revert the logic and provide a cpu_relax_yield that can be called in places where yielding is more important than latency. By default this is the same as cpu_relax on all architectures. >>> >>> Rather than having to update all these architectures in this way, can't >>> we put in some linux/*.h header something like: >>> >>> #ifndef cpu_relax_yield >>> #define cpu_relax_yield() cpu_relax() >>> #endif >>> >>> so only those architectures that need to do something need to be >>> modified? >> >> These patches are part of linux-next since a month or so, changing that >> would invalidate all the next testing. If people want that, I can certainly >> do that, though. > > It's three weeks since you posted them. For one of those weeks (the > week you posted them) I was away, and missed them while catching up. > Sorry, but it sometimes takes a while to spot things amongst the > backlog, and normally takes some subsequent activity on the thread to > bring it back into view. Absolutely no need to apologize. Thank you for doing the review and the proposal. I will do whatever is consensus, but since this looks like tip/locking material I will wait for Peter or Ingo to decide about the if and how. Christian
Re: [GIT PULL v2 1/5] processor.h: introduce cpu_relax_yield
On Tue, Nov 15, 2016 at 02:19:53PM +0100, Christian Borntraeger wrote: > On 11/15/2016 01:30 PM, Russell King - ARM Linux wrote: > > On Tue, Oct 25, 2016 at 11:03:11AM +0200, Christian Borntraeger wrote: > >> For spinning loops people do often use barrier() or cpu_relax(). > >> For most architectures cpu_relax and barrier are the same, but on > >> some architectures cpu_relax can add some latency. > >> For example on power,sparc64 and arc, cpu_relax can shift the CPU > >> towards other hardware threads in an SMT environment. > >> On s390 cpu_relax does even more, it uses an hypercall to the > >> hypervisor to give up the timeslice. > >> In contrast to the SMT yielding this can result in larger latencies. > >> In some places this latency is unwanted, so another variant > >> "cpu_relax_lowlatency" was introduced. Before this is used in more > >> and more places, lets revert the logic and provide a cpu_relax_yield > >> that can be called in places where yielding is more important than > >> latency. By default this is the same as cpu_relax on all architectures. > > > > Rather than having to update all these architectures in this way, can't > > we put in some linux/*.h header something like: > > > > #ifndef cpu_relax_yield > > #define cpu_relax_yield() cpu_relax() > > #endif > > > > so only those architectures that need to do something need to be > > modified? > > These patches are part of linux-next since a month or so, changing that > would invalidate all the next testing. If people want that, I can certainly > do that, though. It's three weeks since you posted them. For one of those weeks (the week you posted them) I was away, and missed them while catching up. Sorry, but it sometimes takes a while to spot things amongst the backlog, and normally takes some subsequent activity on the thread to bring it back into view. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net.
Re: [GIT PULL v2 1/5] processor.h: introduce cpu_relax_yield
On 11/15/2016 01:30 PM, Russell King - ARM Linux wrote: > On Tue, Oct 25, 2016 at 11:03:11AM +0200, Christian Borntraeger wrote: >> For spinning loops people do often use barrier() or cpu_relax(). >> For most architectures cpu_relax and barrier are the same, but on >> some architectures cpu_relax can add some latency. >> For example on power,sparc64 and arc, cpu_relax can shift the CPU >> towards other hardware threads in an SMT environment. >> On s390 cpu_relax does even more, it uses an hypercall to the >> hypervisor to give up the timeslice. >> In contrast to the SMT yielding this can result in larger latencies. >> In some places this latency is unwanted, so another variant >> "cpu_relax_lowlatency" was introduced. Before this is used in more >> and more places, lets revert the logic and provide a cpu_relax_yield >> that can be called in places where yielding is more important than >> latency. By default this is the same as cpu_relax on all architectures. > > Rather than having to update all these architectures in this way, can't > we put in some linux/*.h header something like: > > #ifndef cpu_relax_yield > #define cpu_relax_yield() cpu_relax() > #endif > > so only those architectures that need to do something need to be > modified? These patches are part of linux-next since a month or so, changing that would invalidate all the next testing. If people want that, I can certainly do that, though.
Re: [GIT PULL v2 1/5] processor.h: introduce cpu_relax_yield
On Tue, Oct 25, 2016 at 11:03:11AM +0200, Christian Borntraeger wrote: > For spinning loops people do often use barrier() or cpu_relax(). > For most architectures cpu_relax and barrier are the same, but on > some architectures cpu_relax can add some latency. > For example on power,sparc64 and arc, cpu_relax can shift the CPU > towards other hardware threads in an SMT environment. > On s390 cpu_relax does even more, it uses an hypercall to the > hypervisor to give up the timeslice. > In contrast to the SMT yielding this can result in larger latencies. > In some places this latency is unwanted, so another variant > "cpu_relax_lowlatency" was introduced. Before this is used in more > and more places, lets revert the logic and provide a cpu_relax_yield > that can be called in places where yielding is more important than > latency. By default this is the same as cpu_relax on all architectures. Rather than having to update all these architectures in this way, can't we put in some linux/*.h header something like: #ifndef cpu_relax_yield #define cpu_relax_yield() cpu_relax() #endif so only those architectures that need to do something need to be modified? -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net.
[GIT PULL v2 1/5] processor.h: introduce cpu_relax_yield
For spinning loops people do often use barrier() or cpu_relax(). For most architectures cpu_relax and barrier are the same, but on some architectures cpu_relax can add some latency. For example on power,sparc64 and arc, cpu_relax can shift the CPU towards other hardware threads in an SMT environment. On s390 cpu_relax does even more, it uses an hypercall to the hypervisor to give up the timeslice. In contrast to the SMT yielding this can result in larger latencies. In some places this latency is unwanted, so another variant "cpu_relax_lowlatency" was introduced. Before this is used in more and more places, lets revert the logic and provide a cpu_relax_yield that can be called in places where yielding is more important than latency. By default this is the same as cpu_relax on all architectures. Signed-off-by: Christian Borntraeger--- arch/alpha/include/asm/processor.h | 1 + arch/arc/include/asm/processor.h| 2 ++ arch/arm/include/asm/processor.h| 1 + arch/arm64/include/asm/processor.h | 1 + arch/avr32/include/asm/processor.h | 1 + arch/blackfin/include/asm/processor.h | 1 + arch/c6x/include/asm/processor.h| 1 + arch/cris/include/asm/processor.h | 1 + arch/frv/include/asm/processor.h| 1 + arch/h8300/include/asm/processor.h | 1 + arch/hexagon/include/asm/processor.h| 1 + arch/ia64/include/asm/processor.h | 1 + arch/m32r/include/asm/processor.h | 1 + arch/m68k/include/asm/processor.h | 1 + arch/metag/include/asm/processor.h | 1 + arch/microblaze/include/asm/processor.h | 1 + arch/mips/include/asm/processor.h | 1 + arch/mn10300/include/asm/processor.h| 1 + arch/nios2/include/asm/processor.h | 1 + arch/openrisc/include/asm/processor.h | 1 + arch/parisc/include/asm/processor.h | 1 + arch/powerpc/include/asm/processor.h| 1 + arch/s390/include/asm/processor.h | 3 ++- arch/s390/kernel/processor.c| 4 ++-- arch/score/include/asm/processor.h | 1 + arch/sh/include/asm/processor.h | 1 + arch/sparc/include/asm/processor_32.h | 1 + arch/sparc/include/asm/processor_64.h | 1 + arch/tile/include/asm/processor.h | 1 + arch/unicore32/include/asm/processor.h | 1 + arch/x86/include/asm/processor.h| 1 + arch/x86/um/asm/processor.h | 1 + arch/xtensa/include/asm/processor.h | 1 + 33 files changed, 36 insertions(+), 3 deletions(-) diff --git a/arch/alpha/include/asm/processor.h b/arch/alpha/include/asm/processor.h index 43a7559..0556fda 100644 --- a/arch/alpha/include/asm/processor.h +++ b/arch/alpha/include/asm/processor.h @@ -58,6 +58,7 @@ unsigned long get_wchan(struct task_struct *p); ((tsk) == current ? rdusp() : task_thread_info(tsk)->pcb.usp) #define cpu_relax()barrier() +#define cpu_relax_yield() cpu_relax() #define cpu_relax_lowlatency() cpu_relax() #define ARCH_HAS_PREFETCH diff --git a/arch/arc/include/asm/processor.h b/arch/arc/include/asm/processor.h index 16b630f..6c158d5 100644 --- a/arch/arc/include/asm/processor.h +++ b/arch/arc/include/asm/processor.h @@ -60,6 +60,7 @@ struct task_struct; #ifndef CONFIG_EZNPS_MTM_EXT #define cpu_relax()barrier() +#define cpu_relax_yield() cpu_relax() #define cpu_relax_lowlatency() cpu_relax() #else @@ -67,6 +68,7 @@ struct task_struct; #define cpu_relax() \ __asm__ __volatile__ (".word %0" : : "i"(CTOP_INST_SCHD_RW) : "memory") +#define cpu_relax_yield() cpu_relax() #define cpu_relax_lowlatency() barrier() #endif diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h index 8a1e8e9..db660e0 100644 --- a/arch/arm/include/asm/processor.h +++ b/arch/arm/include/asm/processor.h @@ -82,6 +82,7 @@ unsigned long get_wchan(struct task_struct *p); #define cpu_relax()barrier() #endif +#define cpu_relax_yield()cpu_relax() #define cpu_relax_lowlatency()cpu_relax() #define task_pt_regs(p) \ diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h index 60e3482..3f9b0e5 100644 --- a/arch/arm64/include/asm/processor.h +++ b/arch/arm64/include/asm/processor.h @@ -149,6 +149,7 @@ static inline void cpu_relax(void) asm volatile("yield" ::: "memory"); } +#define cpu_relax_yield() cpu_relax() #define cpu_relax_lowlatency()cpu_relax() /* Thread switching */ diff --git a/arch/avr32/include/asm/processor.h b/arch/avr32/include/asm/processor.h index 941593c..e412e8b 100644 --- a/arch/avr32/include/asm/processor.h +++ b/arch/avr32/include/asm/processor.h @@ -92,6 +92,7 @@ extern struct avr32_cpuinfo boot_cpu_data; #define TASK_UNMAPPED_BASE (PAGE_ALIGN(TASK_SIZE / 3)) #define cpu_relax()barrier() +#define cpu_relax_yield() cpu_relax() #define cpu_relax_lowlatency()cpu_relax() #define cpu_sync_pipeline()