Re: svn commit: r352938 - head/sys/arm/include
On Tue, Oct 01, 2019 at 01:53:05PM -0600, Ian Lepore wrote: > On Tue, 2019-10-01 at 22:49 +0300, Konstantin Belousov wrote: > > On Tue, Oct 01, 2019 at 07:39:00PM +, Ian Lepore wrote: > > > Author: ian > > > Date: Tue Oct 1 19:39:00 2019 > > > New Revision: 352938 > > > URL: https://svnweb.freebsd.org/changeset/base/352938 > > > > > > Log: > > > Add 8 and 16 bit versions of atomic_cmpset and atomic_fcmpset for arm. > > > > > > This adds 8 and 16 bit versions of the cmpset and fcmpset functions. > > > Macros > > > are used to generate all the flavors from the same set of instructions; > > > the > > > macro expansion handles the couple minor differences between each size > > > variation (generating ldrexb/ldrexh/ldrex for 8/16/32, etc). > > > > > > In addition to handling new sizes, the instruction sequences used for > > > cmpset > > > and fcmpset are rewritten to be a bit shorter/faster, and the new > > > sequence > > > will not return false when *dst==*old but the store-exclusive fails > > > because > > > of concurrent writers. Instead, it just loops like ldrex/strex sequences > > > normally do until it gets a non-conflicted store. The manpage allows > > > LL/SC > > > architectures to bogusly return false, but there's no reason to > > > actually do > > > so, at least on arm. > > > > The reason is to avoid nested loops. The outer control for retry was the > > initial design decision for fcmpset() comparing to cmpset(). casueword() > > also started following this approach after the fixes for ll/sc looping > > after the external control. > > If the implementation is forbidden from looping, then the manpage > should say so. What I commited meets the requirements currently stated > in the manpage. Until somebody explains to me why it is somehow > harmful to return the RIGHT information at a cost of either 0 or 1 > extra cpu cycle, it's staying the way it is. Implementation is not forbidden from looping, but looping definitely deteriorates the quality of the implementation. Problem with the loop is that the outer caller does not have control over the inner loop, while it is the outer caller who knows more about terminating conditions. I can only point out casueword(9) where inner looping is causing CVE-level issues. I do not believe that any of atomic(9) primitives on ll/sc machines cause the CVE for us right now (but casueword did). Note that even x86 does not use the comparision with dest, but rely on the CPU flag to provide the result (this is the closest analog of not looping for CAS). ___ svn-src-all@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"
Re: svn commit: r352938 - head/sys/arm/include
On Tue, 2019-10-01 at 22:49 +0300, Konstantin Belousov wrote: > On Tue, Oct 01, 2019 at 07:39:00PM +, Ian Lepore wrote: > > Author: ian > > Date: Tue Oct 1 19:39:00 2019 > > New Revision: 352938 > > URL: https://svnweb.freebsd.org/changeset/base/352938 > > > > Log: > > Add 8 and 16 bit versions of atomic_cmpset and atomic_fcmpset for arm. > > > > This adds 8 and 16 bit versions of the cmpset and fcmpset functions. > > Macros > > are used to generate all the flavors from the same set of instructions; > > the > > macro expansion handles the couple minor differences between each size > > variation (generating ldrexb/ldrexh/ldrex for 8/16/32, etc). > > > > In addition to handling new sizes, the instruction sequences used for > > cmpset > > and fcmpset are rewritten to be a bit shorter/faster, and the new sequence > > will not return false when *dst==*old but the store-exclusive fails > > because > > of concurrent writers. Instead, it just loops like ldrex/strex sequences > > normally do until it gets a non-conflicted store. The manpage allows LL/SC > > architectures to bogusly return false, but there's no reason to actually > > do > > so, at least on arm. > > The reason is to avoid nested loops. The outer control for retry was the > initial design decision for fcmpset() comparing to cmpset(). casueword() > also started following this approach after the fixes for ll/sc looping > after the external control. If the implementation is forbidden from looping, then the manpage should say so. What I commited meets the requirements currently stated in the manpage. Until somebody explains to me why it is somehow harmful to return the RIGHT information at a cost of either 0 or 1 extra cpu cycle, it's staying the way it is. -- Ian ___ svn-src-all@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"
Re: svn commit: r352938 - head/sys/arm/include
On Tue, Oct 01, 2019 at 07:39:00PM +, Ian Lepore wrote: > Author: ian > Date: Tue Oct 1 19:39:00 2019 > New Revision: 352938 > URL: https://svnweb.freebsd.org/changeset/base/352938 > > Log: > Add 8 and 16 bit versions of atomic_cmpset and atomic_fcmpset for arm. > > This adds 8 and 16 bit versions of the cmpset and fcmpset functions. Macros > are used to generate all the flavors from the same set of instructions; the > macro expansion handles the couple minor differences between each size > variation (generating ldrexb/ldrexh/ldrex for 8/16/32, etc). > > In addition to handling new sizes, the instruction sequences used for cmpset > and fcmpset are rewritten to be a bit shorter/faster, and the new sequence > will not return false when *dst==*old but the store-exclusive fails because > of concurrent writers. Instead, it just loops like ldrex/strex sequences > normally do until it gets a non-conflicted store. The manpage allows LL/SC > architectures to bogusly return false, but there's no reason to actually do > so, at least on arm. The reason is to avoid nested loops. The outer control for retry was the initial design decision for fcmpset() comparing to cmpset(). casueword() also started following this approach after the fixes for ll/sc looping after the external control. ___ svn-src-all@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"
svn commit: r352938 - head/sys/arm/include
Author: ian Date: Tue Oct 1 19:39:00 2019 New Revision: 352938 URL: https://svnweb.freebsd.org/changeset/base/352938 Log: Add 8 and 16 bit versions of atomic_cmpset and atomic_fcmpset for arm. This adds 8 and 16 bit versions of the cmpset and fcmpset functions. Macros are used to generate all the flavors from the same set of instructions; the macro expansion handles the couple minor differences between each size variation (generating ldrexb/ldrexh/ldrex for 8/16/32, etc). In addition to handling new sizes, the instruction sequences used for cmpset and fcmpset are rewritten to be a bit shorter/faster, and the new sequence will not return false when *dst==*old but the store-exclusive fails because of concurrent writers. Instead, it just loops like ldrex/strex sequences normally do until it gets a non-conflicted store. The manpage allows LL/SC architectures to bogusly return false, but there's no reason to actually do so, at least on arm. Reviewed by: cognet Modified: head/sys/arm/include/atomic-v4.h head/sys/arm/include/atomic-v6.h Modified: head/sys/arm/include/atomic-v4.h == --- head/sys/arm/include/atomic-v4.hTue Oct 1 18:32:27 2019 (r352937) +++ head/sys/arm/include/atomic-v4.hTue Oct 1 19:39:00 2019 (r352938) @@ -113,6 +113,43 @@ atomic_clear_64(volatile uint64_t *address, uint64_t c } static __inline int +atomic_fcmpset_8(volatile uint8_t *p, volatile uint8_t *cmpval, volatile uint8_t newval) +{ + int ret; + + __with_interrupts_disabled( +{ + ret = *p; + if (*p == *cmpval) { + *p = newval; + ret = 1; + } else { + *cmpval = *p; + ret = 0; + } + }); + return (ret); +} +static __inline int +atomic_fcmpset_16(volatile uint16_t *p, volatile uint16_t *cmpval, volatile uint16_t newval) +{ + int ret; + + __with_interrupts_disabled( +{ + ret = *p; + if (*p == *cmpval) { + *p = newval; + ret = 1; + } else { + *cmpval = *p; + ret = 0; + } + }); + return (ret); +} + +static __inline int atomic_fcmpset_32(volatile u_int32_t *p, volatile u_int32_t *cmpval, volatile u_int32_t newval) { int ret; @@ -150,6 +187,40 @@ atomic_fcmpset_64(volatile u_int64_t *p, volatile u_in } static __inline int +atomic_cmpset_8(volatile uint8_t *p, volatile uint8_t cmpval, volatile uint8_t newval) +{ + int ret; + + __with_interrupts_disabled( +{ + if (*p == cmpval) { + *p = newval; + ret = 1; + } else { + ret = 0; + } + }); + return (ret); +} + +static __inline int +atomic_cmpset_16(volatile uint16_t *p, volatile uint16_t cmpval, volatile uint16_t newval) +{ + int ret; + + __with_interrupts_disabled( +{ + if (*p == cmpval) { + *p = newval; + ret = 1; + } else { + ret = 0; + } + }); + return (ret); +} + +static __inline int atomic_cmpset_32(volatile u_int32_t *p, volatile u_int32_t cmpval, volatile u_int32_t newval) { int ret; @@ -450,6 +521,10 @@ atomic_swap_32(volatile u_int32_t *p, u_int32_t v) #define atomic_fcmpset_rel_32 atomic_fcmpset_32 #define atomic_fcmpset_acq_32 atomic_fcmpset_32 #ifdef _KERNEL +#define atomic_fcmpset_rel_8 atomic_fcmpset_8 +#define atomic_fcmpset_acq_8 atomic_fcmpset_8 +#define atomic_fcmpset_rel_16 atomic_fcmpset_16 +#define atomic_fcmpset_acq_16 atomic_fcmpset_16 #define atomic_fcmpset_rel_64 atomic_fcmpset_64 #define atomic_fcmpset_acq_64 atomic_fcmpset_64 #endif @@ -458,6 +533,10 @@ atomic_swap_32(volatile u_int32_t *p, u_int32_t v) #define atomic_cmpset_rel_32 atomic_cmpset_32 #define atomic_cmpset_acq_32 atomic_cmpset_32 #ifdef _KERNEL +#define atomic_cmpset_rel_8atomic_cmpset_8 +#define atomic_cmpset_acq_8atomic_cmpset_8 +#define atomic_cmpset_rel_16 atomic_cmpset_16 +#define atomic_cmpset_acq_16 atomic_cmpset_16 #define atomic_cmpset_rel_64 atomic_cmpset_64 #define atomic_cmpset_acq_64 atomic_cmpset_64 #endif Modified: head/sys/arm/include/atomic-v6.h == --- head/sys/arm/include/atomic-v6.hTue Oct 1 18:32:27 2019 (r352937) +++ head/sys/arm/include/atomic-v6.hTue Oct 1 19:39:00 2019 (r352938) @@ -190,224 +190,380 @@ ATOMIC_ACQ_REL(clear, 32) ATOMIC_ACQ_REL(clear, 64) ATOMIC_ACQ_REL_LONG(clear) +#define ATOMIC_FCMPSET_CODE(RET, TYPE, SUF) \ +