Re: svn commit: r352938 - head/sys/arm/include

2019-10-02 Thread Konstantin Belousov
On Tue, Oct 01, 2019 at 01:53:05PM -0600, Ian Lepore wrote:
> On Tue, 2019-10-01 at 22:49 +0300, Konstantin Belousov wrote:
> > On Tue, Oct 01, 2019 at 07:39:00PM +, Ian Lepore wrote:
> > > Author: ian
> > > Date: Tue Oct  1 19:39:00 2019
> > > New Revision: 352938
> > > URL: https://svnweb.freebsd.org/changeset/base/352938
> > > 
> > > Log:
> > >   Add 8 and 16 bit versions of atomic_cmpset and atomic_fcmpset for arm.
> > >   
> > >   This adds 8 and 16 bit versions of the cmpset and fcmpset functions. 
> > > Macros
> > >   are used to generate all the flavors from the same set of instructions; 
> > > the
> > >   macro expansion handles the couple minor differences between each size
> > >   variation (generating ldrexb/ldrexh/ldrex for 8/16/32, etc).
> > >   
> > >   In addition to handling new sizes, the instruction sequences used for 
> > > cmpset
> > >   and fcmpset are rewritten to be a bit shorter/faster, and the new 
> > > sequence
> > >   will not return false when *dst==*old but the store-exclusive fails 
> > > because
> > >   of concurrent writers. Instead, it just loops like ldrex/strex sequences
> > >   normally do until it gets a non-conflicted store. The manpage allows 
> > > LL/SC
> > >   architectures to bogusly return false, but there's no reason to 
> > > actually do
> > >   so, at least on arm.
> > 
> > The reason is to avoid nested loops.  The outer control for retry was the
> > initial design decision for fcmpset() comparing to cmpset().  casueword()
> > also started following this approach after the fixes for ll/sc looping
> > after the external control.
> 
> If the implementation is forbidden from looping, then the manpage
> should say so.  What I commited meets the requirements currently stated
> in the manpage.  Until somebody explains to me why it is somehow
> harmful to return the RIGHT information at a cost of either 0 or 1
> extra cpu cycle, it's staying the way it is.

Implementation is not forbidden from looping, but looping definitely
deteriorates the quality of the implementation.

Problem with the loop is that the outer caller does not have control
over the inner loop, while it is the outer caller who knows more about
terminating conditions.

I can only point out casueword(9) where inner looping is causing CVE-level
issues.  I do not believe that any of atomic(9) primitives on ll/sc machines
cause the CVE for us right now (but casueword did).

Note that even x86 does not use the comparision with dest, but rely on
the CPU flag to provide the result (this is the closest analog of not
looping for CAS).
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r352938 - head/sys/arm/include

2019-10-01 Thread Ian Lepore
On Tue, 2019-10-01 at 22:49 +0300, Konstantin Belousov wrote:
> On Tue, Oct 01, 2019 at 07:39:00PM +, Ian Lepore wrote:
> > Author: ian
> > Date: Tue Oct  1 19:39:00 2019
> > New Revision: 352938
> > URL: https://svnweb.freebsd.org/changeset/base/352938
> > 
> > Log:
> >   Add 8 and 16 bit versions of atomic_cmpset and atomic_fcmpset for arm.
> >   
> >   This adds 8 and 16 bit versions of the cmpset and fcmpset functions. 
> > Macros
> >   are used to generate all the flavors from the same set of instructions; 
> > the
> >   macro expansion handles the couple minor differences between each size
> >   variation (generating ldrexb/ldrexh/ldrex for 8/16/32, etc).
> >   
> >   In addition to handling new sizes, the instruction sequences used for 
> > cmpset
> >   and fcmpset are rewritten to be a bit shorter/faster, and the new sequence
> >   will not return false when *dst==*old but the store-exclusive fails 
> > because
> >   of concurrent writers. Instead, it just loops like ldrex/strex sequences
> >   normally do until it gets a non-conflicted store. The manpage allows LL/SC
> >   architectures to bogusly return false, but there's no reason to actually 
> > do
> >   so, at least on arm.
> 
> The reason is to avoid nested loops.  The outer control for retry was the
> initial design decision for fcmpset() comparing to cmpset().  casueword()
> also started following this approach after the fixes for ll/sc looping
> after the external control.

If the implementation is forbidden from looping, then the manpage
should say so.  What I commited meets the requirements currently stated
in the manpage.  Until somebody explains to me why it is somehow
harmful to return the RIGHT information at a cost of either 0 or 1
extra cpu cycle, it's staying the way it is.

-- Ian


___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r352938 - head/sys/arm/include

2019-10-01 Thread Konstantin Belousov
On Tue, Oct 01, 2019 at 07:39:00PM +, Ian Lepore wrote:
> Author: ian
> Date: Tue Oct  1 19:39:00 2019
> New Revision: 352938
> URL: https://svnweb.freebsd.org/changeset/base/352938
> 
> Log:
>   Add 8 and 16 bit versions of atomic_cmpset and atomic_fcmpset for arm.
>   
>   This adds 8 and 16 bit versions of the cmpset and fcmpset functions. Macros
>   are used to generate all the flavors from the same set of instructions; the
>   macro expansion handles the couple minor differences between each size
>   variation (generating ldrexb/ldrexh/ldrex for 8/16/32, etc).
>   
>   In addition to handling new sizes, the instruction sequences used for cmpset
>   and fcmpset are rewritten to be a bit shorter/faster, and the new sequence
>   will not return false when *dst==*old but the store-exclusive fails because
>   of concurrent writers. Instead, it just loops like ldrex/strex sequences
>   normally do until it gets a non-conflicted store. The manpage allows LL/SC
>   architectures to bogusly return false, but there's no reason to actually do
>   so, at least on arm.

The reason is to avoid nested loops.  The outer control for retry was the
initial design decision for fcmpset() comparing to cmpset().  casueword()
also started following this approach after the fixes for ll/sc looping
after the external control.
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


svn commit: r352938 - head/sys/arm/include

2019-10-01 Thread Ian Lepore
Author: ian
Date: Tue Oct  1 19:39:00 2019
New Revision: 352938
URL: https://svnweb.freebsd.org/changeset/base/352938

Log:
  Add 8 and 16 bit versions of atomic_cmpset and atomic_fcmpset for arm.
  
  This adds 8 and 16 bit versions of the cmpset and fcmpset functions. Macros
  are used to generate all the flavors from the same set of instructions; the
  macro expansion handles the couple minor differences between each size
  variation (generating ldrexb/ldrexh/ldrex for 8/16/32, etc).
  
  In addition to handling new sizes, the instruction sequences used for cmpset
  and fcmpset are rewritten to be a bit shorter/faster, and the new sequence
  will not return false when *dst==*old but the store-exclusive fails because
  of concurrent writers. Instead, it just loops like ldrex/strex sequences
  normally do until it gets a non-conflicted store. The manpage allows LL/SC
  architectures to bogusly return false, but there's no reason to actually do
  so, at least on arm.
  
  Reviewed by:  cognet

Modified:
  head/sys/arm/include/atomic-v4.h
  head/sys/arm/include/atomic-v6.h

Modified: head/sys/arm/include/atomic-v4.h
==
--- head/sys/arm/include/atomic-v4.hTue Oct  1 18:32:27 2019
(r352937)
+++ head/sys/arm/include/atomic-v4.hTue Oct  1 19:39:00 2019
(r352938)
@@ -113,6 +113,43 @@ atomic_clear_64(volatile uint64_t *address, uint64_t c
 }
 
 static __inline int
+atomic_fcmpset_8(volatile uint8_t *p, volatile uint8_t *cmpval, volatile 
uint8_t newval)
+{
+   int ret;
+
+   __with_interrupts_disabled(
+{
+   ret = *p;
+   if (*p == *cmpval) {
+   *p = newval;
+   ret = 1;
+   } else {
+   *cmpval = *p;
+   ret = 0;
+   }
+   });
+   return (ret);
+}
+static __inline int
+atomic_fcmpset_16(volatile uint16_t *p, volatile uint16_t *cmpval, volatile 
uint16_t newval)
+{
+   int ret;
+
+   __with_interrupts_disabled(
+{
+   ret = *p;
+   if (*p == *cmpval) {
+   *p = newval;
+   ret = 1;
+   } else {
+   *cmpval = *p;
+   ret = 0;
+   }
+   });
+   return (ret);
+}
+
+static __inline int
 atomic_fcmpset_32(volatile u_int32_t *p, volatile u_int32_t *cmpval, volatile 
u_int32_t newval)
 {
int ret;
@@ -150,6 +187,40 @@ atomic_fcmpset_64(volatile u_int64_t *p, volatile u_in
 }
 
 static __inline int
+atomic_cmpset_8(volatile uint8_t *p, volatile uint8_t cmpval, volatile uint8_t 
newval)
+{
+   int ret;
+
+   __with_interrupts_disabled(
+{
+   if (*p == cmpval) {
+   *p = newval;
+   ret = 1;
+   } else {
+   ret = 0;
+   }
+   });
+   return (ret);
+}
+
+static __inline int
+atomic_cmpset_16(volatile uint16_t *p, volatile uint16_t cmpval, volatile 
uint16_t newval)
+{
+   int ret;
+
+   __with_interrupts_disabled(
+{
+   if (*p == cmpval) {
+   *p = newval;
+   ret = 1;
+   } else {
+   ret = 0;
+   }
+   });
+   return (ret);
+}
+
+static __inline int
 atomic_cmpset_32(volatile u_int32_t *p, volatile u_int32_t cmpval, volatile 
u_int32_t newval)
 {
int ret;
@@ -450,6 +521,10 @@ atomic_swap_32(volatile u_int32_t *p, u_int32_t v)
 #define atomic_fcmpset_rel_32  atomic_fcmpset_32
 #define atomic_fcmpset_acq_32  atomic_fcmpset_32
 #ifdef _KERNEL
+#define atomic_fcmpset_rel_8   atomic_fcmpset_8
+#define atomic_fcmpset_acq_8   atomic_fcmpset_8
+#define atomic_fcmpset_rel_16  atomic_fcmpset_16
+#define atomic_fcmpset_acq_16  atomic_fcmpset_16
 #define atomic_fcmpset_rel_64  atomic_fcmpset_64
 #define atomic_fcmpset_acq_64  atomic_fcmpset_64
 #endif
@@ -458,6 +533,10 @@ atomic_swap_32(volatile u_int32_t *p, u_int32_t v)
 #define atomic_cmpset_rel_32   atomic_cmpset_32
 #define atomic_cmpset_acq_32   atomic_cmpset_32
 #ifdef _KERNEL
+#define atomic_cmpset_rel_8atomic_cmpset_8
+#define atomic_cmpset_acq_8atomic_cmpset_8
+#define atomic_cmpset_rel_16   atomic_cmpset_16
+#define atomic_cmpset_acq_16   atomic_cmpset_16
 #define atomic_cmpset_rel_64   atomic_cmpset_64
 #define atomic_cmpset_acq_64   atomic_cmpset_64
 #endif

Modified: head/sys/arm/include/atomic-v6.h
==
--- head/sys/arm/include/atomic-v6.hTue Oct  1 18:32:27 2019
(r352937)
+++ head/sys/arm/include/atomic-v6.hTue Oct  1 19:39:00 2019
(r352938)
@@ -190,224 +190,380 @@ ATOMIC_ACQ_REL(clear, 32)
 ATOMIC_ACQ_REL(clear, 64)
 ATOMIC_ACQ_REL_LONG(clear)
 
+#define ATOMIC_FCMPSET_CODE(RET, TYPE, SUF)   \
+