Re: __{read,write}_once

2019-11-29 Thread Taylor R Campbell
> Date: Sun, 24 Nov 2019 19:25:52 +
> From: Taylor R Campbell 
> 
> This thread is not converging on consensus, so we're discussing the
> semantics and naming of these operations as core and will come back
> with a decision by the end of the week.

We (core) carefully read the thread, and discussed this and the
related Linux READ_ONCE/WRITE_ONCE macros as well as the C11 atomic
API.

   For maxv: Please add conditional definitions in 
   according to what KCSAN needs, and use atomic_load/store_relaxed
   for counters and other integer objects in the rest of your patch.
   (I didn't see any pointer loads there.)  For uvm's lossy counters,
   please use atomic_store_relaxed(p, 1 + atomic_load_relaxed(p)) and
   not an __add_once macro -- since these should really be per-CPU
   counters, we don't want to endorse this pattern by making it
   pretty.

* Summary

We added a few macros to  for the purpose,
atomic_load_(p) and atomic_store_(p,v).  The
orderings are relaxed, acquire, consume, and release, and are intended
to match C11 semantics.  See the new atomic_loadstore(9) man page for
reference.

Currently they are defined in terms of volatile loads and stores, but
we should eventually use the C11 atomic API instead in order to
provide the intended atomicity guarantees under all compilers without
having to rely on the folklore interpretations of volatile.

* Details

There are four main properties involved in the operations under
discussion:

1. No tearing.  A 32-bit write can't be split into two separate 16-bit
   writes, for instance.

   * In _some_ cases, namely aligned pointers to sufficiently small
 objects, Linux READ_ONCE/WRITE_ONCE guarantee no tearing.

   * C11 atomic_load/store guarantees no tearing -- although on large
 objects it may involve locks, requiring the C11 type qualifier
 _Atomic and changing the ABI.

   This was the primary motivation for maxv's original question.

2. No fusing.  Consecutive writes can't be combined into one, for
   instance, or a write followed by a read can't skip the read to
   return the value that was written.

   * Linux's READ_ONCE/WRITE_ONCE and C11's atomic_load/store
 guarantee no fusing.

3. Data-dependent memory ordering.  If you read a pointer, and then
   dereference the pointer (maybe plus some offset), the reads happen
   in that order.

   * Linux's READ_ONCE guarantees this by issuing the analogue of
 membar_datadep_consumer on DEC Alpha, and nothing on other CPUs.

   * C11's atomic_load guarantees this with seq_cst, acquire, or
 consume memory ordering.

4. Cost.  There's no need to incur cost of read/modify/write atomic
   operations, and for many purposes, no need to incur cost of
   memory-ordering barriers.

To express these, we've decided to add a few macros that are similar
to Linux's READ_ONCE/WRITE_ONCE and C11's atomic_load/store_explicit
but are less error-prone and less cumbersome:

#include 

- atomic_load_relaxed(p) is like *p, but guarantees no tearing and no
  fusing.  No ordering relative to memory operations on other objects
  is guaranteed.  

- atomic_store_relaxed(p, v) is like *p = v, but guarantees no tearing
  and no fusing.  No ordering relative to memory operations on other
  objects is guaranteed.

- atomic_store_release(p, v) and atomic_load_acquire(p) are,
  respectively, like *p = v and *p, but guarantee no tearing and no
  fusing.  They _also_ guarantee for logic like

Thread AThread B

stuff();
atomic_store_release(p, v);
u = atomic_load_acquire(p);
things();

  that _if_ the atomic_load_acquire(p) in thread B witnesses the state
  of the object at p set by atomic_store_release(p, v) in thread A,
  then all memory operations in stuff() happen before any memory
  operations in things().

  No guarantees if only one thread participates -- the store-release
  and load-acquire _must_ be paired.

- atomic_load_consume(p) is like atomic_load_acquire(p), but it only
  guarantees ordering for data-dependent memory references.  Like
  atomic_load_acquire, it must be paired with atomic_store_release.
  However, on most CPUs, it is as _cheap_ as atomic_load_relaxed.

The atomic load/store operations are defined _only_ on objects as
large as the architecture can support -- so, for example, on 32-bit
platforms they cannot be used on 64-bit quantities; attempts to do so
will lead to compile-time errors.  They are also defined _only_ on
aligned pointers -- using them on unaligned pointers may lead to
run-time crashes, even on architectures without strict alignment
requirements.

* Why the names atomic_{load,store}_?

- Atomic.  Although `atomic' may suggest `expensive' to some people
  (and I'm guilty of making that connection in the past), what's
  really expensive is atomic _read/modify/write_ operations and
  _memory ordering guarantees_.

  Merely preventing tearing 

Re: __{read,write}_once

2019-11-25 Thread Edgar Fuß
> I feel like load_entire() and store_entire() get to the heart of the
> matter while being easy to speak, but _fully() or _completely() seem
> fine.
_integral()?


Re: __{read,write}_once

2019-11-24 Thread Taylor R Campbell
> Date: Wed, 6 Nov 2019 12:31:37 +0100
> From: Maxime Villard 
> 
> There are cases in the kernel where we read/write global memory locklessly,
> and accept the races either because it is part of the design (eg low-level
> scheduling) or we simply don't care (eg global stats).
> 
> In these cases, we want to access the memory only once, and need to ensure
> the compiler does not split that access in several pieces, some of which
> may be changed by concurrent accesses. There is a Linux article [1] about
> this, and also [2]. I'd like to introduce the following macros:
> 
>   __read_once(x)
>   __write_once(x, val)

This thread is not converging on consensus, so we're discussing the
semantics and naming of these operations as core and will come back
with a decision by the end of the week.

Thanks,
-Riastradh, on behalf of core


Re: __{read,write}_once

2019-11-22 Thread Kamil Rytarowski
On 22.11.2019 02:42, Robert Elz wrote:
> Date:Fri, 22 Nov 2019 01:04:56 +0100
> From:Kamil Rytarowski 
> Message-ID:  <1a9d9b40-42fe-be08-d7b3-e6ecead5b...@gmx.com>
> 
> 
>   | I think that picking C11 terminology is the way forward.
> 
> Use a name like that iff the intent is to also exactly match the
> semantics implied, otherwise it will only create more confusion.
> 
> kre
> 

I think this matches. We want to make operation in 'single operation'
(or looking like so) and 'atomic' is a settled name in English (there
are options like: 'integral', 'intrinsic', 'elemental', 'essential',
'fundamental' or 'single operation').

If there are stronger feelings against it, I would go for
'__write_singleop()', '__read_singleop()'.



signature.asc
Description: OpenPGP digital signature


Re: __{read,write}_once

2019-11-22 Thread Andreas Gustafsson
Maxime Villard wrote:
> How about _onestep()? Or _nosplit()?

Or just conclude that there are no good options and we might as well
call it _once() to be compatible with other OSes.
-- 
Andreas Gustafsson, g...@gson.org


Re: __{read,write}_once

2019-11-22 Thread Maxime Villard

Le 21/11/2019 à 23:08, David Young a écrit :

On Thu, Nov 21, 2019 at 07:19:51PM +0100, Maxime Villard wrote:

Le 18/11/2019 à 19:49, David Holland a écrit :

On Sun, Nov 17, 2019 at 02:35:43PM +, Mindaugas Rasiukevicius wrote:
  > David Holland  wrote:
  > >  > I see the potential source of confusion, but just think about: what
  > >  > could "atomic" possibly mean for loads or stores?  A load is just a
  > >  > load, there is no RMW cycle, conceptually; inter-locking the operation
  > >  > does not make sense.  For anybody who attempts to reason about this,
  > >  > it should not be too confusing, plus there are man pages.
  > >
  > > ...that it's not torn.
  > >
  > > As far as names... there are increasingly many slightly different
  > > types of atomic and semiatomic operations.
  > >
  > > I think it would be helpful if someone came up with a comprehensive
  > > naming scheme for all of them (including ones we don't currently have
  > > that we're moderately likely to end up with later...)
  >
  > Yes, the meaning of "atomic" has different flavours and describes different
  > set of properties in different fields (operating systems vs databases vs
  > distributed systems vs ...) and, as we can see, even within the fields.
  >
  > Perhaps not ideal, but "atomic loads/stores" and "relaxed" are already the
  > dominant terms.

Yes, but "relaxed" means something else... let me be clearer since I
wasn't before: I would expect e.g. atomic_inc_32_relaxed() to be
distinguished from atomic_inc_32() or maybe atomic_inc_32_ordered() by
whether or not multiple instances of it are globally ordered, not by
whether or not it's actually atomic relative to other cpus.

Checking the prior messages indicates we aren't currently talking
about atomic_inc_32_relaxed() but only about atomic_load_32_relaxed()
and atomic_store_32_relaxed(), which would be used together to
generate a local counter. This is less misleading, but I'm still not
convinced it's a good choice of names given that we might reasonably
later on want to have atomic_inc_32_relaxed() and
atomic_inc_32_ordered() that differ as above.

  > I think it is pointless to attempt to reinvent the wheel
  > here.  It is terminology used by C11 (and C++11) and accepted by various
  > technical literature and, at this point, by academia (if you look at the
  > recent papers on memory models -- it's pretty much settled).  These terms
  > are not too bad; it could be worse; and this bit is certainly not the worst
  > part of C11.  So, I would just move on.

Is it settled? My experience with the academic literature has been
that everyone uses their own terminology and the same words are
routinely found to have subtly different meanings from one paper to
the next and occasionally even within the same paper. :-/  But I
haven't been paying much attention lately due to being preoccupied
with other things.


So in the end which name do we use? Are people really unhappy with _racy()?
At least it has a general meaning, and does not imply atomicity or ordering.


_racy() doesn't really get at the intended meaning, and everything in
C11 is racy unless you arrange otherwise by using mutexes, atomics, etc.
The suffix has very little content.

Names such as load_/store_fully() or load_/store_entire() or
load_/store_completely() get to the actual semantics: at the program
step implied by the load_/store_entire() expression, the memory
constituting the named variable is loaded/stored in its entirety.  In
other words, the load cannot be drawn out over more than one local
program step.  Whether or not the load/store is performed in one step
with respect to interrupts or other threads is not defined.


Sounds like the right definition of the semantic. However:


I feel like load_entire() and store_entire() get to the heart of the
matter while being easy to speak, but _fully() or _completely() seem
fine.


I'm not a really big fan of _entire(), because it gives the impression that
the load/store is not entire otherwise - which it is, but just possibly not
in one step.

How about _onestep()? Or _nosplit()?


Re: __{read,write}_once

2019-11-21 Thread Martin Husemann
On Fri, Nov 22, 2019 at 08:42:19AM +0700, Robert Elz wrote:
> Date:Fri, 22 Nov 2019 01:04:56 +0100
> From:Kamil Rytarowski 
> Message-ID:  <1a9d9b40-42fe-be08-d7b3-e6ecead5b...@gmx.com>
> 
> 
>   | I think that picking C11 terminology is the way forward.
> 
> Use a name like that iff the intent is to also exactly match the
> semantics implied, otherwise it will only create more confusion.

And if the implied semantics isn't even totally clear, do not hide
it behind a macro at all?

Martin


re: __{read,write}_once

2019-11-21 Thread matthew green
_always()?


Re: __{read,write}_once

2019-11-21 Thread Robert Elz
Date:Fri, 22 Nov 2019 01:04:56 +0100
From:Kamil Rytarowski 
Message-ID:  <1a9d9b40-42fe-be08-d7b3-e6ecead5b...@gmx.com>


  | I think that picking C11 terminology is the way forward.

Use a name like that iff the intent is to also exactly match the
semantics implied, otherwise it will only create more confusion.

kre



Re: __{read,write}_once

2019-11-21 Thread David Young
On Thu, Nov 21, 2019 at 07:19:51PM +0100, Maxime Villard wrote:
> Le 18/11/2019 à 19:49, David Holland a écrit :
> >On Sun, Nov 17, 2019 at 02:35:43PM +, Mindaugas Rasiukevicius wrote:
> >  > David Holland  wrote:
> >  > >  > I see the potential source of confusion, but just think about: what
> >  > >  > could "atomic" possibly mean for loads or stores?  A load is just a
> >  > >  > load, there is no RMW cycle, conceptually; inter-locking the 
> > operation
> >  > >  > does not make sense.  For anybody who attempts to reason about this,
> >  > >  > it should not be too confusing, plus there are man pages.
> >  > >
> >  > > ...that it's not torn.
> >  > >
> >  > > As far as names... there are increasingly many slightly different
> >  > > types of atomic and semiatomic operations.
> >  > >
> >  > > I think it would be helpful if someone came up with a comprehensive
> >  > > naming scheme for all of them (including ones we don't currently have
> >  > > that we're moderately likely to end up with later...)
> >  >
> >  > Yes, the meaning of "atomic" has different flavours and describes 
> > different
> >  > set of properties in different fields (operating systems vs databases vs
> >  > distributed systems vs ...) and, as we can see, even within the fields.
> >  >
> >  > Perhaps not ideal, but "atomic loads/stores" and "relaxed" are already 
> > the
> >  > dominant terms.
> >
> >Yes, but "relaxed" means something else... let me be clearer since I
> >wasn't before: I would expect e.g. atomic_inc_32_relaxed() to be
> >distinguished from atomic_inc_32() or maybe atomic_inc_32_ordered() by
> >whether or not multiple instances of it are globally ordered, not by
> >whether or not it's actually atomic relative to other cpus.
> >
> >Checking the prior messages indicates we aren't currently talking
> >about atomic_inc_32_relaxed() but only about atomic_load_32_relaxed()
> >and atomic_store_32_relaxed(), which would be used together to
> >generate a local counter. This is less misleading, but I'm still not
> >convinced it's a good choice of names given that we might reasonably
> >later on want to have atomic_inc_32_relaxed() and
> >atomic_inc_32_ordered() that differ as above.
> >
> >  > I think it is pointless to attempt to reinvent the wheel
> >  > here.  It is terminology used by C11 (and C++11) and accepted by various
> >  > technical literature and, at this point, by academia (if you look at the
> >  > recent papers on memory models -- it's pretty much settled).  These terms
> >  > are not too bad; it could be worse; and this bit is certainly not the 
> > worst
> >  > part of C11.  So, I would just move on.
> >
> >Is it settled? My experience with the academic literature has been
> >that everyone uses their own terminology and the same words are
> >routinely found to have subtly different meanings from one paper to
> >the next and occasionally even within the same paper. :-/  But I
> >haven't been paying much attention lately due to being preoccupied
> >with other things.
> 
> So in the end which name do we use? Are people really unhappy with _racy()?
> At least it has a general meaning, and does not imply atomicity or ordering.

_racy() doesn't really get at the intended meaning, and everything in
C11 is racy unless you arrange otherwise by using mutexes, atomics, etc.
The suffix has very little content.

Names such as load_/store_fully() or load_/store_entire() or
load_/store_completely() get to the actual semantics: at the program
step implied by the load_/store_entire() expression, the memory
constituting the named variable is loaded/stored in its entirety.  In
other words, the load cannot be drawn out over more than one local
program step.  Whether or not the load/store is performed in one step
with respect to interrupts or other threads is not defined.

I feel like load_entire() and store_entire() get to the heart of the
matter while being easy to speak, but _fully() or _completely() seem
fine.

Dave

-- 
David Young
dyo...@pobox.comUrbana, IL(217) 721-9981


Re: __{read,write}_once

2019-11-21 Thread Kamil Rytarowski
On 22.11.2019 00:53, Robert Elz wrote:
> Date:Thu, 21 Nov 2019 19:19:51 +0100
> From:Maxime Villard 
> Message-ID:  
> 
>   | So in the end which name do we use? Are people really unhappy with 
> _racy()?
>   | At least it has a general meaning, and does not imply atomicity or 
> ordering.
> 
> I dislike naming discussions, as in general, while there are often a
> bunch of obviously incorrect names (here for example, read_page_data() ...)
> there is very rarely one obviously right answer, and it all really just 
> becomes
> a matter of choice for whoever is doing it.
> 
> Nevertheless, perhaps something that says more what is actually happening,
> rather than mentions what doesn't matter (races here), so perhaps something
> like {read,write}_single_cycle() or {read,write}_1_bus_xact() or something
> along those lines ?
> 
> kre
> 

I think that picking C11 terminology is the way forward. It's settled
probably forever now and it will be simpler to find corresponding
documentation and specification of it.



signature.asc
Description: OpenPGP digital signature


Re: __{read,write}_once

2019-11-21 Thread Robert Elz
Date:Thu, 21 Nov 2019 19:19:51 +0100
From:Maxime Villard 
Message-ID:  

  | So in the end which name do we use? Are people really unhappy with _racy()?
  | At least it has a general meaning, and does not imply atomicity or ordering.

I dislike naming discussions, as in general, while there are often a
bunch of obviously incorrect names (here for example, read_page_data() ...)
there is very rarely one obviously right answer, and it all really just becomes
a matter of choice for whoever is doing it.

Nevertheless, perhaps something that says more what is actually happening,
rather than mentions what doesn't matter (races here), so perhaps something
like {read,write}_single_cycle() or {read,write}_1_bus_xact() or something
along those lines ?

kre



Re: __{read,write}_once

2019-11-21 Thread Maxime Villard

Le 18/11/2019 à 19:49, David Holland a écrit :

On Sun, Nov 17, 2019 at 02:35:43PM +, Mindaugas Rasiukevicius wrote:
  > David Holland  wrote:
  > >  > I see the potential source of confusion, but just think about: what
  > >  > could "atomic" possibly mean for loads or stores?  A load is just a
  > >  > load, there is no RMW cycle, conceptually; inter-locking the operation
  > >  > does not make sense.  For anybody who attempts to reason about this,
  > >  > it should not be too confusing, plus there are man pages.
  > >
  > > ...that it's not torn.
  > >
  > > As far as names... there are increasingly many slightly different
  > > types of atomic and semiatomic operations.
  > >
  > > I think it would be helpful if someone came up with a comprehensive
  > > naming scheme for all of them (including ones we don't currently have
  > > that we're moderately likely to end up with later...)
  >
  > Yes, the meaning of "atomic" has different flavours and describes different
  > set of properties in different fields (operating systems vs databases vs
  > distributed systems vs ...) and, as we can see, even within the fields.
  >
  > Perhaps not ideal, but "atomic loads/stores" and "relaxed" are already the
  > dominant terms.

Yes, but "relaxed" means something else... let me be clearer since I
wasn't before: I would expect e.g. atomic_inc_32_relaxed() to be
distinguished from atomic_inc_32() or maybe atomic_inc_32_ordered() by
whether or not multiple instances of it are globally ordered, not by
whether or not it's actually atomic relative to other cpus.

Checking the prior messages indicates we aren't currently talking
about atomic_inc_32_relaxed() but only about atomic_load_32_relaxed()
and atomic_store_32_relaxed(), which would be used together to
generate a local counter. This is less misleading, but I'm still not
convinced it's a good choice of names given that we might reasonably
later on want to have atomic_inc_32_relaxed() and
atomic_inc_32_ordered() that differ as above.

  > I think it is pointless to attempt to reinvent the wheel
  > here.  It is terminology used by C11 (and C++11) and accepted by various
  > technical literature and, at this point, by academia (if you look at the
  > recent papers on memory models -- it's pretty much settled).  These terms
  > are not too bad; it could be worse; and this bit is certainly not the worst
  > part of C11.  So, I would just move on.

Is it settled? My experience with the academic literature has been
that everyone uses their own terminology and the same words are
routinely found to have subtly different meanings from one paper to
the next and occasionally even within the same paper. :-/  But I
haven't been paying much attention lately due to being preoccupied
with other things.


So in the end which name do we use? Are people really unhappy with _racy()?
At least it has a general meaning, and does not imply atomicity or ordering.


Re: __{read,write}_once

2019-11-18 Thread David Holland
On Sun, Nov 17, 2019 at 02:35:43PM +, Mindaugas Rasiukevicius wrote:
 > David Holland  wrote:
 > >  > I see the potential source of confusion, but just think about: what
 > >  > could "atomic" possibly mean for loads or stores?  A load is just a
 > >  > load, there is no RMW cycle, conceptually; inter-locking the operation
 > >  > does not make sense.  For anybody who attempts to reason about this,
 > >  > it should not be too confusing, plus there are man pages.
 > > 
 > > ...that it's not torn.
 > > 
 > > As far as names... there are increasingly many slightly different
 > > types of atomic and semiatomic operations.
 > > 
 > > I think it would be helpful if someone came up with a comprehensive
 > > naming scheme for all of them (including ones we don't currently have
 > > that we're moderately likely to end up with later...)
 > 
 > Yes, the meaning of "atomic" has different flavours and describes different
 > set of properties in different fields (operating systems vs databases vs
 > distributed systems vs ...) and, as we can see, even within the fields.
 > 
 > Perhaps not ideal, but "atomic loads/stores" and "relaxed" are already the
 > dominant terms.

Yes, but "relaxed" means something else... let me be clearer since I
wasn't before: I would expect e.g. atomic_inc_32_relaxed() to be
distinguished from atomic_inc_32() or maybe atomic_inc_32_ordered() by
whether or not multiple instances of it are globally ordered, not by
whether or not it's actually atomic relative to other cpus.

Checking the prior messages indicates we aren't currently talking
about atomic_inc_32_relaxed() but only about atomic_load_32_relaxed()
and atomic_store_32_relaxed(), which would be used together to
generate a local counter. This is less misleading, but I'm still not
convinced it's a good choice of names given that we might reasonably
later on want to have atomic_inc_32_relaxed() and
atomic_inc_32_ordered() that differ as above.

 > I think it is pointless to attempt to reinvent the wheel
 > here.  It is terminology used by C11 (and C++11) and accepted by various
 > technical literature and, at this point, by academia (if you look at the
 > recent papers on memory models -- it's pretty much settled).  These terms
 > are not too bad; it could be worse; and this bit is certainly not the worst
 > part of C11.  So, I would just move on.

Is it settled? My experience with the academic literature has been
that everyone uses their own terminology and the same words are
routinely found to have subtly different meanings from one paper to
the next and occasionally even within the same paper. :-/  But I
haven't been paying much attention lately due to being preoccupied
with other things.

-- 
David A. Holland
dholl...@netbsd.org


Re: __{read,write}_once

2019-11-17 Thread Mindaugas Rasiukevicius
David Holland  wrote:
>  > I see the potential source of confusion, but just think about: what
>  > could "atomic" possibly mean for loads or stores?  A load is just a
>  > load, there is no RMW cycle, conceptually; inter-locking the operation
>  > does not make sense.  For anybody who attempts to reason about this,
>  > it should not be too confusing, plus there are man pages.
> 
> ...that it's not torn.
> 
> As far as names... there are increasingly many slightly different
> types of atomic and semiatomic operations.
> 
> I think it would be helpful if someone came up with a comprehensive
> naming scheme for all of them (including ones we don't currently have
> that we're moderately likely to end up with later...)

Yes, the meaning of "atomic" has different flavours and describes different
set of properties in different fields (operating systems vs databases vs
distributed systems vs ...) and, as we can see, even within the fields.

Perhaps not ideal, but "atomic loads/stores" and "relaxed" are already the
dominant terms.  I think it is pointless to attempt to reinvent the wheel
here.  It is terminology used by C11 (and C++11) and accepted by various
technical literature and, at this point, by academia (if you look at the
recent papers on memory models -- it's pretty much settled).  These terms
are not too bad; it could be worse; and this bit is certainly not the worst
part of C11.  So, I would just move on.

-- 
Mindaugas


Re: __{read,write}_once

2019-11-16 Thread David Holland
On Sat, Nov 16, 2019 at 03:56:35PM +, Mindaugas Rasiukevicius wrote:
 > > > I suggest __atomic_load_relaxed()/__atomic_store_relaxed().
 > > 
 > > What I don't like with "atomic" in the name is that the instructions
 > > generated are not atomic strictly speaking, and I'd rather avoid the
 > > confusion with the really atomic instructions.
 > 
 > I see the potential source of confusion, but just think about: what could
 > "atomic" possibly mean for loads or stores?  A load is just a load, there
 > is no RMW cycle, conceptually; inter-locking the operation does not make
 > sense.  For anybody who attempts to reason about this, it should not be
 > too confusing, plus there are man pages.

...that it's not torn.

As far as names... there are increasingly many slightly different
types of atomic and semiatomic operations.

I think it would be helpful if someone came up with a comprehensive
naming scheme for all of them (including ones we don't currently have
that we're moderately likely to end up with later...)

Ordinarily I would go do this but I am completely swamped :-(


For the moment, I don't especially like *_relaxed as "relaxed" is
normally an ordering property, and while ordering and atomicity are
different things, they're not really separable in the naming scheme.

I might suggest "localatomic". Or maybe just "untorn", but that looks
weird.

however _relaxed is far better than *_once which is really confusing :(

-- 
David A. Holland
dholl...@netbsd.org


Re: __{read,write}_once

2019-11-16 Thread Mouse
> I see the potential source of confusion, but just think about: what
> could "atomic" possibly mean for loads or stores?

The same thing it means for other operations: that all other operations
act as if the load, or store, happened either entirely before or
entirely after every other event.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: __{read,write}_once

2019-11-16 Thread Maxime Villard

Le 16/11/2019 à 15:31, Mindaugas Rasiukevicius a écrit :

Maxime Villard  wrote:

Alright so let's add the macros with volatile (my initial patch). Which
name do we use? I actually like __{read,write}_racy().


I suggest __atomic_load_relaxed()/__atomic_store_relaxed().


What I don't like with "atomic" in the name is that the instructions generated
are not atomic strictly speaking, and I'd rather avoid the confusion with the
really atomic instructions.

But that's not a strong opinion, and I'm fine if we go with it anyway.


If these are to be provided as macros, then I also suggest to ensure that
they provide compiler-level barrier.


You mean __insn_barrier(), right? Surrounding the access (one before, one
after) also, right?


Re: __{read,write}_once

2019-11-16 Thread Mindaugas Rasiukevicius
Maxime Villard  wrote:
> >
> > I suggest __atomic_load_relaxed()/__atomic_store_relaxed().
> 
> What I don't like with "atomic" in the name is that the instructions
> generated are not atomic strictly speaking, and I'd rather avoid the
> confusion with the really atomic instructions.

I see the potential source of confusion, but just think about: what could
"atomic" possibly mean for loads or stores?  A load is just a load, there
is no RMW cycle, conceptually; inter-locking the operation does not make
sense.  For anybody who attempts to reason about this, it should not be
too confusing, plus there are man pages.

> 
> > If these are to be provided as macros, then I also suggest to ensure
> > that they provide compiler-level barrier.
> 
> You mean __insn_barrier(), right? Surrounding the access (one before, one
> after) also, right?

Correct.  Alternatively, they can be provided as real functions.  In C,
function calls generally act as compiler-level barriers (as long as they
are guaranteed to not be optimised out).

-- 
Mindaugas


Re: __{read,write}_once

2019-11-16 Thread Kamil Rytarowski
On 16.11.2019 15:31, Mindaugas Rasiukevicius wrote:
> Maxime Villard  wrote:
>> Alright so let's add the macros with volatile (my initial patch). Which
>> name do we use? I actually like __{read,write}_racy().
> 
> I suggest __atomic_load_relaxed()/__atomic_store_relaxed().  If these
> are to be provided as macros, then I also suggest to ensure that they
> provide compiler-level barrier.
> 

I'm in favor of this naming.



signature.asc
Description: OpenPGP digital signature


Re: __{read,write}_once

2019-11-16 Thread Mindaugas Rasiukevicius
Maxime Villard  wrote:
> Alright so let's add the macros with volatile (my initial patch). Which
> name do we use? I actually like __{read,write}_racy().

I suggest __atomic_load_relaxed()/__atomic_store_relaxed().  If these
are to be provided as macros, then I also suggest to ensure that they
provide compiler-level barrier.

-- 
Mindaugas


Re: __{read,write}_once

2019-11-15 Thread J. Lewis Muir
On 11/15, Maxime Villard wrote:
> Alright so let's add the macros with volatile (my initial patch). Which name 
> do we
> use? I actually like __{read,write}_racy().

+1.  I'm a fan of avoiding the "once" terminology just because the first
thing that comes to my mind is the threading concept of a function that
does something only one time regardless of how many times it is called
and regardless of from which thread it is called.

Lewis


Re: __{read,write}_once

2019-11-15 Thread Maxime Villard

Alright so let's add the macros with volatile (my initial patch). Which name do 
we
use? I actually like __{read,write}_racy().



Re: __{read,write}_once

2019-11-11 Thread Robert Elz
Date:Mon, 11 Nov 2019 21:32:06 +0100
From:Joerg Sonnenberger 
Message-ID:  <2019203206.ga4...@bec.de>

  | The update needs to be uninterruptable on the local CPU in the sense that
  | context switches and interrupts don't tear the R and W part of the RMW cycle
  | apart.

If I understood the original message, even that was not required.

All that was really demanded was that each of the R and W oarts
not be split into pieces where an intervening interrupt might alter
the "other" part, allowing a read to receive, or a write to produce,
a value that neither the basline code, nor the interrupt routine
even intended.

But in
x += 2;
if an interrupt intervenes and does
x += 1;
it is acceptably for the evential reault to be old_x + 1 (which is very
unlikely), old_x + 2, or old_x + 3, but only those.

That's how I understood the "racy is OK" part of it.  Ie: we don't care
about race conditions like this.

What we don't want is if x started out at  and is read in 2
halves, for the base code to read the  half, then the interrupt to
change the value to 0001 and then the base code to read  for the
bottom half, leading to a reasult of 1 in x, rather than 0001 or 00010001
which is not acceptable (or anything else like that, and similarly for
writes).

Or in other words, it is acceptable for some uses for the occasuonal
change to be lost (particuarly for some event counters) if it is not
likely to happen frequently, but not for entirely bogus values to appear.

kre



Re: __{read,write}_once

2019-11-11 Thread Mindaugas Rasiukevicius
Maxime Villard  wrote:
> 
> Alright, thanks.
> 
> Do you think there is a performance degradation with using explicitly
> atomic operations (with the "lock" prefix on x86), compared to just using
> an aligned volatile which may not be exactly atomic in that sense (even
> if it happens to be on x86)?

Yes, interlocked atomic operations have considerable performance overhead
and should be treated as relatively expensive.  Whether it is significant,
of course, depends where and how they are used in the code.

You can find various measurements of the atomic operations on the Internet,
but they can differ quite a lot amongst various CPU models.

Also, just to be clear here: the atomic loads/stores (both the C11 routines
and READ_ONCE/WRITE_ONCE) are *not* interlocked operations.

> 
> Typically in sys/uvm/uvm_fault.c there are several lockless stat
> increments like the following:
> 
>   /* Increment the counters.*/
>   uvmexp.fltanget++;
> 
> In your (not just rmind@, but tech-kern@) opinion, is it better to switch
> to: atomic_inc_uint(&uvmexp.fltanget);
> or to
>   __add_once(uvmexp.fltanget, 1);
> as I did in my patch? Considering that the latter may possibly not be
> atomic on certain arches, which could possibly result in garbage when the
> value is read.

In this case it is assumed that inaccurate statistics is better than the
expense of atomic operations.  That is the reason why they are non-atomic
increments, although to prevent from garbage (not only the losses of adds),
they should use atomic loads/stores (or just volatile for now) on a word.

Generally, the right solution to this problem is to convert these counters
to be per-CPU (or per-thread in userspace applications) and aggregate them
when needed (e.g. on fetching the stats).  See percpu(9) API.  This way you
get both the performance and accuracy/correctness.

> 
> I've committed it right away, because this one seemed clear enough, feel
> free to comment or improve if you have a better idea:
> 
>   https://mail-index.netbsd.org/source-changes/2019/11/11/msg110726.html
> 

It would be better to find a solution which doesn't remove the optimisation
for the fast path.  At least on 64-bit architectures, a relax atomic load
(as 'volatile' for now) on xc->xc_donep could stay.

-- 
Mindaugas


Re: __{read,write}_once

2019-11-11 Thread Joerg Sonnenberger
On Mon, Nov 11, 2019 at 01:15:16PM -0500, Mouse wrote:
> > Uninterruptible means exactly that, there is a clear before and after
> > state and no interrupts can happen in between.
> 
> Is uninterruptible all you care about?  Or does it also need to be
> atomic with respect to other CPUs?  Eventually, of course, you'll want
> all those counter values on a single CPU - does that happen often
> enough to matter?
> 
> Also, does it actually need to be uninterruptible, or does it just need
> to be interrupt-tolerant?  I'm not as clear as I'd like on what the
> original desire here was, so I'm not sure but that we might be trying
> to solve a harder problem than necessary.

The update needs to be uninterruptable on the local CPU in the sense that
context switches and interrupts don't tear the R and W part of the RMW cycle
apart. x86 ALU instructions with memory operand as destination fit the
bill fine. It doesn't have to be atomic, no other CPU is supposed to
write to that cache line. It also generally doesn't matter whether they
see the old OR the new value, as long as it is either. We have full
control over alignment.

Joerg


Re: __{read,write}_once

2019-11-11 Thread Mouse
> Uninterruptible means exactly that, there is a clear before and after
> state and no interrupts can happen in between.

Is uninterruptible all you care about?  Or does it also need to be
atomic with respect to other CPUs?  Eventually, of course, you'll want
all those counter values on a single CPU - does that happen often
enough to matter?

Also, does it actually need to be uninterruptible, or does it just need
to be interrupt-tolerant?  I'm not as clear as I'd like on what the
original desire here was, so I'm not sure but that we might be trying
to solve a harder problem than necessary.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: __{read,write}_once

2019-11-11 Thread Joerg Sonnenberger
On Mon, Nov 11, 2019 at 11:51:26AM -0500, Mouse wrote:
> >>> (2) Use uninterruptible memory operations in per CPU memory,
> >>> aggregate passively on demand.
> > Problem is that (2) is natively only present on CISC platforms in
> > general.  Most RISC platforms can't do RMW in one instruction.
> 
> (2) says "uninterruptible", not "one instruction", though I'm not sure
> how large the difference is in practice.  (Also, some CISC platforms
> provide atomic memory RMW operations only under annoying restrictions;
> for example, I think the VAX has only three sorts of RMW memory
> accesses that are atomic with respect to other processors: ADAWI
> (16-bit-aligned 16-bit add), BB{SS,CC}I (test-and-{set/clear} single
> bits), and {INS,REM}Q{H,T}I (queue insert/remove at head/tail).)

The point here is that we really don't want to have bus locked
instructions for per-CPU counters. It would defeat the point of using
per-CPU counters in first place to a large degree. Uninterruptible means
exactly that, there is a clear before and after state and no interrupts
can happen in between. I don't know about ARM and friends for how
expensive masking interrupts is. It is quite expensive on x86. RMW
instructions are the other simple option for implementing them. (3)
would be the high effort version, doing RAS for kernel code as well.

Joerg


Re: __{read,write}_once

2019-11-11 Thread Mouse
>>> (2) Use uninterruptible memory operations in per CPU memory,
>>> aggregate passively on demand.
> Problem is that (2) is natively only present on CISC platforms in
> general.  Most RISC platforms can't do RMW in one instruction.

(2) says "uninterruptible", not "one instruction", though I'm not sure
how large the difference is in practice.  (Also, some CISC platforms
provide atomic memory RMW operations only under annoying restrictions;
for example, I think the VAX has only three sorts of RMW memory
accesses that are atomic with respect to other processors: ADAWI
(16-bit-aligned 16-bit add), BB{SS,CC}I (test-and-{set/clear} single
bits), and {INS,REM}Q{H,T}I (queue insert/remove at head/tail).)

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: __{read,write}_once

2019-11-11 Thread Joerg Sonnenberger
On Mon, Nov 11, 2019 at 02:39:26PM +0100, Maxime Villard wrote:
> Le 11/11/2019 à 13:51, Joerg Sonnenberger a écrit :
> > On Mon, Nov 11, 2019 at 11:02:47AM +0100, Maxime Villard wrote:
> > > Typically in sys/uvm/uvm_fault.c there are several lockless stat 
> > > increments
> > > like the following:
> > > 
> > >   /* Increment the counters.*/
> > >   uvmexp.fltanget++;
> > 
> > Wasn't the general consensus here to ideally have per-cpu counters here
> > that are aggregated passively?
> 
> In this specific case it's not going to work, because uvmexp is supposed to
> be read from a core dump, and the stats are expected to be found in the
> symbol...

That's just a reason to add code for doing the aggregation, not a reason
to avoid it.

> > I can think of three different options depending the platform:
> > 
> > (1) Use full atomic operations. Fine for true UP platforms and when the
> > overhead of per-CPU precise accounting is too high.
> > 
> > (2) Use uninterruptible memory operations in per CPU memory, aggregate
> > passively on demand.
> > 
> > (3) Emulate uninterruptible memory operations with kernel RAS, aggregate
> > passively on demand.
> 
> Generally I would do (2), but in this specific case I suggest (1).
> atomic_inc_uint is supposed to be implemented on each platform already, so
> it's easy to do.
> 
> (3) is a big headache for a very small reason, and it's not going to
> prevent inter-CPU races.

Problem is that (2) is natively only present on CISC platforms in
general. Most RISC platforms can't do RMW in one instruction. Inter-CPU
races don't matter as long as only the counter of the current CPU is
modified. The main difference between (1) and (2) is the bus lock. (1)
and (3) is primarily a question on whether the atomic operation can be
inlined, if it can't, (3) would still be much nicer.

Joerg


Re: __{read,write}_once

2019-11-11 Thread Maxime Villard

Le 11/11/2019 à 13:51, Joerg Sonnenberger a écrit :

On Mon, Nov 11, 2019 at 11:02:47AM +0100, Maxime Villard wrote:

Typically in sys/uvm/uvm_fault.c there are several lockless stat increments
like the following:

/* Increment the counters.*/
uvmexp.fltanget++;


Wasn't the general consensus here to ideally have per-cpu counters here
that are aggregated passively?


In this specific case it's not going to work, because uvmexp is supposed to
be read from a core dump, and the stats are expected to be found in the
symbol...


I can think of three different options depending the platform:

(1) Use full atomic operations. Fine for true UP platforms and when the
overhead of per-CPU precise accounting is too high.

(2) Use uninterruptible memory operations in per CPU memory, aggregate
passively on demand.

(3) Emulate uninterruptible memory operations with kernel RAS, aggregate
passively on demand.


Generally I would do (2), but in this specific case I suggest (1).
atomic_inc_uint is supposed to be implemented on each platform already, so
it's easy to do.

(3) is a big headache for a very small reason, and it's not going to
prevent inter-CPU races.


Re: __{read,write}_once

2019-11-11 Thread Joerg Sonnenberger
On Mon, Nov 11, 2019 at 11:02:47AM +0100, Maxime Villard wrote:
> Typically in sys/uvm/uvm_fault.c there are several lockless stat increments
> like the following:
> 
>   /* Increment the counters.*/
>   uvmexp.fltanget++;

Wasn't the general consensus here to ideally have per-cpu counters here
that are aggregated passively? I can think of three different options
depending the platform:

(1) Use full atomic operations. Fine for true UP platforms and when the
overhead of per-CPU precise accounting is too high.

(2) Use uninterruptible memory operations in per CPU memory, aggregate
passively on demand.

(3) Emulate uninterruptible memory operations with kernel RAS, aggregate
passively on demand.

Essentially, the only race condition we care about for the statistic
counters is via interrupts or scheduling. We can implement the
equivalent of x86's add with memory operand as destination using RAS, so
the only overhead would be the function call in that case.

Joerg


Re: __{read,write}_once

2019-11-11 Thread Maxime Villard

Le 08/11/2019 à 13:40, Mindaugas Rasiukevicius a écrit :

Maxime Villard  wrote:

They are "atomic" in a sense that they prevent from tearing, fusing and
invented loads/stores.  Terms and conditions apply (e.g. they assume
properly aligned and word-sized accesses).  Additionally, READ_ONCE()
provides a data-dependency barrier, applicable only to DEC Alpha.  I
think it was the right decision on the Linux side (even though trying
to get all the data-dependency barriers right these days is kind of a
lost cause, IMO).

So, READ_ONCE()/WRITE_ONCE() is more or less equivalent to the C11
atomic load/stores routines with memory_order_relaxed (or
memory_order_consume, if you care about DEC Alpha).


But... Didn't Marco just say that 'volatile' accesses do not actually
prevent tearing/fusing/invented loads/stores? READ_ONCE/WRITE_ONCE only
do volatile accesses.


Let me try to clarify:

- The main purpose of READ_ONCE()/WRITE_ONCE() is to provide a way to
perform atomic loads/stores (in a sense of preventing from the said
behaviours), even though they help to get the memory ordering right too.
Currently, 'volatile' is a key instrument in achieving that.  However,
as stated before, terms and conditions apply: 'volatile' just in itself
does not provide the guarantee; the loads/stores also have to be properly
aligned and word-sized (these are the pre-C11 assumptions we always had).
Note: C11 introduces atomic _types_, so that the compiler could leverage
the type system and thus provide the necessary guarantees.

- Having re-read Marco's emails in this thread, I think we are very much
in agreement.  I think he merely points out that 'volatile' in itself is
not sufficient; it does not mean it's not necessary.

- There is quite a bit of confusion regarding 'volatile' amongst the
developers.  This is partly because 'volatile' is arguably underspecified
in the C standard.  AFAIK, some people in the C standardization committee
have a view that it provides weaker guarantees; however, others maintain
that the intent has always been clear and the wording is sufficient.
Without going into the details (somewhat philosophical anyway), at least
for now, 'volatile' is a de facto ingredient (one of a few, but absolutely
necessary) in achieving atomic loads/stores.

Sorry if this is a bit repetitive, but I hope it gets the point across.


Alright, thanks.

Do you think there is a performance degradation with using explicitly atomic
operations (with the "lock" prefix on x86), compared to just using an aligned
volatile which may not be exactly atomic in that sense (even if it happens to
be on x86)?

Typically in sys/uvm/uvm_fault.c there are several lockless stat increments
like the following:

/* Increment the counters.*/
uvmexp.fltanget++;

In your (not just rmind@, but tech-kern@) opinion, is it better to switch to:
atomic_inc_uint(&uvmexp.fltanget);
or to
__add_once(uvmexp.fltanget, 1);
as I did in my patch? Considering that the latter may possibly not be atomic
on certain arches, which could possibly result in garbage when the value is
read.


To fix that, do you agree that I should
   - Remove the first branch (because no lockless fastpath possible)
   - Move down the second branch (KASSERT) right after the mutex_enter
?


Feel free to write a patch and I'll have a look at it once I have a
little bit more free time.


I've committed it right away, because this one seemed clear enough, feel free
to comment or improve if you have a better idea:

https://mail-index.netbsd.org/source-changes/2019/11/11/msg110726.html

Maxime


Re: __{read,write}_once

2019-11-08 Thread Kamil Rytarowski
On 08.11.2019 13:40, Mindaugas Rasiukevicius wrote:
>>> There is more code in the NetBSD kernel which needs fixing.  I would say
>>> pretty much all lock-free code should be audited.
>> I believe KCSAN can greatly help with that, since it automatically reports
>> concurrent accesses. Up to us then to switch to atomic, or other kinds of
>> markers like READ_ONCE.
> Is there a CSAN i.e. such sanitizer for userspace applications?

No.. there is TSan. If it would be useful to get CSan in userspace it
probably shouldn't be too difficult to write it. At least full TSan is
heavy and requires 64bit CPU.



signature.asc
Description: OpenPGP digital signature


Re: __{read,write}_once

2019-11-08 Thread Mindaugas Rasiukevicius
Maxime Villard  wrote:
> > They are "atomic" in a sense that they prevent from tearing, fusing and
> > invented loads/stores.  Terms and conditions apply (e.g. they assume
> > properly aligned and word-sized accesses).  Additionally, READ_ONCE()
> > provides a data-dependency barrier, applicable only to DEC Alpha.  I
> > think it was the right decision on the Linux side (even though trying
> > to get all the data-dependency barriers right these days is kind of a
> > lost cause, IMO).
> > 
> > So, READ_ONCE()/WRITE_ONCE() is more or less equivalent to the C11
> > atomic load/stores routines with memory_order_relaxed (or
> > memory_order_consume, if you care about DEC Alpha).
> 
> But... Didn't Marco just say that 'volatile' accesses do not actually
> prevent tearing/fusing/invented loads/stores? READ_ONCE/WRITE_ONCE only
> do volatile accesses.

Let me try to clarify:

- The main purpose of READ_ONCE()/WRITE_ONCE() is to provide a way to
perform atomic loads/stores (in a sense of preventing from the said
behaviours), even though they help to get the memory ordering right too.
Currently, 'volatile' is a key instrument in achieving that.  However,
as stated before, terms and conditions apply: 'volatile' just in itself
does not provide the guarantee; the loads/stores also have to be properly
aligned and word-sized (these are the pre-C11 assumptions we always had).
Note: C11 introduces atomic _types_, so that the compiler could leverage
the type system and thus provide the necessary guarantees.

- Having re-read Marco's emails in this thread, I think we are very much
in agreement.  I think he merely points out that 'volatile' in itself is
not sufficient; it does not mean it's not necessary.

- There is quite a bit of confusion regarding 'volatile' amongst the
developers.  This is partly because 'volatile' is arguably underspecified
in the C standard.  AFAIK, some people in the C standardization committee
have a view that it provides weaker guarantees; however, others maintain
that the intent has always been clear and the wording is sufficient.
Without going into the details (somewhat philosophical anyway), at least
for now, 'volatile' is a de facto ingredient (one of a few, but absolutely
necessary) in achieving atomic loads/stores.

Sorry if this is a bit repetitive, but I hope it gets the point across.

> 
> To fix that, do you agree that I should
>   - Remove the first branch (because no lockless fastpath possible)
>   - Move down the second branch (KASSERT) right after the mutex_enter
> ?

Feel free to write a patch and I'll have a look at it once I have a
little bit more free time.

> 
> > There is more code in the NetBSD kernel which needs fixing.  I would say
> > pretty much all lock-free code should be audited.
> 
> I believe KCSAN can greatly help with that, since it automatically reports
> concurrent accesses. Up to us then to switch to atomic, or other kinds of
> markers like READ_ONCE.

Is there a CSAN i.e. such sanitizer for userspace applications?

Thanks.

-- 
Mindaugas


Re: __{read,write}_once

2019-11-07 Thread David Holland
On Wed, Nov 06, 2019 at 10:14:48PM +0100, Kamil Rytarowski wrote:
 > I have no opinion here, just please do the right thing. Unless there are
 > any shortcomings it would be nice to follow closely C11.

There are plenty of shortcomings in C11. We should concentrate on
abstractions that are correct first.

(also, volatile is ~useless for this)

-- 
David A. Holland
dholl...@netbsd.org


Re: __{read,write}_once

2019-11-07 Thread Maxime Villard

Le 06/11/2019 à 23:41, Mindaugas Rasiukevicius a écrit :

Maxime Villard  wrote:

- If we do not want to stick with the C11 API (its emulation), then I
would suggest to use the similar terminology, e.g. atomic_load_relaxed()
and atomic_store_relaxed(), or Linux READ_ONCE()/WRITE_ONCE().  There is
no reason invent new terminology; it might just add more confusion.


But... Linux's READ_ONCE/WRITE_ONCE are not actually atomic, is that
correct? So there is a significant semantic difference.


They are "atomic" in a sense that they prevent from tearing, fusing and
invented loads/stores.  Terms and conditions apply (e.g. they assume properly
aligned and word-sized accesses).  Additionally, READ_ONCE() provides a
data-dependency barrier, applicable only to DEC Alpha.  I think it was
the right decision on the Linux side (even though trying to get all the
data-dependency barriers right these days is kind of a lost cause, IMO).

So, READ_ONCE()/WRITE_ONCE() is more or less equivalent to the C11 atomic
load/stores routines with memory_order_relaxed (or memory_order_consume,
if you care about DEC Alpha).


But... Didn't Marco just say that 'volatile' accesses do not actually prevent
tearing/fusing/invented loads/stores? READ_ONCE/WRITE_ONCE only do volatile
accesses.


Also, in my original patch I marked two branches of subr_xcall.c, but it
seems to me now they are actually races: ie xc_wait(), the read of
'xc_donep' could be made by two 4-byte fetches on 32bit arches (at
least). Between the two, an xc thread could have updated the value. So
there is an actual race which could possibly result in returning early
while we shouldn't have. Does that look correct to you?


Correct.


Alright, so we have the first bug found by KCSAN :)

To fix that, do you agree that I should
 - Remove the first branch (because no lockless fastpath possible)
 - Move down the second branch (KASSERT) right after the mutex_enter
?


There is more code in the NetBSD kernel which needs fixing.  I would say
pretty much all lock-free code should be audited.


I believe KCSAN can greatly help with that, since it automatically reports
concurrent accesses. Up to us then to switch to atomic, or other kinds of
markers like READ_ONCE.


Re: __{read,write}_once

2019-11-06 Thread Mindaugas Rasiukevicius
Maxime Villard  wrote:
> > - If we do not want to stick with the C11 API (its emulation), then I
> > would suggest to use the similar terminology, e.g. atomic_load_relaxed()
> > and atomic_store_relaxed(), or Linux READ_ONCE()/WRITE_ONCE().  There is
> > no reason invent new terminology; it might just add more confusion.
> 
> But... Linux's READ_ONCE/WRITE_ONCE are not actually atomic, is that
> correct? So there is a significant semantic difference.

They are "atomic" in a sense that they prevent from tearing, fusing and
invented loads/stores.  Terms and conditions apply (e.g. they assume properly
aligned and word-sized accesses).  Additionally, READ_ONCE() provides a
data-dependency barrier, applicable only to DEC Alpha.  I think it was
the right decision on the Linux side (even though trying to get all the
data-dependency barriers right these days is kind of a lost cause, IMO).

So, READ_ONCE()/WRITE_ONCE() is more or less equivalent to the C11 atomic
load/stores routines with memory_order_relaxed (or memory_order_consume,
if you care about DEC Alpha).

> Also, in my original patch I marked two branches of subr_xcall.c, but it
> seems to me now they are actually races: ie xc_wait(), the read of
> 'xc_donep' could be made by two 4-byte fetches on 32bit arches (at
> least). Between the two, an xc thread could have updated the value. So
> there is an actual race which could possibly result in returning early
> while we shouldn't have. Does that look correct to you?

Correct.

There is more code in the NetBSD kernel which needs fixing.  I would say
pretty much all lock-free code should be audited.

Back in time, certain liberties compilers have were not fully understood or
were not considered as real problems (after all, C memory model concerning
many of these aspects was not defined).  Times changed, compilers became
much more aggressive and the old code needs to catch up.

-- 
Mindaugas


Re: __{read,write}_once

2019-11-06 Thread Kamil Rytarowski
On 06.11.2019 20:38, Mindaugas Rasiukevicius wrote:
> Maxime Villard  wrote:
>> There are cases in the kernel where we read/write global memory
>> locklessly, and accept the races either because it is part of the design
>> (eg low-level scheduling) or we simply don't care (eg global stats).
>>
>> In these cases, we want to access the memory only once, and need to ensure
>> the compiler does not split that access in several pieces, some of which
>> may be changed by concurrent accesses. There is a Linux article [1] about
>> this, and also [2]. I'd like to introduce the following macros:
>>
>> <...>
> 
> A few comments for everybody here:
> 
> - There is a general need in the NetBSD kernel for atomic load/store
> semantics.  This is because plain accesses (loads/stores) are subject
> to _tearing_, _fusing_ and _invented_ loads/stores.  I created a new
> thread which might help to clarify these and various other aspects:
> 
> http://mail-index.netbsd.org/tech-kern/2019/11/06/msg025664.html
> 

Thank you.

> - In C11, this can be handled with atomic_{load,store}_explicit() and
> memory_order_relaxed (or stronger memory order).
> 
> - If we do not want to stick with the C11 API (its emulation), then I
> would suggest to use the similar terminology, e.g. atomic_load_relaxed()
> and atomic_store_relaxed(), or Linux READ_ONCE()/WRITE_ONCE().  There is
> no reason invent new terminology; it might just add more confusion.
> 
> Thanks.
> 

I have no opinion here, just please do the right thing. Unless there are
any shortcomings it would be nice to follow closely C11.

If that information helps or not, we also need C11/C++14 libatomic-like
library in userland.



signature.asc
Description: OpenPGP digital signature


re: __{read,write}_once

2019-11-06 Thread matthew green
> - If we do not want to stick with the C11 API (its emulation), then I
> would suggest to use the similar terminology, e.g. atomic_load_relaxed()
> and atomic_store_relaxed(), or Linux READ_ONCE()/WRITE_ONCE().  There is
> no reason invent new terminology; it might just add more confusion.

i really do not like the "once" name.

even though i'm familiar with the actual desired semantics,
i have to ignore my builtin meaning that comes first.  to me,
"once" initially reads as compiler will only do it one time,
ever, not every time it is called.  like pthread "once".

can we make them __read_always() and __write_always()?


.mrg.


Re: __{read,write}_once

2019-11-06 Thread Maxime Villard

Le 06/11/2019 à 20:38, Mindaugas Rasiukevicius a écrit :

Maxime Villard  wrote:

There are cases in the kernel where we read/write global memory
locklessly, and accept the races either because it is part of the design
(eg low-level scheduling) or we simply don't care (eg global stats).

In these cases, we want to access the memory only once, and need to ensure
the compiler does not split that access in several pieces, some of which
may be changed by concurrent accesses. There is a Linux article [1] about
this, and also [2]. I'd like to introduce the following macros:

<...>


A few comments for everybody here:

- There is a general need in the NetBSD kernel for atomic load/store
semantics.  This is because plain accesses (loads/stores) are subject
to _tearing_, _fusing_ and _invented_ loads/stores.  I created a new
thread which might help to clarify these and various other aspects:

http://mail-index.netbsd.org/tech-kern/2019/11/06/msg025664.html


Thanks


- In C11, this can be handled with atomic_{load,store}_explicit() and
memory_order_relaxed (or stronger memory order).

- If we do not want to stick with the C11 API (its emulation), then I
would suggest to use the similar terminology, e.g. atomic_load_relaxed()
and atomic_store_relaxed(), or Linux READ_ONCE()/WRITE_ONCE().  There is
no reason invent new terminology; it might just add more confusion.


But... Linux's READ_ONCE/WRITE_ONCE are not actually atomic, is that correct?
So there is a significant semantic difference.

Also, in my original patch I marked two branches of subr_xcall.c, but it
seems to me now they are actually races: ie xc_wait(), the read of 'xc_donep'
could be made by two 4-byte fetches on 32bit arches (at least). Between the
two, an xc thread could have updated the value. So there is an actual race
which could possibly result in returning early while we shouldn't have. Does
that look correct to you?

Thanks


Re: __{read,write}_once

2019-11-06 Thread Mindaugas Rasiukevicius
Maxime Villard  wrote:
> There are cases in the kernel where we read/write global memory
> locklessly, and accept the races either because it is part of the design
> (eg low-level scheduling) or we simply don't care (eg global stats).
> 
> In these cases, we want to access the memory only once, and need to ensure
> the compiler does not split that access in several pieces, some of which
> may be changed by concurrent accesses. There is a Linux article [1] about
> this, and also [2]. I'd like to introduce the following macros:
>
> <...>

A few comments for everybody here:

- There is a general need in the NetBSD kernel for atomic load/store
semantics.  This is because plain accesses (loads/stores) are subject
to _tearing_, _fusing_ and _invented_ loads/stores.  I created a new
thread which might help to clarify these and various other aspects:

http://mail-index.netbsd.org/tech-kern/2019/11/06/msg025664.html

- In C11, this can be handled with atomic_{load,store}_explicit() and
memory_order_relaxed (or stronger memory order).

- If we do not want to stick with the C11 API (its emulation), then I
would suggest to use the similar terminology, e.g. atomic_load_relaxed()
and atomic_store_relaxed(), or Linux READ_ONCE()/WRITE_ONCE().  There is
no reason invent new terminology; it might just add more confusion.

Thanks.

-- 
Mindaugas


Re: __{read,write}_once

2019-11-06 Thread Marco Elver
On Wed, 6 Nov 2019 at 18:08, Maxime Villard  wrote:
>
> Le 06/11/2019 à 17:37, Marco Elver a écrit :
> > On Wed, 6 Nov 2019 at 16:51, Kamil Rytarowski  wrote:
> >>
> >> On 06.11.2019 16:44, Kamil Rytarowski wrote:
> >>> On 06.11.2019 15:57, Jason Thorpe wrote:
> >>>>
> >>>>
> >>>>> On Nov 6, 2019, at 5:41 AM, Kamil Rytarowski  wrote:
> >>>>>
> >>>>> On 06.11.2019 14:37, Jason Thorpe wrote:
> >>>>>>
> >>>>>>
> >>>>>>> On Nov 6, 2019, at 4:45 AM, Kamil Rytarowski  wrote:
> >>>>>>>
> >>>>>>> I propose __write_relaxed() / __read_relaxed().
> >>>>>>
> >>>>>> ...except that seems to imply the opposite of what these do.
> >>>>>>
> >>>>>> -- thorpej
> >>>>>>
> >>>>>
> >>>>> Rationale?
> >>>>>
> >>>>> This matches atomic_load_relaxed() / atomic_write_relaxed(), but we do
> >>>>> not deal with atomics here.
> >>>>
> >>>> Fair enough.  To me, the names suggest "compiler is allowed to apply 
> >>>> relaxed constraints and tear the access if it wants" But apparently 
> >>>> the common meaning is "relax, bro, I know what I'm doing".  If that's 
> >>>> the case, I can roll with it.
> >
> > See below.
> >
> >>>> -- thorpej
> >>>>
> >>>
> >>> Unless I mean something this is exactly about relaxed constraints.
> >>
> >> miss*
> >>
> >>>
> >>> "Relaxed operation: there are no synchronization or ordering constraints
> >>> imposed on other reads or writes" and without "operation's atomicity is
> >>> guaranteed".
> >
> > In the memory consistency model world, "relaxed" has a very specific
> > meaning for loads/stores. It simply means that the compiler and
> > architecture is free to reorder the memory operation with respect to
> > previous or following memory operations in program order. But the
> > load/store still happens as one atomic operation.
> >
> > For programming-language memory models, "relaxed" appears in the
> > context of atomic memory operations, to define their ordering w.r.t.
> > other operations in program order. There is a distinction between
> > "relaxed atomic" and plain (unannotated) memory operations at the
> > programming language level.
> >
> > For plain operations, the compiler, in addition to the target
> > architecture, is free to reorder operations or even apply
> > optimizations that turn single plain operations into multiple
> > loads/stores [1, 2].
> > [1] https://lwn.net/Articles/793253/
> > [2] https://lwn.net/Articles/799218/
> >
> > In the Linux kernel READ_ONCE/WRITE_ONCE are one way to specify
> > "relaxed atomic read/write" for accesses that fit in a word (_ONCE
> > also works on things larger than a word, but probably aren't atomic
> > anymore). For most architectures the implementation is the same, and
> > merely avoids compiler optimizations; the exception is Alpha, where
> > READ_ONCE adds a memory barrier [3].
> > [3] https://www.kernel.org/doc/Documentation/memory-barriers.txt
> >
> >>> This is also similar to what suggested Google to apply to NetBSD in our
> >>> internal thread, but with a bit different naming.
> >
> > What you call the ops is entirely up to you. I would not use
> > '{read,write}_relaxed', since as you pointed out above, is probably
> > more confusing. This is why I suggested 'atomic_{read,write}_relaxed'.
> > If you do not like the 'atomic' in there, the next best thing is
> > probably '{read,write}_once'.
> >
> > Just note that you're now defining your own memory consistency model.
> > This is a complex topic that spans decades of work. The safest thing
> > is to stick as closely to the C11/C++11 memory model as possible [4].
> > The Linux kernel also has a memory model [5], but is significantly
> > more complex since it was defined after various primitives had been
> > introduced.
> > [4] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1479.htm
> > [5] 
> > https://github.com/torvalds/linux/blob/master/tools/memory-model/Documentation/explanation.txt
> >
> > Best Wishes,
>

Re: __{read,write}_once

2019-11-06 Thread Mouse
>   - Is 'volatile' a guarantee that the compiler will emit only one
> instruction to fetch the value?

No.  For some types, some compilers, and some architectures, it may be,
but it is implementation-dependent.  (For some types, on some
architectures, it cannot be; for example, consider a 64-bit volatile
value on an architecture without 64-bit memory access primitives.)

What volatile is designed for is to ensure that the generated code
actually accesses the value.  Consider

count = 0;
while (count < 10) {
if (*register & 0x0400) count ++;
}

If register is "uint32_t *", the compiler is permitted to - and some
will - hoist the load, effectively turning this into

count = 0;
temp = *register;
while (count < 10) {
if (temp & 0x0400) count ++;
}

which is unlikely to do what you want.  If register is instead declared
"volatile uint32_t *", then *register is volatile and thus must be
accessed each time through the loop.

volatile also compels such caching in some circumstances; for exmaple,
if the code is

uint32_t *register;
uint32_t v;
v = *register;
call((v+4)*v);

then, under certain circumstances (for example, v is auto and its
address is never taken) the compiler is permitted to turn that into

call(((*register)+4) * *register)

and, on some architectures, that would be a win.  (It's less likely to
happen than the other way around, but, if addressing modes are cheap,
registers are few, and memory is fast compared to CPU cycles, the
compiler may do that.)  But make it "volatile uint32_t *register;" and
such an optimization is not permitted.

>   - Assuming there is only one instruction, strictly speaking the
> fetch is not guaranteed to be atomic, is that correct?

Right.  What is or isn't atomic is architecture-dependent; there is, I
think, _nothing_ you can do in C that can guarantee atomicity (or, for
that matter, guarantee non-atomicity), since atomicity is fundamentally
an architecture-dependent notion.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: __{read,write}_once

2019-11-06 Thread Maxime Villard

Le 06/11/2019 à 17:37, Marco Elver a écrit :

On Wed, 6 Nov 2019 at 16:51, Kamil Rytarowski  wrote:


On 06.11.2019 16:44, Kamil Rytarowski wrote:

On 06.11.2019 15:57, Jason Thorpe wrote:




On Nov 6, 2019, at 5:41 AM, Kamil Rytarowski  wrote:

On 06.11.2019 14:37, Jason Thorpe wrote:




On Nov 6, 2019, at 4:45 AM, Kamil Rytarowski  wrote:

I propose __write_relaxed() / __read_relaxed().


...except that seems to imply the opposite of what these do.

-- thorpej



Rationale?

This matches atomic_load_relaxed() / atomic_write_relaxed(), but we do
not deal with atomics here.


Fair enough.  To me, the names suggest "compiler is allowed to apply relaxed constraints and 
tear the access if it wants" But apparently the common meaning is "relax, bro, I know 
what I'm doing".  If that's the case, I can roll with it.


See below.


-- thorpej



Unless I mean something this is exactly about relaxed constraints.


miss*



"Relaxed operation: there are no synchronization or ordering constraints
imposed on other reads or writes" and without "operation's atomicity is
guaranteed".


In the memory consistency model world, "relaxed" has a very specific
meaning for loads/stores. It simply means that the compiler and
architecture is free to reorder the memory operation with respect to
previous or following memory operations in program order. But the
load/store still happens as one atomic operation.

For programming-language memory models, "relaxed" appears in the
context of atomic memory operations, to define their ordering w.r.t.
other operations in program order. There is a distinction between
"relaxed atomic" and plain (unannotated) memory operations at the
programming language level.

For plain operations, the compiler, in addition to the target
architecture, is free to reorder operations or even apply
optimizations that turn single plain operations into multiple
loads/stores [1, 2].
[1] https://lwn.net/Articles/793253/
[2] https://lwn.net/Articles/799218/

In the Linux kernel READ_ONCE/WRITE_ONCE are one way to specify
"relaxed atomic read/write" for accesses that fit in a word (_ONCE
also works on things larger than a word, but probably aren't atomic
anymore). For most architectures the implementation is the same, and
merely avoids compiler optimizations; the exception is Alpha, where
READ_ONCE adds a memory barrier [3].
[3] https://www.kernel.org/doc/Documentation/memory-barriers.txt


This is also similar to what suggested Google to apply to NetBSD in our
internal thread, but with a bit different naming.


What you call the ops is entirely up to you. I would not use
'{read,write}_relaxed', since as you pointed out above, is probably
more confusing. This is why I suggested 'atomic_{read,write}_relaxed'.
If you do not like the 'atomic' in there, the next best thing is
probably '{read,write}_once'.

Just note that you're now defining your own memory consistency model.
This is a complex topic that spans decades of work. The safest thing
is to stick as closely to the C11/C++11 memory model as possible [4].
The Linux kernel also has a memory model [5], but is significantly
more complex since it was defined after various primitives had been
introduced.
[4] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1479.htm
[5] 
https://github.com/torvalds/linux/blob/master/tools/memory-model/Documentation/explanation.txt

Best Wishes,
-- Marco


Thanks for the details.

I must say I still have a doubt about the instructions actually emited
by the compiler.

Let's take READ_ONCE for example. What it does is basically just casting
the pointer to volatile and doing the access. Two points of confusion:

 - Is 'volatile' a guarantee that the compiler will emit only one
   instruction to fetch the value?

 - Assuming there is only one instruction, strictly speaking the fetch is
   not guaranteed to be atomic, is that correct? Because the address may
   not be naturally aligned; or the cpu may not be x86 and thus may not
   provide automatic atomicity in such case.

Thanks


Re: __{read,write}_once

2019-11-06 Thread Marco Elver
On Wed, 6 Nov 2019 at 16:51, Kamil Rytarowski  wrote:
>
> On 06.11.2019 16:44, Kamil Rytarowski wrote:
> > On 06.11.2019 15:57, Jason Thorpe wrote:
> >>
> >>
> >>> On Nov 6, 2019, at 5:41 AM, Kamil Rytarowski  wrote:
> >>>
> >>> On 06.11.2019 14:37, Jason Thorpe wrote:
> >>>>
> >>>>
> >>>>> On Nov 6, 2019, at 4:45 AM, Kamil Rytarowski  wrote:
> >>>>>
> >>>>> I propose __write_relaxed() / __read_relaxed().
> >>>>
> >>>> ...except that seems to imply the opposite of what these do.
> >>>>
> >>>> -- thorpej
> >>>>
> >>>
> >>> Rationale?
> >>>
> >>> This matches atomic_load_relaxed() / atomic_write_relaxed(), but we do
> >>> not deal with atomics here.
> >>
> >> Fair enough.  To me, the names suggest "compiler is allowed to apply 
> >> relaxed constraints and tear the access if it wants" But apparently 
> >> the common meaning is "relax, bro, I know what I'm doing".  If that's the 
> >> case, I can roll with it.

See below.

> >> -- thorpej
> >>
> >
> > Unless I mean something this is exactly about relaxed constraints.
>
> miss*
>
> >
> > "Relaxed operation: there are no synchronization or ordering constraints
> > imposed on other reads or writes" and without "operation's atomicity is
> > guaranteed".

In the memory consistency model world, "relaxed" has a very specific
meaning for loads/stores. It simply means that the compiler and
architecture is free to reorder the memory operation with respect to
previous or following memory operations in program order. But the
load/store still happens as one atomic operation.

For programming-language memory models, "relaxed" appears in the
context of atomic memory operations, to define their ordering w.r.t.
other operations in program order. There is a distinction between
"relaxed atomic" and plain (unannotated) memory operations at the
programming language level.

For plain operations, the compiler, in addition to the target
architecture, is free to reorder operations or even apply
optimizations that turn single plain operations into multiple
loads/stores [1, 2].
[1] https://lwn.net/Articles/793253/
[2] https://lwn.net/Articles/799218/

In the Linux kernel READ_ONCE/WRITE_ONCE are one way to specify
"relaxed atomic read/write" for accesses that fit in a word (_ONCE
also works on things larger than a word, but probably aren't atomic
anymore). For most architectures the implementation is the same, and
merely avoids compiler optimizations; the exception is Alpha, where
READ_ONCE adds a memory barrier [3].
[3] https://www.kernel.org/doc/Documentation/memory-barriers.txt

> > This is also similar to what suggested Google to apply to NetBSD in our
> > internal thread, but with a bit different naming.

What you call the ops is entirely up to you. I would not use
'{read,write}_relaxed', since as you pointed out above, is probably
more confusing. This is why I suggested 'atomic_{read,write}_relaxed'.
If you do not like the 'atomic' in there, the next best thing is
probably '{read,write}_once'.

Just note that you're now defining your own memory consistency model.
This is a complex topic that spans decades of work. The safest thing
is to stick as closely to the C11/C++11 memory model as possible [4].
The Linux kernel also has a memory model [5], but is significantly
more complex since it was defined after various primitives had been
introduced.
[4] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1479.htm
[5] 
https://github.com/torvalds/linux/blob/master/tools/memory-model/Documentation/explanation.txt

Best Wishes,
-- Marco

> Adding Marco to this thread.
>



Re: __{read,write}_once

2019-11-06 Thread Kamil Rytarowski
On 06.11.2019 16:58, David Young wrote:
> I *think* the intention is for __read_once()/__write_once() to
> load/store the entire variable from/to memory precisely once.  They
> provide no guarantees about atomicity of the load/store.  Should
> something be said about ordering and visibility of stores?

The original intention is to mark reads and writes racy-OK.

once is a bad name as it suggests some variation of RUN_ONCE(). I am for
memory ordering name relaxed, borrowed from the terminology atomics.



signature.asc
Description: OpenPGP digital signature


Re: __{read,write}_once

2019-11-06 Thread Andrew Cagney
On Wed, 6 Nov 2019 at 09:57, Jason Thorpe  wrote:
>
>
>
> > On Nov 6, 2019, at 5:41 AM, Kamil Rytarowski  wrote:
> >
> > On 06.11.2019 14:37, Jason Thorpe wrote:
> >>
> >>
> >>> On Nov 6, 2019, at 4:45 AM, Kamil Rytarowski  wrote:
> >>>
> >>> I propose __write_relaxed() / __read_relaxed().
> >>
> >> ...except that seems to imply the opposite of what these do.
> >>
> >> -- thorpej
> >>
> >
> > Rationale?
> >
> > This matches atomic_load_relaxed() / atomic_write_relaxed(), but we do
> > not deal with atomics here.
>
> Fair enough.  To me, the names suggest "compiler is allowed to apply relaxed 
> constraints and tear the access if it wants" But apparently the common 
> meaning is "relax, bro, I know what I'm doing".  If that's the case, I can 
> roll with it.

Honestly, without reading any code, I interpretation is more inline
with the former:
  hey relax, maybe it happens, maybe it doesn't but well, nothing
matters so what ever


Re: __{read,write}_once

2019-11-06 Thread David Young
On Wed, Nov 06, 2019 at 06:57:07AM -0800, Jason Thorpe wrote:
> 
> 
> > On Nov 6, 2019, at 5:41 AM, Kamil Rytarowski  wrote:
> > 
> > On 06.11.2019 14:37, Jason Thorpe wrote:
> >> 
> >> 
> >>> On Nov 6, 2019, at 4:45 AM, Kamil Rytarowski  wrote:
> >>> 
> >>> I propose __write_relaxed() / __read_relaxed().
> >> 
> >> ...except that seems to imply the opposite of what these do.
> >> 
> >> -- thorpej
> >> 
> > 
> > Rationale?
> > 
> > This matches atomic_load_relaxed() / atomic_write_relaxed(), but we do
> > not deal with atomics here.
> 
> Fair enough.  To me, the names suggest "compiler is allowed to apply relaxed 
> constraints and tear the access if it wants" But apparently the common 
> meaning is "relax, bro, I know what I'm doing".  If that's the case, I can 
> roll with it.

After reading this conversation, I'm not sure of the semantics.

I *think* the intention is for __read_once()/__write_once() to
load/store the entire variable from/to memory precisely once.  They
provide no guarantees about atomicity of the load/store.  Should
something be said about ordering and visibility of stores?

If x is initialized to 0xf00dd00f, two threads start, and thread
1 performs __read_once(x) concurrently with thread 2 performing
__write_once(x, 0xfeedbeef), then what values can thread 1 read?

Do __read_once()/__write_once() have any semantics with respect to
interrupts?

Dave

-- 
David Young
dyo...@pobox.comUrbana, IL(217) 721-9981


Re: __{read,write}_once

2019-11-06 Thread Kamil Rytarowski
On 06.11.2019 16:44, Kamil Rytarowski wrote:
> On 06.11.2019 15:57, Jason Thorpe wrote:
>>
>>
>>> On Nov 6, 2019, at 5:41 AM, Kamil Rytarowski  wrote:
>>>
>>> On 06.11.2019 14:37, Jason Thorpe wrote:


> On Nov 6, 2019, at 4:45 AM, Kamil Rytarowski  wrote:
>
> I propose __write_relaxed() / __read_relaxed().

 ...except that seems to imply the opposite of what these do.

 -- thorpej

>>>
>>> Rationale?
>>>
>>> This matches atomic_load_relaxed() / atomic_write_relaxed(), but we do
>>> not deal with atomics here.
>>
>> Fair enough.  To me, the names suggest "compiler is allowed to apply relaxed 
>> constraints and tear the access if it wants" But apparently the common 
>> meaning is "relax, bro, I know what I'm doing".  If that's the case, I can 
>> roll with it.
>>
>> -- thorpej
>>
> 
> Unless I mean something this is exactly about relaxed constraints.

miss*

> 
> "Relaxed operation: there are no synchronization or ordering constraints
> imposed on other reads or writes" and without "operation's atomicity is
> guaranteed".
> 
> This is also similar to what suggested Google to apply to NetBSD in our
> internal thread, but with a bit different naming.
> 

Adding Marco to this thread.



signature.asc
Description: OpenPGP digital signature


Re: __{read,write}_once

2019-11-06 Thread Kamil Rytarowski
On 06.11.2019 15:57, Jason Thorpe wrote:
> 
> 
>> On Nov 6, 2019, at 5:41 AM, Kamil Rytarowski  wrote:
>>
>> On 06.11.2019 14:37, Jason Thorpe wrote:
>>>
>>>
 On Nov 6, 2019, at 4:45 AM, Kamil Rytarowski  wrote:

 I propose __write_relaxed() / __read_relaxed().
>>>
>>> ...except that seems to imply the opposite of what these do.
>>>
>>> -- thorpej
>>>
>>
>> Rationale?
>>
>> This matches atomic_load_relaxed() / atomic_write_relaxed(), but we do
>> not deal with atomics here.
> 
> Fair enough.  To me, the names suggest "compiler is allowed to apply relaxed 
> constraints and tear the access if it wants" But apparently the common 
> meaning is "relax, bro, I know what I'm doing".  If that's the case, I can 
> roll with it.
> 
> -- thorpej
> 

Unless I mean something this is exactly about relaxed constraints.

"Relaxed operation: there are no synchronization or ordering constraints
imposed on other reads or writes" and without "operation's atomicity is
guaranteed".

This is also similar to what suggested Google to apply to NetBSD in our
internal thread, but with a bit different naming.



signature.asc
Description: OpenPGP digital signature


Re: __{read,write}_once

2019-11-06 Thread Martin Husemann
On Wed, Nov 06, 2019 at 06:57:07AM -0800, Jason Thorpe wrote:
> > This matches atomic_load_relaxed() / atomic_write_relaxed(), but we do
> > not deal with atomics here.
> 
> Fair enough.  To me, the names suggest "compiler is allowed to apply
> relaxed constraints and tear the access if it wants" But apparently
> the common meaning is "relax, bro, I know what I'm doing".  If that's
> the case, I can roll with it.

What is the compiler / implementation supposed to do if there is no way
to do a single instruction load/store for the argument size?
Do we want sized versions instead and only provide the ones that are
available on a given architecture?

If both names and semantics are unclear, adding the macros maybe not a
good idea ;-)

Martin


Re: __{read,write}_once

2019-11-06 Thread Jason Thorpe



> On Nov 6, 2019, at 5:41 AM, Kamil Rytarowski  wrote:
> 
> On 06.11.2019 14:37, Jason Thorpe wrote:
>> 
>> 
>>> On Nov 6, 2019, at 4:45 AM, Kamil Rytarowski  wrote:
>>> 
>>> I propose __write_relaxed() / __read_relaxed().
>> 
>> ...except that seems to imply the opposite of what these do.
>> 
>> -- thorpej
>> 
> 
> Rationale?
> 
> This matches atomic_load_relaxed() / atomic_write_relaxed(), but we do
> not deal with atomics here.

Fair enough.  To me, the names suggest "compiler is allowed to apply relaxed 
constraints and tear the access if it wants" But apparently the common 
meaning is "relax, bro, I know what I'm doing".  If that's the case, I can roll 
with it.

-- thorpej



Re: __{read,write}_once

2019-11-06 Thread Jason Thorpe



> On Nov 6, 2019, at 4:45 AM, Kamil Rytarowski  wrote:
> 
> I propose __write_relaxed() / __read_relaxed().

...except that seems to imply the opposite of what these do.

-- thorpej



Re: __{read,write}_once

2019-11-06 Thread Kamil Rytarowski
On 06.11.2019 14:37, Jason Thorpe wrote:
> 
> 
>> On Nov 6, 2019, at 4:45 AM, Kamil Rytarowski  wrote:
>>
>> I propose __write_relaxed() / __read_relaxed().
> 
> ...except that seems to imply the opposite of what these do.
> 
> -- thorpej
> 

Rationale?

This matches atomic_load_relaxed() / atomic_write_relaxed(), but we do
not deal with atomics here.



signature.asc
Description: OpenPGP digital signature


Re: __{read,write}_once

2019-11-06 Thread Kamil Rytarowski
On 06.11.2019 12:51, Maxime Villard wrote:
> Le 06/11/2019 à 12:38, Martin Husemann a écrit :
>> On Wed, Nov 06, 2019 at 12:31:37PM +0100, Maxime Villard wrote:
>>> __read_once(x)
>>> __write_once(x, val)
>>
>> The names are really not helpfull for understanding what is supposed
>> to happen
>> here.
> 
> I don't know if you have a better suggestion, but it's basically the naming
> convention that seems to have been used for several years already. Maybe we
> could have something more explicit, like
> 
> __{read,write}_racy() or
> __{read,write}_parallel() or
> __{read,write}_concurrent()
> 
> But the last one is too long I think.

I propose __write_relaxed() / __read_relaxed().



signature.asc
Description: OpenPGP digital signature


Re: __{read,write}_once

2019-11-06 Thread Maxime Villard

Le 06/11/2019 à 12:38, Martin Husemann a écrit :

On Wed, Nov 06, 2019 at 12:31:37PM +0100, Maxime Villard wrote:

__read_once(x)
__write_once(x, val)


The names are really not helpfull for understanding what is supposed to happen
here.


I don't know if you have a better suggestion, but it's basically the naming
convention that seems to have been used for several years already. Maybe we
could have something more explicit, like

__{read,write}_racy() or
__{read,write}_parallel() or
__{read,write}_concurrent()

But the last one is too long I think.


Re: __{read,write}_once

2019-11-06 Thread Martin Husemann
On Wed, Nov 06, 2019 at 12:31:37PM +0100, Maxime Villard wrote:
>   __read_once(x)
>   __write_once(x, val)

The names are really not helpfull for understanding what is supposed to happen
here.

Martin


__{read,write}_once

2019-11-06 Thread Maxime Villard

There are cases in the kernel where we read/write global memory locklessly,
and accept the races either because it is part of the design (eg low-level
scheduling) or we simply don't care (eg global stats).

In these cases, we want to access the memory only once, and need to ensure
the compiler does not split that access in several pieces, some of which
may be changed by concurrent accesses. There is a Linux article [1] about
this, and also [2]. I'd like to introduce the following macros:

__read_once(x)
__write_once(x, val)

These macros:

 - Are supposed to ensure that reads/writes result in only one access. We
   basically just cast to volatile for now.

 - Serve as markers for KCSAN, to say "yes there is a race, but it's
   expected and you don't need to care".

 - Help understanding the code, ie the reader will easily see that the area
   can legitimately race.

In short, the main role they play is markers, to inform the reader and
sanitizers.

I've made a patch [3], with several changes already to use these macros.

Feel free to comment

[1] https://lwn.net/Articles/508991/
[2] https://github.com/google/ktsan/wiki/READ_ONCE-and-WRITE_ONCE
[3] https://m00nbsd.net/garbage/kcsan/access-once.diff