subject:"__\{read,write\}

On Mon, Nov 11, 2019 at 01:15:16PM -0500, Mouse wrote:
> > Uninterruptible means exactly that, there is a clear before and after
> > state and no interrupts can happen in between.
> 
> Is uninterruptible all you care about?  Or does it also need to be
> atomic with respect to other CPUs?  Eventually, of course, you'll want
> all those counter values on a single CPU - does that happen often
> enough to matter?
> 
> Also, does it actually need to be uninterruptible, or does it just need
> to be interrupt-tolerant?  I'm not as clear as I'd like on what the
> original desire here was, so I'm not sure but that we might be trying
> to solve a harder problem than necessary.

The update needs to be uninterruptable on the local CPU in the sense that
context switches and interrupts don't tear the R and W part of the RMW cycle
apart. x86 ALU instructions with memory operand as destination fit the
bill fine. It doesn't have to be atomic, no other CPU is supposed to
write to that cache line. It also generally doesn't matter whether they
see the old OR the new value, as long as it is either. We have full
control over alignment.

Joerg

Re: __{read,write}_once

2019-11-11 Thread Mouse

> Uninterruptible means exactly that, there is a clear before and after
> state and no interrupts can happen in between.

Is uninterruptible all you care about?  Or does it also need to be
atomic with respect to other CPUs?  Eventually, of course, you'll want
all those counter values on a single CPU - does that happen often
enough to matter?

Also, does it actually need to be uninterruptible, or does it just need
to be interrupt-tolerant?  I'm not as clear as I'd like on what the
original desire here was, so I'm not sure but that we might be trying
to solve a harder problem than necessary.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: __{read,write}_once

On Mon, Nov 11, 2019 at 11:51:26AM -0500, Mouse wrote:
> >>> (2) Use uninterruptible memory operations in per CPU memory,
> >>> aggregate passively on demand.
> > Problem is that (2) is natively only present on CISC platforms in
> > general.  Most RISC platforms can't do RMW in one instruction.
> 
> (2) says "uninterruptible", not "one instruction", though I'm not sure
> how large the difference is in practice.  (Also, some CISC platforms
> provide atomic memory RMW operations only under annoying restrictions;
> for example, I think the VAX has only three sorts of RMW memory
> accesses that are atomic with respect to other processors: ADAWI
> (16-bit-aligned 16-bit add), BB{SS,CC}I (test-and-{set/clear} single
> bits), and {INS,REM}Q{H,T}I (queue insert/remove at head/tail).)

The point here is that we really don't want to have bus locked
instructions for per-CPU counters. It would defeat the point of using
per-CPU counters in first place to a large degree. Uninterruptible means
exactly that, there is a clear before and after state and no interrupts
can happen in between. I don't know about ARM and friends for how
expensive masking interrupts is. It is quite expensive on x86. RMW
instructions are the other simple option for implementing them. (3)
would be the high effort version, doing RAS for kernel code as well.

Joerg

Re: __{read,write}_once

2019-11-11 Thread Mouse

>>> (2) Use uninterruptible memory operations in per CPU memory,
>>> aggregate passively on demand.
> Problem is that (2) is natively only present on CISC platforms in
> general.  Most RISC platforms can't do RMW in one instruction.

(2) says "uninterruptible", not "one instruction", though I'm not sure
how large the difference is in practice.  (Also, some CISC platforms
provide atomic memory RMW operations only under annoying restrictions;
for example, I think the VAX has only three sorts of RMW memory
accesses that are atomic with respect to other processors: ADAWI
(16-bit-aligned 16-bit add), BB{SS,CC}I (test-and-{set/clear} single
bits), and {INS,REM}Q{H,T}I (queue insert/remove at head/tail).)

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: __{read,write}_once

On Mon, Nov 11, 2019 at 02:39:26PM +0100, Maxime Villard wrote:
> Le 11/11/2019 à 13:51, Joerg Sonnenberger a écrit :
> > On Mon, Nov 11, 2019 at 11:02:47AM +0100, Maxime Villard wrote:
> > > Typically in sys/uvm/uvm_fault.c there are several lockless stat 
> > > increments
> > > like the following:
> > > 
> > >   /* Increment the counters.*/
> > >   uvmexp.fltanget++;
> > 
> > Wasn't the general consensus here to ideally have per-cpu counters here
> > that are aggregated passively?
> 
> In this specific case it's not going to work, because uvmexp is supposed to
> be read from a core dump, and the stats are expected to be found in the
> symbol...

That's just a reason to add code for doing the aggregation, not a reason
to avoid it.

> > I can think of three different options depending the platform:
> > 
> > (1) Use full atomic operations. Fine for true UP platforms and when the
> > overhead of per-CPU precise accounting is too high.
> > 
> > (2) Use uninterruptible memory operations in per CPU memory, aggregate
> > passively on demand.
> > 
> > (3) Emulate uninterruptible memory operations with kernel RAS, aggregate
> > passively on demand.
> 
> Generally I would do (2), but in this specific case I suggest (1).
> atomic_inc_uint is supposed to be implemented on each platform already, so
> it's easy to do.
> 
> (3) is a big headache for a very small reason, and it's not going to
> prevent inter-CPU races.

Problem is that (2) is natively only present on CISC platforms in
general. Most RISC platforms can't do RMW in one instruction. Inter-CPU
races don't matter as long as only the counter of the current CPU is
modified. The main difference between (1) and (2) is the bus lock. (1)
and (3) is primarily a question on whether the atomic operation can be
inlined, if it can't, (3) would still be much nicer.

Joerg

Re: __{read,write}_once

2019-11-11 Thread Maxime Villard


Le 11/11/2019 à 13:51, Joerg Sonnenberger a écrit :

On Mon, Nov 11, 2019 at 11:02:47AM +0100, Maxime Villard wrote:

Typically in sys/uvm/uvm_fault.c there are several lockless stat increments
like the following:

/* Increment the counters.*/
uvmexp.fltanget++;


Wasn't the general consensus here to ideally have per-cpu counters here
that are aggregated passively?


In this specific case it's not going to work, because uvmexp is supposed to
be read from a core dump, and the stats are expected to be found in the
symbol...


I can think of three different options depending the platform:

(1) Use full atomic operations. Fine for true UP platforms and when the
overhead of per-CPU precise accounting is too high.

(2) Use uninterruptible memory operations in per CPU memory, aggregate
passively on demand.

(3) Emulate uninterruptible memory operations with kernel RAS, aggregate
passively on demand.


Generally I would do (2), but in this specific case I suggest (1).
atomic_inc_uint is supposed to be implemented on each platform already, so
it's easy to do.

(3) is a big headache for a very small reason, and it's not going to
prevent inter-CPU races.

Re: __{read,write}_once

On Mon, Nov 11, 2019 at 11:02:47AM +0100, Maxime Villard wrote:
> Typically in sys/uvm/uvm_fault.c there are several lockless stat increments
> like the following:
> 
>   /* Increment the counters.*/
>   uvmexp.fltanget++;

Wasn't the general consensus here to ideally have per-cpu counters here
that are aggregated passively? I can think of three different options
depending the platform:

(1) Use full atomic operations. Fine for true UP platforms and when the
overhead of per-CPU precise accounting is too high.

(2) Use uninterruptible memory operations in per CPU memory, aggregate
passively on demand.

(3) Emulate uninterruptible memory operations with kernel RAS, aggregate
passively on demand.

Essentially, the only race condition we care about for the statistic
counters is via interrupts or scheduling. We can implement the
equivalent of x86's add with memory operand as destination using RAS, so
the only overhead would be the function call in that case.

Joerg

Re: __{read,write}_once

2019-11-11 Thread Maxime Villard


Le 08/11/2019 à 13:40, Mindaugas Rasiukevicius a écrit :

Maxime Villard  wrote:

They are "atomic" in a sense that they prevent from tearing, fusing and
invented loads/stores.  Terms and conditions apply (e.g. they assume
properly aligned and word-sized accesses).  Additionally, READ_ONCE()
provides a data-dependency barrier, applicable only to DEC Alpha.  I
think it was the right decision on the Linux side (even though trying
to get all the data-dependency barriers right these days is kind of a
lost cause, IMO).

So, READ_ONCE()/WRITE_ONCE() is more or less equivalent to the C11
atomic load/stores routines with memory_order_relaxed (or
memory_order_consume, if you care about DEC Alpha).


But... Didn't Marco just say that 'volatile' accesses do not actually
prevent tearing/fusing/invented loads/stores? READ_ONCE/WRITE_ONCE only
do volatile accesses.


Let me try to clarify:

- The main purpose of READ_ONCE()/WRITE_ONCE() is to provide a way to
perform atomic loads/stores (in a sense of preventing from the said
behaviours), even though they help to get the memory ordering right too.
Currently, 'volatile' is a key instrument in achieving that.  However,
as stated before, terms and conditions apply: 'volatile' just in itself
does not provide the guarantee; the loads/stores also have to be properly
aligned and word-sized (these are the pre-C11 assumptions we always had).
Note: C11 introduces atomic _types_, so that the compiler could leverage
the type system and thus provide the necessary guarantees.

- Having re-read Marco's emails in this thread, I think we are very much
in agreement.  I think he merely points out that 'volatile' in itself is
not sufficient; it does not mean it's not necessary.

- There is quite a bit of confusion regarding 'volatile' amongst the
developers.  This is partly because 'volatile' is arguably underspecified
in the C standard.  AFAIK, some people in the C standardization committee
have a view that it provides weaker guarantees; however, others maintain
that the intent has always been clear and the wording is sufficient.
Without going into the details (somewhat philosophical anyway), at least
for now, 'volatile' is a de facto ingredient (one of a few, but absolutely
necessary) in achieving atomic loads/stores.

Sorry if this is a bit repetitive, but I hope it gets the point across.


Alright, thanks.

Do you think there is a performance degradation with using explicitly atomic
operations (with the "lock" prefix on x86), compared to just using an aligned
volatile which may not be exactly atomic in that sense (even if it happens to
be on x86)?

Typically in sys/uvm/uvm_fault.c there are several lockless stat increments
like the following:

/* Increment the counters.*/
uvmexp.fltanget++;

In your (not just rmind@, but tech-kern@) opinion, is it better to switch to:
atomic_inc_uint(&uvmexp.fltanget);
or to
__add_once(uvmexp.fltanget, 1);
as I did in my patch? Considering that the latter may possibly not be atomic
on certain arches, which could possibly result in garbage when the value is
read.


To fix that, do you agree that I should
   - Remove the first branch (because no lockless fastpath possible)
   - Move down the second branch (KASSERT) right after the mutex_enter
?


Feel free to write a patch and I'll have a look at it once I have a
little bit more free time.


I've committed it right away, because this one seemed clear enough, feel free
to comment or improve if you have a better idea:

https://mail-index.netbsd.org/source-changes/2019/11/11/msg110726.html

Maxime

Re: __{read,write}_once

2019-11-08 Thread Kamil Rytarowski

On 08.11.2019 13:40, Mindaugas Rasiukevicius wrote:
>>> There is more code in the NetBSD kernel which needs fixing.  I would say
>>> pretty much all lock-free code should be audited.
>> I believe KCSAN can greatly help with that, since it automatically reports
>> concurrent accesses. Up to us then to switch to atomic, or other kinds of
>> markers like READ_ONCE.
> Is there a CSAN i.e. such sanitizer for userspace applications?

No.. there is TSan. If it would be useful to get CSan in userspace it
probably shouldn't be too difficult to write it. At least full TSan is
heavy and requires 64bit CPU.



signature.asc
Description: OpenPGP digital signature

Re: __{read,write}_once

2019-11-08 Thread Mindaugas Rasiukevicius

Maxime Villard  wrote:
> > They are "atomic" in a sense that they prevent from tearing, fusing and
> > invented loads/stores.  Terms and conditions apply (e.g. they assume
> > properly aligned and word-sized accesses).  Additionally, READ_ONCE()
> > provides a data-dependency barrier, applicable only to DEC Alpha.  I
> > think it was the right decision on the Linux side (even though trying
> > to get all the data-dependency barriers right these days is kind of a
> > lost cause, IMO).
> > 
> > So, READ_ONCE()/WRITE_ONCE() is more or less equivalent to the C11
> > atomic load/stores routines with memory_order_relaxed (or
> > memory_order_consume, if you care about DEC Alpha).
> 
> But... Didn't Marco just say that 'volatile' accesses do not actually
> prevent tearing/fusing/invented loads/stores? READ_ONCE/WRITE_ONCE only
> do volatile accesses.

Let me try to clarify:

- The main purpose of READ_ONCE()/WRITE_ONCE() is to provide a way to
perform atomic loads/stores (in a sense of preventing from the said
behaviours), even though they help to get the memory ordering right too.
Currently, 'volatile' is a key instrument in achieving that.  However,
as stated before, terms and conditions apply: 'volatile' just in itself
does not provide the guarantee; the loads/stores also have to be properly
aligned and word-sized (these are the pre-C11 assumptions we always had).
Note: C11 introduces atomic _types_, so that the compiler could leverage
the type system and thus provide the necessary guarantees.

- Having re-read Marco's emails in this thread, I think we are very much
in agreement.  I think he merely points out that 'volatile' in itself is
not sufficient; it does not mean it's not necessary.

- There is quite a bit of confusion regarding 'volatile' amongst the
developers.  This is partly because 'volatile' is arguably underspecified
in the C standard.  AFAIK, some people in the C standardization committee
have a view that it provides weaker guarantees; however, others maintain
that the intent has always been clear and the wording is sufficient.
Without going into the details (somewhat philosophical anyway), at least
for now, 'volatile' is a de facto ingredient (one of a few, but absolutely
necessary) in achieving atomic loads/stores.

Sorry if this is a bit repetitive, but I hope it gets the point across.

> 
> To fix that, do you agree that I should
>   - Remove the first branch (because no lockless fastpath possible)
>   - Move down the second branch (KASSERT) right after the mutex_enter
> ?

Feel free to write a patch and I'll have a look at it once I have a
little bit more free time.

> 
> > There is more code in the NetBSD kernel which needs fixing.  I would say
> > pretty much all lock-free code should be audited.
> 
> I believe KCSAN can greatly help with that, since it automatically reports
> concurrent accesses. Up to us then to switch to atomic, or other kinds of
> markers like READ_ONCE.

Is there a CSAN i.e. such sanitizer for userspace applications?

Thanks.

-- 
Mindaugas

Re: __{read,write}_once

2019-11-07 Thread David Holland

On Wed, Nov 06, 2019 at 10:14:48PM +0100, Kamil Rytarowski wrote:
 > I have no opinion here, just please do the right thing. Unless there are
 > any shortcomings it would be nice to follow closely C11.

There are plenty of shortcomings in C11. We should concentrate on
abstractions that are correct first.

(also, volatile is ~useless for this)

-- 
David A. Holland
dholl...@netbsd.org

Re: __{read,write}_once

2019-11-07 Thread Maxime Villard


Le 06/11/2019 à 23:41, Mindaugas Rasiukevicius a écrit :

Maxime Villard  wrote:

- If we do not want to stick with the C11 API (its emulation), then I
would suggest to use the similar terminology, e.g. atomic_load_relaxed()
and atomic_store_relaxed(), or Linux READ_ONCE()/WRITE_ONCE().  There is
no reason invent new terminology; it might just add more confusion.


But... Linux's READ_ONCE/WRITE_ONCE are not actually atomic, is that
correct? So there is a significant semantic difference.


They are "atomic" in a sense that they prevent from tearing, fusing and
invented loads/stores.  Terms and conditions apply (e.g. they assume properly
aligned and word-sized accesses).  Additionally, READ_ONCE() provides a
data-dependency barrier, applicable only to DEC Alpha.  I think it was
the right decision on the Linux side (even though trying to get all the
data-dependency barriers right these days is kind of a lost cause, IMO).

So, READ_ONCE()/WRITE_ONCE() is more or less equivalent to the C11 atomic
load/stores routines with memory_order_relaxed (or memory_order_consume,
if you care about DEC Alpha).


But... Didn't Marco just say that 'volatile' accesses do not actually prevent
tearing/fusing/invented loads/stores? READ_ONCE/WRITE_ONCE only do volatile
accesses.


Also, in my original patch I marked two branches of subr_xcall.c, but it
seems to me now they are actually races: ie xc_wait(), the read of
'xc_donep' could be made by two 4-byte fetches on 32bit arches (at
least). Between the two, an xc thread could have updated the value. So
there is an actual race which could possibly result in returning early
while we shouldn't have. Does that look correct to you?


Correct.


Alright, so we have the first bug found by KCSAN :)

To fix that, do you agree that I should
 - Remove the first branch (because no lockless fastpath possible)
 - Move down the second branch (KASSERT) right after the mutex_enter
?


There is more code in the NetBSD kernel which needs fixing.  I would say
pretty much all lock-free code should be audited.


I believe KCSAN can greatly help with that, since it automatically reports
concurrent accesses. Up to us then to switch to atomic, or other kinds of
markers like READ_ONCE.

Re: __{read,write}_once

2019-11-06 Thread Mindaugas Rasiukevicius

Maxime Villard  wrote:
> > - If we do not want to stick with the C11 API (its emulation), then I
> > would suggest to use the similar terminology, e.g. atomic_load_relaxed()
> > and atomic_store_relaxed(), or Linux READ_ONCE()/WRITE_ONCE().  There is
> > no reason invent new terminology; it might just add more confusion.
> 
> But... Linux's READ_ONCE/WRITE_ONCE are not actually atomic, is that
> correct? So there is a significant semantic difference.

They are "atomic" in a sense that they prevent from tearing, fusing and
invented loads/stores.  Terms and conditions apply (e.g. they assume properly
aligned and word-sized accesses).  Additionally, READ_ONCE() provides a
data-dependency barrier, applicable only to DEC Alpha.  I think it was
the right decision on the Linux side (even though trying to get all the
data-dependency barriers right these days is kind of a lost cause, IMO).

So, READ_ONCE()/WRITE_ONCE() is more or less equivalent to the C11 atomic
load/stores routines with memory_order_relaxed (or memory_order_consume,
if you care about DEC Alpha).

> Also, in my original patch I marked two branches of subr_xcall.c, but it
> seems to me now they are actually races: ie xc_wait(), the read of
> 'xc_donep' could be made by two 4-byte fetches on 32bit arches (at
> least). Between the two, an xc thread could have updated the value. So
> there is an actual race which could possibly result in returning early
> while we shouldn't have. Does that look correct to you?

Correct.

There is more code in the NetBSD kernel which needs fixing.  I would say
pretty much all lock-free code should be audited.

Back in time, certain liberties compilers have were not fully understood or
were not considered as real problems (after all, C memory model concerning
many of these aspects was not defined).  Times changed, compilers became
much more aggressive and the old code needs to catch up.

-- 
Mindaugas

Re: __{read,write}_once

On 06.11.2019 20:38, Mindaugas Rasiukevicius wrote:
> Maxime Villard  wrote:
>> There are cases in the kernel where we read/write global memory
>> locklessly, and accept the races either because it is part of the design
>> (eg low-level scheduling) or we simply don't care (eg global stats).
>>
>> In these cases, we want to access the memory only once, and need to ensure
>> the compiler does not split that access in several pieces, some of which
>> may be changed by concurrent accesses. There is a Linux article [1] about
>> this, and also [2]. I'd like to introduce the following macros:
>>
>> <...>
> 
> A few comments for everybody here:
> 
> - There is a general need in the NetBSD kernel for atomic load/store
> semantics.  This is because plain accesses (loads/stores) are subject
> to _tearing_, _fusing_ and _invented_ loads/stores.  I created a new
> thread which might help to clarify these and various other aspects:
> 
> http://mail-index.netbsd.org/tech-kern/2019/11/06/msg025664.html
> 

Thank you.

> - In C11, this can be handled with atomic_{load,store}_explicit() and
> memory_order_relaxed (or stronger memory order).
> 
> - If we do not want to stick with the C11 API (its emulation), then I
> would suggest to use the similar terminology, e.g. atomic_load_relaxed()
> and atomic_store_relaxed(), or Linux READ_ONCE()/WRITE_ONCE().  There is
> no reason invent new terminology; it might just add more confusion.
> 
> Thanks.
> 

I have no opinion here, just please do the right thing. Unless there are
any shortcomings it would be nice to follow closely C11.

If that information helps or not, we also need C11/C++14 libatomic-like
library in userland.



signature.asc
Description: OpenPGP digital signature

re: __{read,write}_once

2019-11-06 Thread matthew green

> - If we do not want to stick with the C11 API (its emulation), then I
> would suggest to use the similar terminology, e.g. atomic_load_relaxed()
> and atomic_store_relaxed(), or Linux READ_ONCE()/WRITE_ONCE().  There is
> no reason invent new terminology; it might just add more confusion.

i really do not like the "once" name.

even though i'm familiar with the actual desired semantics,
i have to ignore my builtin meaning that comes first.  to me,
"once" initially reads as compiler will only do it one time,
ever, not every time it is called.  like pthread "once".

can we make them __read_always() and __write_always()?


.mrg.

Re: __{read,write}_once


Le 06/11/2019 à 20:38, Mindaugas Rasiukevicius a écrit :

Maxime Villard  wrote:

There are cases in the kernel where we read/write global memory
locklessly, and accept the races either because it is part of the design
(eg low-level scheduling) or we simply don't care (eg global stats).

In these cases, we want to access the memory only once, and need to ensure
the compiler does not split that access in several pieces, some of which
may be changed by concurrent accesses. There is a Linux article [1] about
this, and also [2]. I'd like to introduce the following macros:

<...>


A few comments for everybody here:

- There is a general need in the NetBSD kernel for atomic load/store
semantics.  This is because plain accesses (loads/stores) are subject
to _tearing_, _fusing_ and _invented_ loads/stores.  I created a new
thread which might help to clarify these and various other aspects:

http://mail-index.netbsd.org/tech-kern/2019/11/06/msg025664.html


Thanks


- In C11, this can be handled with atomic_{load,store}_explicit() and
memory_order_relaxed (or stronger memory order).

- If we do not want to stick with the C11 API (its emulation), then I
would suggest to use the similar terminology, e.g. atomic_load_relaxed()
and atomic_store_relaxed(), or Linux READ_ONCE()/WRITE_ONCE().  There is
no reason invent new terminology; it might just add more confusion.


But... Linux's READ_ONCE/WRITE_ONCE are not actually atomic, is that correct?
So there is a significant semantic difference.

Also, in my original patch I marked two branches of subr_xcall.c, but it
seems to me now they are actually races: ie xc_wait(), the read of 'xc_donep'
could be made by two 4-byte fetches on 32bit arches (at least). Between the
two, an xc thread could have updated the value. So there is an actual race
which could possibly result in returning early while we shouldn't have. Does
that look correct to you?

Thanks

Re: __{read,write}_once

2019-11-06 Thread Mindaugas Rasiukevicius

Maxime Villard  wrote:
> There are cases in the kernel where we read/write global memory
> locklessly, and accept the races either because it is part of the design
> (eg low-level scheduling) or we simply don't care (eg global stats).
> 
> In these cases, we want to access the memory only once, and need to ensure
> the compiler does not split that access in several pieces, some of which
> may be changed by concurrent accesses. There is a Linux article [1] about
> this, and also [2]. I'd like to introduce the following macros:
>
> <...>

A few comments for everybody here:

- There is a general need in the NetBSD kernel for atomic load/store
semantics.  This is because plain accesses (loads/stores) are subject
to _tearing_, _fusing_ and _invented_ loads/stores.  I created a new
thread which might help to clarify these and various other aspects:

http://mail-index.netbsd.org/tech-kern/2019/11/06/msg025664.html

- In C11, this can be handled with atomic_{load,store}_explicit() and
memory_order_relaxed (or stronger memory order).

- If we do not want to stick with the C11 API (its emulation), then I
would suggest to use the similar terminology, e.g. atomic_load_relaxed()
and atomic_store_relaxed(), or Linux READ_ONCE()/WRITE_ONCE().  There is
no reason invent new terminology; it might just add more confusion.

Thanks.

-- 
Mindaugas

Re: __{read,write}_once

2019-11-06 Thread Marco Elver

On Wed, 6 Nov 2019 at 18:08, Maxime Villard  wrote:
>
> Le 06/11/2019 à 17:37, Marco Elver a écrit :
> > On Wed, 6 Nov 2019 at 16:51, Kamil Rytarowski  wrote:
> >>
> >> On 06.11.2019 16:44, Kamil Rytarowski wrote:
> >>> On 06.11.2019 15:57, Jason Thorpe wrote:
> >>>>
> >>>>
> >>>>> On Nov 6, 2019, at 5:41 AM, Kamil Rytarowski  wrote:
> >>>>>
> >>>>> On 06.11.2019 14:37, Jason Thorpe wrote:
> >>>>>>
> >>>>>>
> >>>>>>> On Nov 6, 2019, at 4:45 AM, Kamil Rytarowski  wrote:
> >>>>>>>
> >>>>>>> I propose __write_relaxed() / __read_relaxed().
> >>>>>>
> >>>>>> ...except that seems to imply the opposite of what these do.
> >>>>>>
> >>>>>> -- thorpej
> >>>>>>
> >>>>>
> >>>>> Rationale?
> >>>>>
> >>>>> This matches atomic_load_relaxed() / atomic_write_relaxed(), but we do
> >>>>> not deal with atomics here.
> >>>>
> >>>> Fair enough.  To me, the names suggest "compiler is allowed to apply 
> >>>> relaxed constraints and tear the access if it wants" But apparently 
> >>>> the common meaning is "relax, bro, I know what I'm doing".  If that's 
> >>>> the case, I can roll with it.
> >
> > See below.
> >
> >>>> -- thorpej
> >>>>
> >>>
> >>> Unless I mean something this is exactly about relaxed constraints.
> >>
> >> miss*
> >>
> >>>
> >>> "Relaxed operation: there are no synchronization or ordering constraints
> >>> imposed on other reads or writes" and without "operation's atomicity is
> >>> guaranteed".
> >
> > In the memory consistency model world, "relaxed" has a very specific
> > meaning for loads/stores. It simply means that the compiler and
> > architecture is free to reorder the memory operation with respect to
> > previous or following memory operations in program order. But the
> > load/store still happens as one atomic operation.
> >
> > For programming-language memory models, "relaxed" appears in the
> > context of atomic memory operations, to define their ordering w.r.t.
> > other operations in program order. There is a distinction between
> > "relaxed atomic" and plain (unannotated) memory operations at the
> > programming language level.
> >
> > For plain operations, the compiler, in addition to the target
> > architecture, is free to reorder operations or even apply
> > optimizations that turn single plain operations into multiple
> > loads/stores [1, 2].
> > [1] https://lwn.net/Articles/793253/
> > [2] https://lwn.net/Articles/799218/
> >
> > In the Linux kernel READ_ONCE/WRITE_ONCE are one way to specify
> > "relaxed atomic read/write" for accesses that fit in a word (_ONCE
> > also works on things larger than a word, but probably aren't atomic
> > anymore). For most architectures the implementation is the same, and
> > merely avoids compiler optimizations; the exception is Alpha, where
> > READ_ONCE adds a memory barrier [3].
> > [3] https://www.kernel.org/doc/Documentation/memory-barriers.txt
> >
> >>> This is also similar to what suggested Google to apply to NetBSD in our
> >>> internal thread, but with a bit different naming.
> >
> > What you call the ops is entirely up to you. I would not use
> > '{read,write}_relaxed', since as you pointed out above, is probably
> > more confusing. This is why I suggested 'atomic_{read,write}_relaxed'.
> > If you do not like the 'atomic' in there, the next best thing is
> > probably '{read,write}_once'.
> >
> > Just note that you're now defining your own memory consistency model.
> > This is a complex topic that spans decades of work. The safest thing
> > is to stick as closely to the C11/C++11 memory model as possible [4].
> > The Linux kernel also has a memory model [5], but is significantly
> > more complex since it was defined after various primitives had been
> > introduced.
> > [4] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1479.htm
> > [5] 
> > https://github.com/torvalds/linux/blob/master/tools/memory-model/Documentation/explanation.txt
> >
> > Best Wishes,
>

Re: __{read,write}_once

2019-11-06 Thread Mouse

>   - Is 'volatile' a guarantee that the compiler will emit only one
> instruction to fetch the value?

No.  For some types, some compilers, and some architectures, it may be,
but it is implementation-dependent.  (For some types, on some
architectures, it cannot be; for example, consider a 64-bit volatile
value on an architecture without 64-bit memory access primitives.)

What volatile is designed for is to ensure that the generated code
actually accesses the value.  Consider

count = 0;
while (count < 10) {
if (*register & 0x0400) count ++;
}

If register is "uint32_t *", the compiler is permitted to - and some
will - hoist the load, effectively turning this into

count = 0;
temp = *register;
while (count < 10) {
if (temp & 0x0400) count ++;
}

which is unlikely to do what you want.  If register is instead declared
"volatile uint32_t *", then *register is volatile and thus must be
accessed each time through the loop.

volatile also compels such caching in some circumstances; for exmaple,
if the code is

uint32_t *register;
uint32_t v;
v = *register;
call((v+4)*v);

then, under certain circumstances (for example, v is auto and its
address is never taken) the compiler is permitted to turn that into

call(((*register)+4) * *register)

and, on some architectures, that would be a win.  (It's less likely to
happen than the other way around, but, if addressing modes are cheap,
registers are few, and memory is fast compared to CPU cycles, the
compiler may do that.)  But make it "volatile uint32_t *register;" and
such an optimization is not permitted.

>   - Assuming there is only one instruction, strictly speaking the
> fetch is not guaranteed to be atomic, is that correct?

Right.  What is or isn't atomic is architecture-dependent; there is, I
think, _nothing_ you can do in C that can guarantee atomicity (or, for
that matter, guarantee non-atomicity), since atomicity is fundamentally
an architecture-dependent notion.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: __{read,write}_once

Le 06/11/2019 à 17:37, Marco Elver a écrit :

On Wed, 6 Nov 2019 at 16:51, Kamil Rytarowski wrote:

On 06.11.2019 16:44, Kamil Rytarowski wrote:

On 06.11.2019 15:57, Jason Thorpe wrote:

On Nov 6, 2019, at 5:41 AM, Kamil Rytarowski wrote:

On 06.11.2019 14:37, Jason Thorpe wrote:

On Nov 6, 2019, at 4:45 AM, Kamil Rytarowski wrote:

I propose __write_relaxed() / __read_relaxed().

...except that seems to imply the opposite of what these do.

-- thorpej

Rationale?

This matches atomic_load_relaxed() / atomic_write_relaxed(), but we do
not deal with atomics here.

Fair enough. To me, the names suggest "compiler is allowed to apply relaxed constraints and
tear the access if it wants" But apparently the common meaning is "relax, bro, I know
what I'm doing". If that's the case, I can roll with it.

See below.

-- thorpej

Unless I mean something this is exactly about relaxed constraints.

miss*

"Relaxed operation: there are no synchronization or ordering constraints
imposed on other reads or writes" and without "operation's atomicity is
guaranteed".

In the memory consistency model world, "relaxed" has a very specific
meaning for loads/stores. It simply means that the compiler and
architecture is free to reorder the memory operation with respect to
previous or following memory operations in program order. But the
load/store still happens as one atomic operation.

For programming-language memory models, "relaxed" appears in the
context of atomic memory operations, to define their ordering w.r.t.
other operations in program order. There is a distinction between
"relaxed atomic" and plain (unannotated) memory operations at the
programming language level.

For plain operations, the compiler, in addition to the target
architecture, is free to reorder operations or even apply
optimizations that turn single plain operations into multiple
loads/stores [1, 2].
[1] https://lwn.net/Articles/793253/
[2] https://lwn.net/Articles/799218/

In the Linux kernel READ_ONCE/WRITE_ONCE are one way to specify
"relaxed atomic read/write" for accesses that fit in a word (_ONCE
also works on things larger than a word, but probably aren't atomic
anymore). For most architectures the implementation is the same, and
merely avoids compiler optimizations; the exception is Alpha, where
READ_ONCE adds a memory barrier [3].
[3] https://www.kernel.org/doc/Documentation/memory-barriers.txt

This is also similar to what suggested Google to apply to NetBSD in our
internal thread, but with a bit different naming.

What you call the ops is entirely up to you. I would not use
'{read,write}_relaxed', since as you pointed out above, is probably
more confusing. This is why I suggested 'atomic_{read,write}_relaxed'.
If you do not like the 'atomic' in there, the next best thing is
probably '{read,write}_once'.

Just note that you're now defining your own memory consistency model.
This is a complex topic that spans decades of work. The safest thing
is to stick as closely to the C11/C++11 memory model as possible [4].
The Linux kernel also has a memory model [5], but is significantly
more complex since it was defined after various primitives had been
introduced.
[4] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1479.htm
[5]
https://github.com/torvalds/linux/blob/master/tools/memory-model/Documentation/explanation.txt

Best Wishes,
-- Marco

Thanks for the details.

I must say I still have a doubt about the instructions actually emited
by the compiler.

Let's take READ_ONCE for example. What it does is basically just casting
the pointer to volatile and doing the access. Two points of confusion:

- Is 'volatile' a guarantee that the compiler will emit only one
instruction to fetch the value?

- Assuming there is only one instruction, strictly speaking the fetch is
not guaranteed to be atomic, is that correct? Because the address may
not be naturally aligned; or the cpu may not be x86 and thus may not
provide automatic atomicity in such case.

Thanks

Re: __{read,write}_once

2019-11-06 Thread Marco Elver

On Wed, 6 Nov 2019 at 16:51, Kamil Rytarowski  wrote:
>
> On 06.11.2019 16:44, Kamil Rytarowski wrote:
> > On 06.11.2019 15:57, Jason Thorpe wrote:
> >>
> >>
> >>> On Nov 6, 2019, at 5:41 AM, Kamil Rytarowski  wrote:
> >>>
> >>> On 06.11.2019 14:37, Jason Thorpe wrote:
> >>>>
> >>>>
> >>>>> On Nov 6, 2019, at 4:45 AM, Kamil Rytarowski  wrote:
> >>>>>
> >>>>> I propose __write_relaxed() / __read_relaxed().
> >>>>
> >>>> ...except that seems to imply the opposite of what these do.
> >>>>
> >>>> -- thorpej
> >>>>
> >>>
> >>> Rationale?
> >>>
> >>> This matches atomic_load_relaxed() / atomic_write_relaxed(), but we do
> >>> not deal with atomics here.
> >>
> >> Fair enough.  To me, the names suggest "compiler is allowed to apply 
> >> relaxed constraints and tear the access if it wants" But apparently 
> >> the common meaning is "relax, bro, I know what I'm doing".  If that's the 
> >> case, I can roll with it.

See below.

> >> -- thorpej
> >>
> >
> > Unless I mean something this is exactly about relaxed constraints.
>
> miss*
>
> >
> > "Relaxed operation: there are no synchronization or ordering constraints
> > imposed on other reads or writes" and without "operation's atomicity is
> > guaranteed".

In the memory consistency model world, "relaxed" has a very specific
meaning for loads/stores. It simply means that the compiler and
architecture is free to reorder the memory operation with respect to
previous or following memory operations in program order. But the
load/store still happens as one atomic operation.

For programming-language memory models, "relaxed" appears in the
context of atomic memory operations, to define their ordering w.r.t.
other operations in program order. There is a distinction between
"relaxed atomic" and plain (unannotated) memory operations at the
programming language level.

For plain operations, the compiler, in addition to the target
architecture, is free to reorder operations or even apply
optimizations that turn single plain operations into multiple
loads/stores [1, 2].
[1] https://lwn.net/Articles/793253/
[2] https://lwn.net/Articles/799218/

In the Linux kernel READ_ONCE/WRITE_ONCE are one way to specify
"relaxed atomic read/write" for accesses that fit in a word (_ONCE
also works on things larger than a word, but probably aren't atomic
anymore). For most architectures the implementation is the same, and
merely avoids compiler optimizations; the exception is Alpha, where
READ_ONCE adds a memory barrier [3].
[3] https://www.kernel.org/doc/Documentation/memory-barriers.txt

> > This is also similar to what suggested Google to apply to NetBSD in our
> > internal thread, but with a bit different naming.

What you call the ops is entirely up to you. I would not use
'{read,write}_relaxed', since as you pointed out above, is probably
more confusing. This is why I suggested 'atomic_{read,write}_relaxed'.
If you do not like the 'atomic' in there, the next best thing is
probably '{read,write}_once'.

Just note that you're now defining your own memory consistency model.
This is a complex topic that spans decades of work. The safest thing
is to stick as closely to the C11/C++11 memory model as possible [4].
The Linux kernel also has a memory model [5], but is significantly
more complex since it was defined after various primitives had been
introduced.
[4] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1479.htm
[5] 
https://github.com/torvalds/linux/blob/master/tools/memory-model/Documentation/explanation.txt

Best Wishes,
-- Marco

> Adding Marco to this thread.
>

Re: __{read,write}_once

On 06.11.2019 16:58, David Young wrote:
> I *think* the intention is for __read_once()/__write_once() to
> load/store the entire variable from/to memory precisely once.  They
> provide no guarantees about atomicity of the load/store.  Should
> something be said about ordering and visibility of stores?

The original intention is to mark reads and writes racy-OK.

once is a bad name as it suggests some variation of RUN_ONCE(). I am for
memory ordering name relaxed, borrowed from the terminology atomics.

signature.asc
Description: OpenPGP digital signature

Re: __{read,write}_once

2019-11-06 Thread Andrew Cagney

On Wed, 6 Nov 2019 at 09:57, Jason Thorpe  wrote:
>
>
>
> > On Nov 6, 2019, at 5:41 AM, Kamil Rytarowski  wrote:
> >
> > On 06.11.2019 14:37, Jason Thorpe wrote:
> >>
> >>
> >>> On Nov 6, 2019, at 4:45 AM, Kamil Rytarowski  wrote:
> >>>
> >>> I propose __write_relaxed() / __read_relaxed().
> >>
> >> ...except that seems to imply the opposite of what these do.
> >>
> >> -- thorpej
> >>
> >
> > Rationale?
> >
> > This matches atomic_load_relaxed() / atomic_write_relaxed(), but we do
> > not deal with atomics here.
>
> Fair enough.  To me, the names suggest "compiler is allowed to apply relaxed 
> constraints and tear the access if it wants" But apparently the common 
> meaning is "relax, bro, I know what I'm doing".  If that's the case, I can 
> roll with it.

Honestly, without reading any code, I interpretation is more inline
with the former:
  hey relax, maybe it happens, maybe it doesn't but well, nothing
matters so what ever

Re: __{read,write}_once

2019-11-06 Thread David Young

On Wed, Nov 06, 2019 at 06:57:07AM -0800, Jason Thorpe wrote:
> 
> 
> > On Nov 6, 2019, at 5:41 AM, Kamil Rytarowski  wrote:
> > 
> > On 06.11.2019 14:37, Jason Thorpe wrote:
> >> 
> >> 
> >>> On Nov 6, 2019, at 4:45 AM, Kamil Rytarowski  wrote:
> >>> 
> >>> I propose __write_relaxed() / __read_relaxed().
> >> 
> >> ...except that seems to imply the opposite of what these do.
> >> 
> >> -- thorpej
> >> 
> > 
> > Rationale?
> > 
> > This matches atomic_load_relaxed() / atomic_write_relaxed(), but we do
> > not deal with atomics here.
> 
> Fair enough.  To me, the names suggest "compiler is allowed to apply relaxed 
> constraints and tear the access if it wants" But apparently the common 
> meaning is "relax, bro, I know what I'm doing".  If that's the case, I can 
> roll with it.

After reading this conversation, I'm not sure of the semantics.

I *think* the intention is for __read_once()/__write_once() to
load/store the entire variable from/to memory precisely once.  They
provide no guarantees about atomicity of the load/store.  Should
something be said about ordering and visibility of stores?

If x is initialized to 0xf00dd00f, two threads start, and thread
1 performs __read_once(x) concurrently with thread 2 performing
__write_once(x, 0xfeedbeef), then what values can thread 1 read?

Do __read_once()/__write_once() have any semantics with respect to
interrupts?

Dave

-- 
David Young
dyo...@pobox.comUrbana, IL(217) 721-9981

Re: __{read,write}_once

On 06.11.2019 16:44, Kamil Rytarowski wrote:
> On 06.11.2019 15:57, Jason Thorpe wrote:
>>
>>
>>> On Nov 6, 2019, at 5:41 AM, Kamil Rytarowski  wrote:
>>>
>>> On 06.11.2019 14:37, Jason Thorpe wrote:


> On Nov 6, 2019, at 4:45 AM, Kamil Rytarowski  wrote:
>
> I propose __write_relaxed() / __read_relaxed().

 ...except that seems to imply the opposite of what these do.

 -- thorpej

>>>
>>> Rationale?
>>>
>>> This matches atomic_load_relaxed() / atomic_write_relaxed(), but we do
>>> not deal with atomics here.
>>
>> Fair enough.  To me, the names suggest "compiler is allowed to apply relaxed 
>> constraints and tear the access if it wants" But apparently the common 
>> meaning is "relax, bro, I know what I'm doing".  If that's the case, I can 
>> roll with it.
>>
>> -- thorpej
>>
> 
> Unless I mean something this is exactly about relaxed constraints.

miss*

> 
> "Relaxed operation: there are no synchronization or ordering constraints
> imposed on other reads or writes" and without "operation's atomicity is
> guaranteed".
> 
> This is also similar to what suggested Google to apply to NetBSD in our
> internal thread, but with a bit different naming.
> 

Adding Marco to this thread.



signature.asc
Description: OpenPGP digital signature

Re: __{read,write}_once

On 06.11.2019 15:57, Jason Thorpe wrote:
> 
> 
>> On Nov 6, 2019, at 5:41 AM, Kamil Rytarowski  wrote:
>>
>> On 06.11.2019 14:37, Jason Thorpe wrote:
>>>
>>>
 On Nov 6, 2019, at 4:45 AM, Kamil Rytarowski  wrote:

 I propose __write_relaxed() / __read_relaxed().
>>>
>>> ...except that seems to imply the opposite of what these do.
>>>
>>> -- thorpej
>>>
>>
>> Rationale?
>>
>> This matches atomic_load_relaxed() / atomic_write_relaxed(), but we do
>> not deal with atomics here.
> 
> Fair enough.  To me, the names suggest "compiler is allowed to apply relaxed 
> constraints and tear the access if it wants" But apparently the common 
> meaning is "relax, bro, I know what I'm doing".  If that's the case, I can 
> roll with it.
> 
> -- thorpej
> 

Unless I mean something this is exactly about relaxed constraints.

"Relaxed operation: there are no synchronization or ordering constraints
imposed on other reads or writes" and without "operation's atomicity is
guaranteed".

This is also similar to what suggested Google to apply to NetBSD in our
internal thread, but with a bit different naming.



signature.asc
Description: OpenPGP digital signature

Re: __{read,write}_once

2019-11-06 Thread Martin Husemann

On Wed, Nov 06, 2019 at 06:57:07AM -0800, Jason Thorpe wrote:
> > This matches atomic_load_relaxed() / atomic_write_relaxed(), but we do
> > not deal with atomics here.
> 
> Fair enough.  To me, the names suggest "compiler is allowed to apply
> relaxed constraints and tear the access if it wants" But apparently
> the common meaning is "relax, bro, I know what I'm doing".  If that's
> the case, I can roll with it.

What is the compiler / implementation supposed to do if there is no way
to do a single instruction load/store for the argument size?
Do we want sized versions instead and only provide the ones that are
available on a given architecture?

If both names and semantics are unclear, adding the macros maybe not a
good idea ;-)

Martin

Re: __{read,write}_once

2019-11-06 Thread Jason Thorpe

> On Nov 6, 2019, at 5:41 AM, Kamil Rytarowski  wrote:
> 
> On 06.11.2019 14:37, Jason Thorpe wrote:
>> 
>> 
>>> On Nov 6, 2019, at 4:45 AM, Kamil Rytarowski  wrote:
>>> 
>>> I propose __write_relaxed() / __read_relaxed().
>> 
>> ...except that seems to imply the opposite of what these do.
>> 
>> -- thorpej
>> 
> 
> Rationale?
> 
> This matches atomic_load_relaxed() / atomic_write_relaxed(), but we do
> not deal with atomics here.

Fair enough.  To me, the names suggest "compiler is allowed to apply relaxed 
constraints and tear the access if it wants" But apparently the common 
meaning is "relax, bro, I know what I'm doing".  If that's the case, I can roll 
with it.

-- thorpej

Re: __{read,write}_once

2019-11-06 Thread Jason Thorpe




> On Nov 6, 2019, at 4:45 AM, Kamil Rytarowski  wrote:
> 
> I propose __write_relaxed() / __read_relaxed().

...except that seems to imply the opposite of what these do.

-- thorpej

Re: __{read,write}_once

On 06.11.2019 14:37, Jason Thorpe wrote:
> 
> 
>> On Nov 6, 2019, at 4:45 AM, Kamil Rytarowski  wrote:
>>
>> I propose __write_relaxed() / __read_relaxed().
> 
> ...except that seems to imply the opposite of what these do.
> 
> -- thorpej
> 

Rationale?

This matches atomic_load_relaxed() / atomic_write_relaxed(), but we do
not deal with atomics here.



signature.asc
Description: OpenPGP digital signature

Re: __{read,write}_once

On 06.11.2019 12:51, Maxime Villard wrote:
> Le 06/11/2019 à 12:38, Martin Husemann a écrit :
>> On Wed, Nov 06, 2019 at 12:31:37PM +0100, Maxime Villard wrote:
>>> __read_once(x)
>>> __write_once(x, val)
>>
>> The names are really not helpfull for understanding what is supposed
>> to happen
>> here.
> 
> I don't know if you have a better suggestion, but it's basically the naming
> convention that seems to have been used for several years already. Maybe we
> could have something more explicit, like
> 
> __{read,write}_racy() or
> __{read,write}_parallel() or
> __{read,write}_concurrent()
> 
> But the last one is too long I think.

I propose __write_relaxed() / __read_relaxed().



signature.asc
Description: OpenPGP digital signature

Re: __{read,write}_once


Le 06/11/2019 à 12:38, Martin Husemann a écrit :

On Wed, Nov 06, 2019 at 12:31:37PM +0100, Maxime Villard wrote:

__read_once(x)
__write_once(x, val)


The names are really not helpfull for understanding what is supposed to happen
here.


I don't know if you have a better suggestion, but it's basically the naming
convention that seems to have been used for several years already. Maybe we
could have something more explicit, like

__{read,write}_racy() or
__{read,write}_parallel() or
__{read,write}_concurrent()

But the last one is too long I think.

Re: __{read,write}_once

2019-11-06 Thread Martin Husemann

On Wed, Nov 06, 2019 at 12:31:37PM +0100, Maxime Villard wrote:
>   __read_once(x)
>   __write_once(x, val)

The names are really not helpfull for understanding what is supposed to happen
here.

Martin

__{read,write}_once