Re: __{read,write}_once
> Date: Sun, 24 Nov 2019 19:25:52 + > From: Taylor R Campbell > > This thread is not converging on consensus, so we're discussing the > semantics and naming of these operations as core and will come back > with a decision by the end of the week. We (core) carefully read the thread, and discussed this and the related Linux READ_ONCE/WRITE_ONCE macros as well as the C11 atomic API. For maxv: Please add conditional definitions in according to what KCSAN needs, and use atomic_load/store_relaxed for counters and other integer objects in the rest of your patch. (I didn't see any pointer loads there.) For uvm's lossy counters, please use atomic_store_relaxed(p, 1 + atomic_load_relaxed(p)) and not an __add_once macro -- since these should really be per-CPU counters, we don't want to endorse this pattern by making it pretty. * Summary We added a few macros to for the purpose, atomic_load_(p) and atomic_store_(p,v). The orderings are relaxed, acquire, consume, and release, and are intended to match C11 semantics. See the new atomic_loadstore(9) man page for reference. Currently they are defined in terms of volatile loads and stores, but we should eventually use the C11 atomic API instead in order to provide the intended atomicity guarantees under all compilers without having to rely on the folklore interpretations of volatile. * Details There are four main properties involved in the operations under discussion: 1. No tearing. A 32-bit write can't be split into two separate 16-bit writes, for instance. * In _some_ cases, namely aligned pointers to sufficiently small objects, Linux READ_ONCE/WRITE_ONCE guarantee no tearing. * C11 atomic_load/store guarantees no tearing -- although on large objects it may involve locks, requiring the C11 type qualifier _Atomic and changing the ABI. This was the primary motivation for maxv's original question. 2. No fusing. Consecutive writes can't be combined into one, for instance, or a write followed by a read can't skip the read to return the value that was written. * Linux's READ_ONCE/WRITE_ONCE and C11's atomic_load/store guarantee no fusing. 3. Data-dependent memory ordering. If you read a pointer, and then dereference the pointer (maybe plus some offset), the reads happen in that order. * Linux's READ_ONCE guarantees this by issuing the analogue of membar_datadep_consumer on DEC Alpha, and nothing on other CPUs. * C11's atomic_load guarantees this with seq_cst, acquire, or consume memory ordering. 4. Cost. There's no need to incur cost of read/modify/write atomic operations, and for many purposes, no need to incur cost of memory-ordering barriers. To express these, we've decided to add a few macros that are similar to Linux's READ_ONCE/WRITE_ONCE and C11's atomic_load/store_explicit but are less error-prone and less cumbersome: #include - atomic_load_relaxed(p) is like *p, but guarantees no tearing and no fusing. No ordering relative to memory operations on other objects is guaranteed. - atomic_store_relaxed(p, v) is like *p = v, but guarantees no tearing and no fusing. No ordering relative to memory operations on other objects is guaranteed. - atomic_store_release(p, v) and atomic_load_acquire(p) are, respectively, like *p = v and *p, but guarantee no tearing and no fusing. They _also_ guarantee for logic like Thread AThread B stuff(); atomic_store_release(p, v); u = atomic_load_acquire(p); things(); that _if_ the atomic_load_acquire(p) in thread B witnesses the state of the object at p set by atomic_store_release(p, v) in thread A, then all memory operations in stuff() happen before any memory operations in things(). No guarantees if only one thread participates -- the store-release and load-acquire _must_ be paired. - atomic_load_consume(p) is like atomic_load_acquire(p), but it only guarantees ordering for data-dependent memory references. Like atomic_load_acquire, it must be paired with atomic_store_release. However, on most CPUs, it is as _cheap_ as atomic_load_relaxed. The atomic load/store operations are defined _only_ on objects as large as the architecture can support -- so, for example, on 32-bit platforms they cannot be used on 64-bit quantities; attempts to do so will lead to compile-time errors. They are also defined _only_ on aligned pointers -- using them on unaligned pointers may lead to run-time crashes, even on architectures without strict alignment requirements. * Why the names atomic_{load,store}_? - Atomic. Although `atomic' may suggest `expensive' to some people (and I'm guilty of making that connection in the past), what's really expensive is atomic _read/modify/write_ operations and _memory ordering guarantees_. Merely preventing tearing
Re: __{read,write}_once
> I feel like load_entire() and store_entire() get to the heart of the > matter while being easy to speak, but _fully() or _completely() seem > fine. _integral()?
Re: __{read,write}_once
> Date: Wed, 6 Nov 2019 12:31:37 +0100 > From: Maxime Villard > > There are cases in the kernel where we read/write global memory locklessly, > and accept the races either because it is part of the design (eg low-level > scheduling) or we simply don't care (eg global stats). > > In these cases, we want to access the memory only once, and need to ensure > the compiler does not split that access in several pieces, some of which > may be changed by concurrent accesses. There is a Linux article [1] about > this, and also [2]. I'd like to introduce the following macros: > > __read_once(x) > __write_once(x, val) This thread is not converging on consensus, so we're discussing the semantics and naming of these operations as core and will come back with a decision by the end of the week. Thanks, -Riastradh, on behalf of core
Re: __{read,write}_once
On 22.11.2019 02:42, Robert Elz wrote: > Date:Fri, 22 Nov 2019 01:04:56 +0100 > From:Kamil Rytarowski > Message-ID: <1a9d9b40-42fe-be08-d7b3-e6ecead5b...@gmx.com> > > > | I think that picking C11 terminology is the way forward. > > Use a name like that iff the intent is to also exactly match the > semantics implied, otherwise it will only create more confusion. > > kre > I think this matches. We want to make operation in 'single operation' (or looking like so) and 'atomic' is a settled name in English (there are options like: 'integral', 'intrinsic', 'elemental', 'essential', 'fundamental' or 'single operation'). If there are stronger feelings against it, I would go for '__write_singleop()', '__read_singleop()'. signature.asc Description: OpenPGP digital signature
Re: __{read,write}_once
Maxime Villard wrote: > How about _onestep()? Or _nosplit()? Or just conclude that there are no good options and we might as well call it _once() to be compatible with other OSes. -- Andreas Gustafsson, g...@gson.org
Re: __{read,write}_once
Le 21/11/2019 à 23:08, David Young a écrit : On Thu, Nov 21, 2019 at 07:19:51PM +0100, Maxime Villard wrote: Le 18/11/2019 à 19:49, David Holland a écrit : On Sun, Nov 17, 2019 at 02:35:43PM +, Mindaugas Rasiukevicius wrote: > David Holland wrote: > > > I see the potential source of confusion, but just think about: what > > > could "atomic" possibly mean for loads or stores? A load is just a > > > load, there is no RMW cycle, conceptually; inter-locking the operation > > > does not make sense. For anybody who attempts to reason about this, > > > it should not be too confusing, plus there are man pages. > > > > ...that it's not torn. > > > > As far as names... there are increasingly many slightly different > > types of atomic and semiatomic operations. > > > > I think it would be helpful if someone came up with a comprehensive > > naming scheme for all of them (including ones we don't currently have > > that we're moderately likely to end up with later...) > > Yes, the meaning of "atomic" has different flavours and describes different > set of properties in different fields (operating systems vs databases vs > distributed systems vs ...) and, as we can see, even within the fields. > > Perhaps not ideal, but "atomic loads/stores" and "relaxed" are already the > dominant terms. Yes, but "relaxed" means something else... let me be clearer since I wasn't before: I would expect e.g. atomic_inc_32_relaxed() to be distinguished from atomic_inc_32() or maybe atomic_inc_32_ordered() by whether or not multiple instances of it are globally ordered, not by whether or not it's actually atomic relative to other cpus. Checking the prior messages indicates we aren't currently talking about atomic_inc_32_relaxed() but only about atomic_load_32_relaxed() and atomic_store_32_relaxed(), which would be used together to generate a local counter. This is less misleading, but I'm still not convinced it's a good choice of names given that we might reasonably later on want to have atomic_inc_32_relaxed() and atomic_inc_32_ordered() that differ as above. > I think it is pointless to attempt to reinvent the wheel > here. It is terminology used by C11 (and C++11) and accepted by various > technical literature and, at this point, by academia (if you look at the > recent papers on memory models -- it's pretty much settled). These terms > are not too bad; it could be worse; and this bit is certainly not the worst > part of C11. So, I would just move on. Is it settled? My experience with the academic literature has been that everyone uses their own terminology and the same words are routinely found to have subtly different meanings from one paper to the next and occasionally even within the same paper. :-/ But I haven't been paying much attention lately due to being preoccupied with other things. So in the end which name do we use? Are people really unhappy with _racy()? At least it has a general meaning, and does not imply atomicity or ordering. _racy() doesn't really get at the intended meaning, and everything in C11 is racy unless you arrange otherwise by using mutexes, atomics, etc. The suffix has very little content. Names such as load_/store_fully() or load_/store_entire() or load_/store_completely() get to the actual semantics: at the program step implied by the load_/store_entire() expression, the memory constituting the named variable is loaded/stored in its entirety. In other words, the load cannot be drawn out over more than one local program step. Whether or not the load/store is performed in one step with respect to interrupts or other threads is not defined. Sounds like the right definition of the semantic. However: I feel like load_entire() and store_entire() get to the heart of the matter while being easy to speak, but _fully() or _completely() seem fine. I'm not a really big fan of _entire(), because it gives the impression that the load/store is not entire otherwise - which it is, but just possibly not in one step. How about _onestep()? Or _nosplit()?
Re: __{read,write}_once
On Fri, Nov 22, 2019 at 08:42:19AM +0700, Robert Elz wrote: > Date:Fri, 22 Nov 2019 01:04:56 +0100 > From:Kamil Rytarowski > Message-ID: <1a9d9b40-42fe-be08-d7b3-e6ecead5b...@gmx.com> > > > | I think that picking C11 terminology is the way forward. > > Use a name like that iff the intent is to also exactly match the > semantics implied, otherwise it will only create more confusion. And if the implied semantics isn't even totally clear, do not hide it behind a macro at all? Martin
re: __{read,write}_once
_always()?
Re: __{read,write}_once
Date:Fri, 22 Nov 2019 01:04:56 +0100 From:Kamil Rytarowski Message-ID: <1a9d9b40-42fe-be08-d7b3-e6ecead5b...@gmx.com> | I think that picking C11 terminology is the way forward. Use a name like that iff the intent is to also exactly match the semantics implied, otherwise it will only create more confusion. kre
Re: __{read,write}_once
On Thu, Nov 21, 2019 at 07:19:51PM +0100, Maxime Villard wrote: > Le 18/11/2019 à 19:49, David Holland a écrit : > >On Sun, Nov 17, 2019 at 02:35:43PM +, Mindaugas Rasiukevicius wrote: > > > David Holland wrote: > > > > > I see the potential source of confusion, but just think about: what > > > > > could "atomic" possibly mean for loads or stores? A load is just a > > > > > load, there is no RMW cycle, conceptually; inter-locking the > > operation > > > > > does not make sense. For anybody who attempts to reason about this, > > > > > it should not be too confusing, plus there are man pages. > > > > > > > > ...that it's not torn. > > > > > > > > As far as names... there are increasingly many slightly different > > > > types of atomic and semiatomic operations. > > > > > > > > I think it would be helpful if someone came up with a comprehensive > > > > naming scheme for all of them (including ones we don't currently have > > > > that we're moderately likely to end up with later...) > > > > > > Yes, the meaning of "atomic" has different flavours and describes > > different > > > set of properties in different fields (operating systems vs databases vs > > > distributed systems vs ...) and, as we can see, even within the fields. > > > > > > Perhaps not ideal, but "atomic loads/stores" and "relaxed" are already > > the > > > dominant terms. > > > >Yes, but "relaxed" means something else... let me be clearer since I > >wasn't before: I would expect e.g. atomic_inc_32_relaxed() to be > >distinguished from atomic_inc_32() or maybe atomic_inc_32_ordered() by > >whether or not multiple instances of it are globally ordered, not by > >whether or not it's actually atomic relative to other cpus. > > > >Checking the prior messages indicates we aren't currently talking > >about atomic_inc_32_relaxed() but only about atomic_load_32_relaxed() > >and atomic_store_32_relaxed(), which would be used together to > >generate a local counter. This is less misleading, but I'm still not > >convinced it's a good choice of names given that we might reasonably > >later on want to have atomic_inc_32_relaxed() and > >atomic_inc_32_ordered() that differ as above. > > > > > I think it is pointless to attempt to reinvent the wheel > > > here. It is terminology used by C11 (and C++11) and accepted by various > > > technical literature and, at this point, by academia (if you look at the > > > recent papers on memory models -- it's pretty much settled). These terms > > > are not too bad; it could be worse; and this bit is certainly not the > > worst > > > part of C11. So, I would just move on. > > > >Is it settled? My experience with the academic literature has been > >that everyone uses their own terminology and the same words are > >routinely found to have subtly different meanings from one paper to > >the next and occasionally even within the same paper. :-/ But I > >haven't been paying much attention lately due to being preoccupied > >with other things. > > So in the end which name do we use? Are people really unhappy with _racy()? > At least it has a general meaning, and does not imply atomicity or ordering. _racy() doesn't really get at the intended meaning, and everything in C11 is racy unless you arrange otherwise by using mutexes, atomics, etc. The suffix has very little content. Names such as load_/store_fully() or load_/store_entire() or load_/store_completely() get to the actual semantics: at the program step implied by the load_/store_entire() expression, the memory constituting the named variable is loaded/stored in its entirety. In other words, the load cannot be drawn out over more than one local program step. Whether or not the load/store is performed in one step with respect to interrupts or other threads is not defined. I feel like load_entire() and store_entire() get to the heart of the matter while being easy to speak, but _fully() or _completely() seem fine. Dave -- David Young dyo...@pobox.comUrbana, IL(217) 721-9981
Re: __{read,write}_once
On 22.11.2019 00:53, Robert Elz wrote: > Date:Thu, 21 Nov 2019 19:19:51 +0100 > From:Maxime Villard > Message-ID: > > | So in the end which name do we use? Are people really unhappy with > _racy()? > | At least it has a general meaning, and does not imply atomicity or > ordering. > > I dislike naming discussions, as in general, while there are often a > bunch of obviously incorrect names (here for example, read_page_data() ...) > there is very rarely one obviously right answer, and it all really just > becomes > a matter of choice for whoever is doing it. > > Nevertheless, perhaps something that says more what is actually happening, > rather than mentions what doesn't matter (races here), so perhaps something > like {read,write}_single_cycle() or {read,write}_1_bus_xact() or something > along those lines ? > > kre > I think that picking C11 terminology is the way forward. It's settled probably forever now and it will be simpler to find corresponding documentation and specification of it. signature.asc Description: OpenPGP digital signature
Re: __{read,write}_once
Date:Thu, 21 Nov 2019 19:19:51 +0100 From:Maxime Villard Message-ID: | So in the end which name do we use? Are people really unhappy with _racy()? | At least it has a general meaning, and does not imply atomicity or ordering. I dislike naming discussions, as in general, while there are often a bunch of obviously incorrect names (here for example, read_page_data() ...) there is very rarely one obviously right answer, and it all really just becomes a matter of choice for whoever is doing it. Nevertheless, perhaps something that says more what is actually happening, rather than mentions what doesn't matter (races here), so perhaps something like {read,write}_single_cycle() or {read,write}_1_bus_xact() or something along those lines ? kre
Re: __{read,write}_once
Le 18/11/2019 à 19:49, David Holland a écrit : On Sun, Nov 17, 2019 at 02:35:43PM +, Mindaugas Rasiukevicius wrote: > David Holland wrote: > > > I see the potential source of confusion, but just think about: what > > > could "atomic" possibly mean for loads or stores? A load is just a > > > load, there is no RMW cycle, conceptually; inter-locking the operation > > > does not make sense. For anybody who attempts to reason about this, > > > it should not be too confusing, plus there are man pages. > > > > ...that it's not torn. > > > > As far as names... there are increasingly many slightly different > > types of atomic and semiatomic operations. > > > > I think it would be helpful if someone came up with a comprehensive > > naming scheme for all of them (including ones we don't currently have > > that we're moderately likely to end up with later...) > > Yes, the meaning of "atomic" has different flavours and describes different > set of properties in different fields (operating systems vs databases vs > distributed systems vs ...) and, as we can see, even within the fields. > > Perhaps not ideal, but "atomic loads/stores" and "relaxed" are already the > dominant terms. Yes, but "relaxed" means something else... let me be clearer since I wasn't before: I would expect e.g. atomic_inc_32_relaxed() to be distinguished from atomic_inc_32() or maybe atomic_inc_32_ordered() by whether or not multiple instances of it are globally ordered, not by whether or not it's actually atomic relative to other cpus. Checking the prior messages indicates we aren't currently talking about atomic_inc_32_relaxed() but only about atomic_load_32_relaxed() and atomic_store_32_relaxed(), which would be used together to generate a local counter. This is less misleading, but I'm still not convinced it's a good choice of names given that we might reasonably later on want to have atomic_inc_32_relaxed() and atomic_inc_32_ordered() that differ as above. > I think it is pointless to attempt to reinvent the wheel > here. It is terminology used by C11 (and C++11) and accepted by various > technical literature and, at this point, by academia (if you look at the > recent papers on memory models -- it's pretty much settled). These terms > are not too bad; it could be worse; and this bit is certainly not the worst > part of C11. So, I would just move on. Is it settled? My experience with the academic literature has been that everyone uses their own terminology and the same words are routinely found to have subtly different meanings from one paper to the next and occasionally even within the same paper. :-/ But I haven't been paying much attention lately due to being preoccupied with other things. So in the end which name do we use? Are people really unhappy with _racy()? At least it has a general meaning, and does not imply atomicity or ordering.
Re: __{read,write}_once
On Sun, Nov 17, 2019 at 02:35:43PM +, Mindaugas Rasiukevicius wrote: > David Holland wrote: > > > I see the potential source of confusion, but just think about: what > > > could "atomic" possibly mean for loads or stores? A load is just a > > > load, there is no RMW cycle, conceptually; inter-locking the operation > > > does not make sense. For anybody who attempts to reason about this, > > > it should not be too confusing, plus there are man pages. > > > > ...that it's not torn. > > > > As far as names... there are increasingly many slightly different > > types of atomic and semiatomic operations. > > > > I think it would be helpful if someone came up with a comprehensive > > naming scheme for all of them (including ones we don't currently have > > that we're moderately likely to end up with later...) > > Yes, the meaning of "atomic" has different flavours and describes different > set of properties in different fields (operating systems vs databases vs > distributed systems vs ...) and, as we can see, even within the fields. > > Perhaps not ideal, but "atomic loads/stores" and "relaxed" are already the > dominant terms. Yes, but "relaxed" means something else... let me be clearer since I wasn't before: I would expect e.g. atomic_inc_32_relaxed() to be distinguished from atomic_inc_32() or maybe atomic_inc_32_ordered() by whether or not multiple instances of it are globally ordered, not by whether or not it's actually atomic relative to other cpus. Checking the prior messages indicates we aren't currently talking about atomic_inc_32_relaxed() but only about atomic_load_32_relaxed() and atomic_store_32_relaxed(), which would be used together to generate a local counter. This is less misleading, but I'm still not convinced it's a good choice of names given that we might reasonably later on want to have atomic_inc_32_relaxed() and atomic_inc_32_ordered() that differ as above. > I think it is pointless to attempt to reinvent the wheel > here. It is terminology used by C11 (and C++11) and accepted by various > technical literature and, at this point, by academia (if you look at the > recent papers on memory models -- it's pretty much settled). These terms > are not too bad; it could be worse; and this bit is certainly not the worst > part of C11. So, I would just move on. Is it settled? My experience with the academic literature has been that everyone uses their own terminology and the same words are routinely found to have subtly different meanings from one paper to the next and occasionally even within the same paper. :-/ But I haven't been paying much attention lately due to being preoccupied with other things. -- David A. Holland dholl...@netbsd.org
Re: __{read,write}_once
David Holland wrote: > > I see the potential source of confusion, but just think about: what > > could "atomic" possibly mean for loads or stores? A load is just a > > load, there is no RMW cycle, conceptually; inter-locking the operation > > does not make sense. For anybody who attempts to reason about this, > > it should not be too confusing, plus there are man pages. > > ...that it's not torn. > > As far as names... there are increasingly many slightly different > types of atomic and semiatomic operations. > > I think it would be helpful if someone came up with a comprehensive > naming scheme for all of them (including ones we don't currently have > that we're moderately likely to end up with later...) Yes, the meaning of "atomic" has different flavours and describes different set of properties in different fields (operating systems vs databases vs distributed systems vs ...) and, as we can see, even within the fields. Perhaps not ideal, but "atomic loads/stores" and "relaxed" are already the dominant terms. I think it is pointless to attempt to reinvent the wheel here. It is terminology used by C11 (and C++11) and accepted by various technical literature and, at this point, by academia (if you look at the recent papers on memory models -- it's pretty much settled). These terms are not too bad; it could be worse; and this bit is certainly not the worst part of C11. So, I would just move on. -- Mindaugas
Re: __{read,write}_once
On Sat, Nov 16, 2019 at 03:56:35PM +, Mindaugas Rasiukevicius wrote: > > > I suggest __atomic_load_relaxed()/__atomic_store_relaxed(). > > > > What I don't like with "atomic" in the name is that the instructions > > generated are not atomic strictly speaking, and I'd rather avoid the > > confusion with the really atomic instructions. > > I see the potential source of confusion, but just think about: what could > "atomic" possibly mean for loads or stores? A load is just a load, there > is no RMW cycle, conceptually; inter-locking the operation does not make > sense. For anybody who attempts to reason about this, it should not be > too confusing, plus there are man pages. ...that it's not torn. As far as names... there are increasingly many slightly different types of atomic and semiatomic operations. I think it would be helpful if someone came up with a comprehensive naming scheme for all of them (including ones we don't currently have that we're moderately likely to end up with later...) Ordinarily I would go do this but I am completely swamped :-( For the moment, I don't especially like *_relaxed as "relaxed" is normally an ordering property, and while ordering and atomicity are different things, they're not really separable in the naming scheme. I might suggest "localatomic". Or maybe just "untorn", but that looks weird. however _relaxed is far better than *_once which is really confusing :( -- David A. Holland dholl...@netbsd.org
Re: __{read,write}_once
> I see the potential source of confusion, but just think about: what > could "atomic" possibly mean for loads or stores? The same thing it means for other operations: that all other operations act as if the load, or store, happened either entirely before or entirely after every other event. /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Re: __{read,write}_once
Le 16/11/2019 à 15:31, Mindaugas Rasiukevicius a écrit : Maxime Villard wrote: Alright so let's add the macros with volatile (my initial patch). Which name do we use? I actually like __{read,write}_racy(). I suggest __atomic_load_relaxed()/__atomic_store_relaxed(). What I don't like with "atomic" in the name is that the instructions generated are not atomic strictly speaking, and I'd rather avoid the confusion with the really atomic instructions. But that's not a strong opinion, and I'm fine if we go with it anyway. If these are to be provided as macros, then I also suggest to ensure that they provide compiler-level barrier. You mean __insn_barrier(), right? Surrounding the access (one before, one after) also, right?
Re: __{read,write}_once
Maxime Villard wrote: > > > > I suggest __atomic_load_relaxed()/__atomic_store_relaxed(). > > What I don't like with "atomic" in the name is that the instructions > generated are not atomic strictly speaking, and I'd rather avoid the > confusion with the really atomic instructions. I see the potential source of confusion, but just think about: what could "atomic" possibly mean for loads or stores? A load is just a load, there is no RMW cycle, conceptually; inter-locking the operation does not make sense. For anybody who attempts to reason about this, it should not be too confusing, plus there are man pages. > > > If these are to be provided as macros, then I also suggest to ensure > > that they provide compiler-level barrier. > > You mean __insn_barrier(), right? Surrounding the access (one before, one > after) also, right? Correct. Alternatively, they can be provided as real functions. In C, function calls generally act as compiler-level barriers (as long as they are guaranteed to not be optimised out). -- Mindaugas
Re: __{read,write}_once
On 16.11.2019 15:31, Mindaugas Rasiukevicius wrote: > Maxime Villard wrote: >> Alright so let's add the macros with volatile (my initial patch). Which >> name do we use? I actually like __{read,write}_racy(). > > I suggest __atomic_load_relaxed()/__atomic_store_relaxed(). If these > are to be provided as macros, then I also suggest to ensure that they > provide compiler-level barrier. > I'm in favor of this naming. signature.asc Description: OpenPGP digital signature
Re: __{read,write}_once
Maxime Villard wrote: > Alright so let's add the macros with volatile (my initial patch). Which > name do we use? I actually like __{read,write}_racy(). I suggest __atomic_load_relaxed()/__atomic_store_relaxed(). If these are to be provided as macros, then I also suggest to ensure that they provide compiler-level barrier. -- Mindaugas
Re: __{read,write}_once
On 11/15, Maxime Villard wrote: > Alright so let's add the macros with volatile (my initial patch). Which name > do we > use? I actually like __{read,write}_racy(). +1. I'm a fan of avoiding the "once" terminology just because the first thing that comes to my mind is the threading concept of a function that does something only one time regardless of how many times it is called and regardless of from which thread it is called. Lewis
Re: __{read,write}_once
Alright so let's add the macros with volatile (my initial patch). Which name do we use? I actually like __{read,write}_racy().
Re: __{read,write}_once
Date:Mon, 11 Nov 2019 21:32:06 +0100 From:Joerg Sonnenberger Message-ID: <2019203206.ga4...@bec.de> | The update needs to be uninterruptable on the local CPU in the sense that | context switches and interrupts don't tear the R and W part of the RMW cycle | apart. If I understood the original message, even that was not required. All that was really demanded was that each of the R and W oarts not be split into pieces where an intervening interrupt might alter the "other" part, allowing a read to receive, or a write to produce, a value that neither the basline code, nor the interrupt routine even intended. But in x += 2; if an interrupt intervenes and does x += 1; it is acceptably for the evential reault to be old_x + 1 (which is very unlikely), old_x + 2, or old_x + 3, but only those. That's how I understood the "racy is OK" part of it. Ie: we don't care about race conditions like this. What we don't want is if x started out at and is read in 2 halves, for the base code to read the half, then the interrupt to change the value to 0001 and then the base code to read for the bottom half, leading to a reasult of 1 in x, rather than 0001 or 00010001 which is not acceptable (or anything else like that, and similarly for writes). Or in other words, it is acceptable for some uses for the occasuonal change to be lost (particuarly for some event counters) if it is not likely to happen frequently, but not for entirely bogus values to appear. kre
Re: __{read,write}_once
Maxime Villard wrote: > > Alright, thanks. > > Do you think there is a performance degradation with using explicitly > atomic operations (with the "lock" prefix on x86), compared to just using > an aligned volatile which may not be exactly atomic in that sense (even > if it happens to be on x86)? Yes, interlocked atomic operations have considerable performance overhead and should be treated as relatively expensive. Whether it is significant, of course, depends where and how they are used in the code. You can find various measurements of the atomic operations on the Internet, but they can differ quite a lot amongst various CPU models. Also, just to be clear here: the atomic loads/stores (both the C11 routines and READ_ONCE/WRITE_ONCE) are *not* interlocked operations. > > Typically in sys/uvm/uvm_fault.c there are several lockless stat > increments like the following: > > /* Increment the counters.*/ > uvmexp.fltanget++; > > In your (not just rmind@, but tech-kern@) opinion, is it better to switch > to: atomic_inc_uint(&uvmexp.fltanget); > or to > __add_once(uvmexp.fltanget, 1); > as I did in my patch? Considering that the latter may possibly not be > atomic on certain arches, which could possibly result in garbage when the > value is read. In this case it is assumed that inaccurate statistics is better than the expense of atomic operations. That is the reason why they are non-atomic increments, although to prevent from garbage (not only the losses of adds), they should use atomic loads/stores (or just volatile for now) on a word. Generally, the right solution to this problem is to convert these counters to be per-CPU (or per-thread in userspace applications) and aggregate them when needed (e.g. on fetching the stats). See percpu(9) API. This way you get both the performance and accuracy/correctness. > > I've committed it right away, because this one seemed clear enough, feel > free to comment or improve if you have a better idea: > > https://mail-index.netbsd.org/source-changes/2019/11/11/msg110726.html > It would be better to find a solution which doesn't remove the optimisation for the fast path. At least on 64-bit architectures, a relax atomic load (as 'volatile' for now) on xc->xc_donep could stay. -- Mindaugas
Re: __{read,write}_once
On Mon, Nov 11, 2019 at 01:15:16PM -0500, Mouse wrote: > > Uninterruptible means exactly that, there is a clear before and after > > state and no interrupts can happen in between. > > Is uninterruptible all you care about? Or does it also need to be > atomic with respect to other CPUs? Eventually, of course, you'll want > all those counter values on a single CPU - does that happen often > enough to matter? > > Also, does it actually need to be uninterruptible, or does it just need > to be interrupt-tolerant? I'm not as clear as I'd like on what the > original desire here was, so I'm not sure but that we might be trying > to solve a harder problem than necessary. The update needs to be uninterruptable on the local CPU in the sense that context switches and interrupts don't tear the R and W part of the RMW cycle apart. x86 ALU instructions with memory operand as destination fit the bill fine. It doesn't have to be atomic, no other CPU is supposed to write to that cache line. It also generally doesn't matter whether they see the old OR the new value, as long as it is either. We have full control over alignment. Joerg
Re: __{read,write}_once
> Uninterruptible means exactly that, there is a clear before and after > state and no interrupts can happen in between. Is uninterruptible all you care about? Or does it also need to be atomic with respect to other CPUs? Eventually, of course, you'll want all those counter values on a single CPU - does that happen often enough to matter? Also, does it actually need to be uninterruptible, or does it just need to be interrupt-tolerant? I'm not as clear as I'd like on what the original desire here was, so I'm not sure but that we might be trying to solve a harder problem than necessary. /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Re: __{read,write}_once
On Mon, Nov 11, 2019 at 11:51:26AM -0500, Mouse wrote: > >>> (2) Use uninterruptible memory operations in per CPU memory, > >>> aggregate passively on demand. > > Problem is that (2) is natively only present on CISC platforms in > > general. Most RISC platforms can't do RMW in one instruction. > > (2) says "uninterruptible", not "one instruction", though I'm not sure > how large the difference is in practice. (Also, some CISC platforms > provide atomic memory RMW operations only under annoying restrictions; > for example, I think the VAX has only three sorts of RMW memory > accesses that are atomic with respect to other processors: ADAWI > (16-bit-aligned 16-bit add), BB{SS,CC}I (test-and-{set/clear} single > bits), and {INS,REM}Q{H,T}I (queue insert/remove at head/tail).) The point here is that we really don't want to have bus locked instructions for per-CPU counters. It would defeat the point of using per-CPU counters in first place to a large degree. Uninterruptible means exactly that, there is a clear before and after state and no interrupts can happen in between. I don't know about ARM and friends for how expensive masking interrupts is. It is quite expensive on x86. RMW instructions are the other simple option for implementing them. (3) would be the high effort version, doing RAS for kernel code as well. Joerg
Re: __{read,write}_once
>>> (2) Use uninterruptible memory operations in per CPU memory, >>> aggregate passively on demand. > Problem is that (2) is natively only present on CISC platforms in > general. Most RISC platforms can't do RMW in one instruction. (2) says "uninterruptible", not "one instruction", though I'm not sure how large the difference is in practice. (Also, some CISC platforms provide atomic memory RMW operations only under annoying restrictions; for example, I think the VAX has only three sorts of RMW memory accesses that are atomic with respect to other processors: ADAWI (16-bit-aligned 16-bit add), BB{SS,CC}I (test-and-{set/clear} single bits), and {INS,REM}Q{H,T}I (queue insert/remove at head/tail).) /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Re: __{read,write}_once
On Mon, Nov 11, 2019 at 02:39:26PM +0100, Maxime Villard wrote: > Le 11/11/2019 à 13:51, Joerg Sonnenberger a écrit : > > On Mon, Nov 11, 2019 at 11:02:47AM +0100, Maxime Villard wrote: > > > Typically in sys/uvm/uvm_fault.c there are several lockless stat > > > increments > > > like the following: > > > > > > /* Increment the counters.*/ > > > uvmexp.fltanget++; > > > > Wasn't the general consensus here to ideally have per-cpu counters here > > that are aggregated passively? > > In this specific case it's not going to work, because uvmexp is supposed to > be read from a core dump, and the stats are expected to be found in the > symbol... That's just a reason to add code for doing the aggregation, not a reason to avoid it. > > I can think of three different options depending the platform: > > > > (1) Use full atomic operations. Fine for true UP platforms and when the > > overhead of per-CPU precise accounting is too high. > > > > (2) Use uninterruptible memory operations in per CPU memory, aggregate > > passively on demand. > > > > (3) Emulate uninterruptible memory operations with kernel RAS, aggregate > > passively on demand. > > Generally I would do (2), but in this specific case I suggest (1). > atomic_inc_uint is supposed to be implemented on each platform already, so > it's easy to do. > > (3) is a big headache for a very small reason, and it's not going to > prevent inter-CPU races. Problem is that (2) is natively only present on CISC platforms in general. Most RISC platforms can't do RMW in one instruction. Inter-CPU races don't matter as long as only the counter of the current CPU is modified. The main difference between (1) and (2) is the bus lock. (1) and (3) is primarily a question on whether the atomic operation can be inlined, if it can't, (3) would still be much nicer. Joerg
Re: __{read,write}_once
Le 11/11/2019 à 13:51, Joerg Sonnenberger a écrit : On Mon, Nov 11, 2019 at 11:02:47AM +0100, Maxime Villard wrote: Typically in sys/uvm/uvm_fault.c there are several lockless stat increments like the following: /* Increment the counters.*/ uvmexp.fltanget++; Wasn't the general consensus here to ideally have per-cpu counters here that are aggregated passively? In this specific case it's not going to work, because uvmexp is supposed to be read from a core dump, and the stats are expected to be found in the symbol... I can think of three different options depending the platform: (1) Use full atomic operations. Fine for true UP platforms and when the overhead of per-CPU precise accounting is too high. (2) Use uninterruptible memory operations in per CPU memory, aggregate passively on demand. (3) Emulate uninterruptible memory operations with kernel RAS, aggregate passively on demand. Generally I would do (2), but in this specific case I suggest (1). atomic_inc_uint is supposed to be implemented on each platform already, so it's easy to do. (3) is a big headache for a very small reason, and it's not going to prevent inter-CPU races.
Re: __{read,write}_once
On Mon, Nov 11, 2019 at 11:02:47AM +0100, Maxime Villard wrote: > Typically in sys/uvm/uvm_fault.c there are several lockless stat increments > like the following: > > /* Increment the counters.*/ > uvmexp.fltanget++; Wasn't the general consensus here to ideally have per-cpu counters here that are aggregated passively? I can think of three different options depending the platform: (1) Use full atomic operations. Fine for true UP platforms and when the overhead of per-CPU precise accounting is too high. (2) Use uninterruptible memory operations in per CPU memory, aggregate passively on demand. (3) Emulate uninterruptible memory operations with kernel RAS, aggregate passively on demand. Essentially, the only race condition we care about for the statistic counters is via interrupts or scheduling. We can implement the equivalent of x86's add with memory operand as destination using RAS, so the only overhead would be the function call in that case. Joerg
Re: __{read,write}_once
Le 08/11/2019 à 13:40, Mindaugas Rasiukevicius a écrit : Maxime Villard wrote: They are "atomic" in a sense that they prevent from tearing, fusing and invented loads/stores. Terms and conditions apply (e.g. they assume properly aligned and word-sized accesses). Additionally, READ_ONCE() provides a data-dependency barrier, applicable only to DEC Alpha. I think it was the right decision on the Linux side (even though trying to get all the data-dependency barriers right these days is kind of a lost cause, IMO). So, READ_ONCE()/WRITE_ONCE() is more or less equivalent to the C11 atomic load/stores routines with memory_order_relaxed (or memory_order_consume, if you care about DEC Alpha). But... Didn't Marco just say that 'volatile' accesses do not actually prevent tearing/fusing/invented loads/stores? READ_ONCE/WRITE_ONCE only do volatile accesses. Let me try to clarify: - The main purpose of READ_ONCE()/WRITE_ONCE() is to provide a way to perform atomic loads/stores (in a sense of preventing from the said behaviours), even though they help to get the memory ordering right too. Currently, 'volatile' is a key instrument in achieving that. However, as stated before, terms and conditions apply: 'volatile' just in itself does not provide the guarantee; the loads/stores also have to be properly aligned and word-sized (these are the pre-C11 assumptions we always had). Note: C11 introduces atomic _types_, so that the compiler could leverage the type system and thus provide the necessary guarantees. - Having re-read Marco's emails in this thread, I think we are very much in agreement. I think he merely points out that 'volatile' in itself is not sufficient; it does not mean it's not necessary. - There is quite a bit of confusion regarding 'volatile' amongst the developers. This is partly because 'volatile' is arguably underspecified in the C standard. AFAIK, some people in the C standardization committee have a view that it provides weaker guarantees; however, others maintain that the intent has always been clear and the wording is sufficient. Without going into the details (somewhat philosophical anyway), at least for now, 'volatile' is a de facto ingredient (one of a few, but absolutely necessary) in achieving atomic loads/stores. Sorry if this is a bit repetitive, but I hope it gets the point across. Alright, thanks. Do you think there is a performance degradation with using explicitly atomic operations (with the "lock" prefix on x86), compared to just using an aligned volatile which may not be exactly atomic in that sense (even if it happens to be on x86)? Typically in sys/uvm/uvm_fault.c there are several lockless stat increments like the following: /* Increment the counters.*/ uvmexp.fltanget++; In your (not just rmind@, but tech-kern@) opinion, is it better to switch to: atomic_inc_uint(&uvmexp.fltanget); or to __add_once(uvmexp.fltanget, 1); as I did in my patch? Considering that the latter may possibly not be atomic on certain arches, which could possibly result in garbage when the value is read. To fix that, do you agree that I should - Remove the first branch (because no lockless fastpath possible) - Move down the second branch (KASSERT) right after the mutex_enter ? Feel free to write a patch and I'll have a look at it once I have a little bit more free time. I've committed it right away, because this one seemed clear enough, feel free to comment or improve if you have a better idea: https://mail-index.netbsd.org/source-changes/2019/11/11/msg110726.html Maxime
Re: __{read,write}_once
On 08.11.2019 13:40, Mindaugas Rasiukevicius wrote: >>> There is more code in the NetBSD kernel which needs fixing. I would say >>> pretty much all lock-free code should be audited. >> I believe KCSAN can greatly help with that, since it automatically reports >> concurrent accesses. Up to us then to switch to atomic, or other kinds of >> markers like READ_ONCE. > Is there a CSAN i.e. such sanitizer for userspace applications? No.. there is TSan. If it would be useful to get CSan in userspace it probably shouldn't be too difficult to write it. At least full TSan is heavy and requires 64bit CPU. signature.asc Description: OpenPGP digital signature
Re: __{read,write}_once
Maxime Villard wrote: > > They are "atomic" in a sense that they prevent from tearing, fusing and > > invented loads/stores. Terms and conditions apply (e.g. they assume > > properly aligned and word-sized accesses). Additionally, READ_ONCE() > > provides a data-dependency barrier, applicable only to DEC Alpha. I > > think it was the right decision on the Linux side (even though trying > > to get all the data-dependency barriers right these days is kind of a > > lost cause, IMO). > > > > So, READ_ONCE()/WRITE_ONCE() is more or less equivalent to the C11 > > atomic load/stores routines with memory_order_relaxed (or > > memory_order_consume, if you care about DEC Alpha). > > But... Didn't Marco just say that 'volatile' accesses do not actually > prevent tearing/fusing/invented loads/stores? READ_ONCE/WRITE_ONCE only > do volatile accesses. Let me try to clarify: - The main purpose of READ_ONCE()/WRITE_ONCE() is to provide a way to perform atomic loads/stores (in a sense of preventing from the said behaviours), even though they help to get the memory ordering right too. Currently, 'volatile' is a key instrument in achieving that. However, as stated before, terms and conditions apply: 'volatile' just in itself does not provide the guarantee; the loads/stores also have to be properly aligned and word-sized (these are the pre-C11 assumptions we always had). Note: C11 introduces atomic _types_, so that the compiler could leverage the type system and thus provide the necessary guarantees. - Having re-read Marco's emails in this thread, I think we are very much in agreement. I think he merely points out that 'volatile' in itself is not sufficient; it does not mean it's not necessary. - There is quite a bit of confusion regarding 'volatile' amongst the developers. This is partly because 'volatile' is arguably underspecified in the C standard. AFAIK, some people in the C standardization committee have a view that it provides weaker guarantees; however, others maintain that the intent has always been clear and the wording is sufficient. Without going into the details (somewhat philosophical anyway), at least for now, 'volatile' is a de facto ingredient (one of a few, but absolutely necessary) in achieving atomic loads/stores. Sorry if this is a bit repetitive, but I hope it gets the point across. > > To fix that, do you agree that I should > - Remove the first branch (because no lockless fastpath possible) > - Move down the second branch (KASSERT) right after the mutex_enter > ? Feel free to write a patch and I'll have a look at it once I have a little bit more free time. > > > There is more code in the NetBSD kernel which needs fixing. I would say > > pretty much all lock-free code should be audited. > > I believe KCSAN can greatly help with that, since it automatically reports > concurrent accesses. Up to us then to switch to atomic, or other kinds of > markers like READ_ONCE. Is there a CSAN i.e. such sanitizer for userspace applications? Thanks. -- Mindaugas
Re: __{read,write}_once
On Wed, Nov 06, 2019 at 10:14:48PM +0100, Kamil Rytarowski wrote: > I have no opinion here, just please do the right thing. Unless there are > any shortcomings it would be nice to follow closely C11. There are plenty of shortcomings in C11. We should concentrate on abstractions that are correct first. (also, volatile is ~useless for this) -- David A. Holland dholl...@netbsd.org
Re: __{read,write}_once
Le 06/11/2019 à 23:41, Mindaugas Rasiukevicius a écrit : Maxime Villard wrote: - If we do not want to stick with the C11 API (its emulation), then I would suggest to use the similar terminology, e.g. atomic_load_relaxed() and atomic_store_relaxed(), or Linux READ_ONCE()/WRITE_ONCE(). There is no reason invent new terminology; it might just add more confusion. But... Linux's READ_ONCE/WRITE_ONCE are not actually atomic, is that correct? So there is a significant semantic difference. They are "atomic" in a sense that they prevent from tearing, fusing and invented loads/stores. Terms and conditions apply (e.g. they assume properly aligned and word-sized accesses). Additionally, READ_ONCE() provides a data-dependency barrier, applicable only to DEC Alpha. I think it was the right decision on the Linux side (even though trying to get all the data-dependency barriers right these days is kind of a lost cause, IMO). So, READ_ONCE()/WRITE_ONCE() is more or less equivalent to the C11 atomic load/stores routines with memory_order_relaxed (or memory_order_consume, if you care about DEC Alpha). But... Didn't Marco just say that 'volatile' accesses do not actually prevent tearing/fusing/invented loads/stores? READ_ONCE/WRITE_ONCE only do volatile accesses. Also, in my original patch I marked two branches of subr_xcall.c, but it seems to me now they are actually races: ie xc_wait(), the read of 'xc_donep' could be made by two 4-byte fetches on 32bit arches (at least). Between the two, an xc thread could have updated the value. So there is an actual race which could possibly result in returning early while we shouldn't have. Does that look correct to you? Correct. Alright, so we have the first bug found by KCSAN :) To fix that, do you agree that I should - Remove the first branch (because no lockless fastpath possible) - Move down the second branch (KASSERT) right after the mutex_enter ? There is more code in the NetBSD kernel which needs fixing. I would say pretty much all lock-free code should be audited. I believe KCSAN can greatly help with that, since it automatically reports concurrent accesses. Up to us then to switch to atomic, or other kinds of markers like READ_ONCE.
Re: __{read,write}_once
Maxime Villard wrote: > > - If we do not want to stick with the C11 API (its emulation), then I > > would suggest to use the similar terminology, e.g. atomic_load_relaxed() > > and atomic_store_relaxed(), or Linux READ_ONCE()/WRITE_ONCE(). There is > > no reason invent new terminology; it might just add more confusion. > > But... Linux's READ_ONCE/WRITE_ONCE are not actually atomic, is that > correct? So there is a significant semantic difference. They are "atomic" in a sense that they prevent from tearing, fusing and invented loads/stores. Terms and conditions apply (e.g. they assume properly aligned and word-sized accesses). Additionally, READ_ONCE() provides a data-dependency barrier, applicable only to DEC Alpha. I think it was the right decision on the Linux side (even though trying to get all the data-dependency barriers right these days is kind of a lost cause, IMO). So, READ_ONCE()/WRITE_ONCE() is more or less equivalent to the C11 atomic load/stores routines with memory_order_relaxed (or memory_order_consume, if you care about DEC Alpha). > Also, in my original patch I marked two branches of subr_xcall.c, but it > seems to me now they are actually races: ie xc_wait(), the read of > 'xc_donep' could be made by two 4-byte fetches on 32bit arches (at > least). Between the two, an xc thread could have updated the value. So > there is an actual race which could possibly result in returning early > while we shouldn't have. Does that look correct to you? Correct. There is more code in the NetBSD kernel which needs fixing. I would say pretty much all lock-free code should be audited. Back in time, certain liberties compilers have were not fully understood or were not considered as real problems (after all, C memory model concerning many of these aspects was not defined). Times changed, compilers became much more aggressive and the old code needs to catch up. -- Mindaugas
Re: __{read,write}_once
On 06.11.2019 20:38, Mindaugas Rasiukevicius wrote: > Maxime Villard wrote: >> There are cases in the kernel where we read/write global memory >> locklessly, and accept the races either because it is part of the design >> (eg low-level scheduling) or we simply don't care (eg global stats). >> >> In these cases, we want to access the memory only once, and need to ensure >> the compiler does not split that access in several pieces, some of which >> may be changed by concurrent accesses. There is a Linux article [1] about >> this, and also [2]. I'd like to introduce the following macros: >> >> <...> > > A few comments for everybody here: > > - There is a general need in the NetBSD kernel for atomic load/store > semantics. This is because plain accesses (loads/stores) are subject > to _tearing_, _fusing_ and _invented_ loads/stores. I created a new > thread which might help to clarify these and various other aspects: > > http://mail-index.netbsd.org/tech-kern/2019/11/06/msg025664.html > Thank you. > - In C11, this can be handled with atomic_{load,store}_explicit() and > memory_order_relaxed (or stronger memory order). > > - If we do not want to stick with the C11 API (its emulation), then I > would suggest to use the similar terminology, e.g. atomic_load_relaxed() > and atomic_store_relaxed(), or Linux READ_ONCE()/WRITE_ONCE(). There is > no reason invent new terminology; it might just add more confusion. > > Thanks. > I have no opinion here, just please do the right thing. Unless there are any shortcomings it would be nice to follow closely C11. If that information helps or not, we also need C11/C++14 libatomic-like library in userland. signature.asc Description: OpenPGP digital signature
re: __{read,write}_once
> - If we do not want to stick with the C11 API (its emulation), then I > would suggest to use the similar terminology, e.g. atomic_load_relaxed() > and atomic_store_relaxed(), or Linux READ_ONCE()/WRITE_ONCE(). There is > no reason invent new terminology; it might just add more confusion. i really do not like the "once" name. even though i'm familiar with the actual desired semantics, i have to ignore my builtin meaning that comes first. to me, "once" initially reads as compiler will only do it one time, ever, not every time it is called. like pthread "once". can we make them __read_always() and __write_always()? .mrg.
Re: __{read,write}_once
Le 06/11/2019 à 20:38, Mindaugas Rasiukevicius a écrit : Maxime Villard wrote: There are cases in the kernel where we read/write global memory locklessly, and accept the races either because it is part of the design (eg low-level scheduling) or we simply don't care (eg global stats). In these cases, we want to access the memory only once, and need to ensure the compiler does not split that access in several pieces, some of which may be changed by concurrent accesses. There is a Linux article [1] about this, and also [2]. I'd like to introduce the following macros: <...> A few comments for everybody here: - There is a general need in the NetBSD kernel for atomic load/store semantics. This is because plain accesses (loads/stores) are subject to _tearing_, _fusing_ and _invented_ loads/stores. I created a new thread which might help to clarify these and various other aspects: http://mail-index.netbsd.org/tech-kern/2019/11/06/msg025664.html Thanks - In C11, this can be handled with atomic_{load,store}_explicit() and memory_order_relaxed (or stronger memory order). - If we do not want to stick with the C11 API (its emulation), then I would suggest to use the similar terminology, e.g. atomic_load_relaxed() and atomic_store_relaxed(), or Linux READ_ONCE()/WRITE_ONCE(). There is no reason invent new terminology; it might just add more confusion. But... Linux's READ_ONCE/WRITE_ONCE are not actually atomic, is that correct? So there is a significant semantic difference. Also, in my original patch I marked two branches of subr_xcall.c, but it seems to me now they are actually races: ie xc_wait(), the read of 'xc_donep' could be made by two 4-byte fetches on 32bit arches (at least). Between the two, an xc thread could have updated the value. So there is an actual race which could possibly result in returning early while we shouldn't have. Does that look correct to you? Thanks
Re: __{read,write}_once
Maxime Villard wrote: > There are cases in the kernel where we read/write global memory > locklessly, and accept the races either because it is part of the design > (eg low-level scheduling) or we simply don't care (eg global stats). > > In these cases, we want to access the memory only once, and need to ensure > the compiler does not split that access in several pieces, some of which > may be changed by concurrent accesses. There is a Linux article [1] about > this, and also [2]. I'd like to introduce the following macros: > > <...> A few comments for everybody here: - There is a general need in the NetBSD kernel for atomic load/store semantics. This is because plain accesses (loads/stores) are subject to _tearing_, _fusing_ and _invented_ loads/stores. I created a new thread which might help to clarify these and various other aspects: http://mail-index.netbsd.org/tech-kern/2019/11/06/msg025664.html - In C11, this can be handled with atomic_{load,store}_explicit() and memory_order_relaxed (or stronger memory order). - If we do not want to stick with the C11 API (its emulation), then I would suggest to use the similar terminology, e.g. atomic_load_relaxed() and atomic_store_relaxed(), or Linux READ_ONCE()/WRITE_ONCE(). There is no reason invent new terminology; it might just add more confusion. Thanks. -- Mindaugas
Re: __{read,write}_once
On Wed, 6 Nov 2019 at 18:08, Maxime Villard wrote: > > Le 06/11/2019 à 17:37, Marco Elver a écrit : > > On Wed, 6 Nov 2019 at 16:51, Kamil Rytarowski wrote: > >> > >> On 06.11.2019 16:44, Kamil Rytarowski wrote: > >>> On 06.11.2019 15:57, Jason Thorpe wrote: > >>>> > >>>> > >>>>> On Nov 6, 2019, at 5:41 AM, Kamil Rytarowski wrote: > >>>>> > >>>>> On 06.11.2019 14:37, Jason Thorpe wrote: > >>>>>> > >>>>>> > >>>>>>> On Nov 6, 2019, at 4:45 AM, Kamil Rytarowski wrote: > >>>>>>> > >>>>>>> I propose __write_relaxed() / __read_relaxed(). > >>>>>> > >>>>>> ...except that seems to imply the opposite of what these do. > >>>>>> > >>>>>> -- thorpej > >>>>>> > >>>>> > >>>>> Rationale? > >>>>> > >>>>> This matches atomic_load_relaxed() / atomic_write_relaxed(), but we do > >>>>> not deal with atomics here. > >>>> > >>>> Fair enough. To me, the names suggest "compiler is allowed to apply > >>>> relaxed constraints and tear the access if it wants" But apparently > >>>> the common meaning is "relax, bro, I know what I'm doing". If that's > >>>> the case, I can roll with it. > > > > See below. > > > >>>> -- thorpej > >>>> > >>> > >>> Unless I mean something this is exactly about relaxed constraints. > >> > >> miss* > >> > >>> > >>> "Relaxed operation: there are no synchronization or ordering constraints > >>> imposed on other reads or writes" and without "operation's atomicity is > >>> guaranteed". > > > > In the memory consistency model world, "relaxed" has a very specific > > meaning for loads/stores. It simply means that the compiler and > > architecture is free to reorder the memory operation with respect to > > previous or following memory operations in program order. But the > > load/store still happens as one atomic operation. > > > > For programming-language memory models, "relaxed" appears in the > > context of atomic memory operations, to define their ordering w.r.t. > > other operations in program order. There is a distinction between > > "relaxed atomic" and plain (unannotated) memory operations at the > > programming language level. > > > > For plain operations, the compiler, in addition to the target > > architecture, is free to reorder operations or even apply > > optimizations that turn single plain operations into multiple > > loads/stores [1, 2]. > > [1] https://lwn.net/Articles/793253/ > > [2] https://lwn.net/Articles/799218/ > > > > In the Linux kernel READ_ONCE/WRITE_ONCE are one way to specify > > "relaxed atomic read/write" for accesses that fit in a word (_ONCE > > also works on things larger than a word, but probably aren't atomic > > anymore). For most architectures the implementation is the same, and > > merely avoids compiler optimizations; the exception is Alpha, where > > READ_ONCE adds a memory barrier [3]. > > [3] https://www.kernel.org/doc/Documentation/memory-barriers.txt > > > >>> This is also similar to what suggested Google to apply to NetBSD in our > >>> internal thread, but with a bit different naming. > > > > What you call the ops is entirely up to you. I would not use > > '{read,write}_relaxed', since as you pointed out above, is probably > > more confusing. This is why I suggested 'atomic_{read,write}_relaxed'. > > If you do not like the 'atomic' in there, the next best thing is > > probably '{read,write}_once'. > > > > Just note that you're now defining your own memory consistency model. > > This is a complex topic that spans decades of work. The safest thing > > is to stick as closely to the C11/C++11 memory model as possible [4]. > > The Linux kernel also has a memory model [5], but is significantly > > more complex since it was defined after various primitives had been > > introduced. > > [4] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1479.htm > > [5] > > https://github.com/torvalds/linux/blob/master/tools/memory-model/Documentation/explanation.txt > > > > Best Wishes, >
Re: __{read,write}_once
> - Is 'volatile' a guarantee that the compiler will emit only one > instruction to fetch the value? No. For some types, some compilers, and some architectures, it may be, but it is implementation-dependent. (For some types, on some architectures, it cannot be; for example, consider a 64-bit volatile value on an architecture without 64-bit memory access primitives.) What volatile is designed for is to ensure that the generated code actually accesses the value. Consider count = 0; while (count < 10) { if (*register & 0x0400) count ++; } If register is "uint32_t *", the compiler is permitted to - and some will - hoist the load, effectively turning this into count = 0; temp = *register; while (count < 10) { if (temp & 0x0400) count ++; } which is unlikely to do what you want. If register is instead declared "volatile uint32_t *", then *register is volatile and thus must be accessed each time through the loop. volatile also compels such caching in some circumstances; for exmaple, if the code is uint32_t *register; uint32_t v; v = *register; call((v+4)*v); then, under certain circumstances (for example, v is auto and its address is never taken) the compiler is permitted to turn that into call(((*register)+4) * *register) and, on some architectures, that would be a win. (It's less likely to happen than the other way around, but, if addressing modes are cheap, registers are few, and memory is fast compared to CPU cycles, the compiler may do that.) But make it "volatile uint32_t *register;" and such an optimization is not permitted. > - Assuming there is only one instruction, strictly speaking the > fetch is not guaranteed to be atomic, is that correct? Right. What is or isn't atomic is architecture-dependent; there is, I think, _nothing_ you can do in C that can guarantee atomicity (or, for that matter, guarantee non-atomicity), since atomicity is fundamentally an architecture-dependent notion. /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Re: __{read,write}_once
Le 06/11/2019 à 17:37, Marco Elver a écrit : On Wed, 6 Nov 2019 at 16:51, Kamil Rytarowski wrote: On 06.11.2019 16:44, Kamil Rytarowski wrote: On 06.11.2019 15:57, Jason Thorpe wrote: On Nov 6, 2019, at 5:41 AM, Kamil Rytarowski wrote: On 06.11.2019 14:37, Jason Thorpe wrote: On Nov 6, 2019, at 4:45 AM, Kamil Rytarowski wrote: I propose __write_relaxed() / __read_relaxed(). ...except that seems to imply the opposite of what these do. -- thorpej Rationale? This matches atomic_load_relaxed() / atomic_write_relaxed(), but we do not deal with atomics here. Fair enough. To me, the names suggest "compiler is allowed to apply relaxed constraints and tear the access if it wants" But apparently the common meaning is "relax, bro, I know what I'm doing". If that's the case, I can roll with it. See below. -- thorpej Unless I mean something this is exactly about relaxed constraints. miss* "Relaxed operation: there are no synchronization or ordering constraints imposed on other reads or writes" and without "operation's atomicity is guaranteed". In the memory consistency model world, "relaxed" has a very specific meaning for loads/stores. It simply means that the compiler and architecture is free to reorder the memory operation with respect to previous or following memory operations in program order. But the load/store still happens as one atomic operation. For programming-language memory models, "relaxed" appears in the context of atomic memory operations, to define their ordering w.r.t. other operations in program order. There is a distinction between "relaxed atomic" and plain (unannotated) memory operations at the programming language level. For plain operations, the compiler, in addition to the target architecture, is free to reorder operations or even apply optimizations that turn single plain operations into multiple loads/stores [1, 2]. [1] https://lwn.net/Articles/793253/ [2] https://lwn.net/Articles/799218/ In the Linux kernel READ_ONCE/WRITE_ONCE are one way to specify "relaxed atomic read/write" for accesses that fit in a word (_ONCE also works on things larger than a word, but probably aren't atomic anymore). For most architectures the implementation is the same, and merely avoids compiler optimizations; the exception is Alpha, where READ_ONCE adds a memory barrier [3]. [3] https://www.kernel.org/doc/Documentation/memory-barriers.txt This is also similar to what suggested Google to apply to NetBSD in our internal thread, but with a bit different naming. What you call the ops is entirely up to you. I would not use '{read,write}_relaxed', since as you pointed out above, is probably more confusing. This is why I suggested 'atomic_{read,write}_relaxed'. If you do not like the 'atomic' in there, the next best thing is probably '{read,write}_once'. Just note that you're now defining your own memory consistency model. This is a complex topic that spans decades of work. The safest thing is to stick as closely to the C11/C++11 memory model as possible [4]. The Linux kernel also has a memory model [5], but is significantly more complex since it was defined after various primitives had been introduced. [4] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1479.htm [5] https://github.com/torvalds/linux/blob/master/tools/memory-model/Documentation/explanation.txt Best Wishes, -- Marco Thanks for the details. I must say I still have a doubt about the instructions actually emited by the compiler. Let's take READ_ONCE for example. What it does is basically just casting the pointer to volatile and doing the access. Two points of confusion: - Is 'volatile' a guarantee that the compiler will emit only one instruction to fetch the value? - Assuming there is only one instruction, strictly speaking the fetch is not guaranteed to be atomic, is that correct? Because the address may not be naturally aligned; or the cpu may not be x86 and thus may not provide automatic atomicity in such case. Thanks
Re: __{read,write}_once
On Wed, 6 Nov 2019 at 16:51, Kamil Rytarowski wrote: > > On 06.11.2019 16:44, Kamil Rytarowski wrote: > > On 06.11.2019 15:57, Jason Thorpe wrote: > >> > >> > >>> On Nov 6, 2019, at 5:41 AM, Kamil Rytarowski wrote: > >>> > >>> On 06.11.2019 14:37, Jason Thorpe wrote: > >>>> > >>>> > >>>>> On Nov 6, 2019, at 4:45 AM, Kamil Rytarowski wrote: > >>>>> > >>>>> I propose __write_relaxed() / __read_relaxed(). > >>>> > >>>> ...except that seems to imply the opposite of what these do. > >>>> > >>>> -- thorpej > >>>> > >>> > >>> Rationale? > >>> > >>> This matches atomic_load_relaxed() / atomic_write_relaxed(), but we do > >>> not deal with atomics here. > >> > >> Fair enough. To me, the names suggest "compiler is allowed to apply > >> relaxed constraints and tear the access if it wants" But apparently > >> the common meaning is "relax, bro, I know what I'm doing". If that's the > >> case, I can roll with it. See below. > >> -- thorpej > >> > > > > Unless I mean something this is exactly about relaxed constraints. > > miss* > > > > > "Relaxed operation: there are no synchronization or ordering constraints > > imposed on other reads or writes" and without "operation's atomicity is > > guaranteed". In the memory consistency model world, "relaxed" has a very specific meaning for loads/stores. It simply means that the compiler and architecture is free to reorder the memory operation with respect to previous or following memory operations in program order. But the load/store still happens as one atomic operation. For programming-language memory models, "relaxed" appears in the context of atomic memory operations, to define their ordering w.r.t. other operations in program order. There is a distinction between "relaxed atomic" and plain (unannotated) memory operations at the programming language level. For plain operations, the compiler, in addition to the target architecture, is free to reorder operations or even apply optimizations that turn single plain operations into multiple loads/stores [1, 2]. [1] https://lwn.net/Articles/793253/ [2] https://lwn.net/Articles/799218/ In the Linux kernel READ_ONCE/WRITE_ONCE are one way to specify "relaxed atomic read/write" for accesses that fit in a word (_ONCE also works on things larger than a word, but probably aren't atomic anymore). For most architectures the implementation is the same, and merely avoids compiler optimizations; the exception is Alpha, where READ_ONCE adds a memory barrier [3]. [3] https://www.kernel.org/doc/Documentation/memory-barriers.txt > > This is also similar to what suggested Google to apply to NetBSD in our > > internal thread, but with a bit different naming. What you call the ops is entirely up to you. I would not use '{read,write}_relaxed', since as you pointed out above, is probably more confusing. This is why I suggested 'atomic_{read,write}_relaxed'. If you do not like the 'atomic' in there, the next best thing is probably '{read,write}_once'. Just note that you're now defining your own memory consistency model. This is a complex topic that spans decades of work. The safest thing is to stick as closely to the C11/C++11 memory model as possible [4]. The Linux kernel also has a memory model [5], but is significantly more complex since it was defined after various primitives had been introduced. [4] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1479.htm [5] https://github.com/torvalds/linux/blob/master/tools/memory-model/Documentation/explanation.txt Best Wishes, -- Marco > Adding Marco to this thread. >
Re: __{read,write}_once
On 06.11.2019 16:58, David Young wrote: > I *think* the intention is for __read_once()/__write_once() to > load/store the entire variable from/to memory precisely once. They > provide no guarantees about atomicity of the load/store. Should > something be said about ordering and visibility of stores? The original intention is to mark reads and writes racy-OK. once is a bad name as it suggests some variation of RUN_ONCE(). I am for memory ordering name relaxed, borrowed from the terminology atomics. signature.asc Description: OpenPGP digital signature
Re: __{read,write}_once
On Wed, 6 Nov 2019 at 09:57, Jason Thorpe wrote: > > > > > On Nov 6, 2019, at 5:41 AM, Kamil Rytarowski wrote: > > > > On 06.11.2019 14:37, Jason Thorpe wrote: > >> > >> > >>> On Nov 6, 2019, at 4:45 AM, Kamil Rytarowski wrote: > >>> > >>> I propose __write_relaxed() / __read_relaxed(). > >> > >> ...except that seems to imply the opposite of what these do. > >> > >> -- thorpej > >> > > > > Rationale? > > > > This matches atomic_load_relaxed() / atomic_write_relaxed(), but we do > > not deal with atomics here. > > Fair enough. To me, the names suggest "compiler is allowed to apply relaxed > constraints and tear the access if it wants" But apparently the common > meaning is "relax, bro, I know what I'm doing". If that's the case, I can > roll with it. Honestly, without reading any code, I interpretation is more inline with the former: hey relax, maybe it happens, maybe it doesn't but well, nothing matters so what ever
Re: __{read,write}_once
On Wed, Nov 06, 2019 at 06:57:07AM -0800, Jason Thorpe wrote: > > > > On Nov 6, 2019, at 5:41 AM, Kamil Rytarowski wrote: > > > > On 06.11.2019 14:37, Jason Thorpe wrote: > >> > >> > >>> On Nov 6, 2019, at 4:45 AM, Kamil Rytarowski wrote: > >>> > >>> I propose __write_relaxed() / __read_relaxed(). > >> > >> ...except that seems to imply the opposite of what these do. > >> > >> -- thorpej > >> > > > > Rationale? > > > > This matches atomic_load_relaxed() / atomic_write_relaxed(), but we do > > not deal with atomics here. > > Fair enough. To me, the names suggest "compiler is allowed to apply relaxed > constraints and tear the access if it wants" But apparently the common > meaning is "relax, bro, I know what I'm doing". If that's the case, I can > roll with it. After reading this conversation, I'm not sure of the semantics. I *think* the intention is for __read_once()/__write_once() to load/store the entire variable from/to memory precisely once. They provide no guarantees about atomicity of the load/store. Should something be said about ordering and visibility of stores? If x is initialized to 0xf00dd00f, two threads start, and thread 1 performs __read_once(x) concurrently with thread 2 performing __write_once(x, 0xfeedbeef), then what values can thread 1 read? Do __read_once()/__write_once() have any semantics with respect to interrupts? Dave -- David Young dyo...@pobox.comUrbana, IL(217) 721-9981
Re: __{read,write}_once
On 06.11.2019 16:44, Kamil Rytarowski wrote: > On 06.11.2019 15:57, Jason Thorpe wrote: >> >> >>> On Nov 6, 2019, at 5:41 AM, Kamil Rytarowski wrote: >>> >>> On 06.11.2019 14:37, Jason Thorpe wrote: > On Nov 6, 2019, at 4:45 AM, Kamil Rytarowski wrote: > > I propose __write_relaxed() / __read_relaxed(). ...except that seems to imply the opposite of what these do. -- thorpej >>> >>> Rationale? >>> >>> This matches atomic_load_relaxed() / atomic_write_relaxed(), but we do >>> not deal with atomics here. >> >> Fair enough. To me, the names suggest "compiler is allowed to apply relaxed >> constraints and tear the access if it wants" But apparently the common >> meaning is "relax, bro, I know what I'm doing". If that's the case, I can >> roll with it. >> >> -- thorpej >> > > Unless I mean something this is exactly about relaxed constraints. miss* > > "Relaxed operation: there are no synchronization or ordering constraints > imposed on other reads or writes" and without "operation's atomicity is > guaranteed". > > This is also similar to what suggested Google to apply to NetBSD in our > internal thread, but with a bit different naming. > Adding Marco to this thread. signature.asc Description: OpenPGP digital signature
Re: __{read,write}_once
On 06.11.2019 15:57, Jason Thorpe wrote: > > >> On Nov 6, 2019, at 5:41 AM, Kamil Rytarowski wrote: >> >> On 06.11.2019 14:37, Jason Thorpe wrote: >>> >>> On Nov 6, 2019, at 4:45 AM, Kamil Rytarowski wrote: I propose __write_relaxed() / __read_relaxed(). >>> >>> ...except that seems to imply the opposite of what these do. >>> >>> -- thorpej >>> >> >> Rationale? >> >> This matches atomic_load_relaxed() / atomic_write_relaxed(), but we do >> not deal with atomics here. > > Fair enough. To me, the names suggest "compiler is allowed to apply relaxed > constraints and tear the access if it wants" But apparently the common > meaning is "relax, bro, I know what I'm doing". If that's the case, I can > roll with it. > > -- thorpej > Unless I mean something this is exactly about relaxed constraints. "Relaxed operation: there are no synchronization or ordering constraints imposed on other reads or writes" and without "operation's atomicity is guaranteed". This is also similar to what suggested Google to apply to NetBSD in our internal thread, but with a bit different naming. signature.asc Description: OpenPGP digital signature
Re: __{read,write}_once
On Wed, Nov 06, 2019 at 06:57:07AM -0800, Jason Thorpe wrote: > > This matches atomic_load_relaxed() / atomic_write_relaxed(), but we do > > not deal with atomics here. > > Fair enough. To me, the names suggest "compiler is allowed to apply > relaxed constraints and tear the access if it wants" But apparently > the common meaning is "relax, bro, I know what I'm doing". If that's > the case, I can roll with it. What is the compiler / implementation supposed to do if there is no way to do a single instruction load/store for the argument size? Do we want sized versions instead and only provide the ones that are available on a given architecture? If both names and semantics are unclear, adding the macros maybe not a good idea ;-) Martin
Re: __{read,write}_once
> On Nov 6, 2019, at 5:41 AM, Kamil Rytarowski wrote: > > On 06.11.2019 14:37, Jason Thorpe wrote: >> >> >>> On Nov 6, 2019, at 4:45 AM, Kamil Rytarowski wrote: >>> >>> I propose __write_relaxed() / __read_relaxed(). >> >> ...except that seems to imply the opposite of what these do. >> >> -- thorpej >> > > Rationale? > > This matches atomic_load_relaxed() / atomic_write_relaxed(), but we do > not deal with atomics here. Fair enough. To me, the names suggest "compiler is allowed to apply relaxed constraints and tear the access if it wants" But apparently the common meaning is "relax, bro, I know what I'm doing". If that's the case, I can roll with it. -- thorpej
Re: __{read,write}_once
> On Nov 6, 2019, at 4:45 AM, Kamil Rytarowski wrote: > > I propose __write_relaxed() / __read_relaxed(). ...except that seems to imply the opposite of what these do. -- thorpej
Re: __{read,write}_once
On 06.11.2019 14:37, Jason Thorpe wrote: > > >> On Nov 6, 2019, at 4:45 AM, Kamil Rytarowski wrote: >> >> I propose __write_relaxed() / __read_relaxed(). > > ...except that seems to imply the opposite of what these do. > > -- thorpej > Rationale? This matches atomic_load_relaxed() / atomic_write_relaxed(), but we do not deal with atomics here. signature.asc Description: OpenPGP digital signature
Re: __{read,write}_once
On 06.11.2019 12:51, Maxime Villard wrote: > Le 06/11/2019 à 12:38, Martin Husemann a écrit : >> On Wed, Nov 06, 2019 at 12:31:37PM +0100, Maxime Villard wrote: >>> __read_once(x) >>> __write_once(x, val) >> >> The names are really not helpfull for understanding what is supposed >> to happen >> here. > > I don't know if you have a better suggestion, but it's basically the naming > convention that seems to have been used for several years already. Maybe we > could have something more explicit, like > > __{read,write}_racy() or > __{read,write}_parallel() or > __{read,write}_concurrent() > > But the last one is too long I think. I propose __write_relaxed() / __read_relaxed(). signature.asc Description: OpenPGP digital signature
Re: __{read,write}_once
Le 06/11/2019 à 12:38, Martin Husemann a écrit : On Wed, Nov 06, 2019 at 12:31:37PM +0100, Maxime Villard wrote: __read_once(x) __write_once(x, val) The names are really not helpfull for understanding what is supposed to happen here. I don't know if you have a better suggestion, but it's basically the naming convention that seems to have been used for several years already. Maybe we could have something more explicit, like __{read,write}_racy() or __{read,write}_parallel() or __{read,write}_concurrent() But the last one is too long I think.
Re: __{read,write}_once
On Wed, Nov 06, 2019 at 12:31:37PM +0100, Maxime Villard wrote: > __read_once(x) > __write_once(x, val) The names are really not helpfull for understanding what is supposed to happen here. Martin
__{read,write}_once
There are cases in the kernel where we read/write global memory locklessly, and accept the races either because it is part of the design (eg low-level scheduling) or we simply don't care (eg global stats). In these cases, we want to access the memory only once, and need to ensure the compiler does not split that access in several pieces, some of which may be changed by concurrent accesses. There is a Linux article [1] about this, and also [2]. I'd like to introduce the following macros: __read_once(x) __write_once(x, val) These macros: - Are supposed to ensure that reads/writes result in only one access. We basically just cast to volatile for now. - Serve as markers for KCSAN, to say "yes there is a race, but it's expected and you don't need to care". - Help understanding the code, ie the reader will easily see that the area can legitimately race. In short, the main role they play is markers, to inform the reader and sanitizers. I've made a patch [3], with several changes already to use these macros. Feel free to comment [1] https://lwn.net/Articles/508991/ [2] https://github.com/google/ktsan/wiki/READ_ONCE-and-WRITE_ONCE [3] https://m00nbsd.net/garbage/kcsan/access-once.diff