On Mon, Jul 01, 2019 at 08:05:51PM +0000, Vineet Gupta wrote: > On 5/31/19 1:21 AM, Peter Zijlstra wrote: > > On Thu, May 30, 2019 at 11:22:42AM -0700, Vineet Gupta wrote: > >> Had an interesting lunch time discussion with our hardware architects > >> pertinent to > >> "minimal guarantees expected of a CPU" section of memory-barriers.txt > >> > >> > >> | (*) These guarantees apply only to properly aligned and sized scalar > >> | variables. "Properly sized" currently means variables that are > >> | the same size as "char", "short", "int" and "long". "Properly > >> | aligned" means the natural alignment, thus no constraints for > >> | "char", two-byte alignment for "short", four-byte alignment for > >> | "int", and either four-byte or eight-byte alignment for "long", > >> | on 32-bit and 64-bit systems, respectively. > >> > >> > >> I'm not sure how to interpret "natural alignment" for the case of double > >> load/stores on 32-bit systems where the hardware and ABI allow for 4 byte > >> alignment (ARCv2 LDD/STD, ARM LDRD/STRD ....) > > > > Natural alignment: !((uintptr_t)ptr % sizeof(*ptr)) > > > > For any u64 type, that would give 8 byte alignment. the problem > > otherwise being that your data spans two lines/pages etc.. > > > >> I presume (and the question) that lkmm doesn't expect such 8 byte > >> load/stores to > >> be atomic unless 8-byte aligned > >> > >> ARMv7 arch ref manual seems to confirm this. Quoting > >> > >> | LDM, LDC, LDC2, LDRD, STM, STC, STC2, STRD, PUSH, POP, RFE, SRS, VLDM, > >> VLDR, > >> | VSTM, and VSTR instructions are executed as a sequence of word-aligned > >> word > >> | accesses. Each 32-bit word access is guaranteed to be single-copy > >> atomic. A > >> | subsequence of two or more word accesses from the sequence might not > >> exhibit > >> | single-copy atomicity > >> > >> While it seems reasonable form hardware pov to not implement such > >> atomicity by > >> default it seems there's an additional burden on application writers. They > >> could > >> be happily using a lockless algorithm with just a shared flag between 2 > >> threads > >> w/o need for any explicit synchronization. > > > > If you're that careless with lockless code, you deserve all the pain you > > get. > > > >> But upgrade to a new compiler which > >> aggressively "packs" struct rendering long long 32-bit aligned (vs. 64-bit > >> before) > >> causing the code to suddenly stop working. Is the onus on them to declare > >> such > >> memory as c11 atomic or some such. > > > > When a programmer wants guarantees they already need to know wth they're > > doing. > > > > And I'll stand by my earlier conviction that any architecture that has a > > native u64 (be it a 64bit arch or a 32bit with double-width > > instructions) but has an ABI that allows u32 alignment on them is daft. > > So I agree with Paul's assertion that it is strange for 8-byte type being > 4-byte > aligned on a 64-bit system, but is it totally broken even if the ISA of the > said > 64-bit arch allows LD/ST to be augmented with acq/rel respectively. > > Say the ISA guarantees single-copy atomicity for aligned cases (i.e. for > 8-byte > data only if it is naturally aligned) and in lack thereof programmer needs to > use > the proper acq/release
Apologies if I'm missing some context here, but it's not clear to me why the use of acquire/release instructions has anything to do with single-copy atomicity of unaligned accesses. The ordering they provide doesn't necessarily prevent tearing, although a CPU architecture could obviously provide that guarantee if it wanted to. Generally though, I wouldn't expect the two to go hand-in-hand like you're suggesting. Will