Jack, I haven't had time to look at the outstanding issues, and won't realistically before stage3 ends - so I don't suppose it will be eligible for GCC5 :(
Iain On 8 Jan 2015, at 21:56, Jack Howarth wrote: > Iain, > Were you planning to try to get this committed before stage4 or > will it have to wait for the next major gcc release? > Jack > > On Thu, Nov 13, 2014 at 3:34 PM, Iain Sandoe <i...@codesourcery.com> wrote: >> Hello Richard, Joseph, >> >> Thanks for your reviews, >> >> On 13 Nov 2014, at 07:40, Richard Henderson wrote: >> >>> On 11/12/2014 10:18 PM, Iain Sandoe wrote: >>> >>>> # ifndef USE_ATOMIC >>>> # define USE_ATOMIC 1 >>>> # endif >>> >>> Why would USE_ATOMIC be defined previously? >> >> This was left-over from a mode where I allowed the User to jam the mode to >> OSSpinLocks to test the performance. >> >> I apologise, [weak excuse follows] with the turbulence of Darwin on trunk, >> my trunk version of the patch had got behind my 4.9 one. (most of the work >> has been hammered out there while we try to get bootstrap restored). >> >> re-synced and retested with a patched trunk that bootstraps with some >> band-aid. >> >>>> inline static void LockUnlock(uint32_t *l) { >>>> __atomic_store_4((_Atomic(uint32_t)*)l, 0, __ATOMIC_RELEASE); >>>> } >>> >>> Gnu coding style, please. All through the file here. >> >> Fixed. >> >>> # define LOCK_SIZE sizeof(uint32_t) >>>> # define NLOCKS (PAGE_SIZE / LOCK_SIZE) >>>> static uint32_t locks[NLOCKS]; >>> >>> Um, surely not LOCK_SIZE, but CACHELINE_SIZE. It's the granularity of the >>> target region that's at issue, not the size of the lock itself. >> >> The algorithm I've used is intentionally different from the pthreads-based >> posix one, here's the rationale, as I see it: >> >> /* Algorithm motivations. >> >> Layout Assumptions: >> o Darwin has a number of sub-targets with common atomic types that have no >> 'native' in-line handling, but are smaller than a cache-line. >> E.G. PPC32 needs locking for >= 8byte quantities, X86/m32 for >=16. >> >> o The _Atomic alignment of a "natural type" is no greater than the type >> size. >> >> o There are no special guarantees about the alignment of _Atomic aggregates >> other than those determined by the psABI. >> >> o There are no guarantees that placement of an entity won't cause it to >> straddle a cache-line boundary. >> >> o Realistic User code will likely place several _Atomic qualified types in >> close proximity (such that they fall within the same cache-line). >> Similarly, arrays of _Atomic qualified items. >> >> Performance Assumptions: >> o Collisions of address hashes for items (which make up the lock keys) >> constitute the largest performance issue. >> >> o We want to avoid unnecessary flushing of [lock table] cache-line(s) when >> items are accessed. >> >> Implementation: >> We maintain a table of locks, each lock being 4 bytes (at present). >> This choice of lock size gives some measure of equality in hash-collision >> statistics between the 'atomic' and 'OSSpinLock' implementations, since the >> lock size is fixed at 4 bytes for the latter. >> >> The table occupies one physical page, and we attempt to align it to a >> page boundary, appropriately. >> >> For entities that need a lock, with sizes < one cache line: >> Each entity that requires a lock, chooses the lock to use from the table >> on >> the basis of a hash determined by its size and address. The lower >> log2(size) >> address bits are discarded on the assumption that the alignment of entities >> will not be smaller than their size. >> CHECKME: this is not verified for aggregates; it might be something that >> could/should be enforced from the front ends (since _Atomic types are >> allowed to have increased alignment c.f. 'normal'). >> >> For entities that need a lock, with sizes >= one cacheline_size: >> We assume that the entity alignment >= log2(cacheline_size) and discard >> log2(cacheline_size) bits from the address. >> We then apply size/cacheline_size locks to cover the entity. >> >> The idea is that this will typically result in distinct hash keys for items >> placed close together. The keys are mangled further such that the size is >> included in the hash. >> >> Finally, to attempt to make it such that the lock table entries are accessed >> in a scattered manner,to avoid repeated cacheline flushes, the hash is >> rearranged to attempt to maximise the most noise in the upper bits. >> */ >> >> NOTE that the CHECKME above doesn't put us in any worse position than the >> pthreads implementation (likely slightly better since we have a smaller >> granularity with the current scheme). >> >>>> #if USE_ATOMIC >>>> LockLock (&locks[addr_hash (ptr, 1)]); >>>> #else >>>> OSSpinLockLock(&locks[addr_hash (ptr, 1)]); >>>> #endif >>> >>> Better to #define LockLock OSSpinLockLock within the area above, so as to >>> avoid the ifdefs here. >> done. >> >> Thoughts on the rationale - or OK now? >> >> thanks >> Iain >> >> I'm not aware of any other PRs that relate, but will do a final scan through >> and ask around the darwin folks. >> >> libatomic: >> >> PR target/59305 >> * config/darwin/host-config.h New. >> * config/darwin/lock.c New. >> * configure.tgt (DEFAULT_X86_CPU): New, (target): New entry for >> darwin. >> >> >> >> >>