Jack,

I haven't had time to look at the outstanding issues, and won't realistically 
before stage3 ends - so I don't suppose it will be eligible for GCC5 :(

Iain

On 8 Jan 2015, at 21:56, Jack Howarth wrote:

> Iain,
>     Were you planning to try to get this committed before stage4 or
> will it have to wait for the next major gcc release?
>                  Jack
> 
> On Thu, Nov 13, 2014 at 3:34 PM, Iain Sandoe <i...@codesourcery.com> wrote:
>> Hello Richard, Joseph,
>> 
>> Thanks for your reviews,
>> 
>> On 13 Nov 2014, at 07:40, Richard Henderson wrote:
>> 
>>> On 11/12/2014 10:18 PM, Iain Sandoe wrote:
>>> 
>>>> #  ifndef USE_ATOMIC
>>>> #    define USE_ATOMIC 1
>>>> #  endif
>>> 
>>> Why would USE_ATOMIC be defined previously?
>> 
>> This was left-over from a mode where I allowed the User to jam the mode to 
>> OSSpinLocks to test the performance.
>> 
>> I apologise, [weak excuse follows] with the turbulence of Darwin on trunk, 
>> my trunk version of the patch had got behind my 4.9 one. (most of the work 
>> has been hammered out there while we try to get bootstrap restored).
>> 
>> re-synced and retested with a patched trunk that bootstraps with some 
>> band-aid.
>> 
>>>> inline static void LockUnlock(uint32_t *l) {
>>>>  __atomic_store_4((_Atomic(uint32_t)*)l, 0, __ATOMIC_RELEASE);
>>>> }
>>> 
>>> Gnu coding style, please.  All through the file here.
>> 
>> Fixed.
>> 
>>> #  define LOCK_SIZE sizeof(uint32_t)
>>>> #  define NLOCKS (PAGE_SIZE / LOCK_SIZE)
>>>> static uint32_t locks[NLOCKS];
>>> 
>>> Um, surely not LOCK_SIZE, but CACHELINE_SIZE.  It's the granularity of the
>>> target region that's at issue, not the size of the lock itself.
>> 
>> The algorithm I've used is intentionally different from the pthreads-based 
>> posix one, here's the rationale, as I see it:
>> 
>> /* Algorithm motivations.
>> 
>> Layout Assumptions:
>>  o Darwin has a number of sub-targets with common atomic types that have no
>>    'native' in-line handling, but are smaller than a cache-line.
>>    E.G. PPC32 needs locking for >= 8byte quantities, X86/m32 for >=16.
>> 
>>  o The _Atomic alignment of a "natural type" is no greater than the type 
>> size.
>> 
>>  o There are no special guarantees about the alignment of _Atomic aggregates
>>    other than those determined by the psABI.
>> 
>>  o There are no guarantees that placement of an entity won't cause it to
>>    straddle a cache-line boundary.
>> 
>>  o Realistic User code will likely place several _Atomic qualified types in
>>    close proximity (such that they fall within the same cache-line).
>>    Similarly, arrays of _Atomic qualified items.
>> 
>> Performance Assumptions:
>>  o Collisions of address hashes for items (which make up the lock keys)
>>    constitute the largest performance issue.
>> 
>>  o We want to avoid unnecessary flushing of [lock table] cache-line(s) when
>>    items are accessed.
>> 
>> Implementation:
>>  We maintain a table of locks, each lock being 4 bytes (at present).
>>  This choice of lock size gives some measure of equality in hash-collision
>>  statistics between the 'atomic' and 'OSSpinLock' implementations, since the
>>  lock size is fixed at 4 bytes for the latter.
>> 
>>  The table occupies one physical page, and we attempt to align it to a
>>  page boundary, appropriately.
>> 
>>  For entities that need a lock, with sizes < one cache line:
>>    Each entity that requires a lock, chooses the lock to use from the table 
>> on
>>  the basis of a hash determined by its size and address.  The lower 
>> log2(size)
>>  address bits are discarded on the assumption that the alignment of entities
>>  will not be smaller than their size.
>>  CHECKME: this is not verified for aggregates; it might be something that
>>  could/should be enforced from the front ends (since _Atomic types are
>>  allowed to have increased alignment c.f. 'normal').
>> 
>>  For entities that need a lock, with sizes >= one cacheline_size:
>>    We assume that the entity alignment >= log2(cacheline_size) and discard
>>  log2(cacheline_size) bits from the address.
>>  We then apply size/cacheline_size locks to cover the entity.
>> 
>>  The idea is that this will typically result in distinct hash keys for items
>>  placed close together.  The keys are mangled further such that the size is
>>  included in the hash.
>> 
>>  Finally, to attempt to make it such that the lock table entries are accessed
>>  in a scattered manner,to avoid repeated cacheline flushes, the hash is
>>  rearranged to attempt to maximise the most noise in the upper bits.
>> */
>> 
>> NOTE that the CHECKME above doesn't put us in any worse position than the 
>> pthreads implementation (likely slightly better since we have a smaller 
>> granularity with the current scheme).
>> 
>>>> #if USE_ATOMIC
>>>>  LockLock (&locks[addr_hash (ptr, 1)]);
>>>> #else
>>>>  OSSpinLockLock(&locks[addr_hash (ptr, 1)]);
>>>> #endif
>>> 
>>> Better to #define LockLock  OSSpinLockLock within the area above, so as to
>>> avoid the ifdefs here.
>> done.
>> 
>> Thoughts on the rationale - or OK now?
>> 
>> thanks
>> Iain
>> 
>> I'm not aware of any other PRs that relate, but will do a final scan through 
>> and ask around the darwin folks.
>> 
>> libatomic:
>> 
>>        PR target/59305
>>        * config/darwin/host-config.h New.
>>        * config/darwin/lock.c New.
>>        * configure.tgt (DEFAULT_X86_CPU): New, (target): New entry for 
>> darwin.
>> 
>> 
>> 
>> 
>> 

Reply via email to