(Now using macOS Mail program in plain text mode, hope this works)
> On 2 Apr 2022, at 21:31, Honnappa Nagarahalli <[email protected]>
> wrote:
>
> <snip>
>
>>> +__rte_experimental
>>> +static inline bool
>>> +rte_seqlock_read_tryunlock(const rte_seqlock_t *seqlock, uint32_t
>>> +*begin_sn) {
>>> + uint32_t end_sn;
>>> +
>>> + /* make sure the data loads happens before the sn load */
>>> + rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
>>> +
>>> + end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED);
>>
>> Since we are reading and potentially returning the sequence number here
>> (repeating the read of the protected data), we need to use load-acquire.
>> I assume it is not expected that the user will call
>> rte_seqlock_read_lock() again.
> Good point, we need a load-acquire (due to changes done in v3).
>
>>
>> Seeing this implementation, I might actually prefer the original
>> implementation, I think it is cleaner. But I would like for the begin
>> function
>> also to wait for an even sequence number, the end function would only have
>> to check for same sequence number, this might improve performance a little
>> bit as readers won't perform one or several broken reads while a write is in
>> progress. The function names are a different thing though.
> I think we need to be optimizing for the case where there is no contention
> between readers and writers (as that happens most of the time). From this
> perspective, not checking for an even seq number in the begin function would
> reduce one 'if' statement.
The number of statements in C is not relevant, instead we need to look at the
generated code. On x86, I would assume an if-statement like “if ((sn & 1) ||
(sn == sl->sn))” to generate two separate evaluations with their own
conditional jump instructions. On AArch64, the two evaluations could probably
be combined using a CCMP instruction and need only one conditional branch
instruction. With branch prediction, it is doubtful we will see any difference
in performance.
>
> Going back to the earlier model is better as well, because of the
> load-acquire required in the 'rte_seqlock_read_tryunlock' function. The
> earlier model would not introduce the load-acquire for the no contention case.
The earlier model still had load-acquire in the read_begin function which would
have to invoked again. There is no difference in the number or type of memory
accesses. We just need to copy the implementation of read_begin into the
read_tryunlock function if we decide that the user should not have to re-invoke
read_begin on a failed read_tryunlock.
>
>>
>> The writer side behaves much more like a lock with mutual exclusion so
>> write_lock/write_unlock makes sense.
>>
>>> +
>>> + if (unlikely(end_sn & 1 || *begin_sn != end_sn)) {
>>> + *begin_sn = end_sn;
>>> + return false;
>>> + }
>>> +
>>> + return true;
>>> +}
>>> +