Sorry for the late feedback. I missed this one

On 10/26/2015 05:07 PM, Petri Savolainen wrote:
> Updated odp_sync_stores() specification and added odp_sync_loads
> to pair it. Used GCC __atomic_thread_fence to implement both of
> those.
>
> Signed-off-by: Petri Savolainen <petri.savolai...@nokia.com>
> ---
...
> +/**
> + * Synchronize loads
> + *
> + * This call implements a read memory barrier. It ensures that all 
> (non-atomic
> + * or relaxed atomic) loads that precede this call happen before any load
> + * operation that follows it. It prevents loads moving from after the call to
> + * before it.
> + *
> + * ODP synchronization mechanisms (e.g. barrier, locks, queue dequeues)
> + * include read barrier, so this call is not needed when using those.
> + *
The API here is fine. What bothers me is the footnote about all ODP sync 
mechanisms calling this.
Because Kalray architecture does not have cache coherency, a read memory 
barrier is *very* expensive.
We have to invalidate the complete cache (cheap) but then refill it later.

What our current implementation does is simply invalidate the appropriate 
structure as needed.
barriers and locks only cause write memory barrier. Queue dequeue also ensure 
that the dequeued struct (packet, timer, buffer, etc.) are up to date with the 
other threads and devices.

_______________________________________________
lng-odp mailing list
lng-odp@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/lng-odp

Reply via email to