This is regarding ODP 226 (Tests assuming pthreads) Kalray is facing a problem actually larger than this thread vs process problem: The basic question is: when N processors share the same memory (shmem object), is it acceptable to force a cache update for the whole shmem'd area for N-1
processors, as soon as one single processor updates any byte in the area. Typically, one processor will write something in the shmem, and onother will want to access it. Just the latter really needs to invalid its cache. Kallray typically does a cache invalidation on small ODP objects (e.g. atomics) but the cost of doing cache invalidate everywhere on all processors is too high for shared memory areas: updates on "shmemed" areas are not automaticaly visible from other CPUs. They would need something like a "refresh" method on the shmem object. The processor which really needs the data would call it. <shmem_object>.refresh() would invalidate the local cache (of that single core), and possibly initiate a prefetch. For symetry purpose, maybe a <shmem_object>.flush() is needed, to flush pending writes on a shmem object. Kalray has write-through caching at this stage so the need is not as bad. Possibly these methods would remap to memory barriers on some implementations, but memory barriers are not local to shmem objects, and invalidating the whole CPU cache for a single shmem object is costly. It would make sense to have a refresh() and flush() acting on the whole shmem object. But Kalray pointed out that the price for allocating very small shmem fragment is high (shmem area have name, handles and have page granularity). Allocating a shmem for a single atomic int is not efficient. They would therefore like to see these methods acting on sub-areas of the share memory. (or do we see a case for shmem_malloc() here?) Hope this helps understanding the problem... Christophe.
_______________________________________________ lng-odp mailing list lng-odp@lists.linaro.org https://lists.linaro.org/mailman/listinfo/lng-odp