Hi,

This is partly an answer to Gary's mail included below. it is about shmem
allocation in process mode. I don't think I really agree with you, Gary: As
Ola, I believe it is important (for performance) that shmem areas are seen
at the same address in their virtual space on different ODP threads, even
if the latter happens to be implemented as linux processes.

I do see a solution for doing this but it implies redefining ODP threads as
we (Gary and myself)  originially wanted them to be:

"in linux, an odp thread is any thread or process *descendant of the ODP
initiation process* which has called odp_init_local()" (def 1)

This definition was rejected (mostly by Petri) who wanted to define an ODP
thread as: "in linux, an odp thread is any thread or process which has
called odp_init_local()" (def 2). This was accepted as the definition, but
I still believe we should go for def 1 instead.

Indeed, if the main ODP instantiation process performs a huge virtual space
reservation, e.g something like:

shmem_base = mmap(NULL, SHM_TOT_SZ, PROT_NONE, MAP_ANONYMOUS,
FLAG_NO_RESERVE);
madvise(shmem_base, SHM_TOT_SZ, MADV_DONTNEED)

where SHM_TOT is the grand max of the sum of all possible shmem areas, then
this area will be mapped in all processes and threads descendant from the
odp instantiation process, ie all odpthread if def 1 is accepted.
Note than the only goal of these two system calls is to reserve virtual
space, not memory. I have an open question with Barry to see if there are
better way to do so.
But my point is that, if we accept definition 1 instead of def 2, we can
pre-reserve a huge virtual address space in the instantiation process and
know for sure that this will be inherited by all odp threads (threads as
processes).

shmem_base to shmem_base + SHM_TOT_SZ can now be handled as the address
range where any odp shared memory will be mmapped: we need a algorythm to
retrieve addresses withing this range, e.g. two ODP internal functions:
void* _odp_shm_space_alloc(int size)
_odp_shm_space_free(void *ptr)
which retrieve address within this range and handle defragmentation.

Then, when a odp_shm_reserve(area1_sz) is performed, the real memory can be
allocaded and mmap with the MAP_FIXED flag to the address returned by
 _odp_shm_space_alloc(area1_sz).
Any other odp thread performng a odp_shm_lookup() and odp_shm_get_addr() on
an existing handle would then mmap it on the same existing address using
the MAP_FIXED flag.
Note that we MAP_FIXED is used, the kernel replaces any existing mapping
with the new one, which is exactely the behaviour we want here.

An odp_free would of course call _odp_shm_space_free(), returning the
address range to the "pool" and remapping it with the NO_RESERVE flag or
whatever better way to tell the kernel that this is just a virtual space
reservation.

As far as I can see, there is a huge performance gain to be able to
guarantee that a given shm area maps at the same address in any odp thread.
Accepting def 1 gives this possibility. If not, I cannot see how 2
unrelated processes could be guarantedd to have the same address range
available at any time within their respectice address space...

Christophe.


The mail received from Gary:

I had intended to try some patches on the ODP Linux-generic shmem
implementation in order to allow child processes to access shared memory as
reliably as well as pthreads do now.  Essentially the problem is that while
pthreads share a single memory space with all threads within their thread
group(except for thread-local storage, that is) - forked processes have
unique memory spaces and shared memory or memory-mapped files CANNOT BE
GUARANTEED to be mapped at the same address from one process to the next.
Hence instances of both of these memory areas must be referenced between
processes via the use of shared memory object names known to all processes
in question.  Each process then binds that shared memory object to an
address unique to its local address space - and elements within the shared
memory / file space are referenced via known offsets from the local base
address.  I do not believe there is any reasonable means for the kernel to
guarantee matching virtual addresses across processes for memory-mapped
system-scoped objects... it would necessitate dynamic remapping of the rest
of a given process's virtual address space when a shared object was mapped
in.  What a nightmare that could be!

In the case of ODP there seems to be a "master catalog" of shared memory
created by the parent ODP instance - and the addresses of shared memory
objects reserved by ODP threads belonging to that instance are stored in
this 'catalog' and then later referenced by the children.  Neither the
address of the 'master catalog' itself nor the addresses of the reserved
areas listed within the 'catalog' may be used reliably from forked child
processes.  Forked processes will have to reference all of these shared
memory objects via filenames  - so the 'master catalog' itself must have a
'well known' filename specific to the parent ODP instance - and the
reserved memory areas 'registered' within the 'catalog' must store
filenames into the 'catalog'.  Anonymous shared memory objects will not
work with the 'process' child model.

A solution would be to store only the object names into the 'master
catalog'.  A separate process-specific table would then be added to contain
the addresses of the objects  as mapped in each process's address space.
Then when a child ODP thread needs to reference a previously reserved
shared memory object, it would first check for a matching entry in the
local process-specific address table.  If an existing entry is found in the
local table then it can use this address mapping directly... otherwise it
must remap the shared memory object in question into its own address space
and store the address into its local mapping table for future reference.

Possibly an addition to odp_init_local would do this mapping for the parent
instance's "master catalog" when child process begins executing.  When a
new shared memory object is created and reserved by an ODP child thread, it
would register a name in the "master catalog" shared memory object and then
store the local address mapping in the local process-specific address
table.  odp_term_local would have to unlink the local reference and address
binding for each entry in the local process-specific address table when
called from the last thread within a process or ODP instance.

Gary R.
_______________________________________________
lng-odp mailing list
lng-odp@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/lng-odp

Reply via email to