Hi, This is partly an answer to Gary's mail included below. it is about shmem allocation in process mode. I don't think I really agree with you, Gary: As Ola, I believe it is important (for performance) that shmem areas are seen at the same address in their virtual space on different ODP threads, even if the latter happens to be implemented as linux processes.
I do see a solution for doing this but it implies redefining ODP threads as we (Gary and myself) originially wanted them to be: "in linux, an odp thread is any thread or process *descendant of the ODP initiation process* which has called odp_init_local()" (def 1) This definition was rejected (mostly by Petri) who wanted to define an ODP thread as: "in linux, an odp thread is any thread or process which has called odp_init_local()" (def 2). This was accepted as the definition, but I still believe we should go for def 1 instead. Indeed, if the main ODP instantiation process performs a huge virtual space reservation, e.g something like: shmem_base = mmap(NULL, SHM_TOT_SZ, PROT_NONE, MAP_ANONYMOUS, FLAG_NO_RESERVE); madvise(shmem_base, SHM_TOT_SZ, MADV_DONTNEED) where SHM_TOT is the grand max of the sum of all possible shmem areas, then this area will be mapped in all processes and threads descendant from the odp instantiation process, ie all odpthread if def 1 is accepted. Note than the only goal of these two system calls is to reserve virtual space, not memory. I have an open question with Barry to see if there are better way to do so. But my point is that, if we accept definition 1 instead of def 2, we can pre-reserve a huge virtual address space in the instantiation process and know for sure that this will be inherited by all odp threads (threads as processes). shmem_base to shmem_base + SHM_TOT_SZ can now be handled as the address range where any odp shared memory will be mmapped: we need a algorythm to retrieve addresses withing this range, e.g. two ODP internal functions: void* _odp_shm_space_alloc(int size) _odp_shm_space_free(void *ptr) which retrieve address within this range and handle defragmentation. Then, when a odp_shm_reserve(area1_sz) is performed, the real memory can be allocaded and mmap with the MAP_FIXED flag to the address returned by _odp_shm_space_alloc(area1_sz). Any other odp thread performng a odp_shm_lookup() and odp_shm_get_addr() on an existing handle would then mmap it on the same existing address using the MAP_FIXED flag. Note that we MAP_FIXED is used, the kernel replaces any existing mapping with the new one, which is exactely the behaviour we want here. An odp_free would of course call _odp_shm_space_free(), returning the address range to the "pool" and remapping it with the NO_RESERVE flag or whatever better way to tell the kernel that this is just a virtual space reservation. As far as I can see, there is a huge performance gain to be able to guarantee that a given shm area maps at the same address in any odp thread. Accepting def 1 gives this possibility. If not, I cannot see how 2 unrelated processes could be guarantedd to have the same address range available at any time within their respectice address space... Christophe. The mail received from Gary: I had intended to try some patches on the ODP Linux-generic shmem implementation in order to allow child processes to access shared memory as reliably as well as pthreads do now. Essentially the problem is that while pthreads share a single memory space with all threads within their thread group(except for thread-local storage, that is) - forked processes have unique memory spaces and shared memory or memory-mapped files CANNOT BE GUARANTEED to be mapped at the same address from one process to the next. Hence instances of both of these memory areas must be referenced between processes via the use of shared memory object names known to all processes in question. Each process then binds that shared memory object to an address unique to its local address space - and elements within the shared memory / file space are referenced via known offsets from the local base address. I do not believe there is any reasonable means for the kernel to guarantee matching virtual addresses across processes for memory-mapped system-scoped objects... it would necessitate dynamic remapping of the rest of a given process's virtual address space when a shared object was mapped in. What a nightmare that could be! In the case of ODP there seems to be a "master catalog" of shared memory created by the parent ODP instance - and the addresses of shared memory objects reserved by ODP threads belonging to that instance are stored in this 'catalog' and then later referenced by the children. Neither the address of the 'master catalog' itself nor the addresses of the reserved areas listed within the 'catalog' may be used reliably from forked child processes. Forked processes will have to reference all of these shared memory objects via filenames - so the 'master catalog' itself must have a 'well known' filename specific to the parent ODP instance - and the reserved memory areas 'registered' within the 'catalog' must store filenames into the 'catalog'. Anonymous shared memory objects will not work with the 'process' child model. A solution would be to store only the object names into the 'master catalog'. A separate process-specific table would then be added to contain the addresses of the objects as mapped in each process's address space. Then when a child ODP thread needs to reference a previously reserved shared memory object, it would first check for a matching entry in the local process-specific address table. If an existing entry is found in the local table then it can use this address mapping directly... otherwise it must remap the shared memory object in question into its own address space and store the address into its local mapping table for future reference. Possibly an addition to odp_init_local would do this mapping for the parent instance's "master catalog" when child process begins executing. When a new shared memory object is created and reserved by an ODP child thread, it would register a name in the "master catalog" shared memory object and then store the local address mapping in the local process-specific address table. odp_term_local would have to unlink the local reference and address binding for each entry in the local process-specific address table when called from the last thread within a process or ODP instance. Gary R.
_______________________________________________ lng-odp mailing list lng-odp@lists.linaro.org https://lists.linaro.org/mailman/listinfo/lng-odp