Re: [OMPI devel] System V Shared Memory for Open MPI: Request for Community Input and Testing
On 2 May 2010, at 04:03, Samuel K. Gutierrez wrote: > As far as I can tell, calling shmctl IPC_RMID is immediately destroying > the shared memory segment even though there is at least one process > attached to it. This is interesting and confusing because Solaris 10's > behavior description of shmctl IPC_RMID is similar to that of Linux'. > > I call shmctl IPC_RMID immediately after one process has attached to the > segment because, at least on Linux, this only marks the segment for > destruction. The segment is only actually destroyed after all attached > processes have terminated. I'm relying on this behavior for resource > cleanup upon application termination (normal/abnormal). I think you should look into this a little deeper, it certainly used to be the case on Linux that setting IPC_RMID would also prevent any further processes from attaching to the segment. You're right that minimising the window that the region exists for without that bit set is good, both in terms of wall-clock-time and lines of code, what we used to do here was to have all processes on a node perform a out-of-band intra-node barrier before creating the segment and another in-band barrier immediately after creating it. Without this if one process on a node has problems and aborts during startup before it gets to the shared memory code then you are almost guaranteed to leave a un-attached segment behind. As to performance there should be no difference in use between sys-V shared memory and file-backed shared memory, the instructions issued and the MMU flags for the page should both be the same so the performance should be identical. The one area you do need to keep an eye on for performance is on numa machines where it's important which process on a node touches each page first, you can end up using different areas (pages, not regions) for communicating in different directions between the same pair of processes. I don't believe this is any different to mmap backed shared memory though. > Because of this, sysv support may be limited to Linux systems - that is, > until we can get a better sense of which systems provide the shmctl > IPC_RMID behavior that I am relying on. Ashley, -- Ashley Pittman, Bath, UK. Padb - A parallel job inspection tool for cluster computing http://padb.pittman.org.uk
Re: [OMPI devel] System V Shared Memory for Open MPI: Request for Community Input and Testing
On 01/05/10 23:03, Samuel K. Gutierrez wrote: > I call shmctl IPC_RMID immediately after one process has > attached to the segment because, at least on Linux, this > only marks the segment for destruction. That's correct, looking at the kernel code (at least in the current git master) the function that handles this - do_shm_rmid() in ipc/shm.c - only destroys the segment if nobody is attached to it, otherwise it marks the segment as IPC_PRIVATE to stop others finding it and with SHM_DEST so that it is automatically destroyed on the last detach. cheers, Chris -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computational Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/
Re: [OMPI devel] System V Shared Memory for Open MPI: Request for Community Input and Testing
On 02/05/10 06:49, Ashley Pittman wrote: > I think you should look into this a little deeper, it > certainly used to be the case on Linux that setting > IPC_RMID would also prevent any further processes from > attaching to the segment. That certainly appears to be the case in the current master of the kernel, IPC_PRIVATE is set on the segment with the comment: /* Do not find it any more */ That flag means that ipcget() - used by sys_shmget() - take a different code path and now call ipcget_new() rather than ipcget_public(). cheers, Chris -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computational Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/
Re: [OMPI devel] System V Shared Memory for Open MPI: Request for Community Input and Testing
On May 2 2010, Ashley Pittman wrote: On 2 May 2010, at 04:03, Samuel K. Gutierrez wrote: As to performance there should be no difference in use between sys-V shared memory and file-backed shared memory, the instructions issued and the MMU flags for the page should both be the same so the performance should be identical. Not necessarily, and possibly not so even for far-future Linuces. On at least one system I used, the poxious kernel wrote the complete file to disk before returning - all right, it did that for System V shared memory, too, just to a 'hidden' file! But, if I recall, on another it did that only for file-backed shared memory - however, it's a decade ago now and I may be misremembering. Of course, that's a serious issue mainly for large segments. I was using multi-GB ones. I don't know how big the ones you need are. The one area you do need to keep an eye on for performance is on numa machines where it's important which process on a node touches each page first, you can end up using different areas (pages, not regions) for communicating in different directions between the same pair of processes. I don't believe this is any different to mmap backed shared memory though. On some systems it may be, but in bizarre, inconsistent, undocumented and unpredictable ways :-( Also, there are usually several system (and sometimes user) configuration options that change the behaviour, so you have to allow for that. My experience of trying to use those is that different uses have incompatible requirements, and most of the critical configuration parameters apply to ALL uses! In my view, the configuration variability is the number one nightmare for trying to write portable code that uses any form of shared memory. ARMCI seem to agree. Because of this, sysv support may be limited to Linux systems - that is, until we can get a better sense of which systems provide the shmctl IPC_RMID behavior that I am relying on. And, I suggest, whether they have an evil gotcha on one of the areas that Ashley Pittman noted. Regards, Nick Maclaren.
[OMPI devel] Unchecked malloc()'s in OMPI 1.4.x
Hi there, I've been playing around with Coccinelle, the semantic patching system (packaged in Ubuntu 10.04) and using it to try and detect things like unchecked malloc(). It's not perfect, for instance it flags up calls to assert() on the result of the malloc as bad, even though they're not, but the rest seem real. I've done a test run on OMPI 1.4.2rc3r23065 and it's flagged the following, I hope these are useful! ompi/mca/btl/openib/btl_openib_component.c line 997 - prepare_device_for_use() init_data = malloc(sizeof(mca_btl_openib_frag_init_data_t)); ompi/mca/btl/openib/btl_openib_component.c line 2414 - btl_openib_component_init() init_data = malloc(sizeof(mca_btl_openib_frag_init_data_t)); ompi/mca/btl/openib/connect/btl_openib_connect_base.c line 104 - prepare_device_for_use() available = calloc(1, sizeof(all)); ompi/mca/btl/portals/btl_portals.c line 104 - mca_btl_portals_add_procs() portals_procs = malloc(nprocs * sizeof(ptl_process_id_t)); ompi/mpi/f77/comm_spawn_multiple_f.c line 109 - mpi_comm_spawn_multiple_f() c_info = malloc (array_size * sizeof(MPI_Info)); opal/class/opal_hash_table.c line 431 - opal_hash_table_set_value_ptr() node->hn_key = malloc(key_size); orte/mca/ras/alps/ras_alps_module.c line 243 - orte_ras_alps_read_appinfo_file() cpBuf=malloc(szLen+1); /* Allocate buffer */ All the best, Chris -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computational Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/