Re: [OMPI devel] System V Shared Memory for Open MPI: Request for Community Input and Testing

2010-05-02 Thread Ashley Pittman

On 2 May 2010, at 04:03, Samuel K. Gutierrez wrote:
> As far as I can tell, calling shmctl IPC_RMID is immediately destroying
> the shared memory segment even though there is at least one process
> attached to it.  This is interesting and confusing because Solaris 10's
> behavior description of shmctl IPC_RMID is similar to that of Linux'.
> 
> I call shmctl IPC_RMID immediately after one process has attached to the
> segment because, at least on Linux, this only marks the segment for
> destruction.  The segment is only actually destroyed after all attached
> processes have terminated.  I'm relying on this behavior for resource
> cleanup upon application termination (normal/abnormal).

I think you should look into this a little deeper, it certainly used to be the 
case on Linux that setting IPC_RMID would also prevent any further processes 
from attaching to the segment.

You're right that minimising the window that the region exists for without that 
bit set is good, both in terms of wall-clock-time and lines of code, what we 
used to do here was to have all processes on a node perform a out-of-band 
intra-node barrier before creating the segment and another in-band barrier 
immediately after creating it.  Without this if one process on a node has 
problems and aborts during startup before it gets to the shared memory code 
then you are almost guaranteed to leave a un-attached segment behind.

As to performance there should be no difference in use between sys-V shared 
memory and file-backed shared memory, the instructions issued and the MMU flags 
for the page should both be the same so the performance should be identical.

The one area you do need to keep an eye on for performance is on numa machines 
where it's important which process on a node touches each page first, you can 
end up using different areas (pages, not regions) for communicating in 
different directions between the same pair of processes.  I don't believe this 
is any different to mmap backed shared memory though.

> Because of this, sysv support may be limited to Linux systems - that is,
> until we can get a better sense of which systems provide the shmctl
> IPC_RMID behavior that I am relying on.

Ashley,

-- 

Ashley Pittman, Bath, UK.

Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk




Re: [OMPI devel] System V Shared Memory for Open MPI: Request for Community Input and Testing

2010-05-02 Thread Christopher Samuel
On 01/05/10 23:03, Samuel K. Gutierrez wrote:

> I call shmctl IPC_RMID immediately after one process has
> attached to the segment because, at least on Linux, this
> only marks the segment for destruction.

That's correct, looking at the kernel code (at least in the
current git master) the function that handles this - do_shm_rmid()
in ipc/shm.c - only destroys the segment if nobody is attached
to it, otherwise it marks the segment as IPC_PRIVATE to stop
others finding it and with SHM_DEST so that it is automatically
destroyed on the last detach.

cheers,
Chris
-- 
  Christopher Samuel - Senior Systems Administrator
  VLSCI - Victorian Life Sciences Computational Initiative
  Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
  http://www.vlsci.unimelb.edu.au/


Re: [OMPI devel] System V Shared Memory for Open MPI: Request for Community Input and Testing

2010-05-02 Thread Christopher Samuel
On 02/05/10 06:49, Ashley Pittman wrote:

> I think you should look into this a little deeper, it
> certainly used to be the case on Linux that setting
> IPC_RMID would also prevent any further processes from
> attaching to the segment.

That certainly appears to be the case in the current master
of the kernel, IPC_PRIVATE is set on the segment with the
comment:

 /* Do not find it any more */

That flag means that ipcget() - used by sys_shmget() -
take a different code path and now call ipcget_new()
rather than ipcget_public().

cheers,
Chris
-- 
  Christopher Samuel - Senior Systems Administrator
  VLSCI - Victorian Life Sciences Computational Initiative
  Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
  http://www.vlsci.unimelb.edu.au/


Re: [OMPI devel] System V Shared Memory for Open MPI: Request for Community Input and Testing

2010-05-02 Thread N.M. Maclaren

On May 2 2010, Ashley Pittman wrote:

On 2 May 2010, at 04:03, Samuel K. Gutierrez wrote:

As to performance there should be no difference in use between sys-V 
shared memory and file-backed shared memory, the instructions issued and 
the MMU flags for the page should both be the same so the performance 
should be identical.


Not necessarily, and possibly not so even for far-future Linuces.
On at least one system I used, the poxious kernel wrote the complete
file to disk before returning - all right, it did that for System V
shared memory, too, just to a 'hidden' file!  But, if I recall, on
another it did that only for file-backed shared memory - however, it's
a decade ago now and I may be misremembering.

Of course, that's a serious issue mainly for large segments.  I was
using multi-GB ones.  I don't know how big the ones you need are.

The one area you do need to keep an eye on for performance is on numa 
machines where it's important which process on a node touches each page 
first, you can end up using different areas (pages, not regions) for 
communicating in different directions between the same pair of processes. 
I don't believe this is any different to mmap backed shared memory 
though.


On some systems it may be, but in bizarre, inconsistent, undocumented
and unpredictable ways :-(  Also, there are usually several system (and
sometimes user) configuration options that change the behaviour, so you
have to allow for that.  My experience of trying to use those is that
different uses have incompatible requirements, and most of the critical
configuration parameters apply to ALL uses!

In my view, the configuration variability is the number one nightmare
for trying to write portable code that uses any form of shared memory.
ARMCI seem to agree.


Because of this, sysv support may be limited to Linux systems - that is,
until we can get a better sense of which systems provide the shmctl
IPC_RMID behavior that I am relying on.


And, I suggest, whether they have an evil gotcha on one of the areas that
Ashley Pittman noted.


Regards,
Nick Maclaren.




[OMPI devel] Unchecked malloc()'s in OMPI 1.4.x

2010-05-02 Thread Christopher Samuel
Hi there,

I've been playing around with Coccinelle, the semantic
patching system (packaged in Ubuntu 10.04) and using it
to try and detect things like unchecked malloc().  It's
not perfect, for instance it flags up calls to assert()
on the result of the malloc as bad, even though they're
not, but the rest seem real.

I've done a test run on OMPI 1.4.2rc3r23065 and it's
flagged the following, I hope these are useful!


ompi/mca/btl/openib/btl_openib_component.c
line 997 - prepare_device_for_use()

 init_data = malloc(sizeof(mca_btl_openib_frag_init_data_t));


ompi/mca/btl/openib/btl_openib_component.c
line 2414 - btl_openib_component_init()

  init_data = malloc(sizeof(mca_btl_openib_frag_init_data_t));


ompi/mca/btl/openib/connect/btl_openib_connect_base.c
line 104 - prepare_device_for_use()

 available = calloc(1, sizeof(all));


ompi/mca/btl/portals/btl_portals.c
line 104 - mca_btl_portals_add_procs()

  portals_procs = malloc(nprocs * sizeof(ptl_process_id_t));


ompi/mpi/f77/comm_spawn_multiple_f.c
line 109 - mpi_comm_spawn_multiple_f()

 c_info = malloc (array_size * sizeof(MPI_Info));


opal/class/opal_hash_table.c
line 431 - opal_hash_table_set_value_ptr()

 node->hn_key = malloc(key_size);


orte/mca/ras/alps/ras_alps_module.c
line 243 - orte_ras_alps_read_appinfo_file()

 cpBuf=malloc(szLen+1);  /* Allocate buffer 
*/


All the best,
Chris
-- 
  Christopher Samuel - Senior Systems Administrator
  VLSCI - Victorian Life Sciences Computational Initiative
  Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
  http://www.vlsci.unimelb.edu.au/