[OMPI users] freeing attributes on communicators

2009-03-12 Thread Robert Latham
Hello all.

I'm using openmpi-1.3 in this example, linux, gcc-4.3.2, configured
with nothing special.

If I run the following simple C code under valgrind, single process, I
get some errors about reading and writing already-freed memory:

---
#include 
#include 

int delete_fn(MPI_Comm comm, int keyval, void *attr, void *extra) {
MPI_Keyval_free();
return 0;
}

int main (int argc, char **argv)
{
MPI_Comm duped;
int keyval;
MPI_Init(, );
MPI_Comm_dup(MPI_COMM_SELF, );

MPI_Keyval_create(MPI_NULL_COPY_FN, delete_fn,  , NULL);

MPI_Attr_put(MPI_COMM_SELF, keyval, NULL);
MPI_Attr_put(duped, keyval, NULL);

MPI_Comm_free();
MPI_Finalize();
return 0;
}
---

My main question here: Am I doing something wrong, or have I managed
to confuse openmpi's reference counts somehow?

==rob

-- 
Rob Latham
Mathematics and Computer Science DivisionA215 0178 EA2D B059 8CDF
Argonne National Lab, IL USA B29D F333 664A 4280 315B


Re: [OMPI users] MPI_File_write_ordered does not truncate files

2009-02-18 Thread Robert Latham
On Wed, Feb 18, 2009 at 02:24:03PM -0700, Ralph Castain wrote:
> Hi Rob
>
> Guess I'll display my own ignorance here:
>
>>>  MPI_File_open( MPI_COMM_WORLD, "foo.txt",
>>>MPI_MODE_CREATE | MPI_MODE_WRONLY,
>>>MPI_INFO_NULL,  );
>
>
> Since the file was opened with MPI_MODE_CREATE, shouldn't it have been  
> truncated so the prior contents were removed? I think that's the root of 
> the confusion here. It appears that MPI_MODE_CREATE doesn't cause the 
> opened file to be truncated, but instead just leaves it "as-is".
>
> Is that correct?

"The modes MPI_MODE_RDONLY, MPI_MODE_RDWR, MPI_MODE_WRONLY,
MPI_MODE_CREATE, and MPI_MODE_EXCL have identical semantics to their
POSIX counterparts"

MPI_MODE_CREATE behaves like O_CREATE 

There is no MPI-IO flag corresponding to O_TRUNK.  Guess you'd have to
MPI_FILE_SET_SIZE after MPI_FILE_OPEN

==rob

-- 
Rob Latham
Mathematics and Computer Science DivisionA215 0178 EA2D B059 8CDF
Argonne National Lab, IL USA B29D F333 664A 4280 315B


Re: [OMPI users] MPI_Type_create_darray causes MPI_File_set_view to crash when ndims=2, array_of_gsizes[0]>array_of_gsizes[1]

2008-11-12 Thread Robert Latham
On Fri, Oct 31, 2008 at 11:19:39AM -0400, Antonio Molins wrote:
> Hi again,
>
> The problem in a nutshell: it looks like, when I use  
> MPI_Type_create_darray with an argument array_of_gsizes where  
> array_of_gsizes[0]>array_of_gsizes[1], the datatype returned goes  
> through MPI_Type_commit() just fine, but then it causes  
> MPI_File_set_view to crash!! Any idea as to why this is happening?

Hi.  It sounds like from your description (and confirmed in your
backtrace) that this is a ROMIO bug.  

Do you happen to have a small test case for this crash?

Thanks
==rob

-- 
Rob Latham
Mathematics and Computer Science DivisionA215 0178 EA2D B059 8CDF
Argonne National Lab, IL USA B29D F333 664A 4280 315B


Re: [OMPI users] ADIOI_GEN_DELETE

2008-11-12 Thread Robert Latham
On Thu, Oct 23, 2008 at 12:41:45AM -0200, Davi Vercillo C. Garcia (ダヴィ) wrote:
> Hi,
> 
> I'm trying to run a code using OpenMPI and I'm getting this error:
> 
> ADIOI_GEN_DELETE (line 22): **io No such file or directory
> 
> I don't know why this occurs, I only know this happens when I use more
> than one process.

Hey, sorry, I don't check in here very often, but I'm the "ROMIO guy"
around these parts. This is a harmless warning message. 

You see this with more than one process because one process "won" and
deleted the file, and the other N-1 processes then try to delete a
file that doesn't exist.  

If you ignore errors from MPI_File_delete, then you won't see this
error :>

MPI_FILE_DELETE is not a collective operation, so you can also just call
this from one process.

==rob

-- 
Rob Latham
Mathematics and Computer Science DivisionA215 0178 EA2D B059 8CDF
Argonne National Lab, IL USA B29D F333 664A 4280 315B


Re: [OMPI users] bug in MPI_File_get_position_shared ?

2008-09-15 Thread Robert Latham
On Sat, Aug 16, 2008 at 08:05:14AM -0400, Jeff Squyres wrote:
> On Aug 13, 2008, at 7:06 PM, Yvan Fournier wrote:
>
>> I seem to have encountered a bug in MPI-IO, in which
>> MPI_File_get_position_shared hangs when called by multiple processes  
>> in
>> a communicator. It can be illustrated by the following simple test  
>> case,
>> in which a file is simply created with C IO, and opened with MPI-IO.
>> (defining or undefining MY_MPI_IO_BUG on line 5 enables/disables the
>> bug). From the MPI2 documentation, It seems that all processes should 
>> be
>> able to call MPI_File_get_position_shared, but if more than one  
>> process
>> uses it, it fails. Setting the shared pointer helps, but this should  
>> not
>> be necessary, and the code still hangs (in more complete code, after
>> writing data).
>>
>> I encounter the same problem with Open MPI 1.2.6 and MPICH2 1.0.7, so
>> I may have misread the documentation, but I suspect a ROMIO bug.
>
> Bummer.  :-(
>
> It would be best to report this directly to the ROMIO maintainers via 
> romio-ma...@mcs.anl.gov.  They lurk on this list, but they may not be 
> paying attention to every mail.

Hi, that would be me, and yup, as you can see I don't check in too
often.  

Just to wrap this up, I'm glad you found workarounds.  Shared file
pointers have a certain seductive quality about them, but they are a
pain in the neck to implement in the library.  

You will almost assuredly scale to larger numbers of processors and
achieve higher I/O bandwidth if you do just a little bit of work.
Keep track of file offsets on your own and instead of doing
independent I/O use MPI_File_read_at_all or MPI_File_write_at_all.

==rob

-- 
Rob Latham
Mathematics and Computer Science DivisionA215 0178 EA2D B059 8CDF
Argonne National Lab, IL USA B29D F333 664A 4280 315B


Re: [OMPI users] Parallel I/O with MPI-1

2008-07-24 Thread Robert Latham
On Wed, Jul 23, 2008 at 09:47:56AM -0400, Robert Kubrick wrote:
> HDF5 supports parallel I/O through MPI-I/O. I've never used it, but I  
> think the API is easier than direct MPI-I/O, maybe even easier than raw 
> read/writes given its support for hierarchal objects and metadata.

In addition to the API provided by parallel HDF5 and parallel-NetCDF,
these high level libraries offer a self-describing portable file
format.  Pretty nice when collaborating with others. Plus there are a
host of viewers for these file formats, so that's another thing you
don't have to worry about.

> HDF5 supports multiple storage models and it supports MPI-IO.
> HDF5 has an open interface to access raw storage. This enables HDF5  
> files to be written to a variety of media, including sequential files, 
> families of files, memory, Unix sockets (i.e., a network).
> New "Virtual File" drivers can be added to support new storage access  
> mechanisms.
> HDF5 also supports MPI-IO with Parallel HDF5. When building HDF5,  
> parallel support is included by configuring with the --enable-parallel 
> option. A tutorial for Parallel HDF5 is included with the HDF5 Tutorial 
> at:
>   /HDF5/Tutor/

It's a very good tutorial. Do read the parallel I/O chapter closely,
especially the parts about enabling collective I/O via property lists
and transfer templates.  For many HDF5 workloads today, collective I/O
is the key to getting good performance (this was not always the case
back in the bad old days of MPICH1 and LAM, but has been since at
least the HDF5-1.6 series).

==rob

-- 
Rob Latham
Mathematics and Computer Science DivisionA215 0178 EA2D B059 8CDF
Argonne National Lab, IL USA B29D F333 664A 4280 315B


Re: [OMPI users] Parallel I/O with MPI-1

2008-07-24 Thread Robert Latham
On Wed, Jul 23, 2008 at 02:24:03PM +0200, Gabriele Fatigati wrote:
> >You could always effect your own parallel IO (e.g., use MPI sends and
> receives to coordinate parallel reads and writes), but >why?  It's already
> done in the MPI-IO implementation.
> 
> Just a moment: you're saying that i can do fwrite without any lock? OpenMPI
> does this?

You use MPI to describe your I/O regions.  In fact, these I/O regions
can even overlap (something that you can't do efficiently with
lock-based approaches).  Even better, if you do your I/O
"collectively" the MPI library will optimize the heck out of your
accesses.

When I was learning all this way back when, it took me a long time to
get all the details straight (memory types, file views, tiling,
independent vs. collective), but a few readings of the I/O chapter of
"Using MPI-2" set me straight. 

==rob

-- 
Rob Latham
Mathematics and Computer Science DivisionA215 0178 EA2D B059 8CDF
Argonne National Lab, IL USA B29D F333 664A 4280 315B


Re: [OMPI users] Problem with NFS + PVFS2 + OpenMPI

2008-05-29 Thread Robert Latham
On Thu, May 29, 2008 at 04:48:49PM -0300, Davi Vercillo C. Garcia wrote:
> > Oh, I see you want to use ordered i/o in your application.  PVFS
> > doesn't support that mode.  However, since you know how much data each
> > process wants to write, a combination of MPI_Scan (to compute each
> > processes offset) and MPI_File_write_at_all (to carry out the
> > collective i/o) will give you the same result with likely better
> > performance (and has the nice side effect of working with pvfs).
> 
> I don't understand very well this... what do I need to change in my code ?

MPI_File_write_ordered has an interesting property (which you probably
know since you use it, but i'll spell it out anyway):  writes end up
in the file in rank-order, but are not necessarily carried out in
rank-order.   

Once each process knows the offsets and lengths of the writes the
other process will do, that process can writes its data.  Observe that
rank 0 can write immediately.  Rank 1 only needs to know how much data
rank 0 will write.  and so on.

Rank N can compute its offset by knowing how much data the proceeding
N-1 processes want to write.  The most efficent way to collect this is
to use MPI_Scan and collect a sum of data:

http://www.mpi-forum.org/docs/mpi-11-html/node84.html#Node84

Once you've computed these offsets, MPI_File_write_at_all has enough
information to cary out a collective write of the data.

==rob

-- 
Rob Latham
Mathematics and Computer Science DivisionA215 0178 EA2D B059 8CDF
Argonne National Lab, IL USA B29D F333 664A 4280 315B


Re: [OMPI users] Problem with NFS + PVFS2 + OpenMPI

2008-05-29 Thread Robert Latham
On Thu, May 29, 2008 at 04:24:18PM -0300, Davi Vercillo C. Garcia wrote:
> Hi,
> 
> I'm trying to run my program in my environment and some problems are
> happening. My environment is based on PVFS2 over NFS (PVFS is mounted
> over NFS partition), OpenMPI and Ubuntu. My program uses MPI-IO and
> BZ2 development libraries. When I tried to run, a message appears:
> 
> File locking failed in ADIOI_Set_lock. If the file system is NFS, you
> need to use NFS version 3, ensure that the lockd daemon is running on
> all the machines, and mount the directory with the 'noac' option (no
> attribute caching).
> [campogrande05.dcc.ufrj.br:05005] MPI_ABORT invoked on rank 0 in
> communicator MPI_COMM_WORLD with errorcode 1
> mpiexec noticed that job rank 1 with PID 5008 on node campogrande04
> exited on signal 15 (Terminated).

Hi.

NFS has some pretty sloppy consistency semantics.  If you want
parallel I/O to NFS you have to turn off some caches (the 'noac'
option in your error message) and work pretty hard to flush
client-side caches (which ROMIO does for you using fcntl locks).  If
you do this, note that your performance will be really bad, but you'll
get correct results.

Your nfs-exported PVFS volumes will give you pretty decent serial i/o
performance since NFS caching only helps in that case.

I'd suggest, though, that you try using straight PVFS for your MPI-IO
application, as long as the parallell clients have access to all of
the pvfs servers (if tools like pvfs2-ping and pvfs2-ls work, then you
do).  You'll get better performance for a variety of reasons and can
continue to keep your NFS-exported PVFS volumes up at the same time. 

Oh, I see you want to use ordered i/o in your application.  PVFS
doesn't support that mode.  However, since you know how much data each
process wants to write, a combination of MPI_Scan (to compute each
processes offset) and MPI_File_write_at_all (to carry out the
collective i/o) will give you the same result with likely better
performance (and has the nice side effect of working with pvfs).

==rob

-- 
Rob Latham
Mathematics and Computer Science DivisionA215 0178 EA2D B059 8CDF
Argonne National Lab, IL USA B29D F333 664A 4280 315B


Re: [OMPI users] MPI_Request and attributes

2007-11-05 Thread Robert Latham
On Fri, Nov 02, 2007 at 12:18:54PM +0100, Oleg Morajko wrote:
> Is there any standard way of attaching/retrieving attributes to MPI_Request
> object?
> 
> Eg. Typically there are dynamic user data created when starting the
> asynchronous operation and freed when it completes. It would be convenient
> to associate them with the request object itself to simplify the code.

You might find generalized requests offer what you want if you don't
mind spawning threads.  You don't get to hook an attribute onto an
MPI_Request object, but you do get a void * ponter which the
implementation then associates with your user-defined request.  This
void * could be a datatype containing the orginali MPI_Request and
your user data.

==rob

-- 
Rob Latham
Mathematics and Computer Science DivisionA215 0178 EA2D B059 8CDF
Argonne National Lab, IL USA B29D F333 664A 4280 315B


Re: [OMPI users] mpiio romio etc

2007-09-14 Thread Robert Latham
On Fri, Sep 14, 2007 at 02:31:51PM -0400, Jeff Squyres wrote:
> Ok.  Maybe we'll just make a hard-coded string somewhere "ROMIO from  
> MPICH2 vABC, on AA/BB/" or somesuch.  That'll at least give some  
> indication of what version you've got.

That sort-of reminds me:  ROMIO (well, all of MPICH2) is going to move
to SVN "one of these days".  Once we've done that, you'll be able to
sync up with both MPICH2 releases and our development branch.  I think
it wouldn't be a problem for us to tag ROMIO whenever you sync up with
it. 

==rob


-- 
Rob Latham
Mathematics and Computer Science DivisionA215 0178 EA2D B059 8CDF
Argonne National Lab, IL USA B29D F333 664A 4280 315B


Re: [OMPI users] mpiio romio etc

2007-09-14 Thread Robert Latham
On Fri, Sep 14, 2007 at 02:16:46PM -0400, Jeff Squyres wrote:
> Rob -- is there a public constant/symbol somewhere where we can  
> access some form of ROMIO's version number?  If so, we can also make  
> that query-able via ompi_info.

There really isn't.  We used to have a VERSION variable in
configure.in, but more often than not it would be out of date.

When you sync with ROMIO, you could update a datestamp maybe?  Just
throwing out ideas. 

==rob

-- 
Rob Latham
Mathematics and Computer Science DivisionA215 0178 EA2D B059 8CDF
Argonne National Lab, IL USA B29D F333 664A 4280 315B


Re: [OMPI users] mpiio romio etc

2007-09-14 Thread Robert Latham
On Fri, Sep 07, 2007 at 10:18:55AM -0400, Brock Palen wrote:
> Is there a way to find out which ADIO options romio was built with?

not easily. You can use 'nm' and look at the symbols :>

> Also does OpenMPI's romio come with pvfs2 support included? What  
> about Luster or GPFS.

OpenMPI has shipped with PVFS v2 support for a long time.  Not sure
how you enable it, though.  --with-filesystems=ufs+nfs+pvfs2 might
work for OpenMPI as it does for MPICH2.

All versions of ROMIO support Lustre and GPFS the same way: with the
"generic unix filesystem" (UFS) driver.  Weikuan Yu at ORNL has been
working on a native "AD_LUSTRE" driver and some improvements to ROMIO
collective I/O.   Likely to be in the next ROMIO release.

For GPFS, the only optimized MPI-IO implementation is IBM's MPI for
AIX.  You're likely to see decent performance with the UFS driver,
though.

==rob

-- 
Rob Latham
Mathematics and Computer Science DivisionA215 0178 EA2D B059 8CDF
Argonne National Lab, IL USA B29D F333 664A 4280 315B


Re: [OMPI users] DataTypes with "holes" for writing files

2007-07-13 Thread Robert Latham
On Tue, Jul 10, 2007 at 04:36:01PM +, jody wrote:
> Error: Unsupported datatype passed to ADIOI_Count_contiguous_blocks
> [aim-nano_02:9] MPI_ABORT invoked on rank 0 in communicator
> MPI_COMM_WORLD with errorcode 1

Hi Jody:

OpenMPI uses an old version of ROMIO.  You get this error because the
ADIOI_Count_contiguous_blocks routine in this version of ROMIO does
not understand all MPI datatypes.   

You can verify that this is the case by building your test against
MPICH2, which should succeed.  

==rob

-- 
Rob Latham
Mathematics and Computer Science DivisionA215 0178 EA2D B059 8CDF
Argonne National Lab, IL USA B29D F333 664A 4280 315B


Re: [OMPI users] nfs romio

2007-07-05 Thread Robert Latham
On Mon, Jul 02, 2007 at 12:49:27PM -0500, Adams, Samuel D Contr AFRL/HEDR wrote:
> Anyway, so if anyone can tell me how I should configure my NFS server,
> or OpenMPI to make ROMIO work properly, I would appreciate it.   

Well, as Jeff said, the only safe way to run NFS servers for ROMIO is
by disabling all caching, which in turn will dramatically slow down
performance.  

Since NFS is performing so slowly for you, I'd suggest taking this
opportunity to deploy a parallel file system.  PVFS, Lustre, or GPFS
might make fine choices. 

==rob

-- 
Rob Latham
Mathematics and Computer Science DivisionA215 0178 EA2D B059 8CDF
Argonne National Lab, IL USA B29D F333 664A 4280 315B


Re: [OMPI users] MPI_Type_create_subarray fails!

2007-02-02 Thread Robert Latham
On Tue, Jan 30, 2007 at 04:55:09PM -0500, Ivan de Jesus Deras Tabora wrote:
> Then I find all the references to the MPI_Type_create_subarray and
> create a little program just to test that part of the code, the code I
> created is:
...
> After running this little program using mpirun, it raises the same error.

This small program runs fine under MPICH2.  Either you have found a
bug in OpenMPI (passing it a datatype it should be able to handle), or
a bug in MPICH2 (passing it a datatype it handled, but should have
complained about).  

==rob

-- 
Rob Latham
Mathematics and Computer Science DivisionA215 0178 EA2D B059 8CDF
Argonne National Lab, IL USA B29D F333 664A 4280 315B


Re: [OMPI users] external32 i/o not implemented?

2007-01-09 Thread Robert Latham
On Tue, Jan 09, 2007 at 02:53:24PM -0700, Tom Lund wrote:
> Rob,
>Thank you for your informative reply.  I had no luck finding the 
> external32 data representation in any of several mpi implementations and 
> thus I do need to devise an alternative strategy.  Do you know of a good 
> reference explaining how to combine HDF5 with mpi?

Sure.  Start here: http://hdf.ncsa.uiuc.edu/HDF5/PHDF5/ 

You might also find the Parallel-NetCDF project (disclaimer: I work on
that project) interesting:
http://www.mcs.anl.gov/parallel-netcdf/

==rob

-- 
Rob Latham
Mathematics and Computer Science DivisionA215 0178 EA2D B059 8CDF
Argonne National Lab, IL USA B29D F333 664A 4280 315B


Re: [OMPI users] external32 i/o not implemented?

2007-01-09 Thread Robert Latham
On Mon, Jan 08, 2007 at 02:32:14PM -0700, Tom Lund wrote:
> Rainer,
>Thank you for taking time to reply to my querry.  Do I understand 
> correctly that external32 data representation for i/o is not 
> implemented?  I am puzzled since the MPI-2 standard clearly indicates 
> the existence of external32 and has lots of words regarding how nice 
> this feature is for file interoperability.  So do both Open MPI and 
> MPIch2 not adhere to the standard in this regard?  If this is really the 
> case, how difficult is it to define a custom data representation that is 
> 32-bit big endian on all platforms?  Do you know of any documentation 
> that explains how to do this?
>Thanks again.

Hi Tom

You do understand correctly.  I do not know of an MPI-IO
implementation that supports external32.  

When you say "custom data representation" do you mean an MPI-IO
user-defined data representation?  

An alternate approach would be to use a higher level library like
parallel-netcdf or HDF5 (configured for parallel i/o).  Those
libraries already define a file format and implement all the necessary
data conversion routines, and they have a wealth of ancilary tools and
programs to work with their respective file formats.  Additionally,
those higher-level libraries will offer you more features than MPI-IO
such as the ability to define atributes on variables and datafiles.
Even better, there is the potential that these libraries might offer
some clever optimizations for your workload, saving you the effort.
Further, you can use those higher-level libraries on top of any MPI-IO
implementation, not just OpenMPI or MPICH2.  

This is a little bit of a diversion from your original question, but
to sum it up, I'd say one potential answer to the lack of external32
is to use a higher level library and sidestep the issue of MPI-IO data
representations altogether. 

==rob

-- 
Rob Latham
Mathematics and Computer Science DivisionA215 0178 EA2D B059 8CDF
Argonne National Lab, IL USA B29D F333 664A 4280 315B


Re: [OMPI users] pvfs2 and romio

2006-08-31 Thread Robert Latham
On Mon, Aug 14, 2006 at 10:57:34AM -0400, Brock Palen wrote:
> We will be evaluating pvfs2 (www.pvfs.org) in the future.  Is their  
> any special considerations to take to get romio support with openmpi  
> with pvfs2 ?

Hi

Since I wrote the ad_pvfs2 driver for ROMIO, and spend a lot of time
on PVFS2 in general, I've got a special interest in this thread :>  I
hope your evaluation went well.

I don't know how well the PVFS2 support in OpenMPI has tracked
"upstream".  The last official ROMIO release was 1.2.5.1 (and that was
... gosh, 3 years ago at least. sorry!).  In the meantime, ROMIO's
PVFS2 driver has seen a lot of changes.  the two codes (ROMIO in
OpenMPI vs ROMIO in MPICH2) are laid out differently enough that it's
hard to compare directly (too bad 'diff' isn't smarter about renamed
files), but I think OpenMPI has got at least the biggest bug fixes. 

Do follow up on this thread to let us (or at least me) know how well
OpenMPI works with PVFS2.  If you run into problems, I may be able to
provide a patch. 

==rob

-- 
Rob Latham
Mathematics and Computer Science DivisionA215 0178 EA2D B059 8CDF
Argonne National Labs, IL USAB29D F333 664A 4280 315B


Re: [OMPI users] MPI_Info for MPI_Open_port

2006-07-18 Thread Robert Latham
On Tue, Jul 11, 2006 at 12:14:51PM -0400, Abhishek Agarwal wrote:
> Hello,
> 
> Is there a way of providing a specific port number in MPI_Info when using a 
> MPI_Open_port command so that clients know which port number to connect.

The other replies have covered this pretty well but if you are
dead-set on using a tcp port (and not an MPI port) would MPI_Comm_join
work for you?

==rob

-- 
Rob Latham
Mathematics and Computer Science DivisionA215 0178 EA2D B059 8CDF
Argonne National Labs, IL USAB29D F333 664A 4280 315B


[OMPI users] MPI_Comm_connect and singleton init

2006-03-14 Thread Robert Latham
Hello
In playing around with process management routines, I found another
issue.  This one might very well be operator error, or something
implementation specific.

I've got two processes (a and b), linked with openmpi, but started
independently (no mpiexec).  

- A starts up and calls MPI_Init
- A calls MPI_Open_port, prints out the port name to stdout, then
  calls MPI_Comm_accept and blocks.
- B takes as a command line argument the port
  name printed out by A.  It calls MPI_Init and then and passes that
  port name to MPI_Comm_connect
- B gets the following error:

[leela.mcs.anl.gov:04177] [0,0,0] ORTE_ERROR_LOG: Pack data mismatch
in file ../../../orte/dps/dps_unpack.c at line 121
[leela.mcs.anl.gov:04177] [0,0,0] ORTE_ERROR_LOG: Pack data mismatch
in file ../../../orte/dps/dps_unpack.c at line 95
[leela.mcs.anl.gov:04177] *** An error occurred in MPI_Comm_connect
[leela.mcs.anl.gov:04177] *** on communicator MPI_COMM_WORLD
[leela.mcs.anl.gov:04177] *** MPI_ERR_UNKNOWN: unknown error
[leela.mcs.anl.gov:04177] *** MPI_ERRORS_ARE_FATAL (goodbye)
[leela.mcs.anl.gov:04177] [0,0,0] ORTE_ERROR_LOG: Not found in file
../../../../../orte/mca/pls/base/pls_base_proxy.c at line 183

- A is still waiting for someone to connect to it.

Did I pass MPI port strings between programs the correct way, or is
MPI_Publish_name/MPI_Lookup_name the prefered way to pass around this
information?

Thanks
==rob

-- 
Rob Latham
Mathematics and Computer Science DivisionA215 0178 EA2D B059 8CDF
Argonne National Labs, IL USAB29D F333 664A 4280 315B


[OMPI users] comm_join and singleton init

2006-03-14 Thread Robert Latham
Hi

I've got a bit of an odd bug here.  I've been playing around with MPI
process management routines and I notied the following behavior with
openmpi-1.0.1:

Two processes (a and b), linked with ompi, but started independently
(no mpiexec, just started the programs directly).

- a and b: call MPI_Init
- a: open a unix network socket on 'fd'
- b: connect to a's socket
- a and b: call MPI_Comm_join over 'fd'
- a and b: call MPI_Intercomm_merge, get intracommunicator.

These steps all work fine. 

Now the odd part: a and b call MPI_Comm_rank and MPI_Comm_size over
the intracommunicator.  Both (correctly) think Comm_size is two, but
both also think (incorrectly) that they are rank 1.  

==rob

-- 
Rob Latham
Mathematics and Computer Science DivisionA215 0178 EA2D B059 8CDF
Argonne National Labs, IL USAB29D F333 664A 4280 315B