[OMPI users] orted seg fault when using MPI_Comm_spawn on more than one host

2015-01-26 Thread Evan

Hi,

I am using OpenMPI 1.8.4 on a Ubuntu 14.04 machine and 5 Ubuntu 12.04 
machines.  I am using ssh to launch MPI jobs and I'm able to run simple 
programs like 'mpirun -np 8 --host localhost,pachy1 hostname' and get 
the expected output (pachy1 being an entry in my /etc/hosts file).


I started using MPI_Comm_spawn in my app with the intent of NOT calling 
mpirun to launch the program that calls MPI_Comm_spawn (my attempt at 
using the Singleton MPI_INIT pattern described in 10.5.2 of MPI 3.0 
standard).  The app needs to launch an MPI job of a given size from a 
given hostfile, where the job needs to report some info back to the app, 
so it seemed MPI_Comm_spawn was my best bet.  The app is only rarely 
going to be used this way, thus mpirun not being used to launch the app 
that is the parent in the MPI_Comm_spawn operation.  This pattern works 
fine if the only entries in the hostfile are 'localhost'.  However if I 
add a host that isn't local I get a segmentation fault from the orted 
process.


In any case, I distilled my example down as small as I could.  I've 
attached the C code of the master and the hostfile I'm using. Here's the 
output:


evan@lasarti:~/devel/toy_progs/mpi_spawn$ ./master 
~/mpi/test_distributed.hostfile
[lasarti:32020] [[21014,1],0] FORKING HNP: orted --hnp --set-sid 
--report-uri 14 --singleton-died-pipe 15 -mca state_novm_select 1 -mca 
ess_base_jobid 1377173504

[lasarti:32022] *** Process received signal ***
[lasarti:32022] Signal: Segmentation fault (11)
[lasarti:32022] Signal code: Address not mapped (1)
[lasarti:32022] Failing at address: (nil)
[lasarti:32022] [ 0] 
/lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)[0x7f07af039340]
[lasarti:32022] [ 1] 
/opt/openmpi-1.8.4/lib/libopen-pal.so.6(opal_hwloc191_hwloc_get_obj_by_depth+0x32)[0x7f07aea227c2]
[lasarti:32022] [ 2] 
/opt/openmpi-1.8.4/lib/libopen-pal.so.6(opal_hwloc_base_get_nbobjs_by_type+0x90)[0x7f07ae9f5430]
[lasarti:32022] [ 3] 
/opt/openmpi-1.8.4/lib/openmpi/mca_rmaps_round_robin.so(orte_rmaps_rr_byobj+0x134)[0x7f07ab2fb154]
[lasarti:32022] [ 4] 
/opt/openmpi-1.8.4/lib/openmpi/mca_rmaps_round_robin.so(+0x12c6)[0x7f07ab2fa2c6]
[lasarti:32022] [ 5] 
/opt/openmpi-1.8.4/lib/libopen-rte.so.7(orte_rmaps_base_map_job+0x21a)[0x7f07af299f7a]
[lasarti:32022] [ 6] 
/opt/openmpi-1.8.4/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x6e4)[0x7f07ae9e7034]
[lasarti:32022] [ 7] 
/opt/openmpi-1.8.4/lib/libopen-rte.so.7(orte_daemon+0xdff)[0x7f07af27a86f]

[lasarti:32022] [ 8] orted(main+0x47)[0x400877]
[lasarti:32022] [ 9] 
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f07aec84ec5]

[lasarti:32022] [10] orted[0x4008cb]
[lasarti:32022] *** End of error message ***

If I launch 'master.c' using mpirun, I don't get a segmentation fault, 
but it doesn't seem to be launching the process on anything more than 
localhost, no matter what hostfile I give it.


For what it's worth, I fully expected to debug some path issues 
regarding the binary I wanted to launch with MPI_Comm_spawn when I used 
this distributed, but this error at first glance doesn't appear to have 
anything to do with that.  I'm sure this is something silly I'm doing 
wrong, but I don't really know how to debug this further given this error.


Evan

P.S. Only including zipped config.log since the "ompi_info -v ompi full 
--parsable" command I got from http://www.open-mpi.org/community/help/ 
doesn't seem to work anymore.



#include "mpi.h"
#include 

int main(int argc, char **argv) {
  int rc;
  MPI_Init(&argc, &argv);

  MPI_Info the_info;
  rc = MPI_Info_create(&the_info);
  assert(rc == MPI_SUCCESS);

  // I tried both (with appropriately different argv[1])...same result.
#if 1
  rc = MPI_Info_set(the_info, "hostfile", argv[1]);
  assert(rc == MPI_SUCCESS);
#else
  rc = MPI_Info_set(the_info, "host", argv[1]);
  assert(rc == MPI_SUCCESS);
#endif

  MPI_Comm the_group;
  rc = MPI_Comm_spawn("hostname",
 MPI_ARGV_NULL,
 8,
 the_info,
 0,
 MPI_COMM_WORLD,
 &the_group,
 MPI_ERRCODES_IGNORE);
  assert(rc == MPI_SUCCESS);

  MPI_Finalize();
  return 0;
}
localhost
pachy1


config.log.tar.bz2
Description: application/bzip


Re: [OMPI users] orted seg fault when using MPI_Comm_spawn on more than one host

2015-02-03 Thread Evan Samanas
Hi Ralph,

Good to know you've reproduced it.  I was experiencing this using both the
hostfile and host key.  A simple comm_spawn was working for me as well, but
it was only launching locally, and I'm pretty sure each node only has 4
slots given past behavior (the mpirun -np 8 example I gave in my first
email launches on both hosts).  Is there a way to specify the hosts I want
to launch on without the hostfile or host key so I can test remote launch?

And to the "hostname" response...no wonder it was hanging!  I just
constructed that as a basic example.  In my real use I'm launching
something that calls MPI_Init.

Evan


Re: [OMPI users] orted seg fault when using MPI_Comm_spawn on more than one host

2015-02-03 Thread Evan Samanas
Setting these environment variables did indeed change the way mpirun maps
things, and I didn't have to specify a hostfile.  However, setting these
for my MPI_Comm_spawn code still resulted in the same segmentation fault.

Evan

On Tue, Feb 3, 2015 at 10:09 AM, Ralph Castain  wrote:

> If you add the following to your environment, you should run on multiple
> nodes:
>
> OMPI_MCA_rmaps_base_mapping_policy=node
> OMPI_MCA_orte_default_hostfile=
>
> The first tells OMPI to map-by node. The second passes in your default
> hostfile so you don't need to specify it as an Info key.
>
> HTH
> Ralph
>
>
> On Tue, Feb 3, 2015 at 9:23 AM, Evan Samanas 
> wrote:
>
>> Hi Ralph,
>>
>> Good to know you've reproduced it.  I was experiencing this using both
>> the hostfile and host key.  A simple comm_spawn was working for me as well,
>> but it was only launching locally, and I'm pretty sure each node only has 4
>> slots given past behavior (the mpirun -np 8 example I gave in my first
>> email launches on both hosts).  Is there a way to specify the hosts I want
>> to launch on without the hostfile or host key so I can test remote launch?
>>
>> And to the "hostname" response...no wonder it was hanging!  I just
>> constructed that as a basic example.  In my real use I'm launching
>> something that calls MPI_Init.
>>
>> Evan
>>
>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2015/02/26271.php
>>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/02/26272.php
>


Re: [OMPI users] orted seg fault when using MPI_Comm_spawn on more than one host

2015-02-03 Thread Evan Samanas
Yes, I did.  I replaced the info argument of MPI_Comm_spawn with
MPI_INFO_NULL.

On Tue, Feb 3, 2015 at 5:54 PM, Ralph Castain  wrote:

> When running your comm_spawn code, did you remove the Info key code? You
> wouldn't need to provide a hostfile or hosts any more, which is why it
> should resolve that problem.
>
> I agree that providing either hostfile or host as an Info key will cause
> the program to segfault - I'm woking on that issue.
>
>
> On Tue, Feb 3, 2015 at 3:46 PM, Evan Samanas 
> wrote:
>
>> Setting these environment variables did indeed change the way mpirun maps
>> things, and I didn't have to specify a hostfile.  However, setting these
>> for my MPI_Comm_spawn code still resulted in the same segmentation fault.
>>
>> Evan
>>
>> On Tue, Feb 3, 2015 at 10:09 AM, Ralph Castain  wrote:
>>
>>> If you add the following to your environment, you should run on multiple
>>> nodes:
>>>
>>> OMPI_MCA_rmaps_base_mapping_policy=node
>>> OMPI_MCA_orte_default_hostfile=
>>>
>>> The first tells OMPI to map-by node. The second passes in your default
>>> hostfile so you don't need to specify it as an Info key.
>>>
>>> HTH
>>> Ralph
>>>
>>>
>>> On Tue, Feb 3, 2015 at 9:23 AM, Evan Samanas 
>>> wrote:
>>>
>>>> Hi Ralph,
>>>>
>>>> Good to know you've reproduced it.  I was experiencing this using both
>>>> the hostfile and host key.  A simple comm_spawn was working for me as well,
>>>> but it was only launching locally, and I'm pretty sure each node only has 4
>>>> slots given past behavior (the mpirun -np 8 example I gave in my first
>>>> email launches on both hosts).  Is there a way to specify the hosts I want
>>>> to launch on without the hostfile or host key so I can test remote launch?
>>>>
>>>> And to the "hostname" response...no wonder it was hanging!  I just
>>>> constructed that as a basic example.  In my real use I'm launching
>>>> something that calls MPI_Init.
>>>>
>>>> Evan
>>>>
>>>>
>>>>
>>>> ___
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post:
>>>> http://www.open-mpi.org/community/lists/users/2015/02/26271.php
>>>>
>>>
>>>
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/users/2015/02/26272.php
>>>
>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2015/02/26281.php
>>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/02/26285.php
>


Re: [OMPI users] orted seg fault when using MPI_Comm_spawn on more than one host

2015-02-04 Thread Evan Samanas
Indeed, I simply commented out all the MPI_Info stuff, which you
essentially did by passing a dummy argument.  I'm still not able to get it
to succeed.

So here we go, my results defy logic.  I'm sure this could be my
fault...I've only been an occasional user of OpenMPI and MPI in general
over the years and I've never used MPI_Comm_spawn before this project. I
tested simple_spawn like so:
mpicc simple_spawn.c -o simple_spawn
./simple_spawn

When my default hostfile points to a file that just lists localhost, this
test completes successfully.  If it points to my hostfile with localhost
and 5 remote hosts, here's the output:
evan@lasarti:~/devel/toy_progs/mpi_spawn$ mpicc simple_spawn.c -o
simple_spawn
evan@lasarti:~/devel/toy_progs/mpi_spawn$ ./simple_spawn
[pid 5703] starting up!
0 completed MPI_Init
Parent [pid 5703] about to spawn!
[lasarti:05703] [[14661,1],0] FORKING HNP: orted --hnp --set-sid
--report-uri 14 --singleton-died-pipe 15 -mca state_novm_select 1 -mca
ess_base_jobid 960823296
[lasarti:05705] *** Process received signal ***
[lasarti:05705] Signal: Segmentation fault (11)
[lasarti:05705] Signal code: Address not mapped (1)
[lasarti:05705] Failing at address: (nil)
[lasarti:05705] [ 0]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)[0x7fc185dcf340]
[lasarti:05705] [ 1]
/opt/openmpi-v1.8.4-54-g07f735a/lib/libopen-rte.so.7(orte_rmaps_base_compute_bindings+0x650)[0x7fc186033bb0]
[lasarti:05705] [ 2]
/opt/openmpi-v1.8.4-54-g07f735a/lib/libopen-rte.so.7(orte_rmaps_base_map_job+0x939)[0x7fc18602fb99]
[lasarti:05705] [ 3]
/opt/openmpi-v1.8.4-54-g07f735a/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x6e4)[0x7fc18577dcc4]
[lasarti:05705] [ 4]
/opt/openmpi-v1.8.4-54-g07f735a/lib/libopen-rte.so.7(orte_daemon+0xdf8)[0x7fc186010438]
[lasarti:05705] [ 5] orted(main+0x47)[0x400887]
[lasarti:05705] [ 6]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7fc185a1aec5]
[lasarti:05705] [ 7] orted[0x4008db]
[lasarti:05705] *** End of error message ***

You can see from the message that this particular run IS from the latest
snapshot, though the failure happens on v.1.8.4 as well.  I didn't bother
installing the snapshot on the remote nodes though.  Should I do that?  It
looked to me like this error happened well before we got to a remote node,
so that's why I didn't.

Your thoughts?

Evan



On Tue, Feb 3, 2015 at 7:40 PM, Ralph Castain  wrote:

> I confess I am sorely puzzled. I replace the Info key with MPI_INFO_NULL,
> but still had to pass a bogus argument to master since you still have the
> Info_set code in there - otherwise, info_set segfaults due to a NULL
> argv[1]. Doing that (and replacing "hostname" with an MPI example code)
> makes everything work just fine.
>
> I've attached one of our example comm_spawn codes that we test against -
> it also works fine with the current head of the 1.8 code base. I confess
> that some changes have been made since 1.8.4 was released, and it is
> entirely possible that this was a problem in 1.8.4 and has since been fixed.
>
> So I'd suggest trying with the nightly 1.8 tarball and seeing if it works
> for you. You can download it from here:
>
> http://www.open-mpi.org/nightly/v1.8/
>
> HTH
> Ralph
>
>
> On Tue, Feb 3, 2015 at 6:20 PM, Evan Samanas 
> wrote:
>
>> Yes, I did.  I replaced the info argument of MPI_Comm_spawn with
>> MPI_INFO_NULL.
>>
>> On Tue, Feb 3, 2015 at 5:54 PM, Ralph Castain  wrote:
>>
>>> When running your comm_spawn code, did you remove the Info key code? You
>>> wouldn't need to provide a hostfile or hosts any more, which is why it
>>> should resolve that problem.
>>>
>>> I agree that providing either hostfile or host as an Info key will cause
>>> the program to segfault - I'm woking on that issue.
>>>
>>>
>>> On Tue, Feb 3, 2015 at 3:46 PM, Evan Samanas 
>>> wrote:
>>>
>>>> Setting these environment variables did indeed change the way mpirun
>>>> maps things, and I didn't have to specify a hostfile.  However, setting
>>>> these for my MPI_Comm_spawn code still resulted in the same segmentation
>>>> fault.
>>>>
>>>> Evan
>>>>
>>>> On Tue, Feb 3, 2015 at 10:09 AM, Ralph Castain 
>>>> wrote:
>>>>
>>>>> If you add the following to your environment, you should run on
>>>>> multiple nodes:
>>>>>
>>>>> OMPI_MCA_rmaps_base_mapping_policy=node
>>>>> OMPI_MCA_orte_default_hostfile=
>>>>>
>>>>> The first tells OMPI to map-by node. The second passes in your default
>>>>> hostfile so you don't need to specify it as an Info key.
>

Re: [OMPI users] orted seg fault when using MPI_Comm_spawn on more than one host

2015-02-06 Thread Evan Samanas
Hi Ralph,

Thanks for addressing this issue.

I tried downloading your fork from that pull request and the seg fault
appears to be gone.  However I didn't install it on my remote machine
before testing, and I got this error:

bash: /opt/ompi-release-cmr-singlespawn/bin/orted: No such file or
directory (along with the usual complaints about ORTE not being able to
start one of the daemons).

On both machines I have openmpi installed to a directory in /opt, and
/opt/openmpi is a symlink to whatever installation I want to use...then my
paths point to the symlink.  I went to the remote machine and simply
changed the name of the directory to match the other one and I just got a
version mismatch error...a much more expected error. I'm not familiar with
OMPI source, but does this have to do with the prefix issue you mentioned
in the pull request? Should it handle symlinks?  Apologies if I'm misguided.

Evan

On Thu, Feb 5, 2015 at 9:51 AM, Ralph Castain  wrote:

> Okay, I tracked this down - thanks for your patience! I have a fix pending
> review. You can track it here:
>
> https://github.com/open-mpi/ompi-release/pull/179
>
>
> On Feb 4, 2015, at 5:14 PM, Evan Samanas  wrote:
>
> Indeed, I simply commented out all the MPI_Info stuff, which you
> essentially did by passing a dummy argument.  I'm still not able to get it
> to succeed.
>
> So here we go, my results defy logic.  I'm sure this could be my
> fault...I've only been an occasional user of OpenMPI and MPI in general
> over the years and I've never used MPI_Comm_spawn before this project. I
> tested simple_spawn like so:
> mpicc simple_spawn.c -o simple_spawn
> ./simple_spawn
>
> When my default hostfile points to a file that just lists localhost, this
> test completes successfully.  If it points to my hostfile with localhost
> and 5 remote hosts, here's the output:
> evan@lasarti:~/devel/toy_progs/mpi_spawn$ mpicc simple_spawn.c -o
> simple_spawn
> evan@lasarti:~/devel/toy_progs/mpi_spawn$ ./simple_spawn
> [pid 5703] starting up!
> 0 completed MPI_Init
> Parent [pid 5703] about to spawn!
> [lasarti:05703] [[14661,1],0] FORKING HNP: orted --hnp --set-sid
> --report-uri 14 --singleton-died-pipe 15 -mca state_novm_select 1 -mca
> ess_base_jobid 960823296
> [lasarti:05705] *** Process received signal ***
> [lasarti:05705] Signal: Segmentation fault (11)
> [lasarti:05705] Signal code: Address not mapped (1)
> [lasarti:05705] Failing at address: (nil)
> [lasarti:05705] [ 0]
> /lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)[0x7fc185dcf340]
> [lasarti:05705] [ 1]
> /opt/openmpi-v1.8.4-54-g07f735a/lib/libopen-rte.so.7(orte_rmaps_base_compute_bindings+0x650)[0x7fc186033bb0]
> [lasarti:05705] [ 2]
> /opt/openmpi-v1.8.4-54-g07f735a/lib/libopen-rte.so.7(orte_rmaps_base_map_job+0x939)[0x7fc18602fb99]
> [lasarti:05705] [ 3]
> /opt/openmpi-v1.8.4-54-g07f735a/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x6e4)[0x7fc18577dcc4]
> [lasarti:05705] [ 4]
> /opt/openmpi-v1.8.4-54-g07f735a/lib/libopen-rte.so.7(orte_daemon+0xdf8)[0x7fc186010438]
> [lasarti:05705] [ 5] orted(main+0x47)[0x400887]
> [lasarti:05705] [ 6]
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7fc185a1aec5]
> [lasarti:05705] [ 7] orted[0x4008db]
> [lasarti:05705] *** End of error message ***
>
> You can see from the message that this particular run IS from the latest
> snapshot, though the failure happens on v.1.8.4 as well.  I didn't bother
> installing the snapshot on the remote nodes though.  Should I do that?  It
> looked to me like this error happened well before we got to a remote node,
> so that's why I didn't.
>
> Your thoughts?
>
> Evan
>
>
>
> On Tue, Feb 3, 2015 at 7:40 PM, Ralph Castain  wrote:
>
>> I confess I am sorely puzzled. I replace the Info key with MPI_INFO_NULL,
>> but still had to pass a bogus argument to master since you still have the
>> Info_set code in there - otherwise, info_set segfaults due to a NULL
>> argv[1]. Doing that (and replacing "hostname" with an MPI example code)
>> makes everything work just fine.
>>
>> I've attached one of our example comm_spawn codes that we test against -
>> it also works fine with the current head of the 1.8 code base. I confess
>> that some changes have been made since 1.8.4 was released, and it is
>> entirely possible that this was a problem in 1.8.4 and has since been fixed.
>>
>> So I'd suggest trying with the nightly 1.8 tarball and seeing if it works
>> for you. You can download it from here:
>>
>> http://www.open-mpi.org/nightly/v1.8/
>>
>> HTH
>> Ralph
>>
>>
>> On Tue, Feb 3, 2015 at 6:20 PM, Evan Samanas 
>> wrote:
>&

[OMPI users] Problems Using PVFS2 with OpenMPI

2010-01-12 Thread Evan Smyth

I am unable to use PVFS2 with OpenMPI in a simple test program. My
configuration is given below. I'm running on RHEL5 with GigE (probably not
important).

OpenMPI 1.4 (had same issue with 1.3.3) is configured with
./configure --prefix=/work/rd/evan/archives/openmpi/openmpi/1.4/enable_pvfs \
--enable-mpi-threads --with-io-romio-flags="--with-filesystems=pvfs2+ufs+nfs"

PVFS 2.8.1 is configured to install in the default location (/usr/local) with
./configure --with-mpi=/work/rd/evan/archives/openmpi/openmpi/1.4/enable_pvfs

I build and install these (in this order) and setup my PVFS2 space using
instructions at pvfs.org. I am able to use this space using the
/usr/local/bin/pvfs2-ls types of commands. I am simply running a 2-server
config (2 data servers and the same 2 hosts are metadata servers). As I say,
manually, this all seems fine (even when I'm not root). It may be relevant that
I am *not* using the kernel interface for PVFS2 as I am just trying to get a
better understanding of how this works.

It is perhaps relevant that I have not had to explicitly tell OpenMPI where I
installed PVFS. I have told PVFS where I installed OpenMPI, though. This does
seem slightly odd but there does not appear to be a way of telling OpenMPI this
information. Perhaps it is not needed.

In any event, I then build my test program against this OpenMPI and in that
program I have the following call sequence (i is 0 and where mntPoint is the
path to my pvfs2 mount point -- I also tried prefixing a "pvfs2:" in the front
of this as I read somewhere that that was optional).

 sprintf(aname, "%s/%d.fdm", mntPoint, i);
 for(int j = 0; j < numFloats; j++) buf[j] = (float)i;
 int retval = MPI_SUCCESS;
 if(MPI_SUCCESS == (retval = MPI_File_open(MPI_COMM_SELF, aname,
 MPI_MODE_RDWR|MPI_MODE_CREATE|MPI_MODE_UNIQUE_OPEN,
   MPI_INFO_NULL, &fh)))
 {
 MPI_File_write(fh, (void*)buf, numFloats, MPI_FLOAT,
MPI_STATUS_IGNORE);
 MPI_File_close(&fh);
 } else {
 int errBufferLen;
 char errBuffer[MPI_MAX_ERROR_STRING];
 MPI_Error_string(retval, errBuffer, &errBufferLen);
 fprintf(stdout,"%d: open error on %s with code %s\n", rank,
aname,  errBuffer);
 }

Which will only execute on one of my ranks (the way I'm running it). No matter
what I try, the MPI_File_open call fails with an MPI_ERR_ACCESS error code.
This suggests a permission problem but I am able to manually cp and rm from the
  pvfs2 space without problem so I am not at all clear on what the permission
problem is. My access flags look fine to me (the MPI_MODE_UNIQUE_OPEN flag
makes no difference in this case as I'm only opening a single file anyway). If
I write this file to shared NFS storage, all is "fine" (obviously, I do not
consider that a permanent solution, though).

Does anyone have any idea why this is not working? Alternately or in addition,
does anyone have step-by-step instructions for how to build and set up PVFS2
with OpenMPI as well as an example program because this is the first time I've
attempted this so I may well be doing something wrong.

Thanks in advance,
Evan


Re: [OMPI users] Problems Using PVFS2 with OpenMPI

2010-01-14 Thread Evan Smyth
I had been using an older variant of the needed flag for building romio 
(because the newer one was failing as the preceding suggests). I made this 
change and built with the correct romio flag. I next need to fix the ways pvfs2 
build so that is uses -fPIC. Interestingly, about 95% of pvfs2 builds with this 
flag by default but the final 5% does not. It needs to. With that fixed, built 
and installed, I was able to rebuild openmpi correctly. My test program now 
works like a charm. I will give the *precise* steps I needed to build pvfs2 
2.8.1 with openmpi 1.4 here for the record...


1. Determine where openmpi will be installed. I'm not certain that it needs to 
actually be installed there for this to work. If so, you will need to install 
openmpi twice. The first time, it clearly need not be built entirely correctly 
for pvfs2 (it can't be because setp 2 is a prerequisite for that) but probably 
building something without the "--with-io-romio-flags=..." should do if this 
actually must be installed at all. I'm betting it is not required but as I say, 
I have not verified this. It certainly works if it has been pre-installed as I 
just indicated.


2. Build pvfs2 correctly (I get conflicting info on whether the 
"--with-mpi=..." is needed but FWIW, this is how I built it and it installs 
into /usr/local which is it's default location...


cd 
setenv CFLAGS -fPIC
./configure --with-mpi=/work/rd/evan/archives/openmpi/openmpi/1.4/enable_pvfs \ 
--enable-verbose-build

make all

make install
exit

3. Build openmpi correctly. This is straightforward at this point. Also, the 
--enable-mpi-threads is not required for pvfs2 to work but I happen to also 
want this flag


cd 

./configure --prefix=/work/rd/evan/archives/openmpi/openmpi/1.4/enable_pvfs \ 
--enable-mpi-threads --with-io-romio-flags="--with-file-system=pvfs2+ufs+nfs"

make all

make install
exit

... and that's it. Hopefully, the next person who needs to figure this out will 
be helped by these instructions.


Evan

This seems to have done the trick.

Edgar Gabriel wrote:
I don't know whether its relevant for this problem or not, but a couple 
of weeks ago we also found that we had to apply the following patch to 
to compile ROMIO with OpenMPI over pvfs2. There is an additional header 
pvfs2-compat.h included in the ROMIO version of MPICH, but is somehow 
missing in the OpenMPI version


ompi/mca/io/romio/romio/adio/ad_pvfs2/ad_pvfs2.h
--- a/ompi/mca/io/romio/romio/adio/ad_pvfs2/ad_pvfs2.h  Thu Sep 03
11:55:51 2009 -0500
+++ b/ompi/mca/io/romio/romio/adio/ad_pvfs2/ad_pvfs2.h  Mon Sep 21
10:16:27 2009 -0500
@@ -11,6 +11,10 @@
  #include "adio.h"
  #ifdef HAVE_PVFS2_H
  #include "pvfs2.h"
+#endif
+
+#ifdef PVFS2_VERSION_MAJOR
+#include "pvfs2-compat.h"
  #endif


Thanks
Edgar


Rob Latham wrote:

On Tue, Jan 12, 2010 at 02:15:54PM -0800, Evan Smyth wrote:

OpenMPI 1.4 (had same issue with 1.3.3) is configured with
./configure --prefix=/work/rd/evan/archives/openmpi/openmpi/1.4/enable_pvfs \
--enable-mpi-threads --with-io-romio-flags="--with-filesystems=pvfs2+ufs+nfs"
PVFS 2.8.1 is configured to install in the default location (/usr/local) with
./configure --with-mpi=/work/rd/evan/archives/openmpi/openmpi/1.4/enable_pvfs

In addition to Jeff's request for the build logs, do you have
'pvfs2-config' in your path?   
 

I build and install these (in this order) and setup my PVFS2 space using
instructions at pvfs.org. I am able to use this space using the
/usr/local/bin/pvfs2-ls types of commands. I am simply running a 2-server
config (2 data servers and the same 2 hosts are metadata servers). As I say,
manually, this all seems fine (even when I'm not root). It may be
relevant that I am *not* using the kernel interface for PVFS2 as I
am just trying to get a
better understanding of how this works.

That's a good piece of information.  I run in that configuration
often, so we should be able to make this work.


It is perhaps relevant that I have not had to explicitly tell
OpenMPI where I installed PVFS. I have told PVFS where I installed
OpenMPI, though. This does seem slightly odd but there does not
appear to be a way of telling OpenMPI this information. Perhaps it
is not needed.

PVFS needs an MPI library only to build MPI-based testcases.  The
servers, client libraries, and utilities do not use MPI.


In any event, I then build my test program against this OpenMPI and
in that program I have the following call sequence (i is 0 and where
mntPoint is the path to my pvfs2 mount point -- I also tried
prefixing a "pvfs2:" in the front of this as I read somewhere that
that was optional).

In this case, since you do not have the PVFS file system mounted, the
'pvfs2:' prefix is mandatory.  Otherwise, the MPI-IO library will try
to look for a directory that does not exist.


Which will only execute on one of my 

[OMPI users] openmpi equivalent to mpich serv_p4 daemon

2007-01-19 Thread Evan Smyth
I had been using MPICH and its serv_p4 daemon to speed startup times. 
I've decided to try OpenMPI (primarily for the fault-tolerance features) 
and would like to know what the equivalent of the serv_p4 daemon is.


It appears as though the orted daemon may be what I am after but I don't 
quite understand it. I used to run serv_p4 with a specific port number 
and then pass a -p4ssport  flag to mpirun. The daemon would 
remain running on each node and each new mpirun job would simply 
communicate directly through a port with the already running instance of 
the daemon on that machine and would save the mpirun from having to 
launch an rsh. This was great for reducing startup and run times due to 
rsh issues. The orted daemon does support a -persistent flag which seems 
relevant, but I cannot find a real usage example.


I expect that most of the readers will find this to be a trivial problem 
but I'm hoping someone can give me an openmpi equivalent usage example.


Thanks in advance,

Evan


--
--
Evan Smyth
e...@dreamworks.com
Dreamworks Animation
818.695.4105, Riverside 146