Re: [OMPI users] Running with native ugni on a Cray XC
Howard, I searched through the https://github.com/open-mpi/ompi/wiki and didn't find any recent documentation on the master/v2.x series Cray support. Can you give me some pointers? I am trying to get MPI_Comm_spawn() support on a Cray XC40 via using Open MPI's master branch. On Fri, Jun 26, 2015 at 12:00 AM, Howard Pritchard wrote: > Hi Nick, > > I will endeavor to put together a wiki for the master/v2.x series specific > to Cray systems > (sans those customers who choose to neither 1) use Cray supported eslogin > setup nor 2) permit users to directly log in to and build apps on service > nodes) that explains best practices for > using Open MPI on Cray XE/XK/XC systems. > > A significant amount of work went in to master, and now the v2.x release > stream to rationalize support for Open MPI on Cray XE/XK/XC systems using > either aprun > or native slurm launch. > > General advice for all on this mailing list, do not use the Open MPI 1.8.X > release > series with direct ugni access enabled on Cray XE/XK/XC . Rather use > master, or as soon as > a release is available, from v2.x. Note that if you are using CCM, the > performance > of Open MPI 1.8.X over the Cray IAA (simulated ibverbs) is pretty good. I > suggest this > as the preferred route for using the 1.8.X release stream on Cray XE/XK/XC. > > Howard > > > 2015-06-25 19:35 GMT-06:00 Nick Radcliffe : >> >> Thanks Howard, using master worked for me. >> >> Nick Radcliffe >> Software Engineer >> Cray, Inc. >> >> From: users [users-boun...@open-mpi.org] on behalf of Howard Pritchard >> [hpprit...@gmail.com] >> Sent: Thursday, June 25, 2015 5:11 PM >> To: Open MPI Users >> Subject: Re: [OMPI users] Running with native ugni on a Cray XC >> >> Hi Nick >> >> use master not 1.8.x. for cray xc. also for config do not pay attention >> to cray/lanl platform files. just do config. also if using nativized slurm >> launch with srun not mpirun. >> >> howard >> >> -- >> >> sent from my smart phonr so no good type. >> >> Howard >> >> On Jun 25, 2015 2:56 PM, "Nick Radcliffe" wrote: >>> >>> Hi, >>> >>> I'm trying to build and run Open MPI 1.8.5 with native ugni on a Cray XC. >>> The build works, but I'm getting this error when I run: >>> >>> nradclif@kay:/lus/scratch/nradclif> aprun -n 2 -N 1 ./osu_latency >>> [nid00014:28784] [db_pmi.c:174:pmi_commit_packed] PMI_KVS_Put: Operation >>> failed >>> [nid00014:28784] [db_pmi.c:457:commit] PMI_KVS_Commit: Operation failed >>> [nid00012:12788] [db_pmi.c:174:pmi_commit_packed] PMI_KVS_Put: Operation >>> failed >>> [nid00012:12788] [db_pmi.c:457:commit] PMI_KVS_Commit: Operation failed >>> # OSU MPI Latency Test >>> # SizeLatency (us) >>> osu_latency: btl_ugni_endpoint.c:87: mca_btl_ugni_ep_connect_start: >>> Assertion `0' failed. >>> [nid00012:12788] *** Process received signal *** >>> [nid00012:12788] Signal: Aborted (6) >>> [nid00012:12788] Signal code: (-6) >>> [nid00012:12788] [ 0] /lib64/libpthread.so.0(+0xf850)[0x2b42b850] >>> [nid00012:12788] [ 1] /lib64/libc.so.6(gsignal+0x35)[0x2b66b885] >>> [nid00012:12788] [ 2] /lib64/libc.so.6(abort+0x181)[0x2b66ce61] >>> [nid00012:12788] [ 3] >>> /lib64/libc.so.6(__assert_fail+0xf0)[0x2b664740] >>> [nid00012:12788] [ 4] >>> /lus/scratch/nradclif/openmpi_install/lib/libmpi.so.1(mca_btl_ugni_ep_connect_progress+0x6c9)[0x2aff9869] >>> [nid00012:12788] [ 5] >>> /lus/scratch/nradclif/openmpi_install/lib/libmpi.so.1(+0x5ae32)[0x2af46e32] >>> [nid00012:12788] [ 6] >>> /lus/scratch/nradclif/openmpi_install/lib/libmpi.so.1(mca_btl_ugni_sendi+0x8bd)[0x2affaf7d] >>> [nid00012:12788] [ 7] >>> /lus/scratch/nradclif/openmpi_install/lib/libmpi.so.1(+0x1f0c17)[0x2b0dcc17] >>> [nid00012:12788] [ 8] >>> /lus/scratch/nradclif/openmpi_install/lib/libmpi.so.1(mca_pml_ob1_isend+0xa8)[0x2b0dd488] >>> [nid00012:12788] [ 9] >>> /lus/scratch/nradclif/openmpi_install/lib/libmpi.so.1(ompi_coll_tuned_barrier_intra_two_procs+0x11b)[0x2b07e84b] >>> [nid00012:12788] [10] >>> /lus/scratch/nradclif/openmpi_install/lib/libmpi.so.1(PMPI_Barrier+0xb6)[0x2af8a7c6] >>> [nid00012:12788] [11] ./osu_latency[0x401114] >>> [nid00012:12788] [12] >>> /lib64/libc.so.6(__libc_start_main+0xe6)[0x2b657c36] >>> [nid00012:12788] [13] ./osu_lat
Re: [OMPI users] Running with native ugni on a Cray XC
Hi Nick No. Have to use mpirun in this case. You need. to ask for a larger batch allocation than the initial mpirun requires. You do need to ask for batch alloc though. Also note that mpirun doesnt currently work with nativized slurm. Its on my todo list to fix. Howard -- sent from my smart phonr so no good type. Howard On Jun 30, 2015 3:51 PM, "Nick Radcliffe" wrote: > Howard, > > I have one more question. Is it possible to use MPI_Comm_spawn when > launching an OpenMPI job with aprun? I'm getting this error when I try: > > nradclif@kay:/lus/scratch/nradclif> aprun -n 1 -N 1 ./manager > [nid00036:21772] [[14952,0],0] ORTE_ERROR_LOG: Not available in file > dpm_orte.c at line 1190 > [36:21772] *** An error occurred in MPI_Comm_spawn > [36:21772] *** reported by process [979894272,0] > [36:21772] *** on communicator MPI_COMM_SELF > [36:21772] *** MPI_ERR_UNKNOWN: unknown error > [36:21772] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will > now abort, > [36:21772] ***and potentially your MPI job) > aborting job: > N/A > > > Nick Radcliffe > Software Engineer > Cray, Inc. > -- > *From:* users [users-boun...@open-mpi.org] on behalf of Howard Pritchard [ > hpprit...@gmail.com] > *Sent:* Thursday, June 25, 2015 11:00 PM > *To:* Open MPI Users > *Subject:* Re: [OMPI users] Running with native ugni on a Cray XC > > Hi Nick, > > I will endeavor to put together a wiki for the master/v2.x series > specific to Cray systems > (sans those customers who choose to neither 1) use Cray supported eslogin > setup nor 2) permit users to directly log in to and build apps on service > nodes) that explains best practices for > using Open MPI on Cray XE/XK/XC systems. > > A significant amount of work went in to master, and now the v2.x release > stream to rationalize support for Open MPI on Cray XE/XK/XC systems using > either aprun > or native slurm launch. > > General advice for all on this mailing list, do not use the Open MPI > 1.8.X release > series with direct ugni access enabled on Cray XE/XK/XC . Rather use > master, or as soon as > a release is available, from v2.x. Note that if you are using CCM, the > performance > of Open MPI 1.8.X over the Cray IAA (simulated ibverbs) is pretty good. I > suggest this > as the preferred route for using the 1.8.X release stream on Cray XE/XK/XC. > > Howard > > > 2015-06-25 19:35 GMT-06:00 Nick Radcliffe : > >> Thanks Howard, using master worked for me. >> >> Nick Radcliffe >> Software Engineer >> Cray, Inc. >> -------------- >> *From:* users [users-boun...@open-mpi.org] on behalf of Howard Pritchard >> [hpprit...@gmail.com] >> *Sent:* Thursday, June 25, 2015 5:11 PM >> *To:* Open MPI Users >> *Subject:* Re: [OMPI users] Running with native ugni on a Cray XC >> >> Hi Nick >> >> use master not 1.8.x. for cray xc. also for config do not pay attention >> to cray/lanl platform files. just do config. also if using nativized >> slurm launch with srun not mpirun. >> >> howard >> >> -- >> >> sent from my smart phonr so no good type. >> >> Howard >> On Jun 25, 2015 2:56 PM, "Nick Radcliffe" wrote: >> >>> Hi, >>> >>> I'm trying to build and run Open MPI 1.8.5 with native ugni on a Cray >>> XC. The build works, but I'm getting this error when I run: >>> >>> nradclif@kay:/lus/scratch/nradclif> aprun -n 2 -N 1 ./osu_latency >>> [nid00014:28784] [db_pmi.c:174:pmi_commit_packed] PMI_KVS_Put: Operation >>> failed >>> [nid00014:28784] [db_pmi.c:457:commit] PMI_KVS_Commit: Operation failed >>> [nid00012:12788] [db_pmi.c:174:pmi_commit_packed] PMI_KVS_Put: Operation >>> failed >>> [nid00012:12788] [db_pmi.c:457:commit] PMI_KVS_Commit: Operation failed >>> # OSU MPI Latency Test >>> # SizeLatency (us) >>> osu_latency: btl_ugni_endpoint.c:87: mca_btl_ugni_ep_connect_start: >>> Assertion `0' failed. >>> [nid00012:12788] *** Process received signal *** >>> [nid00012:12788] Signal: Aborted (6) >>> [nid00012:12788] Signal code: (-6) >>> [nid00012:12788] [ 0] /lib64/libpthread.so.0(+0xf850)[0x2b42b850] >>> [nid00012:12788] [ 1] /lib64/libc.so.6(gsignal+0x35)[0x2b66b885] >>> [nid00012:12788] [ 2] /lib64/libc.so.6(abort+0x181)[0x2b66ce61] >>> [nid00012:12788] [ 3] >>> /lib64/libc.so.6(__assert_fail+0xf0)[0x2b664740] >>> [nid00012:12788] [ 4] >>> /lus/scratch/nra
Re: [OMPI users] Running with native ugni on a Cray XC
Howard, I have one more question. Is it possible to use MPI_Comm_spawn when launching an OpenMPI job with aprun? I'm getting this error when I try: nradclif@kay:/lus/scratch/nradclif> aprun -n 1 -N 1 ./manager [nid00036:21772] [[14952,0],0] ORTE_ERROR_LOG: Not available in file dpm_orte.c at line 1190 [36:21772] *** An error occurred in MPI_Comm_spawn [36:21772] *** reported by process [979894272,0] [36:21772] *** on communicator MPI_COMM_SELF [36:21772] *** MPI_ERR_UNKNOWN: unknown error [36:21772] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, [36:21772] ***and potentially your MPI job) aborting job: N/A Nick Radcliffe Software Engineer Cray, Inc. From: users [users-boun...@open-mpi.org] on behalf of Howard Pritchard [hpprit...@gmail.com] Sent: Thursday, June 25, 2015 11:00 PM To: Open MPI Users Subject: Re: [OMPI users] Running with native ugni on a Cray XC Hi Nick, I will endeavor to put together a wiki for the master/v2.x series specific to Cray systems (sans those customers who choose to neither 1) use Cray supported eslogin setup nor 2) permit users to directly log in to and build apps on service nodes) that explains best practices for using Open MPI on Cray XE/XK/XC systems. A significant amount of work went in to master, and now the v2.x release stream to rationalize support for Open MPI on Cray XE/XK/XC systems using either aprun or native slurm launch. General advice for all on this mailing list, do not use the Open MPI 1.8.X release series with direct ugni access enabled on Cray XE/XK/XC . Rather use master, or as soon as a release is available, from v2.x. Note that if you are using CCM, the performance of Open MPI 1.8.X over the Cray IAA (simulated ibverbs) is pretty good. I suggest this as the preferred route for using the 1.8.X release stream on Cray XE/XK/XC. Howard 2015-06-25 19:35 GMT-06:00 Nick Radcliffe mailto:nradc...@cray.com>>: Thanks Howard, using master worked for me. Nick Radcliffe Software Engineer Cray, Inc. From: users [users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org>] on behalf of Howard Pritchard [hpprit...@gmail.com<mailto:hpprit...@gmail.com>] Sent: Thursday, June 25, 2015 5:11 PM To: Open MPI Users Subject: Re: [OMPI users] Running with native ugni on a Cray XC Hi Nick use master not 1.8.x. for cray xc. also for config do not pay attention to cray/lanl platform files. just do config. also if using nativized slurm launch with srun not mpirun. howard -- sent from my smart phonr so no good type. Howard On Jun 25, 2015 2:56 PM, "Nick Radcliffe" mailto:nradc...@cray.com>> wrote: Hi, I'm trying to build and run Open MPI 1.8.5 with native ugni on a Cray XC. The build works, but I'm getting this error when I run: nradclif@kay:/lus/scratch/nradclif> aprun -n 2 -N 1 ./osu_latency [nid00014:28784] [db_pmi.c:174:pmi_commit_packed] PMI_KVS_Put: Operation failed [nid00014:28784] [db_pmi.c:457:commit] PMI_KVS_Commit: Operation failed [nid00012:12788] [db_pmi.c:174:pmi_commit_packed] PMI_KVS_Put: Operation failed [nid00012:12788] [db_pmi.c:457:commit] PMI_KVS_Commit: Operation failed # OSU MPI Latency Test # SizeLatency (us) osu_latency: btl_ugni_endpoint.c:87: mca_btl_ugni_ep_connect_start: Assertion `0' failed. [nid00012:12788] *** Process received signal *** [nid00012:12788] Signal: Aborted (6) [nid00012:12788] Signal code: (-6) [nid00012:12788] [ 0] /lib64/libpthread.so.0(+0xf850)[0x2b42b850] [nid00012:12788] [ 1] /lib64/libc.so.6(gsignal+0x35)[0x2b66b885] [nid00012:12788] [ 2] /lib64/libc.so.6(abort+0x181)[0x2b66ce61] [nid00012:12788] [ 3] /lib64/libc.so.6(__assert_fail+0xf0)[0x2b664740] [nid00012:12788] [ 4] /lus/scratch/nradclif/openmpi_install/lib/libmpi.so.1(mca_btl_ugni_ep_connect_progress+0x6c9)[0x2aff9869] [nid00012:12788] [ 5] /lus/scratch/nradclif/openmpi_install/lib/libmpi.so.1(+0x5ae32)[0x2af46e32] [nid00012:12788] [ 6] /lus/scratch/nradclif/openmpi_install/lib/libmpi.so.1(mca_btl_ugni_sendi+0x8bd)[0x2affaf7d] [nid00012:12788] [ 7] /lus/scratch/nradclif/openmpi_install/lib/libmpi.so.1(+0x1f0c17)[0x2b0dcc17] [nid00012:12788] [ 8] /lus/scratch/nradclif/openmpi_install/lib/libmpi.so.1(mca_pml_ob1_isend+0xa8)[0x2b0dd488] [nid00012:12788] [ 9] /lus/scratch/nradclif/openmpi_install/lib/libmpi.so.1(ompi_coll_tuned_barrier_intra_two_procs+0x11b)[0x2b07e84b] [nid00012:12788] [10] /lus/scratch/nradclif/openmpi_install/lib/libmpi.so.1(PMPI_Barrier+0xb6)[0x2af8a7c6] [nid00012:12788] [11] ./osu_latency[0x401114] [nid00012:12788] [12] /lib64/libc.so.6(__libc_start_main+0xe6)[0x2b657c36] [nid00012:12788] [13] ./osu_latency[0x400dd9] [nid00012:12788] *** End of error message *** osu_latency: btl_ugni_endpoint.c:87: mca_btl_ugni_ep_connect_start: Assertion `0' failed. Here's how I b
Re: [OMPI users] Running with native ugni on a Cray XC
Hi Nick, I will endeavor to put together a wiki for the master/v2.x series specific to Cray systems (sans those customers who choose to neither 1) use Cray supported eslogin setup nor 2) permit users to directly log in to and build apps on service nodes) that explains best practices for using Open MPI on Cray XE/XK/XC systems. A significant amount of work went in to master, and now the v2.x release stream to rationalize support for Open MPI on Cray XE/XK/XC systems using either aprun or native slurm launch. General advice for all on this mailing list, do not use the Open MPI 1.8.X release series with direct ugni access enabled on Cray XE/XK/XC . Rather use master, or as soon as a release is available, from v2.x. Note that if you are using CCM, the performance of Open MPI 1.8.X over the Cray IAA (simulated ibverbs) is pretty good. I suggest this as the preferred route for using the 1.8.X release stream on Cray XE/XK/XC. Howard 2015-06-25 19:35 GMT-06:00 Nick Radcliffe : > Thanks Howard, using master worked for me. > > Nick Radcliffe > Software Engineer > Cray, Inc. > -- > *From:* users [users-boun...@open-mpi.org] on behalf of Howard Pritchard [ > hpprit...@gmail.com] > *Sent:* Thursday, June 25, 2015 5:11 PM > *To:* Open MPI Users > *Subject:* Re: [OMPI users] Running with native ugni on a Cray XC > > Hi Nick > > use master not 1.8.x. for cray xc. also for config do not pay attention > to cray/lanl platform files. just do config. also if using nativized > slurm launch with srun not mpirun. > > howard > > -- > > sent from my smart phonr so no good type. > > Howard > On Jun 25, 2015 2:56 PM, "Nick Radcliffe" wrote: > >> Hi, >> >> I'm trying to build and run Open MPI 1.8.5 with native ugni on a Cray XC. >> The build works, but I'm getting this error when I run: >> >> nradclif@kay:/lus/scratch/nradclif> aprun -n 2 -N 1 ./osu_latency >> [nid00014:28784] [db_pmi.c:174:pmi_commit_packed] PMI_KVS_Put: Operation >> failed >> [nid00014:28784] [db_pmi.c:457:commit] PMI_KVS_Commit: Operation failed >> [nid00012:12788] [db_pmi.c:174:pmi_commit_packed] PMI_KVS_Put: Operation >> failed >> [nid00012:12788] [db_pmi.c:457:commit] PMI_KVS_Commit: Operation failed >> # OSU MPI Latency Test >> # SizeLatency (us) >> osu_latency: btl_ugni_endpoint.c:87: mca_btl_ugni_ep_connect_start: >> Assertion `0' failed. >> [nid00012:12788] *** Process received signal *** >> [nid00012:12788] Signal: Aborted (6) >> [nid00012:12788] Signal code: (-6) >> [nid00012:12788] [ 0] /lib64/libpthread.so.0(+0xf850)[0x2b42b850] >> [nid00012:12788] [ 1] /lib64/libc.so.6(gsignal+0x35)[0x2b66b885] >> [nid00012:12788] [ 2] /lib64/libc.so.6(abort+0x181)[0x2b66ce61] >> [nid00012:12788] [ 3] /lib64/libc.so.6(__assert_fail+0xf0)[0x2b664740] >> [nid00012:12788] [ 4] >> /lus/scratch/nradclif/openmpi_install/lib/libmpi.so.1(mca_btl_ugni_ep_connect_progress+0x6c9)[0x2aff9869] >> [nid00012:12788] [ 5] >> /lus/scratch/nradclif/openmpi_install/lib/libmpi.so.1(+0x5ae32)[0x2af46e32] >> [nid00012:12788] [ 6] >> /lus/scratch/nradclif/openmpi_install/lib/libmpi.so.1(mca_btl_ugni_sendi+0x8bd)[0x2affaf7d] >> [nid00012:12788] [ 7] >> /lus/scratch/nradclif/openmpi_install/lib/libmpi.so.1(+0x1f0c17)[0x2b0dcc17] >> [nid00012:12788] [ 8] >> /lus/scratch/nradclif/openmpi_install/lib/libmpi.so.1(mca_pml_ob1_isend+0xa8)[0x2b0dd488] >> [nid00012:12788] [ 9] >> /lus/scratch/nradclif/openmpi_install/lib/libmpi.so.1(ompi_coll_tuned_barrier_intra_two_procs+0x11b)[0x2b07e84b] >> [nid00012:12788] [10] >> /lus/scratch/nradclif/openmpi_install/lib/libmpi.so.1(PMPI_Barrier+0xb6)[0x2af8a7c6] >> [nid00012:12788] [11] ./osu_latency[0x401114] >> [nid00012:12788] [12] >> /lib64/libc.so.6(__libc_start_main+0xe6)[0x2b657c36] >> [nid00012:12788] [13] ./osu_latency[0x400dd9] >> [nid00012:12788] *** End of error message *** >> osu_latency: btl_ugni_endpoint.c:87: mca_btl_ugni_ep_connect_start: >> Assertion `0' failed. >> >> >> Here's how I build: >> >> export FC=ftn (I'm not using Fortran, but the configure fails if >> it can't find a Fortran compiler) >> ./configure --prefix=/lus/scratch/nradclif/openmpi_install >> --enable-mpi-fortran=none >> --with-platform=contrib/platform/lanl/cray_xe6/debug-lustre >> make install >> >> I didn't modify the debug-lustre file, but I did change cray-common to >> remove the hard-coding, e.g., rather than using the gemini-specific path >> "with_pmi=/opt/cray/pmi/2
Re: [OMPI users] Running with native ugni on a Cray XC
Thanks Howard, using master worked for me. Nick Radcliffe Software Engineer Cray, Inc. From: users [users-boun...@open-mpi.org] on behalf of Howard Pritchard [hpprit...@gmail.com] Sent: Thursday, June 25, 2015 5:11 PM To: Open MPI Users Subject: Re: [OMPI users] Running with native ugni on a Cray XC Hi Nick use master not 1.8.x. for cray xc. also for config do not pay attention to cray/lanl platform files. just do config. also if using nativized slurm launch with srun not mpirun. howard -- sent from my smart phonr so no good type. Howard On Jun 25, 2015 2:56 PM, "Nick Radcliffe" mailto:nradc...@cray.com>> wrote: Hi, I'm trying to build and run Open MPI 1.8.5 with native ugni on a Cray XC. The build works, but I'm getting this error when I run: nradclif@kay:/lus/scratch/nradclif> aprun -n 2 -N 1 ./osu_latency [nid00014:28784] [db_pmi.c:174:pmi_commit_packed] PMI_KVS_Put: Operation failed [nid00014:28784] [db_pmi.c:457:commit] PMI_KVS_Commit: Operation failed [nid00012:12788] [db_pmi.c:174:pmi_commit_packed] PMI_KVS_Put: Operation failed [nid00012:12788] [db_pmi.c:457:commit] PMI_KVS_Commit: Operation failed # OSU MPI Latency Test # SizeLatency (us) osu_latency: btl_ugni_endpoint.c:87: mca_btl_ugni_ep_connect_start: Assertion `0' failed. [nid00012:12788] *** Process received signal *** [nid00012:12788] Signal: Aborted (6) [nid00012:12788] Signal code: (-6) [nid00012:12788] [ 0] /lib64/libpthread.so.0(+0xf850)[0x2b42b850] [nid00012:12788] [ 1] /lib64/libc.so.6(gsignal+0x35)[0x2b66b885] [nid00012:12788] [ 2] /lib64/libc.so.6(abort+0x181)[0x2b66ce61] [nid00012:12788] [ 3] /lib64/libc.so.6(__assert_fail+0xf0)[0x2b664740] [nid00012:12788] [ 4] /lus/scratch/nradclif/openmpi_install/lib/libmpi.so.1(mca_btl_ugni_ep_connect_progress+0x6c9)[0x2aff9869] [nid00012:12788] [ 5] /lus/scratch/nradclif/openmpi_install/lib/libmpi.so.1(+0x5ae32)[0x2af46e32] [nid00012:12788] [ 6] /lus/scratch/nradclif/openmpi_install/lib/libmpi.so.1(mca_btl_ugni_sendi+0x8bd)[0x2affaf7d] [nid00012:12788] [ 7] /lus/scratch/nradclif/openmpi_install/lib/libmpi.so.1(+0x1f0c17)[0x2b0dcc17] [nid00012:12788] [ 8] /lus/scratch/nradclif/openmpi_install/lib/libmpi.so.1(mca_pml_ob1_isend+0xa8)[0x2b0dd488] [nid00012:12788] [ 9] /lus/scratch/nradclif/openmpi_install/lib/libmpi.so.1(ompi_coll_tuned_barrier_intra_two_procs+0x11b)[0x2b07e84b] [nid00012:12788] [10] /lus/scratch/nradclif/openmpi_install/lib/libmpi.so.1(PMPI_Barrier+0xb6)[0x2af8a7c6] [nid00012:12788] [11] ./osu_latency[0x401114] [nid00012:12788] [12] /lib64/libc.so.6(__libc_start_main+0xe6)[0x2b657c36] [nid00012:12788] [13] ./osu_latency[0x400dd9] [nid00012:12788] *** End of error message *** osu_latency: btl_ugni_endpoint.c:87: mca_btl_ugni_ep_connect_start: Assertion `0' failed. Here's how I build: export FC=ftn (I'm not using Fortran, but the configure fails if it can't find a Fortran compiler) ./configure --prefix=/lus/scratch/nradclif/openmpi_install --enable-mpi-fortran=none --with-platform=contrib/platform/lanl/cray_xe6/debug-lustre make install I didn't modify the debug-lustre file, but I did change cray-common to remove the hard-coding, e.g., rather than using the gemini-specific path "with_pmi=/opt/cray/pmi/2.1.4-1..8596.8.9.gem", I used "with_pmi=/opt/cray/pmi/default". I've tried running different executables with different numbers of ranks/nodes, but they all seem to run into problems with PMI_KVS_Put. Any ideas what could be going wrong? Thanks for any help, Nick ___ users mailing list us...@open-mpi.org<mailto:us...@open-mpi.org> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2015/06/27197.php
Re: [OMPI users] Running with native ugni on a Cray XC
Hi Nick use master not 1.8.x. for cray xc. also for config do not pay attention to cray/lanl platform files. just do config. also if using nativized slurm launch with srun not mpirun. howard -- sent from my smart phonr so no good type. Howard On Jun 25, 2015 2:56 PM, "Nick Radcliffe" wrote: > Hi, > > I'm trying to build and run Open MPI 1.8.5 with native ugni on a Cray XC. > The build works, but I'm getting this error when I run: > > nradclif@kay:/lus/scratch/nradclif> aprun -n 2 -N 1 ./osu_latency > [nid00014:28784] [db_pmi.c:174:pmi_commit_packed] PMI_KVS_Put: Operation > failed > [nid00014:28784] [db_pmi.c:457:commit] PMI_KVS_Commit: Operation failed > [nid00012:12788] [db_pmi.c:174:pmi_commit_packed] PMI_KVS_Put: Operation > failed > [nid00012:12788] [db_pmi.c:457:commit] PMI_KVS_Commit: Operation failed > # OSU MPI Latency Test > # SizeLatency (us) > osu_latency: btl_ugni_endpoint.c:87: mca_btl_ugni_ep_connect_start: > Assertion `0' failed. > [nid00012:12788] *** Process received signal *** > [nid00012:12788] Signal: Aborted (6) > [nid00012:12788] Signal code: (-6) > [nid00012:12788] [ 0] /lib64/libpthread.so.0(+0xf850)[0x2b42b850] > [nid00012:12788] [ 1] /lib64/libc.so.6(gsignal+0x35)[0x2b66b885] > [nid00012:12788] [ 2] /lib64/libc.so.6(abort+0x181)[0x2b66ce61] > [nid00012:12788] [ 3] /lib64/libc.so.6(__assert_fail+0xf0)[0x2b664740] > [nid00012:12788] [ 4] > /lus/scratch/nradclif/openmpi_install/lib/libmpi.so.1(mca_btl_ugni_ep_connect_progress+0x6c9)[0x2aff9869] > [nid00012:12788] [ 5] > /lus/scratch/nradclif/openmpi_install/lib/libmpi.so.1(+0x5ae32)[0x2af46e32] > [nid00012:12788] [ 6] > /lus/scratch/nradclif/openmpi_install/lib/libmpi.so.1(mca_btl_ugni_sendi+0x8bd)[0x2affaf7d] > [nid00012:12788] [ 7] > /lus/scratch/nradclif/openmpi_install/lib/libmpi.so.1(+0x1f0c17)[0x2b0dcc17] > [nid00012:12788] [ 8] > /lus/scratch/nradclif/openmpi_install/lib/libmpi.so.1(mca_pml_ob1_isend+0xa8)[0x2b0dd488] > [nid00012:12788] [ 9] > /lus/scratch/nradclif/openmpi_install/lib/libmpi.so.1(ompi_coll_tuned_barrier_intra_two_procs+0x11b)[0x2b07e84b] > [nid00012:12788] [10] > /lus/scratch/nradclif/openmpi_install/lib/libmpi.so.1(PMPI_Barrier+0xb6)[0x2af8a7c6] > [nid00012:12788] [11] ./osu_latency[0x401114] > [nid00012:12788] [12] > /lib64/libc.so.6(__libc_start_main+0xe6)[0x2b657c36] > [nid00012:12788] [13] ./osu_latency[0x400dd9] > [nid00012:12788] *** End of error message *** > osu_latency: btl_ugni_endpoint.c:87: mca_btl_ugni_ep_connect_start: > Assertion `0' failed. > > > Here's how I build: > > export FC=ftn (I'm not using Fortran, but the configure fails if > it can't find a Fortran compiler) > ./configure --prefix=/lus/scratch/nradclif/openmpi_install > --enable-mpi-fortran=none > --with-platform=contrib/platform/lanl/cray_xe6/debug-lustre > make install > > I didn't modify the debug-lustre file, but I did change cray-common to > remove the hard-coding, e.g., rather than using the gemini-specific path > "with_pmi=/opt/cray/pmi/2.1.4-1..8596.8.9.gem", I used > "with_pmi=/opt/cray/pmi/default". > > I've tried running different executables with different numbers of > ranks/nodes, but they all seem to run into problems with PMI_KVS_Put. > > Any ideas what could be going wrong? > > Thanks for any help, > Nick > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/06/27197.php >
[OMPI users] Running with native ugni on a Cray XC
Hi, I'm trying to build and run Open MPI 1.8.5 with native ugni on a Cray XC. The build works, but I'm getting this error when I run: nradclif@kay:/lus/scratch/nradclif> aprun -n 2 -N 1 ./osu_latency [nid00014:28784] [db_pmi.c:174:pmi_commit_packed] PMI_KVS_Put: Operation failed [nid00014:28784] [db_pmi.c:457:commit] PMI_KVS_Commit: Operation failed [nid00012:12788] [db_pmi.c:174:pmi_commit_packed] PMI_KVS_Put: Operation failed [nid00012:12788] [db_pmi.c:457:commit] PMI_KVS_Commit: Operation failed # OSU MPI Latency Test # SizeLatency (us) osu_latency: btl_ugni_endpoint.c:87: mca_btl_ugni_ep_connect_start: Assertion `0' failed. [nid00012:12788] *** Process received signal *** [nid00012:12788] Signal: Aborted (6) [nid00012:12788] Signal code: (-6) [nid00012:12788] [ 0] /lib64/libpthread.so.0(+0xf850)[0x2b42b850] [nid00012:12788] [ 1] /lib64/libc.so.6(gsignal+0x35)[0x2b66b885] [nid00012:12788] [ 2] /lib64/libc.so.6(abort+0x181)[0x2b66ce61] [nid00012:12788] [ 3] /lib64/libc.so.6(__assert_fail+0xf0)[0x2b664740] [nid00012:12788] [ 4] /lus/scratch/nradclif/openmpi_install/lib/libmpi.so.1(mca_btl_ugni_ep_connect_progress+0x6c9)[0x2aff9869] [nid00012:12788] [ 5] /lus/scratch/nradclif/openmpi_install/lib/libmpi.so.1(+0x5ae32)[0x2af46e32] [nid00012:12788] [ 6] /lus/scratch/nradclif/openmpi_install/lib/libmpi.so.1(mca_btl_ugni_sendi+0x8bd)[0x2affaf7d] [nid00012:12788] [ 7] /lus/scratch/nradclif/openmpi_install/lib/libmpi.so.1(+0x1f0c17)[0x2b0dcc17] [nid00012:12788] [ 8] /lus/scratch/nradclif/openmpi_install/lib/libmpi.so.1(mca_pml_ob1_isend+0xa8)[0x2b0dd488] [nid00012:12788] [ 9] /lus/scratch/nradclif/openmpi_install/lib/libmpi.so.1(ompi_coll_tuned_barrier_intra_two_procs+0x11b)[0x2b07e84b] [nid00012:12788] [10] /lus/scratch/nradclif/openmpi_install/lib/libmpi.so.1(PMPI_Barrier+0xb6)[0x2af8a7c6] [nid00012:12788] [11] ./osu_latency[0x401114] [nid00012:12788] [12] /lib64/libc.so.6(__libc_start_main+0xe6)[0x2b657c36] [nid00012:12788] [13] ./osu_latency[0x400dd9] [nid00012:12788] *** End of error message *** osu_latency: btl_ugni_endpoint.c:87: mca_btl_ugni_ep_connect_start: Assertion `0' failed. Here's how I build: export FC=ftn (I'm not using Fortran, but the configure fails if it can't find a Fortran compiler) ./configure --prefix=/lus/scratch/nradclif/openmpi_install --enable-mpi-fortran=none --with-platform=contrib/platform/lanl/cray_xe6/debug-lustre make install I didn't modify the debug-lustre file, but I did change cray-common to remove the hard-coding, e.g., rather than using the gemini-specific path "with_pmi=/opt/cray/pmi/2.1.4-1..8596.8.9.gem", I used "with_pmi=/opt/cray/pmi/default". I've tried running different executables with different numbers of ranks/nodes, but they all seem to run into problems with PMI_KVS_Put. Any ideas what could be going wrong? Thanks for any help, Nick