Hi,

on one node ./IOR running with OpenMPI but with two node it fails with 
"][connect/btl_openib_connect_udcm.c:1575:udcm_wait_for_send_completion] send 
failed with verbs status 2"

One Node


[root@vcn03 C]# mpirun --allow-run-as-root -np 1 -host vcn03 ./IOR
--------------------------------------------------------------------------
WARNING: No preset parameters were found for the device that Open MPI
detected:

Local host: vcn03
Device name: mlx5_0
Device vendor ID: 0x02c9
Device vendor part ID: 4114

Default device parameters will be used, which may result in lower
performance. You can edit any of the files specified by the
btl_openib_device_param_files MCA parameter to set values for your
device.

NOTE: You can turn off this warning by setting the MCA parameter
btl_openib_warn_no_device_params_found to 0.
--------------------------------------------------------------------------
[vcn03][[33605,1],0][connect/btl_openib_connect_udcm.c:1235:udcm_rc_qp_to_rtr] 
error modifing QP to RTR errno says Invalid argument
IOR-2.10.3: MPI Coordinated Test of Parallel I/O

Run began: Tue Mar 13 11:50:15 2018
Command line used: ./IOR
Machine: Linux vcn03

Summary:
api = POSIX
test filename = testFile
access = single-shared-file
ordering in a file = sequential offsets
ordering inter file= no tasks offsets
clients = 1 (1 per node)
repetitions = 1
xfersize = 262144 bytes
blocksize = 1 MiB
aggregate filesize = 1 MiB

Operation Max (MiB) Min (MiB) Mean (MiB) Std Dev Max (OPs) Min (OPs) Mean (OPs) 
Std Dev Mean (s)
--------- --------- --------- ---------- ------- --------- --------- ---------- 
------- --------
write 312.36 312.36 312.36 0.00 1249.44 1249.44 1249.44 0.00 0.00320 EXCEL
read 996.42 996.42 996.42 0.00 3985.69 3985.69 3985.69 0.00 0.00100 EXCEL

Max Write: 312.36 MiB/sec (327.53 MB/sec)
Max Read: 996.42 MiB/sec (1044.82 MB/sec)

Run finished: Tue Mar 13 11:50:15 2018


two node run

[root@vcn03 C]# mpirun --allow-run-as-root -np 2 -host vcn03,vcn04 ./IOR
--------------------------------------------------------------------------
WARNING: No preset parameters were found for the device that Open MPI
detected:

Local host: vcn04
Device name: mlx5_0
Device vendor ID: 0x02c9
Device vendor part ID: 4114

Default device parameters will be used, which may result in lower
performance. You can edit any of the files specified by the
btl_openib_device_param_files MCA parameter to set values for your
device.

NOTE: You can turn off this warning by setting the MCA parameter
btl_openib_warn_no_device_params_found to 0.
--------------------------------------------------------------------------
[vcn03][[33640,1],0][connect/btl_openib_connect_udcm.c:1235:udcm_rc_qp_to_rtr] 
error modifing QP to RTR errno says Invalid argument
[vcn04][[33640,1],1][connect/btl_openib_connect_udcm.c:1235:udcm_rc_qp_to_rtr] 
error modifing QP to RTR errno says Invalid argument
mlx5: vcn04: got completion with error:
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 78006802 0a00016f 00005bd2
[vcn04][[33640,1],1][connect/btl_openib_connect_udcm.c:1575:udcm_wait_for_send_completion]
 send failed with verbs status 2
[vcn04:28705] *** An error occurred in MPI_Send
[vcn04:28705] *** reported by process [2204631041,1]
[vcn04:28705] *** on communicator MPI_COMM_WORLD
[vcn04:28705] *** MPI_ERR_OTHER: known error not in list
[vcn04:28705] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now 
abort,
[vcn04:28705] *** and potentially your MPI job)
[vcn03:05349] 1 more process has sent help message help-mpi-btl-openib.txt / no 
device params found
[vcn03:05349] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help 
/ error messages
[root@vcn03 C]#
________________________________________
From: devel [devel-boun...@lists.open-mpi.org] on behalf of Pharthiphan Asokan 
[paso...@ddn.com]
Sent: Tuesday, March 13, 2018 9:13 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] How to Build OpenMPI to support FDR over SR-IOV

[This sender failed our fraud detection checks and may not be who they appear 
to be. Learn about spoofing at http://aka.ms/LearnAboutSpoofing]

HI Jeff,

by adding PATH and LD_LIBRARY_PATH, I don't see orted not found issue.

[root@vcn03 pasokan]# mpirun --allow-run-as-root -np 4 -host 
vcn03,vcn03,vcn04,vcn04 /mnt/lustre_client/pasokan/a.out
--------------------------------------------------------------------------
WARNING: No preset parameters were found for the device that Open MPI
detected:

Local host: vcn03
Device name: mlx5_0
Device vendor ID: 0x02c9
Device vendor part ID: 4114

Default device parameters will be used, which may result in lower
performance. You can edit any of the files specified by the
btl_openib_device_param_files MCA parameter to set values for your
device.

NOTE: You can turn off this warning by setting the MCA parameter
btl_openib_warn_no_device_params_found to 0.
--------------------------------------------------------------------------
[vcn04][[33859,1],2][connect/btl_openib_connect_udcm.c:1235:udcm_rc_qp_to_rtr] 
error modifing QP to RTR errno says Invalid argument
[vcn03][[33859,1],0][connect/btl_openib_connect_udcm.c:1235:udcm_rc_qp_to_rtr] 
error modifing QP to RTR errno says Invalid argument
[vcn03][[33859,1],1][connect/btl_openib_connect_udcm.c:1235:udcm_rc_qp_to_rtr] 
error modifing QP to RTR errno says Invalid argument
[vcn04][[33859,1],3][connect/btl_openib_connect_udcm.c:1235:udcm_rc_qp_to_rtr] 
error modifing QP to RTR errno says Invalid argument
Hello world from processor vcn03, rank 0 out of 4 processors
Hello world from processor vcn03, rank 1 out of 4 processors
Hello world from processor vcn04, rank 2 out of 4 processors
Hello world from processor vcn04, rank 3 out of 4 processors
[vcn03:05070] 3 more processes have sent help message help-mpi-btl-openib.txt / 
no device params found
[vcn03:05070] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help 
/ error messages
[root@vcn03 pasokan]#


but still IOR isn't running while compiled using OpenMPI, throwing segmentation 
fault, which used to be very straight forward in Baremetal but not in KVM + 
SR-IOV

________________________________________
From: Pharthiphan Asokan
Sent: Tuesday, March 13, 2018 8:42 PM
To: Open MPI Developers
Subject: RE: [OMPI devel] How to Build OpenMPI to support FDR over SR-IOV

Thanks Jeff,

OpenMPI is installed here

[root@vcn03 C]# cd /mnt/lustre_client/pasokan/openmpi-3.0.0/openmpi-3.0.0/
bin/ etc/ include/ lib/ share/
[root@vcn03 C]#

why exporting these variables not taking effect

export PATH=/mnt/lustre_client/pasokan/openmpi-3.0.0/openmpi-3.0.0/bin:$PATH
export 
LD_LIBRARY_PATH=/mnt/lustre_client/pasokan/openmpi-3.0.0/openmpi-3.0.0/lib:$LD_LIBRARY_PATH
export 
INCLUDE=/mnt/lustre_client/pasokan/openmpi-3.0.0/openmpi-3.0.0/include:$INCLUDE

but as said by providing --prefix 
/mnt/lustre_client/pasokan/openmpi-3.0.0/openmpi-3.0.0/ is working

[root@vcn03 C]# mpirun --prefix 
/mnt/lustre_client/pasokan/openmpi-3.0.0/openmpi-3.0.0/ --allow-run-as-root -np 
2 -host vcn03,vcn04 hostname
vcn04
vcn03
[root@vcn03 C]#


though my issue is IOR isn't running while compile with OpenMPI on SR-IOV 
envirorment

[root@vcn03 C]# pwd
/mnt/lustre_client/pasokan/IOR-July12/src/C
[root@vcn03 C]#
[root@vcn03 C]# export 
PATH=/mnt/lustre_client/pasokan/openmpi-3.0.0/openmpi-3.0.0/bin:$PATH
[root@vcn03 C]# export 
LD_LIBRARY_PATH=/mnt/lustre_client/pasokan/openmpi-3.0.0/openmpi-3.0.0/lib:$LD_LIBRARY_PATH
[root@vcn03 C]# export 
INCLUDE=/mnt/lustre_client/pasokan/openmpi-3.0.0/openmpi-3.0.0/include:$INCLUDE
[root@vcn03 C]#
[root@vcn03 C]# gmake posix mpiio
mpicc -o IOR IOR.o utilities.o parse_options.o \
aiori-POSIX.o aiori-noMPIIO.o aiori-noHDF5.o aiori-noNCMPI.o \
-lm
mpicc -o IOR IOR.o utilities.o parse_options.o \
aiori-POSIX.o aiori-MPIIO.o aiori-noHDF5.o aiori-noNCMPI.o \
-lm
[root@vcn03 C]# ./IOR
--------------------------------------------------------------------------
WARNING: No preset parameters were found for the device that Open MPI
detected:

Local host: vcn03
Device name: mlx5_0
Device vendor ID: 0x02c9
Device vendor part ID: 4114

Default device parameters will be used, which may result in lower
performance. You can edit any of the files specified by the
btl_openib_device_param_files MCA parameter to set values for your
device.

NOTE: You can turn off this warning by setting the MCA parameter
btl_openib_warn_no_device_params_found to 0.
--------------------------------------------------------------------------
[vcn03][[34068,1],0][connect/btl_openib_connect_udcm.c:1235:udcm_rc_qp_to_rtr] 
error modifing QP to RTR errno says Invalid argument
Segmentation fault
[root@vcn03 C]#


Please help !

________________________________________
From: devel [devel-boun...@lists.open-mpi.org] on behalf of Jeff Squyres 
(jsquyres) [jsquy...@cisco.com]
Sent: Tuesday, March 13, 2018 8:20 PM
To: Open MPI Developers List
Subject: Re: [OMPI devel] How to Build OpenMPI to support FDR over SR-IOV

On Mar 13, 2018, at 2:08 AM, Pharthiphan Asokan <paso...@ddn.com> wrote:
>
> [root@vcn03 C]# mpirun --allow-run-as-root -np 2 -host vcn03,vcn04 hostname
> bash: orted: command not found

This is the key ^^

These FAQ items may help:

* https://www.open-mpi.org/faq/?category=running#run-prereqs.
* https://www.open-mpi.org/faq/?category=running#adding-ompi-to-path
* https://www.open-mpi.org/faq/?category=running#mpirun-prefix

--
Jeff Squyres
jsquy...@cisco.com

_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel
_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel
_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Reply via email to