here it is:
~/openmpi/bin/mpirun -np 2 -hostfile hostfile --mca btl openib,self,sm
--mca btl_openib_cpc_include rdmacm  --mca btl_openib_rroce_enable 1
./sendrecv

what I got as follows.

--------------------------------------------------------------------------
WARNING: There was an error initializing an OpenFabrics device.
  Local host:   chguo-msr-linux1
  Local device: mlx5_0
--------------------------------------------------------------------------
--------------------------------------------------------------------------
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.
  Process 1 ([[45408,1],0]) is on host: chguo-msr-linux1
  Process 2 ([[45408,1],1]) is on host: chguo-msr-linux02
  BTLs attempted: self
Your MPI job is now going to abort; sorry.
--------------------------------------------------------------------------
[chguo-msr-linux1:12690] *** An error occurred in MPI_Send
[chguo-msr-linux1:12690] *** reported by process
[140379686961153,140376711102464]
[chguo-msr-linux1:12690] *** on communicator MPI_COMM_WORLD
[chguo-msr-linux1:12690] *** MPI_ERR_INTERN: internal error
[chguo-msr-linux1:12690] *** MPI_ERRORS_ARE_FATAL (processes in this
communicator will now abort,
[chguo-msr-linux1:12690] ***    and potentially your MPI job)
[chguo-msr-linux1:12684] 1 more process has sent help message
help-mpi-btl-openib.txt / error in device init
[chguo-msr-linux1:12684] Set MCA parameter "orte_base_help_aggregate" to 0
to see all help / error messages


On Tue, Jun 13, 2017 at 5:05 AM, Joshua Ladd <jladd.m...@gmail.com> wrote:

> Hi,
>
> Please include your full command line.
>
> Josh
>
> On Mon, Jun 12, 2017 at 6:17 PM, Chuanxiong Guo <chuanxiong....@gmail.com>
> wrote:
>
>> Hi,
>>
>> I have two servers with Mellanox CX4-LX (50GbE Ethernet) back-to-back
>> connected. I am using Ubuntu 14-04. I have made mvapich2 work, and I can
>> confirm both roce and rocev2 work well (by packet capturing).
>>
>> But I still cannot make openmpi work with roce. I am using openmpi 2.1.1.
>> It looks that this version of openmpi does not recognize CX4-LX, which I
>> have added vendor part id 4117 to mca-btl-openib-device-params.ini, and
>> I have also updated opal/mca/common/verbs/common_verbs_port.c to support
>> CX4-LX, which has speed 64 and width 1.
>>
>> But I am still getting:
>>
>> "WARNING: There was an error initializing an OpenFabrics device.
>>   Local host:   chguo-msr-linux1
>>
>>   Local device: mlx5_0
>> "
>> Any hint on what are missing?
>>
>> Thanks,
>> CX
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>>
>
>
_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Reply via email to