Re: [OMPI users] OpenMPI + InfiniBand

2016-12-26 Thread gilles
 Sergei,

thanks for confirming you are now able to use Open MPI

fwiw, orted is remotely started by the selected plm component.

it can be ssh if you run without a batch manager, the tm interface if 
PBS/torque, srun if slurm, etc ...

that should explain why exporting PATH and LD_LIBRARY_PATH is not enough 
in your environment,

not to mention your .bashrc or equivalent might reset/unset

Cheers,

Gilles

- Original Message -

Hi Gilles!
 

this looks like a very different issue, orted cannot be remotely 
started.
...

a better option (as long as you do not plan to relocate Open MPI 
install dir) is to configure with

--enable-mpirun-prefix-by-default


Yes, that's was a problem with orted.
I checked PATH and LD_LIBRARY_PATH variables and both are specified, 
but it was not enough!

So I added --enable-mpirun-prefix-by-default to configure and even 
when --prefix isn't specified the recompiled version woks properly.

When Ethernet transfer is used, all works both with and without --
enable-mpirun-prefix-by-default.

Thank you!

Best regards,
Sergei.



___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] OpenMPI + InfiniBand

2016-12-26 Thread Sergei Hrushev
Hi Gilles!


> this looks like a very different issue, orted cannot be remotely started.
> ...
>
> a better option (as long as you do not plan to relocate Open MPI install
> dir) is to configure with
>
> --enable-mpirun-prefix-by-default
>

Yes, that's was a problem with orted.
I checked PATH and LD_LIBRARY_PATH variables and both are specified, but it
was not enough!

So I added --enable-mpirun-prefix-by-default to configure and even when
--prefix isn't specified the recompiled version woks properly.

When Ethernet transfer is used, all works both with and without
--enable-mpirun-prefix-by-default.

Thank you!

Best regards,
Sergei.
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] OpenMPI + InfiniBand

2016-12-23 Thread r...@open-mpi.org
Also check to ensure you are using the same version of OMPI on all nodes - this 
message usually means that a different version was used on at least one node.

> On Dec 23, 2016, at 1:58 AM, gil...@rist.or.jp wrote:
> 
>  Serguei,
> 
>  
> this looks like a very different issue, orted cannot be remotely started.
> 
>  
> that typically occurs if orted cannot find some dependencies
> 
> (the Open MPI libs and/or the compiler runtime)
> 
>  
> for example, from a node, ssh  orted should not fail because of 
> unresolved dependencies.
> 
> a simple trick is to replace
> 
> mpirun ...
> 
> with
> 
> `which mpirun` ...
> 
>  
> a better option (as long as you do not plan to relocate Open MPI install dir) 
> is to configure with
> 
> --enable-mpirun-prefix-by-default
> 
>  
> Cheers,
> 
>  
> Gilles
> 
> - Original Message -
> 
> Hi All !
> As there are no any positive changes with "UDSM + IPoIB" problem since my 
> previous post, 
> we installed IPoIB on the cluster and "No OpenFabrics connection..." error 
> doesn't appear more.
> But now OpenMPI reports about another problem:
> 
> In app ERROR OUTPUT stream:
> 
> [node2:14142] [[37935,0],0] ORTE_ERROR_LOG: Data unpack had inadequate space 
> in file base/plm_base_launch_support.c at line 1035
> 
> In app OUTPUT stream:
> 
> --
> ORTE was unable to reliably start one or more daemons.
> This usually is caused by:
> 
> * not finding the required libraries and/or binaries on
>   one or more nodes. Please check your PATH and LD_LIBRARY_PATH
>   settings, or configure OMPI with --enable-orterun-prefix-by-default
> 
> * lack of authority to execute on one or more specified nodes.
>   Please verify your allocation and authorities.
> 
> * the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
>   Please check with your sys admin to determine the correct location to use.
> 
> *  compilation of the orted with dynamic libraries when static are required
>   (e.g., on Cray). Please check your configure cmd line and consider using
>   one of the contrib/platform definitions for your system type.
> 
> * an inability to create a connection back to mpirun due to a
>   lack of common network interfaces and/or no route found between
>   them. Please check network connectivity (including firewalls
>   and network routing requirements).
> --
> 
> When I'm trying to run the task using single node - all works properly.
> But when I specify "run on 2 nodes", the problem appears.
> 
> I tried to run ping using IPoIB addresses and all hosts are resolved 
> properly, 
> ping requests and replies are going over IB without any problems.
> So all nodes (including head) see each other via IPoIB.
> But MPI app fails.
> 
> Same test task works perfect on all nodes being run with Ethernet transport 
> instead of InfiniBand.
> 
> P.S. We use Torque resource manager to enqueue MPI tasks.
> 
> Best regards,
> Sergei.
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] OpenMPI + InfiniBand

2016-12-23 Thread gilles
 Serguei,

this looks like a very different issue, orted cannot be remotely started.

that typically occurs if orted cannot find some dependencies

(the Open MPI libs and/or the compiler runtime)

for example, from a node, ssh  orted should not fail because 
of unresolved dependencies.

a simple trick is to replace

mpirun ...

with

`which mpirun` ...

a better option (as long as you do not plan to relocate Open MPI install 
dir) is to configure with

--enable-mpirun-prefix-by-default

Cheers,

Gilles

- Original Message -

Hi All !

As there are no any positive changes with "UDSM + IPoIB" problem 
since my previous post,
we installed IPoIB on the cluster and "No OpenFabrics connection..." 
error doesn't appear more.
But now OpenMPI reports about another problem:

In app ERROR OUTPUT stream:

[node2:14142] [[37935,0],0] ORTE_ERROR_LOG: Data unpack had 
inadequate space in file base/plm_base_launch_support.c at line 1035

In app OUTPUT stream:


--
ORTE was unable to reliably start one or more daemons.
This usually is caused by:

* not finding the required libraries and/or binaries on
  one or more nodes. Please check your PATH and LD_LIBRARY_PATH
  settings, or configure OMPI with --enable-orterun-prefix-by-
default

* lack of authority to execute on one or more specified nodes.
  Please verify your allocation and authorities.

* the inability to write startup files into /tmp (--tmpdir/orte_
tmpdir_base).
  Please check with your sys admin to determine the correct location 
to use.

*  compilation of the orted with dynamic libraries when static are 
required
  (e.g., on Cray). Please check your configure cmd line and consider 
using
  one of the contrib/platform definitions for your system type.

* an inability to create a connection back to mpirun due to a
  lack of common network interfaces and/or no route found between
  them. Please check network connectivity (including firewalls
  and network routing requirements).

--

When I'm trying to run the task using single node - all works 
properly.
But when I specify "run on 2 nodes", the problem appears.

I tried to run ping using IPoIB addresses and all hosts are resolved 
properly,
ping requests and replies are going over IB without any problems.
So all nodes (including head) see each other via IPoIB.
But MPI app fails.

Same test task works perfect on all nodes being run with Ethernet 
transport instead of InfiniBand.

P.S. We use Torque resource manager to enqueue MPI tasks.

Best regards,
Sergei.



___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] OpenMPI + InfiniBand

2016-12-22 Thread Sergei Hrushev
Hi All !

As there are no any positive changes with "UDSM + IPoIB" problem since my
previous post,
we installed IPoIB on the cluster and "No OpenFabrics connection..." error
doesn't appear more.
But now OpenMPI reports about another problem:

In app ERROR OUTPUT stream:

[node2:14142] [[37935,0],0] ORTE_ERROR_LOG: Data unpack had inadequate
space in file base/plm_base_launch_support.c at line 1035

In app OUTPUT stream:

--
ORTE was unable to reliably start one or more daemons.
This usually is caused by:

* not finding the required libraries and/or binaries on
  one or more nodes. Please check your PATH and LD_LIBRARY_PATH
  settings, or configure OMPI with --enable-orterun-prefix-by-default

* lack of authority to execute on one or more specified nodes.
  Please verify your allocation and authorities.

* the inability to write startup files into /tmp
(--tmpdir/orte_tmpdir_base).
  Please check with your sys admin to determine the correct location to use.

*  compilation of the orted with dynamic libraries when static are required
  (e.g., on Cray). Please check your configure cmd line and consider using
  one of the contrib/platform definitions for your system type.

* an inability to create a connection back to mpirun due to a
  lack of common network interfaces and/or no route found between
  them. Please check network connectivity (including firewalls
  and network routing requirements).
--

When I'm trying to run the task using single node - all works properly.
But when I specify "run on 2 nodes", the problem appears.

I tried to run ping using IPoIB addresses and all hosts are resolved
properly,
ping requests and replies are going over IB without any problems.
So all nodes (including head) see each other via IPoIB.
But MPI app fails.

Same test task works perfect on all nodes being run with Ethernet transport
instead of InfiniBand.

P.S. We use Torque resource manager to enqueue MPI tasks.

Best regards,
Sergei.
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] OpenMPI + InfiniBand

2016-11-02 Thread Sergei Hrushev
Hi Nathan!

UDCM does not require IPoIB. It should be working for you. Can you build
> Open MPI with --enable-debug and run with -mca btl_base_verbose 100 and
> create a gist with the output.
>
>
Ok, done:

https://gist.github.com/hsa-online/30bb27a90bb7b225b233cc2af11b3942


Best regards,
Sergei.
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] OpenMPI + InfiniBand

2016-11-01 Thread Nathan Hjelm

UDCM does not require IPoIB. It should be working for you. Can you build Open 
MPI with --enable-debug and run with -mca btl_base_verbose 100 and create a 
gist with the output.

-Nathan

On Nov 01, 2016, at 07:50 AM, Sergei Hrushev  wrote:


I haven't worked with InfiniBand for years, but I do believe that yes: you need 
IPoIB enabled on your IB devices to get the RDMA CM support to work.


Yes, I saw too that RDMA CM requires IP, but in my case OpenMPI reports that UD 
CM can't be used too.
Is it also require IPoIB?

Is it possible to read more about UD CM somewhere?


___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] OpenMPI + InfiniBand

2016-11-01 Thread Sergei Hrushev
>
> I actually just filed a Github issue to ask this exact question:
>
> https://github.com/open-mpi/ompi/issues/2326
>
>
Good idea, thanks!
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] OpenMPI + InfiniBand

2016-11-01 Thread Jeff Squyres (jsquyres)
I actually just filed a Github issue to ask this exact question:

https://github.com/open-mpi/ompi/issues/2326


> On Nov 1, 2016, at 9:49 AM, Sergei Hrushev  wrote:
> 
> 
> I haven't worked with InfiniBand for years, but I do believe that yes: you 
> need IPoIB enabled on your IB devices to get the RDMA CM support to work.
> 
> 
> Yes, I saw too that RDMA CM requires IP, but in my case OpenMPI reports that 
> UD CM can't be used too.
> Is it also require IPoIB?
> 
> Is it possible to read more about UD CM somewhere?
> 
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] OpenMPI + InfiniBand

2016-11-01 Thread Sergei Hrushev
>
>
> I haven't worked with InfiniBand for years, but I do believe that yes: you
> need IPoIB enabled on your IB devices to get the RDMA CM support to work.
>
>
Yes, I saw too that RDMA CM requires IP, but in my case OpenMPI reports
that UD CM can't be used too.
Is it also require IPoIB?

Is it possible to read more about UD CM somewhere?
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] OpenMPI + InfiniBand

2016-11-01 Thread Jeff Squyres (jsquyres)
On Nov 1, 2016, at 2:40 AM, Sergei Hrushev  wrote:
> 
> Yes, I tried to get this info already.
> And I saw in log that rdmacm wants IP address on port.
> So my question in topc start message was:
> Is it enough for OpenMPI to have RDMA only or IPoIB should also be
> installed?

Sorry; I joined the thread late.

I haven't worked with InfiniBand for years, but I do believe that yes: you need 
IPoIB enabled on your IB devices to get the RDMA CM support to work.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] OpenMPI + InfiniBand

2016-11-01 Thread Sergei Hrushev
Hi John !

I'm experimenting now with a head node and single compute node, all the
rest of cluster is switched off.

can you run :
>
> ibhosts
>

# ibhosts
Ca  : 0x7cfe900300bddec0 ports 1 "MT25408 ConnectX Mellanox
Technologies"
Ca  : 0xe41d2d030050caf0 ports 1 "MT25408 ConnectX Mellanox
Technologies"


>
> ibstat
>

# ibstat
CA 'mlx4_0'
CA type: MT4099
Number of ports: 1
Firmware version: 2.35.5100
Hardware version: 0
Node GUID: 0xe41d2d030050caf0
System image GUID: 0xe41d2d030050caf3
Port 1:
State: Active
Physical state: LinkUp
Rate: 56
Base lid: 1
LMC: 0
SM lid: 3
Capability mask: 0x0251486a
Port GUID: 0xe41d2d030050caf1
Link layer: InfiniBand


>
>
ibdiagnet
>
>
# ibdiagnet
# cat ibdiagnet.log
-W- Topology file is not specified.
Reports regarding cluster links will use direct routes.
-I- Using port 1 as the local port.
-I- Discovering ... 3 nodes (1 Switches & 2 CA-s) discovered.


-I---
-I- Bad Guids/LIDs Info
-I---
-I- No bad Guids were found

-I---
-I- Links With Logical State = INIT
-I---
-I- No bad Links (with logical state = INIT) were found

-I---
-I- General Device Info
-I---

-I---
-I- PM Counters Info
-I---
-I- No illegal PM counters values were found

-I---
-I- Fabric Partitions Report (see ibdiagnet.pkey for a full hosts list)
-I---
-I-PKey:0x7fff Hosts:2 full:2 limited:0

-I---
-I- IPoIB Subnets Check
-I---
-I- Subnet: IPv4 PKey:0x7fff QKey:0x0b1b MTU:2048Byte rate:10Gbps
SL:0x00
-W- Suboptimal rate for group. Lowest member rate:40Gbps > group-rate:10Gbps

-I---
-I- Bad Links Info
-I- No bad link were found
-I---

-I- Done. Run time was 2 seconds.


>
> Lord help me for being so naive, but do you have a subnet manager running?
>

It seems, yes (I even have standby):

# service --status-all | grep opensm
 [ + ]  opensm

# cat ibdiagnet.sm

ibdiagnet fabric SM report

  SM - master
MT25408/P1 lid=0x0003 guid=0x7cfe900300bddec1 dev=4099 priority:0

  SM - standby
The Local Device : MT25408/P1 lid=0x0001 guid=0xe41d2d030050caf1
dev=4099 priority:0

Best regards,
Sergei.
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] OpenMPI + InfiniBand

2016-11-01 Thread John Hearns via users
Segei,
can you run :

ibhosts

ibstat

ibdiagnet


Lord help me for being so naive, but do you have a subnet manager running?



On 1 November 2016 at 06:40, Sergei Hrushev  wrote:

> Hi Jeff !
>
> What does "ompi_info | grep openib" show?
>>
>>
> $ ompi_info | grep openib
>  MCA btl: openib (MCA v2.0.0, API v2.0.0, Component
> v1.10.2)
>
> Additionally, Mellanox provides alternate support through their MXM
>> libraries, if you want to try that.
>>
>
> Yes, I know.
> But we already have a hybrid cluster with OpenMPI, OpenMP, CUDA, Torque
> and many other libraries installed,
> and because it works perfect over Ethernet interconnect my idea was to add
> InfiniBand support with minimum
> of changes. Mainly because we already have some custom-written software
> for OpenMPI.
>
>
>> If that shows that you have the openib BTL plugin loaded, try running
>> with "mpirun --mca btl_base_verbose 100 ..."  That will provide additional
>> output about why / why not each point-to-point plugin is chosen.
>>
>>
> Yes, I tried to get this info already.
> And I saw in log that rdmacm wants IP address on port.
> So my question in topc start message was:
>
> Is it enough for OpenMPI to have RDMA only or IPoIB should also be
> installed?
>
> The mpirun output is:
>
> [node1:02674] mca: base: components_register: registering btl components
> [node1:02674] mca: base: components_register: found loaded component openib
> [node1:02674] mca: base: components_register: component openib register
> function successful
> [node1:02674] mca: base: components_register: found loaded component sm
> [node1:02674] mca: base: components_register: component sm register
> function successful
> [node1:02674] mca: base: components_register: found loaded component self
> [node1:02674] mca: base: components_register: component self register
> function successful
> [node1:02674] mca: base: components_open: opening btl components
> [node1:02674] mca: base: components_open: found loaded component openib
> [node1:02674] mca: base: components_open: component openib open function
> successful
> [node1:02674] mca: base: components_open: found loaded component sm
> [node1:02674] mca: base: components_open: component sm open function
> successful
> [node1:02674] mca: base: components_open: found loaded component self
> [node1:02674] mca: base: components_open: component self open function
> successful
> [node1:02674] select: initializing btl component openib
> [node1:02674] openib BTL: rdmacm IP address not found on port
> [node1:02674] openib BTL: rdmacm CPC unavailable for use on mlx4_0:1;
> skipped
> [node1:02674] select: init of component openib returned failure
> [node1:02674] mca: base: close: component openib closed
> [node1:02674] mca: base: close: unloading component openib
> [node1:02674] select: initializing btl component sm
> [node1:02674] select: init of component sm returned failure
> [node1:02674] mca: base: close: component sm closed
> [node1:02674] mca: base: close: unloading component sm
> [node1:02674] select: initializing btl component self
> [node1:02674] select: init of component self returned success
> [node1:02674] mca: bml: Using self btl to [[16642,1],0] on node node1
> [node1:02674] mca: base: close: component self closed
> [node1:02674] mca: base: close: unloading component self
>
> Best regards,
> Sergei.
>
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] OpenMPI + InfiniBand

2016-11-01 Thread Sergei Hrushev
Hi Jeff !

What does "ompi_info | grep openib" show?
>
>
$ ompi_info | grep openib
 MCA btl: openib (MCA v2.0.0, API v2.0.0, Component v1.10.2)

Additionally, Mellanox provides alternate support through their MXM
> libraries, if you want to try that.
>

Yes, I know.
But we already have a hybrid cluster with OpenMPI, OpenMP, CUDA, Torque and
many other libraries installed,
and because it works perfect over Ethernet interconnect my idea was to add
InfiniBand support with minimum
of changes. Mainly because we already have some custom-written software for
OpenMPI.


> If that shows that you have the openib BTL plugin loaded, try running with
> "mpirun --mca btl_base_verbose 100 ..."  That will provide additional
> output about why / why not each point-to-point plugin is chosen.
>
>
Yes, I tried to get this info already.
And I saw in log that rdmacm wants IP address on port.
So my question in topc start message was:

Is it enough for OpenMPI to have RDMA only or IPoIB should also be
installed?

The mpirun output is:

[node1:02674] mca: base: components_register: registering btl components
[node1:02674] mca: base: components_register: found loaded component openib
[node1:02674] mca: base: components_register: component openib register
function successful
[node1:02674] mca: base: components_register: found loaded component sm
[node1:02674] mca: base: components_register: component sm register
function successful
[node1:02674] mca: base: components_register: found loaded component self
[node1:02674] mca: base: components_register: component self register
function successful
[node1:02674] mca: base: components_open: opening btl components
[node1:02674] mca: base: components_open: found loaded component openib
[node1:02674] mca: base: components_open: component openib open function
successful
[node1:02674] mca: base: components_open: found loaded component sm
[node1:02674] mca: base: components_open: component sm open function
successful
[node1:02674] mca: base: components_open: found loaded component self
[node1:02674] mca: base: components_open: component self open function
successful
[node1:02674] select: initializing btl component openib
[node1:02674] openib BTL: rdmacm IP address not found on port
[node1:02674] openib BTL: rdmacm CPC unavailable for use on mlx4_0:1;
skipped
[node1:02674] select: init of component openib returned failure
[node1:02674] mca: base: close: component openib closed
[node1:02674] mca: base: close: unloading component openib
[node1:02674] select: initializing btl component sm
[node1:02674] select: init of component sm returned failure
[node1:02674] mca: base: close: component sm closed
[node1:02674] mca: base: close: unloading component sm
[node1:02674] select: initializing btl component self
[node1:02674] select: init of component self returned success
[node1:02674] mca: bml: Using self btl to [[16642,1],0] on node node1
[node1:02674] mca: base: close: component self closed
[node1:02674] mca: base: close: unloading component self

Best regards,
Sergei.
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] OpenMPI + InfiniBand

2016-10-31 Thread Jeff Squyres (jsquyres)
What does "ompi_info | grep openib" show?

Additionally, Mellanox provides alternate support through their MXM libraries, 
if you want to try that.

If that shows that you have the openib BTL plugin loaded, try running with 
"mpirun --mca btl_base_verbose 100 ..."  That will provide additional output 
about why / why not each point-to-point plugin is chosen.


> On Oct 30, 2016, at 10:35 PM, Sergei Hrushev  wrote:
> 
> Hi Gilles!
> 
> 
> is there any reason why you configure with --with-verbs-libdir=/usr/lib ?
> as far as i understand, --with-verbs should be enough, and /usr/lib
> nor /usr/local/lib should ever be used in the configure command line
> (and btw, are you running on a 32 bits system ? should the 64 bits
> libs be in /usr/lib64 ?)
> 
> I'm on Ubuntu 16.04 x86_64 and it has /usr/lib and /usr/lib32.
> As I understand /usr/lib is assumed to be /usr/lib64.
> So the library path is correct.
>  
> 
> make sure you
> ulimit -l unlimited
> before you invoke mpirun, and this value is correctly propagated to
> the remote nodes
> /* the failure could be a side effect of a low ulimit -l */
>  
> Yes, ulimit -l returns "unlimited".
> So this is also correct.
> 
> Best regards,
> Sergei.
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] OpenMPI + InfiniBand

2016-10-30 Thread Sergei Hrushev
Hi Gilles!


> is there any reason why you configure with --with-verbs-libdir=/usr/lib ?
> as far as i understand, --with-verbs should be enough, and /usr/lib
> nor /usr/local/lib should ever be used in the configure command line
> (and btw, are you running on a 32 bits system ? should the 64 bits
> libs be in /usr/lib64 ?)
>

I'm on Ubuntu 16.04 x86_64 and it has /usr/lib and /usr/lib32.
As I understand /usr/lib is assumed to be /usr/lib64.
So the library path is correct.


>
> make sure you
> ulimit -l unlimited
> before you invoke mpirun, and this value is correctly propagated to
> the remote nodes
> /* the failure could be a side effect of a low ulimit -l */
>

Yes, ulimit -l returns "unlimited".
So this is also correct.

Best regards,
Sergei.
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] OpenMPI + InfiniBand

2016-10-30 Thread Sergei Hrushev
>
> Sorry - shoot down my idea. Over to someone else (me hides head in shame)
>
>
No problem, thanks for your try!
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] OpenMPI + InfiniBand

2016-10-28 Thread Gilles Gouaillardet
Sergei,

is there any reason why you configure with --with-verbs-libdir=/usr/lib ?
as far as i understand, --with-verbs should be enough, and /usr/lib
nor /usr/local/lib should ever be used in the configure command line
(and btw, are you running on a 32 bits system ? should the 64 bits
libs be in /usr/lib64 ?)

make sure you
ulimit -l unlimited
before you invoke mpirun, and this value is correctly propagated to
the remote nodes
/* the failure could be a side effect of a low ulimit -l */

Cheers,

Gilles


On Fri, Oct 28, 2016 at 6:48 PM, Sergei Hrushev  wrote:
> Hello, All !
>
> We have a problem with OpenMPI version 1.10.2 on a cluster with newly
> installed Mellanox InfiniBand adapters.
> OpenMPI was re-configured and re-compiled using: --with-verbs
> --with-verbs-libdir=/usr/lib
>
> And our test MPI task returns proper results but it seems OpenMPI continues
> to use existing 1Gbit Ethernet network instead of InfiniBand.
>
> An output file contains these lines:
> --
> No OpenFabrics connection schemes reported that they were able to be
> used on a specific port.  As such, the openib BTL (OpenFabrics
> support) will be disabled for this port.
>
>   Local host:   node1
>   Local device: mlx4_0
>   Local port:   1
>   CPCs attempted:   rdmacm, udcm
> --
>
> InfiniBand network itself seems to be working:
>
> $ ibstat mlx4_0 shows:
>
> CA 'mlx4_0'
> CA type: MT4099
> Number of ports: 1
> Firmware version: 2.35.5100
> Hardware version: 0
> Node GUID: 0x7cfe900300bddec0
> System image GUID: 0x7cfe900300bddec3
> Port 1:
> State: Active
> Physical state: LinkUp
> Rate: 56
> Base lid: 3
> LMC: 0
> SM lid: 3
> Capability mask: 0x0251486a
> Port GUID: 0x7cfe900300bddec1
> Link layer: InfiniBand
>
> ibping also works.
> ibnetdiscover shows the correct topology of  IB network.
>
> Cluster works under Ubuntu 16.04 and we use drivers from OS (OFED is not
> installed).
>
> Is it enough for OpenMPI to have RDMA only or IPoIB should also be
> installed?
> What else can be checked?
>
> Thanks a lot for any help!
>
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] OpenMPI + InfiniBand

2016-10-28 Thread John Hearns via users
Sorry - shoot down my idea. Over to someone else (me hides head in shame)

On 28 October 2016 at 11:28, Sergei Hrushev  wrote:

> Sergei,   what does the command  "ibv_devinfo" return please?
>>
>> I had a recent case like this, but on Qlogic hardware.
>> Sorry if I am mixing things up.
>>
>>
> An output of ibv_devinfo from cluster's 1st node is:
>
> $ ibv_devinfo -d mlx4_0
> hca_id: mlx4_0
> transport:  InfiniBand (0)
> fw_ver: 2.35.5100
> node_guid:  7cfe:9003:00bd:dec0
> sys_image_guid: 7cfe:9003:00bd:dec3
> vendor_id:  0x02c9
> vendor_part_id: 4099
> hw_ver: 0x0
> board_id:   MT_1100120019
> phys_port_cnt:  1
> port:   1
> state:  PORT_ACTIVE (4)
> max_mtu:4096 (5)
> active_mtu: 4096 (5)
> sm_lid: 3
> port_lid:   3
> port_lmc:   0x00
> link_layer: InfiniBand
>
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] OpenMPI + InfiniBand

2016-10-28 Thread Sergei Hrushev
>
> Sergei,   what does the command  "ibv_devinfo" return please?
>
> I had a recent case like this, but on Qlogic hardware.
> Sorry if I am mixing things up.
>
>
An output of ibv_devinfo from cluster's 1st node is:

$ ibv_devinfo -d mlx4_0
hca_id: mlx4_0
transport:  InfiniBand (0)
fw_ver: 2.35.5100
node_guid:  7cfe:9003:00bd:dec0
sys_image_guid: 7cfe:9003:00bd:dec3
vendor_id:  0x02c9
vendor_part_id: 4099
hw_ver: 0x0
board_id:   MT_1100120019
phys_port_cnt:  1
port:   1
state:  PORT_ACTIVE (4)
max_mtu:4096 (5)
active_mtu: 4096 (5)
sm_lid: 3
port_lid:   3
port_lmc:   0x00
link_layer: InfiniBand
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] OpenMPI + InfiniBand

2016-10-28 Thread John Hearns via users
Sergei,   what does the command  "ibv_devinfo" return please?

I had a recent case like this, but on Qlogic hardware.
Sorry if I am mixing things up.

On 28 October 2016 at 10:48, Sergei Hrushev  wrote:

> Hello, All !
>
> We have a problem with OpenMPI version 1.10.2 on a cluster with newly
> installed Mellanox InfiniBand adapters.
> OpenMPI was re-configured and re-compiled using: --with-verbs
> --with-verbs-libdir=/usr/lib
>
> And our test MPI task returns proper results but it seems OpenMPI
> continues to use existing 1Gbit Ethernet network instead of InfiniBand.
>
> An output file contains these lines:
> --
> No OpenFabrics connection schemes reported that they were able to be
> used on a specific port.  As such, the openib BTL (OpenFabrics
> support) will be disabled for this port.
>
>   Local host:   node1
>   Local device: mlx4_0
>   Local port:   1
>   CPCs attempted:   rdmacm, udcm
> --
>
> InfiniBand network itself seems to be working:
>
> $ ibstat mlx4_0 shows:
>
> CA 'mlx4_0'
> CA type: MT4099
> Number of ports: 1
> Firmware version: 2.35.5100
> Hardware version: 0
> Node GUID: 0x7cfe900300bddec0
> System image GUID: 0x7cfe900300bddec3
> Port 1:
> State: Active
> Physical state: LinkUp
> Rate: 56
> Base lid: 3
> LMC: 0
> SM lid: 3
> Capability mask: 0x0251486a
> Port GUID: 0x7cfe900300bddec1
> Link layer: InfiniBand
>
> ibping also works.
> ibnetdiscover shows the correct topology of  IB network.
>
> Cluster works under Ubuntu 16.04 and we use drivers from OS (OFED is not
> installed).
>
> Is it enough for OpenMPI to have RDMA only or IPoIB should also be
> installed?
> What else can be checked?
>
> Thanks a lot for any help!
>
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

[OMPI users] OpenMPI + InfiniBand

2016-10-28 Thread Sergei Hrushev
Hello, All !

We have a problem with OpenMPI version 1.10.2 on a cluster with newly
installed Mellanox InfiniBand adapters.
OpenMPI was re-configured and re-compiled using: --with-verbs
--with-verbs-libdir=/usr/lib

And our test MPI task returns proper results but it seems OpenMPI continues
to use existing 1Gbit Ethernet network instead of InfiniBand.

An output file contains these lines:
--
No OpenFabrics connection schemes reported that they were able to be
used on a specific port.  As such, the openib BTL (OpenFabrics
support) will be disabled for this port.

  Local host:   node1
  Local device: mlx4_0
  Local port:   1
  CPCs attempted:   rdmacm, udcm
--

InfiniBand network itself seems to be working:

$ ibstat mlx4_0 shows:

CA 'mlx4_0'
CA type: MT4099
Number of ports: 1
Firmware version: 2.35.5100
Hardware version: 0
Node GUID: 0x7cfe900300bddec0
System image GUID: 0x7cfe900300bddec3
Port 1:
State: Active
Physical state: LinkUp
Rate: 56
Base lid: 3
LMC: 0
SM lid: 3
Capability mask: 0x0251486a
Port GUID: 0x7cfe900300bddec1
Link layer: InfiniBand

ibping also works.
ibnetdiscover shows the correct topology of  IB network.

Cluster works under Ubuntu 16.04 and we use drivers from OS (OFED is not
installed).

Is it enough for OpenMPI to have RDMA only or IPoIB should also be
installed?
What else can be checked?

Thanks a lot for any help!
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] openmpi+infiniband

2013-07-31 Thread christian schmitt
Sorry for this.

This was an try and ERROR ERROR Problem.
It was a mismatch of OFED versions and kernel updates.

Now I installed a fresh centOS 6.4 (with default kernel NO KENELUPDATE).
Then installed the official MELLANOX OFED Driver and compiled openMPI
(without options). And now it works fine.

Before it was a mismatch of first installing centOS 6.3, then updating
to 6.4 and maybe some files of the old OFED drivers survived.

Thank you

Christian


On 07/30/2013 04:40 PM, Reuti wrote:
> Am 30.07.2013 um 15:01 schrieb christian schmitt:
> 
>> I´m trying to get openmpi(1.6.5) running with/over infiniband.
>> My system is a centOS 6.3. I have installed the Mellanox OFED driver
>> (2.0) and everything seems working. ibhosts shows all hosts and the switch.
>> A "hca_self_test.ofed" shows:
>>
>>  Performing Adapter Device Self Test 
>> Number of CAs Detected . 1
>> PCI Device Check ... PASS
>> Kernel Arch  x86_64
>> Host Driver Version  MLNX_OFED_LINUX-2.0-2.0.5
>> (OFED-2.0-2.0.5): 2.6.32-279.el6.x86_64
>> Host Driver RPM Check .. PASS
>> Firmware on CA #0 VPI .. v2.11.500
>> Firmware Check on CA #0 (VPI) .. PASS
>> Host Driver Initialization . PASS"
>> Number of CA Ports Active .. 1
>> Port State of Port #1 on CA #0 (VPI). UP 4X QDR (InfiniBand)
>> Error Counter Check on CA #0 (VPI).. PASS
>> Kernel Syslog Check  PASS
>> Node GUID on CA #0 (VPI) ... 00:02:c9:03:00:1f:a4:e0
>>
>>
>> A "ompi_info | grep openib" shows:
>> MCA btl: openib (MCA v2.0, API v2.0, Component v1.6.5)
>>
>> So I now compiled openmpi with the option "--with-openib" and tried to
>> run the intel MPI test.
> 
> What's the "intel MPI test" - is this an application from Intel's MPI library 
> which is included as source and you recompiled it with Open MPI?
> 
> -- Reuti
> 
> 
>> But it still uses the Ethernet interface to
>> communicate. Only when I configure ipoib (ib0) and start my job with
>> "--mca btl ^openib --mca btl_tcp_if_include ib0" it runs with
>> infiniband. But when I´m right, it should work without the ib0 interface.
>> I´m quiet new to infiniband so maybe I forgot something.
>> I'm grateful for any information that help me solving this problem.
>>
>> Thank you,
>>
>> Christian
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 


Re: [OMPI users] openmpi+infiniband

2013-07-30 Thread christian schmitt
Hallo,

Thank you for this. When I start the mpi test with the option "--mca btl
openib,sm,self" I can start it on on node. But I can't start it on two
nodes. The Error then is:

 schmitt$ /amd/software/openmpi-1.6.5/cltest/bin/mpirun -n 2 -H
cluster1,cluster2 /worklocal/schmitt/imb/3.2.4/src/IMB-MPI1 SENDRECV
--
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[49963,1],0]) is on host: cluster1.gsc.ce.tu-darmstadt.de
  Process 2 ([[49963,1],1]) is on host: cluster2
  BTLs attempted: self sm

Your MPI job is now going to abort; sorry.
--
--
MPI_INIT has failed because at least one MPI process is unreachable
from another.  This *usually* means that an underlying communication
plugin -- such as a BTL or an MTL -- has either not loaded or not
allowed itself to be used.  Your MPI job will now abort.

You may wish to try to narrow down the problem;

 * Check the output of ompi_info to see which BTL/MTL plugins are
   available.
 * Run your application with MPI_THREAD_SINGLE.
 * Set the MCA parameter btl_base_verbose to 100 (or mtl_base_verbose,
   if using MTL-based communications) to see exactly which
   communication plugins were considered and/or discarded.
--
[cluster1.gsc.ce.tu-darmstadt.de:29116] *** An error occurred in MPI_Init
[cluster1.gsc.ce.tu-darmstadt.de:29116] *** on a NULL communicator
[cluster1.gsc.ce.tu-darmstadt.de:29116] *** Unknown error
[cluster1.gsc.ce.tu-darmstadt.de:29116] *** MPI_ERRORS_ARE_FATAL: your
MPI job will now abort
--
An MPI process is aborting at a time when it cannot guarantee that all
of its peer processes in the job will be killed properly.  You should
double check that everything has shut down cleanly.

  Reason: Before MPI_INIT completed
  Local host: cluster1.gsc.ce.tu-darmstadt.de
  PID:29116
--
--
mpirun has exited due to process rank 1 with PID 5194 on
node cluster2 exiting improperly. There are two reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--
[cluster1.gsc.ce.tu-darmstadt.de:29113] 1 more process has sent help
message help-mca-bml-r2.txt / unreachable proc
[cluster1.gsc.ce.tu-darmstadt.de:29113] Set MCA parameter
"orte_base_help_aggregate" to 0 to see all help / error messages
[cluster1.gsc.ce.tu-darmstadt.de:29113] 1 more process has sent help
message help-mpi-runtime / mpi_init:startup:pml-add-procs-fail
[cluster1.gsc.ce.tu-darmstadt.de:29113] 1 more process has sent help
message help-mpi-errors.txt / mpi_errors_are_fatal unknown handle
[cluster1.gsc.ce.tu-darmstadt.de:29113] 1 more process has sent help
message help-mpi-runtime.txt / ompi mpi abort:cannot guarantee all killed

It seems like the mpi doesn’t know how to communicate between the nodes.
Any idea?


Christian Schmitt
Network and Systemadministrator
Technische Universität Darmstadt
Graduate School of Computational Engineering
Dolivostraße 15, S4 10/326
64293 Darmstadt

Office: +49 (0)6151 / 16-4265
Fax:+49 (0)6151 / 16-4459

schm...@gsc.tu-darmstadt.de

http://www.graduate-school-ce.de/

On 07/30/2013 04:34 PM, Gus Correa wrote:
> Hi Christian
> 
> If I understand you right, you want to use Open MPI with
> Infiniband, not Ethernet, right?
> 
> If that is the case, try
> '-mca btl openib,sm,self'
> in your mpiexec command line.
> 
> I don't think ipoib is required for Open MPI.
> 
> See these FAQ (FAQ is the best OpenMPI documentation):
> http://www.open-mpi.org/faq/?category=openfabrics#ib-btl
> 
> I hope this helps,
> Gus Correa
> 
> On 07/30/2013 09:01 AM, christian schmitt wrote:
>> Hallo,
>>
>> I´m trying to get openmpi(1.6.5) 

Re: [OMPI users] openmpi+infiniband

2013-07-30 Thread Reuti
Am 30.07.2013 um 15:01 schrieb christian schmitt:

> I´m trying to get openmpi(1.6.5) running with/over infiniband.
> My system is a centOS 6.3. I have installed the Mellanox OFED driver
> (2.0) and everything seems working. ibhosts shows all hosts and the switch.
> A "hca_self_test.ofed" shows:
> 
>  Performing Adapter Device Self Test 
> Number of CAs Detected . 1
> PCI Device Check ... PASS
> Kernel Arch  x86_64
> Host Driver Version  MLNX_OFED_LINUX-2.0-2.0.5
> (OFED-2.0-2.0.5): 2.6.32-279.el6.x86_64
> Host Driver RPM Check .. PASS
> Firmware on CA #0 VPI .. v2.11.500
> Firmware Check on CA #0 (VPI) .. PASS
> Host Driver Initialization . PASS"
> Number of CA Ports Active .. 1
> Port State of Port #1 on CA #0 (VPI). UP 4X QDR (InfiniBand)
> Error Counter Check on CA #0 (VPI).. PASS
> Kernel Syslog Check  PASS
> Node GUID on CA #0 (VPI) ... 00:02:c9:03:00:1f:a4:e0
> 
> 
> A "ompi_info | grep openib" shows:
> MCA btl: openib (MCA v2.0, API v2.0, Component v1.6.5)
> 
> So I now compiled openmpi with the option "--with-openib" and tried to
> run the intel MPI test.

What's the "intel MPI test" - is this an application from Intel's MPI library 
which is included as source and you recompiled it with Open MPI?

-- Reuti


> But it still uses the Ethernet interface to
> communicate. Only when I configure ipoib (ib0) and start my job with
> "--mca btl ^openib --mca btl_tcp_if_include ib0" it runs with
> infiniband. But when I´m right, it should work without the ib0 interface.
> I´m quiet new to infiniband so maybe I forgot something.
> I'm grateful for any information that help me solving this problem.
> 
> Thank you,
> 
> Christian
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 




Re: [OMPI users] openmpi+infiniband

2013-07-30 Thread Gus Correa

Hi Christian

If I understand you right, you want to use Open MPI with
Infiniband, not Ethernet, right?

If that is the case, try
'-mca btl openib,sm,self'
in your mpiexec command line.

I don't think ipoib is required for Open MPI.

See these FAQ (FAQ is the best OpenMPI documentation):
http://www.open-mpi.org/faq/?category=openfabrics#ib-btl

I hope this helps,
Gus Correa

On 07/30/2013 09:01 AM, christian schmitt wrote:

Hallo,

I´m trying to get openmpi(1.6.5) running with/over infiniband.
My system is a centOS 6.3. I have installed the Mellanox OFED driver
(2.0) and everything seems working. ibhosts shows all hosts and the switch.
A "hca_self_test.ofed" shows:

 Performing Adapter Device Self Test 
Number of CAs Detected . 1
PCI Device Check ... PASS
Kernel Arch  x86_64
Host Driver Version  MLNX_OFED_LINUX-2.0-2.0.5
(OFED-2.0-2.0.5): 2.6.32-279.el6.x86_64
Host Driver RPM Check .. PASS
Firmware on CA #0 VPI .. v2.11.500
Firmware Check on CA #0 (VPI) .. PASS
Host Driver Initialization . PASS"
Number of CA Ports Active .. 1
Port State of Port #1 on CA #0 (VPI). UP 4X QDR (InfiniBand)
Error Counter Check on CA #0 (VPI).. PASS
Kernel Syslog Check  PASS
Node GUID on CA #0 (VPI) ... 00:02:c9:03:00:1f:a4:e0


A "ompi_info | grep openib" shows:
  MCA btl: openib (MCA v2.0, API v2.0, Component v1.6.5)

So I now compiled openmpi with the option "--with-openib" and tried to
run the intel MPI test. But it still uses the Ethernet interface to
communicate. Only when I configure ipoib (ib0) and start my job with
"--mca btl ^openib --mca btl_tcp_if_include ib0" it runs with
infiniband. But when I´m right, it should work without the ib0 interface.
I´m quiet new to infiniband so maybe I forgot something.
I'm grateful for any information that help me solving this problem.

Thank you,

Christian
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




[OMPI users] openmpi+infiniband

2013-07-30 Thread christian schmitt
Hallo,

I´m trying to get openmpi(1.6.5) running with/over infiniband.
My system is a centOS 6.3. I have installed the Mellanox OFED driver
(2.0) and everything seems working. ibhosts shows all hosts and the switch.
A "hca_self_test.ofed" shows:

 Performing Adapter Device Self Test 
Number of CAs Detected . 1
PCI Device Check ... PASS
Kernel Arch  x86_64
Host Driver Version  MLNX_OFED_LINUX-2.0-2.0.5
(OFED-2.0-2.0.5): 2.6.32-279.el6.x86_64
Host Driver RPM Check .. PASS
Firmware on CA #0 VPI .. v2.11.500
Firmware Check on CA #0 (VPI) .. PASS
Host Driver Initialization . PASS"
Number of CA Ports Active .. 1
Port State of Port #1 on CA #0 (VPI). UP 4X QDR (InfiniBand)
Error Counter Check on CA #0 (VPI).. PASS
Kernel Syslog Check  PASS
Node GUID on CA #0 (VPI) ... 00:02:c9:03:00:1f:a4:e0


A "ompi_info | grep openib" shows:
 MCA btl: openib (MCA v2.0, API v2.0, Component v1.6.5)

So I now compiled openmpi with the option "--with-openib" and tried to
run the intel MPI test. But it still uses the Ethernet interface to
communicate. Only when I configure ipoib (ib0) and start my job with
"--mca btl ^openib --mca btl_tcp_if_include ib0" it runs with
infiniband. But when I´m right, it should work without the ib0 interface.
I´m quiet new to infiniband so maybe I forgot something.
I'm grateful for any information that help me solving this problem.

Thank you,

Christian


Re: [OMPI users] [openMPI-infiniband] openMPI in IB network when openSM with LASH is running

2007-11-29 Thread Jeff Squyres

On Nov 29, 2007, at 12:08 AM, Keshetti Mahesh wrote:


There is work starting literally right about now to allow Open MPI to
use the RDMA CM and/or the IBCM for creating OpenFabrics connections
(IB or iWARP).


when this is expected to be completed?



It will not planned to be released until the v1.3 series is released.

See

http://www.open-mpi.org/community/lists/users/2007/11/4535.php
https://svn.open-mpi.org/trac/ompi/milestone/Open%20MPI%201.3

--
Jeff Squyres
Cisco Systems



Re: [OMPI users] [openMPI-infiniband] openMPI in IB network when openSM with LASH is running

2007-11-29 Thread Keshetti Mahesh
> There is work starting literally right about now to allow Open MPI to
> use the RDMA CM and/or the IBCM for creating OpenFabrics connections
> (IB or iWARP).

when this is expected to be completed?

-Mahesh