Re: [OMPI users] OpenMPI + InfiniBand
Hi Gilles! > this looks like a very different issue, orted cannot be remotely started. > ... > > a better option (as long as you do not plan to relocate Open MPI install > dir) is to configure with > > --enable-mpirun-prefix-by-default > Yes, that's was a problem with orted. I checked PATH and LD_LIBRARY_PATH variables and both are specified, but it was not enough! So I added --enable-mpirun-prefix-by-default to configure and even when --prefix isn't specified the recompiled version woks properly. When Ethernet transfer is used, all works both with and without --enable-mpirun-prefix-by-default. Thank you! Best regards, Sergei. ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] OpenMPI + InfiniBand
Hi All ! As there are no any positive changes with "UDSM + IPoIB" problem since my previous post, we installed IPoIB on the cluster and "No OpenFabrics connection..." error doesn't appear more. But now OpenMPI reports about another problem: In app ERROR OUTPUT stream: [node2:14142] [[37935,0],0] ORTE_ERROR_LOG: Data unpack had inadequate space in file base/plm_base_launch_support.c at line 1035 In app OUTPUT stream: -- ORTE was unable to reliably start one or more daemons. This usually is caused by: * not finding the required libraries and/or binaries on one or more nodes. Please check your PATH and LD_LIBRARY_PATH settings, or configure OMPI with --enable-orterun-prefix-by-default * lack of authority to execute on one or more specified nodes. Please verify your allocation and authorities. * the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base). Please check with your sys admin to determine the correct location to use. * compilation of the orted with dynamic libraries when static are required (e.g., on Cray). Please check your configure cmd line and consider using one of the contrib/platform definitions for your system type. * an inability to create a connection back to mpirun due to a lack of common network interfaces and/or no route found between them. Please check network connectivity (including firewalls and network routing requirements). -- When I'm trying to run the task using single node - all works properly. But when I specify "run on 2 nodes", the problem appears. I tried to run ping using IPoIB addresses and all hosts are resolved properly, ping requests and replies are going over IB without any problems. So all nodes (including head) see each other via IPoIB. But MPI app fails. Same test task works perfect on all nodes being run with Ethernet transport instead of InfiniBand. P.S. We use Torque resource manager to enqueue MPI tasks. Best regards, Sergei. ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] OpenMPI + InfiniBand
Hi Nathan! UDCM does not require IPoIB. It should be working for you. Can you build > Open MPI with --enable-debug and run with -mca btl_base_verbose 100 and > create a gist with the output. > > Ok, done: https://gist.github.com/hsa-online/30bb27a90bb7b225b233cc2af11b3942 Best regards, Sergei. ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] OpenMPI + InfiniBand
> > I actually just filed a Github issue to ask this exact question: > > https://github.com/open-mpi/ompi/issues/2326 > > Good idea, thanks! ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] OpenMPI + InfiniBand
> > > I haven't worked with InfiniBand for years, but I do believe that yes: you > need IPoIB enabled on your IB devices to get the RDMA CM support to work. > > Yes, I saw too that RDMA CM requires IP, but in my case OpenMPI reports that UD CM can't be used too. Is it also require IPoIB? Is it possible to read more about UD CM somewhere? ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] OpenMPI + InfiniBand
Hi John ! I'm experimenting now with a head node and single compute node, all the rest of cluster is switched off. can you run : > > ibhosts > # ibhosts Ca : 0x7cfe900300bddec0 ports 1 "MT25408 ConnectX Mellanox Technologies" Ca : 0xe41d2d030050caf0 ports 1 "MT25408 ConnectX Mellanox Technologies" > > ibstat > # ibstat CA 'mlx4_0' CA type: MT4099 Number of ports: 1 Firmware version: 2.35.5100 Hardware version: 0 Node GUID: 0xe41d2d030050caf0 System image GUID: 0xe41d2d030050caf3 Port 1: State: Active Physical state: LinkUp Rate: 56 Base lid: 1 LMC: 0 SM lid: 3 Capability mask: 0x0251486a Port GUID: 0xe41d2d030050caf1 Link layer: InfiniBand > > ibdiagnet > > # ibdiagnet # cat ibdiagnet.log -W- Topology file is not specified. Reports regarding cluster links will use direct routes. -I- Using port 1 as the local port. -I- Discovering ... 3 nodes (1 Switches & 2 CA-s) discovered. -I--- -I- Bad Guids/LIDs Info -I--- -I- No bad Guids were found -I--- -I- Links With Logical State = INIT -I--- -I- No bad Links (with logical state = INIT) were found -I--- -I- General Device Info -I--- -I--- -I- PM Counters Info -I--- -I- No illegal PM counters values were found -I--- -I- Fabric Partitions Report (see ibdiagnet.pkey for a full hosts list) -I--- -I-PKey:0x7fff Hosts:2 full:2 limited:0 -I--- -I- IPoIB Subnets Check -I--- -I- Subnet: IPv4 PKey:0x7fff QKey:0x0b1b MTU:2048Byte rate:10Gbps SL:0x00 -W- Suboptimal rate for group. Lowest member rate:40Gbps > group-rate:10Gbps -I--- -I- Bad Links Info -I- No bad link were found -I--- -I- Done. Run time was 2 seconds. > > Lord help me for being so naive, but do you have a subnet manager running? > It seems, yes (I even have standby): # service --status-all | grep opensm [ + ] opensm # cat ibdiagnet.sm ibdiagnet fabric SM report SM - master MT25408/P1 lid=0x0003 guid=0x7cfe900300bddec1 dev=4099 priority:0 SM - standby The Local Device : MT25408/P1 lid=0x0001 guid=0xe41d2d030050caf1 dev=4099 priority:0 Best regards, Sergei. ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] OpenMPI + InfiniBand
Hi Jeff ! What does "ompi_info | grep openib" show? > > $ ompi_info | grep openib MCA btl: openib (MCA v2.0.0, API v2.0.0, Component v1.10.2) Additionally, Mellanox provides alternate support through their MXM > libraries, if you want to try that. > Yes, I know. But we already have a hybrid cluster with OpenMPI, OpenMP, CUDA, Torque and many other libraries installed, and because it works perfect over Ethernet interconnect my idea was to add InfiniBand support with minimum of changes. Mainly because we already have some custom-written software for OpenMPI. > If that shows that you have the openib BTL plugin loaded, try running with > "mpirun --mca btl_base_verbose 100 ..." That will provide additional > output about why / why not each point-to-point plugin is chosen. > > Yes, I tried to get this info already. And I saw in log that rdmacm wants IP address on port. So my question in topc start message was: Is it enough for OpenMPI to have RDMA only or IPoIB should also be installed? The mpirun output is: [node1:02674] mca: base: components_register: registering btl components [node1:02674] mca: base: components_register: found loaded component openib [node1:02674] mca: base: components_register: component openib register function successful [node1:02674] mca: base: components_register: found loaded component sm [node1:02674] mca: base: components_register: component sm register function successful [node1:02674] mca: base: components_register: found loaded component self [node1:02674] mca: base: components_register: component self register function successful [node1:02674] mca: base: components_open: opening btl components [node1:02674] mca: base: components_open: found loaded component openib [node1:02674] mca: base: components_open: component openib open function successful [node1:02674] mca: base: components_open: found loaded component sm [node1:02674] mca: base: components_open: component sm open function successful [node1:02674] mca: base: components_open: found loaded component self [node1:02674] mca: base: components_open: component self open function successful [node1:02674] select: initializing btl component openib [node1:02674] openib BTL: rdmacm IP address not found on port [node1:02674] openib BTL: rdmacm CPC unavailable for use on mlx4_0:1; skipped [node1:02674] select: init of component openib returned failure [node1:02674] mca: base: close: component openib closed [node1:02674] mca: base: close: unloading component openib [node1:02674] select: initializing btl component sm [node1:02674] select: init of component sm returned failure [node1:02674] mca: base: close: component sm closed [node1:02674] mca: base: close: unloading component sm [node1:02674] select: initializing btl component self [node1:02674] select: init of component self returned success [node1:02674] mca: bml: Using self btl to [[16642,1],0] on node node1 [node1:02674] mca: base: close: component self closed [node1:02674] mca: base: close: unloading component self Best regards, Sergei. ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] OpenMPI + InfiniBand
Hi Gilles! > is there any reason why you configure with --with-verbs-libdir=/usr/lib ? > as far as i understand, --with-verbs should be enough, and /usr/lib > nor /usr/local/lib should ever be used in the configure command line > (and btw, are you running on a 32 bits system ? should the 64 bits > libs be in /usr/lib64 ?) > I'm on Ubuntu 16.04 x86_64 and it has /usr/lib and /usr/lib32. As I understand /usr/lib is assumed to be /usr/lib64. So the library path is correct. > > make sure you > ulimit -l unlimited > before you invoke mpirun, and this value is correctly propagated to > the remote nodes > /* the failure could be a side effect of a low ulimit -l */ > Yes, ulimit -l returns "unlimited". So this is also correct. Best regards, Sergei. ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] OpenMPI + InfiniBand
> > Sorry - shoot down my idea. Over to someone else (me hides head in shame) > > No problem, thanks for your try! ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] OpenMPI + InfiniBand
> > Sergei, what does the command "ibv_devinfo" return please? > > I had a recent case like this, but on Qlogic hardware. > Sorry if I am mixing things up. > > An output of ibv_devinfo from cluster's 1st node is: $ ibv_devinfo -d mlx4_0 hca_id: mlx4_0 transport: InfiniBand (0) fw_ver: 2.35.5100 node_guid: 7cfe:9003:00bd:dec0 sys_image_guid: 7cfe:9003:00bd:dec3 vendor_id: 0x02c9 vendor_part_id: 4099 hw_ver: 0x0 board_id: MT_1100120019 phys_port_cnt: 1 port: 1 state: PORT_ACTIVE (4) max_mtu:4096 (5) active_mtu: 4096 (5) sm_lid: 3 port_lid: 3 port_lmc: 0x00 link_layer: InfiniBand ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
[OMPI users] OpenMPI + InfiniBand
Hello, All ! We have a problem with OpenMPI version 1.10.2 on a cluster with newly installed Mellanox InfiniBand adapters. OpenMPI was re-configured and re-compiled using: --with-verbs --with-verbs-libdir=/usr/lib And our test MPI task returns proper results but it seems OpenMPI continues to use existing 1Gbit Ethernet network instead of InfiniBand. An output file contains these lines: -- No OpenFabrics connection schemes reported that they were able to be used on a specific port. As such, the openib BTL (OpenFabrics support) will be disabled for this port. Local host: node1 Local device: mlx4_0 Local port: 1 CPCs attempted: rdmacm, udcm -- InfiniBand network itself seems to be working: $ ibstat mlx4_0 shows: CA 'mlx4_0' CA type: MT4099 Number of ports: 1 Firmware version: 2.35.5100 Hardware version: 0 Node GUID: 0x7cfe900300bddec0 System image GUID: 0x7cfe900300bddec3 Port 1: State: Active Physical state: LinkUp Rate: 56 Base lid: 3 LMC: 0 SM lid: 3 Capability mask: 0x0251486a Port GUID: 0x7cfe900300bddec1 Link layer: InfiniBand ibping also works. ibnetdiscover shows the correct topology of IB network. Cluster works under Ubuntu 16.04 and we use drivers from OS (OFED is not installed). Is it enough for OpenMPI to have RDMA only or IPoIB should also be installed? What else can be checked? Thanks a lot for any help! ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users