Re: [OMPI users] Infiniband Question

2010-02-05 Thread Jeff Squyres
Yep -- it's normal.

Those IP addresses are used for bootstrapping/startup, not for MPI traffic.  In 
particular, that "HNP URI" stuff is used by Open MPI's underlying run-time 
environment.  It's not used by the MPI layer at all.


On Feb 5, 2010, at 2:32 PM, Mike Hanby wrote:

> Howdy,
> 
> When running a Gromacs job using OpenMPI 1.4.1 on Infiniband enabled nodes, 
> I'm seeing the following process listing:
> 
> \_ -bash /opt/gridengine/default/spool/compute-0-3/job_scripts/97037
> \_ mpirun -np 4 mdrun_mpi -v -np 4 -s production-Npt-323K_4CPU -o 
> production-Npt-323K_4CPU -c production-Npt-323K_4CPU -x 
> production-Npt-323K_4CPU -g production-Npt-323K_4CPU.log
> \_ /opt/gridengine/bin/lx26-amd64/qrsh -inherit -nostdin -V 
> compute-0-4.local  orted -mca ess env -mca orte_ess_jobid 945881088
> -mca orte_ess_vpid 1 -mca orte_ess_num_procs 4 --hnp-uri 
> "945881088.0;tcp://192.168.20.252:39440;tcp://192.168.21.252:39440"
> \_ /opt/gridengine/bin/lx26-amd64/qrsh -inherit -nostdin -V 
> compute-0-2.local  orted -mca ess env -mca orte_ess_jobid 945881088
> -mca orte_ess_vpid 2 -mca orte_ess_num_procs 4 --hnp-uri 
> "945881088.0;tcp://192.168.20.252:39440;tcp://192.168.21.252:39440"
> \_ /opt/gridengine/bin/lx26-amd64/qrsh -inherit -nostdin -V 
> compute-0-1.local  orted -mca ess env -mca orte_ess_jobid 945881088
> -mca orte_ess_vpid 3 -mca orte_ess_num_procs 4 --hnp-uri 
> "945881088.0;tcp://192.168.20.252:39440;tcp://192.168.21.252:39440"
> \_ mdrun_mpi -v -np 4 -s production-Npt-323K_4CPU -o 
> production-Npt-323K_4CPU -c production-Npt-323K_4CPU
> -x production-Npt-323K_4CPU -g production-Npt-323K_4CPU.log
> 
> Is it normal for these tcp addresses to be listed if the job is using 
> Infiniband?
> 
> The 192.168.20.x subnet is the eth0 GigE network
> And the 192.168.21.x subnet is the ib0 IPoverIB network
> 
> Or is this job actually using TCPIP over Infiniband / GigE?
> 
> I'm running mpirun without any special fabric includes / excludes.
> 
> ompi_info lists openib as a valid fabric:
> $ ompi_info |grep openib
>  MCA btl: openib (MCA v2.0, API v2.0, Component v1.4.1)
> 
> Thanks for any insight,
> 
> Mike
> =
> Mike Hanby
> mha...@uab.edu
> Information Systems Specialist II
> IT HPCS / Research Computing
> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 


-- 
Jeff Squyres
jsquy...@cisco.com

For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




[OMPI users] Infiniband Question

2010-02-05 Thread Mike Hanby
Howdy,

When running a Gromacs job using OpenMPI 1.4.1 on Infiniband enabled nodes, I'm 
seeing the following process listing:

\_ -bash /opt/gridengine/default/spool/compute-0-3/job_scripts/97037
\_ mpirun -np 4 mdrun_mpi -v -np 4 -s production-Npt-323K_4CPU -o 
production-Npt-323K_4CPU -c production-Npt-323K_4CPU -x 
production-Npt-323K_4CPU -g production-Npt-323K_4CPU.log
\_ /opt/gridengine/bin/lx26-amd64/qrsh -inherit -nostdin -V 
compute-0-4.local  orted -mca ess env -mca orte_ess_jobid 945881088
-mca orte_ess_vpid 1 -mca orte_ess_num_procs 4 --hnp-uri 
"945881088.0;tcp://192.168.20.252:39440;tcp://192.168.21.252:39440"
\_ /opt/gridengine/bin/lx26-amd64/qrsh -inherit -nostdin -V 
compute-0-2.local  orted -mca ess env -mca orte_ess_jobid 945881088
-mca orte_ess_vpid 2 -mca orte_ess_num_procs 4 --hnp-uri 
"945881088.0;tcp://192.168.20.252:39440;tcp://192.168.21.252:39440"
\_ /opt/gridengine/bin/lx26-amd64/qrsh -inherit -nostdin -V 
compute-0-1.local  orted -mca ess env -mca orte_ess_jobid 945881088
-mca orte_ess_vpid 3 -mca orte_ess_num_procs 4 --hnp-uri 
"945881088.0;tcp://192.168.20.252:39440;tcp://192.168.21.252:39440"
\_ mdrun_mpi -v -np 4 -s production-Npt-323K_4CPU -o 
production-Npt-323K_4CPU -c production-Npt-323K_4CPU
-x production-Npt-323K_4CPU -g production-Npt-323K_4CPU.log

Is it normal for these tcp addresses to be listed if the job is using 
Infiniband?

The 192.168.20.x subnet is the eth0 GigE network
And the 192.168.21.x subnet is the ib0 IPoverIB network

Or is this job actually using TCPIP over Infiniband / GigE?

I'm running mpirun without any special fabric includes / excludes.

ompi_info lists openib as a valid fabric:
$ ompi_info |grep openib
 MCA btl: openib (MCA v2.0, API v2.0, Component v1.4.1)

Thanks for any insight,

Mike
=
Mike Hanby
mha...@uab.edu
Information Systems Specialist II
IT HPCS / Research Computing





Re: [OMPI users] infiniband question

2009-09-17 Thread Jeff Squyres
Correct, you don't need DAPL.  Can you send all the information listed  
here:


http://www.open-mpi.org/community/help/


On Sep 17, 2009, at 9:17 AM, Yann JOBIC wrote:


Hi,

I'm new to infiniband.
I installed the rdma_cm, rdma_ucm and ib_uverbs kernel modules.

When i'm running a ring test openmpi code, i've got :
[Lidia][0,1,1][btl_openib_endpoint.c: 
992:mca_btl_openib_endpoint_qp_init_query]

Set MTU to IBV value 4 (2048 bytes)
[Lidia][0,1,1][btl_openib_endpoint.c: 
992:mca_btl_openib_endpoint_qp_init_query]

Set MTU to IBV value 4 (2048 bytes)
[Lilou][0,1,0][btl_openib_endpoint.c: 
992:mca_btl_openib_endpoint_qp_init_query]

Set MTU to IBV value 4 (2048 bytes)
[Lilou][0,1,0][btl_openib_endpoint.c: 
992:mca_btl_openib_endpoint_qp_init_query]

Set MTU to IBV value 4 (2048 bytes)

And then, the program hangs.

I thought i only need rdma communications, and don't need the DALP lib
(with the iboip module).

I am wrong ?

Thanks,

Yann



--
___

Yann JOBIC
HPC engineer
Polytech Marseille DME
IUSTI-CNRS UMR 6595
Technopôle de Château Gombert
5 rue Enrico Fermi
13453 Marseille cedex 13
Tel : (33) 4 91 10 69 39
  ou  (33) 4 91 10 69 43
Fax : (33) 4 91 10 69 69

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




--
Jeff Squyres
jsquy...@cisco.com




[OMPI users] infiniband question

2009-09-17 Thread Yann JOBIC

Hi,

I'm new to infiniband.
I installed the rdma_cm, rdma_ucm and ib_uverbs kernel modules.

When i'm running a ring test openmpi code, i've got :
[Lidia][0,1,1][btl_openib_endpoint.c:992:mca_btl_openib_endpoint_qp_init_query] 
Set MTU to IBV value 4 (2048 bytes)
[Lidia][0,1,1][btl_openib_endpoint.c:992:mca_btl_openib_endpoint_qp_init_query] 
Set MTU to IBV value 4 (2048 bytes)
[Lilou][0,1,0][btl_openib_endpoint.c:992:mca_btl_openib_endpoint_qp_init_query] 
Set MTU to IBV value 4 (2048 bytes)
[Lilou][0,1,0][btl_openib_endpoint.c:992:mca_btl_openib_endpoint_qp_init_query] 
Set MTU to IBV value 4 (2048 bytes)


And then, the program hangs.

I thought i only need rdma communications, and don't need the DALP lib 
(with the iboip module).


I am wrong ?

Thanks,

Yann



--
___

Yann JOBIC
HPC engineer
Polytech Marseille DME
IUSTI-CNRS UMR 6595
Technopôle de Château Gombert
5 rue Enrico Fermi
13453 Marseille cedex 13
Tel : (33) 4 91 10 69 39
 ou  (33) 4 91 10 69 43
Fax : (33) 4 91 10 69 69