[OMPI devel] 答复: example/Hello_c.c : mpirun run failed on two physical nodes.
Hi, Thanks jeff, and I have not figured out what happened yet with this FAQ. 1. Ssh remote login OK: [root@bb-nsi-ib04 examples]# ssh ib03 hostname bb-nsi-ib03.bb01.*.com [root@bb-nsi-ib04 examples]# 2. following command return immediately without nothing returned [root@bb-nsi-ib04 examples]# mpirun -host ib03 hostname [root@bb-nsi-ib04 examples]# 3. following command excute successfully. [root@bb-nsi-ib04 examples]# ssh ib03 mpirun -- mpirun could not find anything to do. It is possible that you forgot to specify how many processes to run via the "-np" argument. -- [root@bb-nsi-ib04 examples]# So, does it seem like that the non-interactive shell profile is not correctly configured? Step 3 can execute succefully... Hope any response! BR Yanfei Wang -邮件原件- 发件人: devel [mailto:devel-boun...@open-mpi.org] 代表 Jeff Squyres (jsquyres) 发送时间: 2014年3月25日 22:09 收件人: Open MPI Developers 主题: Re: [OMPI devel] example/Hello_c.c : mpirun run failed on two physical nodes. Try this FAQ entry: http://www.open-mpi.org/faq/?category=running#diagnose-multi-host-problems On Mar 25, 2014, at 6:54 AM, "Wang,Yanfei(SYS)" wrote: > Hi, > > I am a fresh learner of OpenMPI programmer, and have some troubles on > building mpi programming, hope some helps.. > > The example/helloc_c can works successfully with 2 process on local machine, > however, do not work on two separate physical node. > > Physical two nodes: > Eg: > [root@bb-nsi-ib04 examples]# mpirun -hostfile hosts -np 2 hello_c The > command just return instantly without nothing printed. > Local machine: > [root@bb-nsi-ib04 examples]# mpirun -np 2 hello_c Hello, world, I am 0 > of 2, (Open MPI v1.7.5, package: Open MPI root@bb-nsi-ib04.bb01.*.com > Distribution, ident: 1.7.5, Mar 20, 2014, 108) Hello, world, I am 1 of > 2, (Open MPI v1.7.5, package: Open MPI root@bb-nsi-ib04.bb01.*.com > Distribution, ident: 1.7.5, Mar 20, 2014, 108) > [root@bb-nsi-ib04 examples]# > -peer machine is ok > [root@bb-nsi-ib03 examples]# mpirun -np 2 hello_c Hello, world, I am 0 > of 2, (Open MPI v1.7.5, package: Open MPI root@bb-nsi-ib03.bb01.*.com > Distribution, ident: 1.7.5, Mar 20, 2014, 108) Hello, world, I am 1 of > 2, (Open MPI v1.7.5, package: Open MPI root@bb-nsi-ib03.bb01.*.com > Distribution, ident: 1.7.5, Mar 20, 2014, 108) > [root@bb-nsi-ib03 examples]# > the command run successfully, and print two message!! > > Configuration details: > OpenMPI version: 1.7.5 > Hostfile: > [root@bb-nsi-ib04 examples]# cat hosts > ib03 slots=1 > ib04 slots=1 > [root@bb-nsi-ib04 examples]# > /etc/hosts: > [root@bb-nsi-ib04 examples]# cat /etc/hosts > 192.168.71.3 ib03 > 192.168.71.4 ib04 > SSH: > Public rsa key is redistributed two machine, ib03 and ib04, and to do ssh > login in without password is ok, I am sure. > > I am confused about this trouble, and anyone can help us? It have nothing > log and error tip, could anyone tell me how to do diagnose it. > > BR > > Yanfei Wang > > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/03/14385.php -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ ___ devel mailing list de...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel Link to this post: http://www.open-mpi.org/community/lists/devel/2014/03/14386.php
Re: [OMPI devel] 答复: example/Hello_c.c : mpirun run failed on two physical nodes.
Can you please configure OMPI with --enable-debug, and then execute mpirun -mca plm_base_verbose 10 -host ib03 hostname This will provide debug information about the problem. Thanks Ralph On Tue, Mar 25, 2014 at 9:51 PM, Wang,Yanfei(SYS) wrote: > Hi, > > Thanks jeff, and I have not figured out what happened yet with this FAQ. > > 1. Ssh remote login OK: > [root@bb-nsi-ib04 examples]# ssh ib03 hostname > bb-nsi-ib03.bb01.*.com > [root@bb-nsi-ib04 examples]# > 2. following command return immediately without nothing returned > [root@bb-nsi-ib04 examples]# mpirun -host ib03 hostname > [root@bb-nsi-ib04 examples]# > 3. following command excute successfully. > [root@bb-nsi-ib04 examples]# ssh ib03 mpirun > -- > mpirun could not find anything to do. > > It is possible that you forgot to specify how many processes to run > via the "-np" argument. > -- > [root@bb-nsi-ib04 examples]# > > So, does it seem like that the non-interactive shell profile is not > correctly configured? Step 3 can execute succefully... > > Hope any response! > > BR > > Yanfei Wang > > -邮件原件- > 发件人: devel [mailto:devel-boun...@open-mpi.org] 代表 Jeff Squyres (jsquyres) > 发送时间: 2014年3月25日 22:09 > 收件人: Open MPI Developers > 主题: Re: [OMPI devel] example/Hello_c.c : mpirun run failed on two physical > nodes. > > Try this FAQ entry: > > http://www.open-mpi.org/faq/?category=running#diagnose-multi-host-problems > > On Mar 25, 2014, at 6:54 AM, "Wang,Yanfei(SYS)" > wrote: > > > Hi, > > > > I am a fresh learner of OpenMPI programmer, and have some troubles on > building mpi programming, hope some helps.. > > > > The example/helloc_c can works successfully with 2 process on local > machine, however, do not work on two separate physical node. > > > > Physical two nodes: > > Eg: > > [root@bb-nsi-ib04 examples]# mpirun -hostfile hosts -np 2 hello_c The > > command just return instantly without nothing printed. > > Local machine: > > [root@bb-nsi-ib04 examples]# mpirun -np 2 hello_c Hello, world, I am 0 > > of 2, (Open MPI v1.7.5, package: Open MPI root@bb-nsi-ib04.bb01.*.com > > Distribution, ident: 1.7.5, Mar 20, 2014, 108) Hello, world, I am 1 of > > 2, (Open MPI v1.7.5, package: Open MPI root@bb-nsi-ib04.bb01.*.com > > Distribution, ident: 1.7.5, Mar 20, 2014, 108) > > [root@bb-nsi-ib04 examples]# > > -peer machine is ok > > [root@bb-nsi-ib03 examples]# mpirun -np 2 hello_c Hello, world, I am 0 > > of 2, (Open MPI v1.7.5, package: Open MPI root@bb-nsi-ib03.bb01.*.com > > Distribution, ident: 1.7.5, Mar 20, 2014, 108) Hello, world, I am 1 of > > 2, (Open MPI v1.7.5, package: Open MPI root@bb-nsi-ib03.bb01.*.com > > Distribution, ident: 1.7.5, Mar 20, 2014, 108) > > [root@bb-nsi-ib03 examples]# > > the command run successfully, and print two message!! > > > > Configuration details: > > OpenMPI version: 1.7.5 > > Hostfile: > > [root@bb-nsi-ib04 examples]# cat hosts > > ib03 slots=1 > > ib04 slots=1 > > [root@bb-nsi-ib04 examples]# > > /etc/hosts: > > [root@bb-nsi-ib04 examples]# cat /etc/hosts > > 192.168.71.3 ib03 > > 192.168.71.4 ib04 > > SSH: > > Public rsa key is redistributed two machine, ib03 and ib04, and to do > ssh login in without password is ok, I am sure. > > > > I am confused about this trouble, and anyone can help us? It have > nothing log and error tip, could anyone tell me how to do diagnose it. > > > > BR > > > > Yanfei Wang > > > > > > ___ > > devel mailing list > > de...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > > Link to this post: > > http://www.open-mpi.org/community/lists/devel/2014/03/14385.php > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/03/14386.php > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/03/14396.php
Re: [OMPI devel] Loading Open MPI from MPJ Express (Java) fails
1) By heterogeneous do you mean Derived Datatypes? MPJ Express's buffering layer handles this. It flattens the data into a ByteBuffer. In this way native device doesn't have to worry about Derived Datatypes (those things are handled at top layers). And an interesting thing, intuitively Java users would use the MPI.OBJECT if there is heterogeneous data to be sent (but yes, MPI.OBJECT is a slow case ...) Currently same goes for user defined Op-functions. Those are handled at the top layers, i.e using MPJ Express's algorithms not native MPI's (but communication is native). 2) API changes: Do you envision to document the changes to something like a mpiJava 1.3 specs or something? 3) New Benchmark Results: I did the benchmarking again with various configurations: i) Open MPI 1.7.4 C ii) MVAPICH2.2 C iii) MPJ Express (using Open MPI - with arrays) iv) Open MPI's Java Bindings (with a large user array -- the unoptimized case) v) Open MPI's Java Bindings (with arrays, where size of the user array is equal to the data point, to be fair) vi) MPJ Express (using MVAPICH2.2 - with arrays) vii) Open MPI's Java Bindings (using MPI.newBuffer, ByteBuffer) viii) MPJ Express (using Open MPI - with ByteBuffer, this is from the device layer of MPJ Express, this helps see how MPJ Express could perform if in future we add MPI.newBuffer like functionality) ix) MPJ Express (using MVAPICH2.2 - with ByteBuffer) --> currently I don't know how it performs better than Open MPI? Bibrak Qamar On Mon, Mar 24, 2014 at 10:16 PM, Jeff Squyres (jsquyres) < jsquy...@cisco.com> wrote: > On Mar 14, 2014, at 9:29 AM, Bibrak Qamar wrote: > > > It works for Open MPI but for MPICH3 I have to comment the dlopen. Is > there any way to tell the compiler if its using Open MPI (mpicc) then use > dlopen else keep it commented? Or some thing else? > > If Open MPI's mpi.h, we have defined OPEN_MPI. You can therefore use #if > defined (OPEN_MPI). > > > Yes, there are some places where we need to be sync with the internals > of the native MPI implementation. These are in section 8.1.2 of MPI 2.1 ( > http://www.mpi-forum.org/docs/mpi-2.1/mpi21-report.pdf). For example the > MPI_TAG_UB. For the pure Java devices of MPJ Express we have always used > Integer.MAX_VALUE. > > > > Datatypes? > > > > MPJ Express uses an internal buffering layer to buffer the user data > into a ByteBuffer. In this way for the native device we end up using the > MPI_BYTE datatype most of the time. ByteBuffer simplifies matters since it > is directly accessible from the native code. > > Does that mean you can't do heterogeneous? (not really a huge deal, since > most people don't run heterogeneously, but something to consider) > > > With our current implementation there is one exception to it i.e. in the > Reduce, Allreduce and Reduce_scatter where the native MPI implementation > needs to know which Java datatype its going to process. Same goes for MPI.Op > > And Accumulate and the other Op-using functions, right? > > > On Are your bindings similar in style/signature to ours? > > No, they use the real datatypes. > > > I checked it and there are differences. MPJ Express (and FastMPJ also) > implements the mpiJava 1.2 specifications. There is also MPJ API (this is > very close to mpiJava 1.2 API). > > > > Example 1: Getting the rank and size of COMM_WORLD > > > > MPJ Express (the mpiJava 1.2 API): > > public int Size() throws MPIException; > > public int Rank() throws MPIException; > > > > MPJ API: > > public int size() throws MPJException; > > public int rank() throws MPJException; > > > > Open MPI's Java bindings: > > public final int getRank() throws MPIException; > > public final int getSize() throws MPIException; > > Right -- we *started* with the old ideas, but then made the conscious > choice to update the Java bindings in a few ways: > > - make them more like modern Java conventions (e.g., camel case, use > verbs, etc.) > - get rid of MPI.OBJECT > - use modern, efficient Java practices > - basically, we didn't want to be bound by any Java decisions that were > made long ago that aren't necessarily relevant any more > - and to be clear: we couldn't find many existing Java MPI codes, so > compatibility with existing Java MPI codes was not a big concern > > One thing we didn't do was use bounce buffers for small messages, which > shows up in your benchmarks. We're considering adding that optimization, > and others. > > > Example 2: Point-to-Point communication > > > > MPJ Express (the mpiJava 1.2 API): > > public void Send(Object buf, int offset, int count, Datatype datatype, > int dest, int tag) throws MPIException > > > > public Status Recv(Object buf, int offset, int count, Datatype datatype, > > int source, int tag) throws MPIException > > > > MPJ API: > > public void send(Object buf, int offset, int count, Datatype datatype, > int dest, int tag) throws MPJExcep