Re: [OMPI users] MPI::Intracomm::Spawn and cluster configuration
Hi. I know this is an old thread, but I'm curious if there are any tutorials describing how to set this up? Is this still available on newer open mpi versions? Thanks, Brian On Fri, Jan 4, 2008 at 7:57 AM, Ralph Castain wrote: > Hi Elena > > I'm copying this to the user list just to correct a mis-statement on my part > in an earlier message that went there. I had stated that a singleton could > comm_spawn onto other nodes listed in a hostfile by setting an environmental > variable that pointed us to the hostfile. > > This is incorrect in the 1.2 code series. That series does not allow > singletons to read a hostfile at all. Hence, any comm_spawn done by a > singleton can only launch child processes on the singleton's local host. > > This situation has been corrected for the upcoming 1.3 code series. For the > 1.2 series, though, you will have to do it via an mpirun command line. > > Sorry for the confusion - I sometimes have too many code families to keep > straight in this old mind! > > Ralph > > > On 1/4/08 5:10 AM, "Elena Zhebel" wrote: > >> Hello Ralph, >> >> Thank you very much for the explanations. >> But I still do not get it running... >> >> For the case >> mpirun -n 1 -hostfile my_hostfile -host my_master_host my_master.exe >> everything works. >> >> For the case >> ./my_master.exe >> it does not. >> >> I did: >> - create my_hostfile and put it in the $HOME/.openmpi/components/ >> my_hostfile : >> bollenstreek slots=2 max_slots=3 >> octocore01 slots=8 max_slots=8 >> octocore02 slots=8 max_slots=8 >> clstr000 slots=2 max_slots=3 >> clstr001 slots=2 max_slots=3 >> clstr002 slots=2 max_slots=3 >> clstr003 slots=2 max_slots=3 >> clstr004 slots=2 max_slots=3 >> clstr005 slots=2 max_slots=3 >> clstr006 slots=2 max_slots=3 >> clstr007 slots=2 max_slots=3 >> - setenv OMPI_MCA_rds_hostfile_path my_hostfile (I put it in .tcshrc and >> then source .tcshrc) >> - in my_master.cpp I did >> MPI_Info info1; >> MPI_Info_create(&info1); >> char* hostname = >> "clstr002,clstr003,clstr005,clstr006,clstr007,octocore01,octocore02"; >> MPI_Info_set(info1, "host", hostname); >> >> _intercomm = intracomm.Spawn("./childexe", argv1, _nProc, info1, 0, >> MPI_ERRCODES_IGNORE); >> >> - After I call the executable, I've got this error message >> >> bollenstreek: > ./my_master >> number of processes to run: 1 >> -- >> Some of the requested hosts are not included in the current allocation for >> the application: >> ./childexe >> The requested hosts were: >> clstr002,clstr003,clstr005,clstr006,clstr007,octocore01,octocore02 >> >> Verify that you have mapped the allocated resources properly using the >> --host specification. >> -- >> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of resource in file >> base/rmaps_base_support_fns.c at line 225 >> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of resource in file >> rmaps_rr.c at line 478 >> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of resource in file >> base/rmaps_base_map_job.c at line 210 >> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of resource in file >> rmgr_urm.c at line 372 >> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of resource in file >> communicator/comm_dyn.c at line 608 >> >> Did I miss something? >> Thanks for help! >> >> Elena >> >> >> -Original Message- >> From: Ralph H Castain [mailto:r...@lanl.gov] >> Sent: Tuesday, December 18, 2007 3:50 PM >> To: Elena Zhebel; Open MPI Users >> Cc: Ralph H Castain >> Subject: Re: [OMPI users] MPI::Intracomm::Spawn and cluster configuration >> >> >> >> >> On 12/18/07 7:35 AM, "Elena Zhebel" wrote: >> >>> Thanks a lot! Now it works! >>> The solution is to use mpirun -n 1 -hostfile my.hosts *.exe and pass >> MPI_Info >>> Key to the Spawn function! >>> >>> One more question: is it necessary to start my "master" program with >>> mpirun -n 1 -hostfile my_hostfile -host my_master_host my_master.exe ? >> >> No, it isn't necessary - assuming that my_master_host is the first host >> listed in your hostfile! If you are only executing one my_master.exe (i.e., >> you gave -n 1 to mpirun), then we will automatically map that process onto >> the first host in your hostfile. >> >> If you want my_master.exe to go on someone other than the first host in the >> file, then you have to give us the -host option. >> >>> >>> Are there other possibilities for easy start? >>> I would say just to run ./my_master.exe , but then the master process >> doesn't >>> know about the available in the network hosts. >> >> You can set the hostfile parameter in your environment instead of on the >> command line. Just set OMPI_MCA_rds_hostfile_path = my.hosts. >> >> You can then just run ./my_master.exe on the host where you want the master >> to reside - everything should work the same. >> >> Just as an FYI: the name of that environmental variable is going to chang
[OMPI users] application with mxm hangs on startup
Hello! I've built openmpi 1.6.1rc3 with support of MXM. But when I try to launch an application using this mtl it hangs and can't figure out why. If I launch it with np below 128 then everything works fine since mxm isn't used. I've tried setting the threshold to 0 and launching 2 processes with the same result: hangs on startup. What could be causing this problem? Here is the command I execute: /opt/openmpi/1.6.1/mxm-test/bin/mpirun \ -np $NP \ -hostfile hosts_fdr2 \ --mca mtl mxm \ --mca btl ^tcp \ --mca mtl_mxm_np 0 \ -x OMP_NUM_THREADS=$NT \ -x LD_LIBRARY_PATH \ --bind-to-core \ -npernode 16 \ --mca coll_fca_np 0 -mca coll_fca_enable 0 \ ./IMB-MPI1 -npmin $NP Allreduce Reduce Barrier Bcast Allgather Allgatherv I'm performing the tests on nodes with Intel SB processors and FDR. Openmpi was configured with the following parameters: CC=icc CXX=icpc F77=ifort FC=ifort ./configure --prefix=/opt/openmpi/1.6.1rc3/mxm-test --with-mxm=/opt/mellanox/mxm --with-fca=/opt/mellanox/fca --with-knem=/usr/share/knem I'm using the latest ofed from mellanox: 1.5.3-3.1.0 on centos 6.1 with default kernel: 2.6.32-131.0.15. The compilation with default mxm (1.0.601) failed so I installed the latest version from mellanox: 1.1.1227 Best regards, Pavel Mezentsev.
Re: [OMPI users] Measuring latency
Hello, Intel MPI Benchmarks suite (http://software.intel.com/en-us/articles/intel-mpi-benchmarks/) will probably measure more things about your MPI environment than you'd ever need to know :) NetPIPE (http://www.scl.ameslab.gov/netpipe/) also has an MPI version. It measures point-to-point bandwidth and latency and has the option to test the effect of using unaligned memory buffers. Kind regards, Hristo On 21.08.2012, at 23:32, Maginot Junior wrote: > Hello. > How do you suggest me to measure the latency between master em slaves > in my cluster? Is there any tool that I can use to test the > performance of my environment? > Thanks > > > -- > Maginot Júnior > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Hristo Iliev, Ph.D. -- High Performance Computing, RWTH Aachen University, Center for Computing and Communication Seffenter Weg 23, D 52074 Aachen (Germany) smime.p7s Description: S/MIME cryptographic signature
Re: [OMPI users] Measuring latency
That's fine. In that case, you just compile it with your MPI implementation and do something like this: mpiexec -np 2 -H masterhostname,slavehostname ./osu_latency There may be some all-to-all latency tools too. I don't really remember. Lloyd Brown Systems Administrator Fulton Supercomputing Lab Brigham Young University http://marylou.byu.edu On 08/21/2012 03:41 PM, Maginot Junior wrote: > Sorry for the type, what I meant was "and" not "em". > Thank you for the quick response, I will take a look at your suggestion
Re: [OMPI users] Measuring latency
On Tue, Aug 21, 2012 at 6:34 PM, Lloyd Brown wrote: > I'm not really familiar enough to know what you mean by "em slaves", but > for general testing of bandwidth and latency, I usually use the "OSU > Micro-benchmarks" (see http://mvapich.cse.ohio-state.edu/benchmarks/). > > Lloyd Brown > Systems Administrator > Fulton Supercomputing Lab > Brigham Young University > http://marylou.byu.edu > > On 08/21/2012 03:32 PM, Maginot Junior wrote: >> Hello. >> How do you suggest me to measure the latency between master em slaves >> in my cluster? Is there any tool that I can use to test the >> performance of my environment? >> Thanks >> >> >> -- >> Maginot Júnior >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users Sorry for the type, what I meant was "and" not "em". Thank you for the quick response, I will take a look at your suggestion regards -- Maginot Jr.
Re: [OMPI users] Measuring latency
I'm not really familiar enough to know what you mean by "em slaves", but for general testing of bandwidth and latency, I usually use the "OSU Micro-benchmarks" (see http://mvapich.cse.ohio-state.edu/benchmarks/). Lloyd Brown Systems Administrator Fulton Supercomputing Lab Brigham Young University http://marylou.byu.edu On 08/21/2012 03:32 PM, Maginot Junior wrote: > Hello. > How do you suggest me to measure the latency between master em slaves > in my cluster? Is there any tool that I can use to test the > performance of my environment? > Thanks > > > -- > Maginot Júnior > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
[OMPI users] Measuring latency
Hello. How do you suggest me to measure the latency between master em slaves in my cluster? Is there any tool that I can use to test the performance of my environment? Thanks -- Maginot Júnior
Re: [OMPI users] "Connection to lifeline lost" when developing a new rsh agent
Have you looked thru the code in orte/mca/plm/rsh/plm_rsh_module.c? It is executing a tree-like spawn pattern by default, but there isn't anything magic about what ssh is doing. However, there are things done to prep the remote shell (setting paths etc.), and the tree spawn passes some additional parameters. It would be worth your while to read thru it to see if just replacing ssh is going to be enough for your environment. The OOB output is telling you that the connection is being attempted, but being rejected for some reason during the return "ACK". Not sure why that would be happening, unless the remote daemon died during the connection handshake. --debug-daemons doesn't do anything but (a) turn on the debug output, and (b) cause ssh to leave the session open by telling the orted not to "daemonize" itself. The --leave-session-attached option does (b) without all the debug output. On Aug 21, 2012, at 8:15 AM, Yann RADENAC wrote: > > Le 20/08/2012 15:56, Ralph Castain wrote : > > You might try adding "-mca plm_base_verbose 5 --debug-daemons" to watch the > > debug output from the daemons as they are launched. > > There seems to be an interference here: my problem is "solved" by enabling > option --debug-daemons with a verbose level > 0 !! > > This command fails (3 processes on 3 different machines): > > mpirun --mca orte_rsh_agent xos-createProcess --leave-session-attached -np > 3 -host `xreservation -a $XOS_RSVID` mpi/hello_world_MPI > > > This command works !!! > (just adding the debug-daemons with verbose level > 0) : > > mpirun --mca orte_rsh_agent xos-createProcess --leave-session-attached -mca > plm_base_verbose 5 --debug-daemons -np 3 -host `xreservation -a $XOS_RSVID` > mpi/hello_world_MPI > > > Anyway, this is just a workaround, and requiring the users to set the > debug-daemons option is not acceptable. > > So what ssh is doing, and also the debug-daemons, that my agent > xos-createProcess is not doing? > > > >> The lifeline is a socket connection between the daemons and mpirun. For some >> reason, the socket from your remote daemon back to mpirun is being closed, >> which the remote daemon interprets as "lifeline lost" and terminates itself. >> You could try setting the verbosity on the OOB to get the debug output from >> it (see "ompi_info --param oob tcp" for the settings), though it's likely to >> just tell you that the socket closed. > > By the way, no firewall is running on any of my machines. > > Using the oob_tcp options: > > mpirun --mca orte_rsh_agent xos-createProcess --leave-session-attached -mca > oob_tcp_debug 1 -mca oob_tcp_verbose 2 -np 3 -host `xreservation -a > $XOS_RSVID` mpi/hello_world_MPI > > > On the machine running the mpirun, the process is still waiting (polling) and > standard error output is: > > [paradent-26.rennes.grid5000.fr:27762] [[1338,0],0]-[[1338,0],2] accepted: > 172.16.97.26 - 172.16.97.6 nodelay 1 sndbuf 262142 rcvbuf 262142 flags > 0802 > [paradent-26.rennes.grid5000.fr:27762] [[1338,0],0]-[[1338,0],2] > mca_oob_tcp_recv_handler: rejected connection from [[1338,0],2] connection > state 4 > > > > On the remote machine running the orted, orted fails and standard error > output is: > > [paradent-6.rennes.grid5000.fr:10391] [[1338,0],2] routed:binomial: > Connection to lifeline [[1338,0],0] lost > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] "Connection to lifeline lost" when developing a new rsh agent
Le 20/08/2012 15:56, Ralph Castain wrote : > You might try adding "-mca plm_base_verbose 5 --debug-daemons" to watch the debug output from the daemons as they are launched. There seems to be an interference here: my problem is "solved" by enabling option --debug-daemons with a verbose level > 0 !! This command fails (3 processes on 3 different machines): mpirun --mca orte_rsh_agent xos-createProcess --leave-session-attached -np 3 -host `xreservation -a $XOS_RSVID` mpi/hello_world_MPI This command works !!! (just adding the debug-daemons with verbose level > 0) : mpirun --mca orte_rsh_agent xos-createProcess --leave-session-attached -mca plm_base_verbose 5 --debug-daemons -np 3 -host `xreservation -a $XOS_RSVID` mpi/hello_world_MPI Anyway, this is just a workaround, and requiring the users to set the debug-daemons option is not acceptable. So what ssh is doing, and also the debug-daemons, that my agent xos-createProcess is not doing? The lifeline is a socket connection between the daemons and mpirun. For some reason, the socket from your remote daemon back to mpirun is being closed, which the remote daemon interprets as "lifeline lost" and terminates itself. You could try setting the verbosity on the OOB to get the debug output from it (see "ompi_info --param oob tcp" for the settings), though it's likely to just tell you that the socket closed. By the way, no firewall is running on any of my machines. Using the oob_tcp options: mpirun --mca orte_rsh_agent xos-createProcess --leave-session-attached -mca oob_tcp_debug 1 -mca oob_tcp_verbose 2 -np 3 -host `xreservation -a $XOS_RSVID` mpi/hello_world_MPI On the machine running the mpirun, the process is still waiting (polling) and standard error output is: [paradent-26.rennes.grid5000.fr:27762] [[1338,0],0]-[[1338,0],2] accepted: 172.16.97.26 - 172.16.97.6 nodelay 1 sndbuf 262142 rcvbuf 262142 flags 0802 [paradent-26.rennes.grid5000.fr:27762] [[1338,0],0]-[[1338,0],2] mca_oob_tcp_recv_handler: rejected connection from [[1338,0],2] connection state 4 On the remote machine running the orted, orted fails and standard error output is: [paradent-6.rennes.grid5000.fr:10391] [[1338,0],2] routed:binomial: Connection to lifeline [[1338,0],0] lost
Re: [OMPI users] MPI_Irecv: Confusion with <> inputy parameter
Hello, Devendra, Sending and receiving messages in MPI are atomic operations - they complete only when the whole message was sent or received. MPI_Test only tells you if the operation has completed - there is no indication like "30% of the message was sent/received, stay tuned for more". On the sender side, the message is constructed by taking bytes from various locations in memory, specified by the type map of the MPI datatype used. Then on the receiver side the message is deconstructed back into memory by placing the received bytes according to the type map of the MPI datatype provided. The combination of receive datatype and receive count gives you a certain number of bytes (that is the type size obtainable by MPI_Type_size times "count"). If the message is shorter, that means that some elements of the receive buffer will not be filled, which is OK - you can test exactly how many elements were filled with MPI_Get_count on the status of the receive operation. If the message was longer, however, there won't be enough place to put all the data that the message is carrying and an overflow error would occur. This works best by example. Image that in one process you issue: MPI_Send(data, 80, MPI_BYTE, ...); This will send a message containing 80 elements of type byte. Now on the receiver side you issue: MPI_Irecv(data, 160, MPI_BYTE, ..., &request); What will happen is that the message will be received in its entirety since 80 times the size of MPI_BYTE is less than or equal to 160 times the size of MPI_BYTE. Calling MPI_Test on "request" will produce true in the completion flag and you will get back a status variable (unless you provided MPI_STATUS_IGNORE) and then you can call: MPI_Get_count(&status, MPI_BYTE, &count); Now "count" will contain 80 - the actual number of elements received. But if the receive operation was instead: MPI_Irecv(data, 40, MPI_BYTE, ..., &request); since 40 times the size of MPI_BYTE is less than the size of the message, there will be not enough space to receive the entire message and an overflow error would occur. The MPI_Irecv itself only initiates the receive operation and will not return an error. Rather you will obtain the overflow error in the MPI_ERROR field of the status argument, returned by MPI_Test (the test call itself will return MPI_SUCCESS). Since MPI operations are atomic, you cannot send a message of 160 element and then receive it with two separate receives of size 80 - this is very important and it is difficult to grasp initially by people, who come to MPI from the traditional Unix network programming. I would recommend that you head to http://www.mpi-forum.org/ and download from there the PDF of the latest MPI 2.2 standard (or order the printed book). Unlike many other standard documents this one is actually readable by normal people and contains many useful explanations and examples. Read through entire section 3.2 to get a better idea of how messaging works in MPI. Hope that helps to clarify things, Hristo On 21.08.2012, at 10:01, devendra rai wrote: > Hello Jeff and Hristo, > > Now I am completely confused: > > So, let's say, the complete reception requires 8192 bytes. And, I have: > > MPI_Irecv( > (void*)this->receivebuffer,/* the receive buffer */ > this->receive_packetsize, /* 80 */ > MPI_BYTE, /* The data type expected > */ > this->transmittingnode,/* The node from which to > receive */ > this->uniquetag, /* Tag */ > MPI_COMM_WORLD, /* Communicator */ > &Irecv_request /* request handle */ > ); > > > That means, the the MPI_Test will tell me that the reception is complete when > I have received the first 80 bytes. Correct? > > Next, let[s say that I have a receive buffer with a capacity of 160 bytes, > then, will overflow error occur here? Even if I have decided to receive a > large payload in chunks of 80 bytes? > > I am sorry, the manual and the API reference was too vague for me. > > Thanks a lot > > Devendra > From: "Iliev, Hristo" > To: Open MPI Users > Cc: devendra rai > Sent: Tuesday, 21 August 2012, 9:48 > Subject: Re: [OMPI users] MPI_Irecv: Confusion with <> inputy > parameter > > Jeff, > > >> Or is it the number of elements that are expected to be received, and > >> hence MPI_Test will tell me that the receive is not complete untill > >> "count" number of elements have not been received? > > > > Yes. > > Answering "Yes" this question might further the confusion there. The "count" > argument specifies the *capacity* of the receive buffer and the receive > operation (blocking or not) will complete successfully for any matching > message with size up to "count", even for an empty message with 0 elements, > and will produce an ove
Re: [OMPI users] MPI_Irecv: Confusion with <> inputy parameter
Hi Devendra MPI has no way of knowing how big your receive buffer is - that's why you have to pass the "count" argument, to tell MPI how many items of your data type (in your case many bytes) it may copy to your receive buffer. When data arrives that is longer than the number you specified in the "count" argument, the data will be cut off after count bytes (and an error will be returned). Any shorter amount of data will be copied to your receive buffer and the call to MPI_Recv will terminate successfully. It is your responsibility to pass the correct value of "count". If you expect data of 160 bytes you have to allocate a buffer with a size greater or equal to 160 and you have to set your "count" parameter to the size you allocated. If you want to receive data in chunks, you have to send it in chunks. I hope this helps Jody On Tue, Aug 21, 2012 at 10:01 AM, devendra rai wrote: > Hello Jeff and Hristo, > > Now I am completely confused: > > So, let's say, the complete reception requires 8192 bytes. And, I have: > > MPI_Irecv( > (void*)this->receivebuffer,/* the receive buffer */ > this->receive_packetsize, /* 80 */ > MPI_BYTE, /* The data type > expected */ > this->transmittingnode,/* The node from which to > receive */ > this->uniquetag, /* Tag */ > MPI_COMM_WORLD, /* Communicator */ > &Irecv_request /* request handle */ > ); > > > That means, the the MPI_Test will tell me that the reception is complete > when I have received the first 80 bytes. Correct? > > Next, let[s say that I have a receive buffer with a capacity of 160 bytes, > then, will overflow error occur here? Even if I have decided to receive a > large payload in chunks of 80 bytes? > > I am sorry, the manual and the API reference was too vague for me. > > Thanks a lot > > Devendra > > From: "Iliev, Hristo" > To: Open MPI Users > Cc: devendra rai > Sent: Tuesday, 21 August 2012, 9:48 > Subject: Re: [OMPI users] MPI_Irecv: Confusion with <> inputy > parameter > > Jeff, > >>> Or is it the number of elements that are expected to be received, and >>> hence MPI_Test will tell me that the receive is not complete untill "count" >>> number of elements have not been received? >> >> Yes. > > Answering "Yes" this question might further the confusion there. The "count" > argument specifies the *capacity* of the receive buffer and the receive > operation (blocking or not) will complete successfully for any matching > message with size up to "count", even for an empty message with 0 elements, > and will produce an overflow error if the received message was longer and > data truncation has to occur. > > On 20.08.2012, at 16:32, Jeff Squyres wrote: > >> On Aug 20, 2012, at 5:51 AM, devendra rai wrote: >> >>> Is it the number of elements that have been received *thus far* in the >>> buffer? >> >> No. >> >>> Or is it the number of elements that are expected to be received, and >>> hence MPI_Test will tell me that the receive is not complete untill "count" >>> number of elements have not been received? >> >> Yes. >> >>> Here's the reason why I have a problem (and I think I may be completely >>> stupid here, I'd appreciate your patience): >> [snip] >>> Does anyone see what could be going wrong? >> >> Double check that the (sender_rank, tag, communicator) tuple that you >> issued in the MPI_Irecv matches the (rank, tag, communicator) tuple from the >> sender (tag and communicator are arguments on the sending side, and rank is >> the rank of the sender in that communicator). >> >> When receives block like this without completing like this, it usually >> means a mismatch between the tuples. >> >> -- >> Jeff Squyres >> jsquy...@cisco.com >> For corporate legal information go to: >> http://www.cisco.com/web/about/doing_business/legal/cri/ >> >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > -- > Hristo Iliev, Ph.D. -- High Performance Computing, > RWTH Aachen University, Center for Computing and Communication > Seffenter Weg 23, D 52074 Aachen (Germany) > Tel: +49 241 80 24367 -- Fax/UMS: +49 241 80 624367 > > > > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] MPI_Irecv: Confusion with <> inputy parameter
Hello Jeff and Hristo, Now I am completely confused: So, let's say, the complete reception requires 8192 bytes. And, I have: MPI_Irecv( (void*)this->receivebuffer,/* the receive buffer */ this->receive_packetsize, /* 80 */ MPI_BYTE, /* The data type expected */ this->transmittingnode, /* The node from which to receive */ this->uniquetag, /* Tag */ MPI_COMM_WORLD, /* Communicator */ &Irecv_request /* request handle */ ); That means, the the MPI_Test will tell me that the reception is complete when I have received the first 80 bytes. Correct? Next, let[s say that I have a receive buffer with a capacity of 160 bytes, then, will overflow error occur here? Even if I have decided to receive a large payload in chunks of 80 bytes? I am sorry, the manual and the API reference was too vague for me. Thanks a lot Devendra From: "Iliev, Hristo" To: Open MPI Users Cc: devendra rai Sent: Tuesday, 21 August 2012, 9:48 Subject: Re: [OMPI users] MPI_Irecv: Confusion with <> inputy parameter Jeff, >> Or is it the number of elements that are expected to be received, and hence >> MPI_Test will tell me that the receive is not complete untill "count" number >> of elements have not been received? > > Yes. Answering "Yes" this question might further the confusion there. The "count" argument specifies the *capacity* of the receive buffer and the receive operation (blocking or not) will complete successfully for any matching message with size up to "count", even for an empty message with 0 elements, and will produce an overflow error if the received message was longer and data truncation has to occur. On 20.08.2012, at 16:32, Jeff Squyres wrote: > On Aug 20, 2012, at 5:51 AM, devendra rai wrote: > >> Is it the number of elements that have been received *thus far* in the >> buffer? > > No. > >> Or is it the number of elements that are expected to be received, and hence >> MPI_Test will tell me that the receive is not complete untill "count" number >> of elements have not been received? > > Yes. > >> Here's the reason why I have a problem (and I think I may be completely >> stupid here, I'd appreciate your patience): > [snip] >> Does anyone see what could be going wrong? > > Double check that the (sender_rank, tag, communicator) tuple that you issued > in the MPI_Irecv matches the (rank, tag, communicator) tuple from the sender > (tag and communicator are arguments on the sending side, and rank is the rank > of the sender in that communicator). > > When receives block like this without completing like this, it usually means > a mismatch between the tuples. > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Hristo Iliev, Ph.D. -- High Performance Computing, RWTH Aachen University, Center for Computing and Communication Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241 80 24367 -- Fax/UMS: +49 241 80 624367
Re: [OMPI users] MPI_Irecv: Confusion with <> inputy parameter
Jeff, >> Or is it the number of elements that are expected to be received, and hence >> MPI_Test will tell me that the receive is not complete untill "count" number >> of elements have not been received? > > Yes. Answering "Yes" this question might further the confusion there. The "count" argument specifies the *capacity* of the receive buffer and the receive operation (blocking or not) will complete successfully for any matching message with size up to "count", even for an empty message with 0 elements, and will produce an overflow error if the received message was longer and data truncation has to occur. On 20.08.2012, at 16:32, Jeff Squyres wrote: > On Aug 20, 2012, at 5:51 AM, devendra rai wrote: > >> Is it the number of elements that have been received *thus far* in the >> buffer? > > No. > >> Or is it the number of elements that are expected to be received, and hence >> MPI_Test will tell me that the receive is not complete untill "count" number >> of elements have not been received? > > Yes. > >> Here's the reason why I have a problem (and I think I may be completely >> stupid here, I'd appreciate your patience): > [snip] >> Does anyone see what could be going wrong? > > Double check that the (sender_rank, tag, communicator) tuple that you issued > in the MPI_Irecv matches the (rank, tag, communicator) tuple from the sender > (tag and communicator are arguments on the sending side, and rank is the rank > of the sender in that communicator). > > When receives block like this without completing like this, it usually means > a mismatch between the tuples. > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Hristo Iliev, Ph.D. -- High Performance Computing, RWTH Aachen University, Center for Computing and Communication Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241 80 24367 -- Fax/UMS: +49 241 80 624367 smime.p7s Description: S/MIME cryptographic signature