Re: [OMPI users] MPI::Intracomm::Spawn and cluster configuration

2012-08-21 Thread Brian Budge
Hi.  I know this is an old thread, but I'm curious if there are any
tutorials describing how to set this up?  Is this still available on
newer open mpi versions?

Thanks,
  Brian

On Fri, Jan 4, 2008 at 7:57 AM, Ralph Castain  wrote:
> Hi Elena
>
> I'm copying this to the user list just to correct a mis-statement on my part
> in an earlier message that went there. I had stated that a singleton could
> comm_spawn onto other nodes listed in a hostfile by setting an environmental
> variable that pointed us to the hostfile.
>
> This is incorrect in the 1.2 code series. That series does not allow
> singletons to read a hostfile at all. Hence, any comm_spawn done by a
> singleton can only launch child processes on the singleton's local host.
>
> This situation has been corrected for the upcoming 1.3 code series. For the
> 1.2 series, though, you will have to do it via an mpirun command line.
>
> Sorry for the confusion - I sometimes have too many code families to keep
> straight in this old mind!
>
> Ralph
>
>
> On 1/4/08 5:10 AM, "Elena Zhebel"  wrote:
>
>> Hello Ralph,
>>
>> Thank you very much for the explanations.
>> But I still do not get it running...
>>
>> For the case
>> mpirun -n 1 -hostfile my_hostfile -host my_master_host my_master.exe
>> everything works.
>>
>> For the case
>> ./my_master.exe
>> it does not.
>>
>> I did:
>> - create my_hostfile and put it in the $HOME/.openmpi/components/
>>   my_hostfile :
>> bollenstreek slots=2 max_slots=3
>> octocore01 slots=8  max_slots=8
>> octocore02 slots=8  max_slots=8
>> clstr000 slots=2 max_slots=3
>> clstr001 slots=2 max_slots=3
>> clstr002 slots=2 max_slots=3
>> clstr003 slots=2 max_slots=3
>> clstr004 slots=2 max_slots=3
>> clstr005 slots=2 max_slots=3
>> clstr006 slots=2 max_slots=3
>> clstr007 slots=2 max_slots=3
>> - setenv OMPI_MCA_rds_hostfile_path my_hostfile (I  put it in .tcshrc and
>> then source .tcshrc)
>> - in my_master.cpp I did
>>   MPI_Info info1;
>>   MPI_Info_create(&info1);
>>   char* hostname =
>> "clstr002,clstr003,clstr005,clstr006,clstr007,octocore01,octocore02";
>>   MPI_Info_set(info1, "host", hostname);
>>
>>   _intercomm = intracomm.Spawn("./childexe", argv1, _nProc, info1, 0,
>> MPI_ERRCODES_IGNORE);
>>
>> - After I call the executable, I've got this error message
>>
>> bollenstreek: > ./my_master
>> number of processes to run: 1
>> --
>> Some of the requested hosts are not included in the current allocation for
>> the application:
>>   ./childexe
>> The requested hosts were:
>>   clstr002,clstr003,clstr005,clstr006,clstr007,octocore01,octocore02
>>
>> Verify that you have mapped the allocated resources properly using the
>> --host specification.
>> --
>> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of resource in file
>> base/rmaps_base_support_fns.c at line 225
>> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of resource in file
>> rmaps_rr.c at line 478
>> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of resource in file
>> base/rmaps_base_map_job.c at line 210
>> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of resource in file
>> rmgr_urm.c at line 372
>> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of resource in file
>> communicator/comm_dyn.c at line 608
>>
>> Did I miss something?
>> Thanks for help!
>>
>> Elena
>>
>>
>> -Original Message-
>> From: Ralph H Castain [mailto:r...@lanl.gov]
>> Sent: Tuesday, December 18, 2007 3:50 PM
>> To: Elena Zhebel; Open MPI Users 
>> Cc: Ralph H Castain
>> Subject: Re: [OMPI users] MPI::Intracomm::Spawn and cluster configuration
>>
>>
>>
>>
>> On 12/18/07 7:35 AM, "Elena Zhebel"  wrote:
>>
>>> Thanks a lot! Now it works!
>>> The solution is to use mpirun -n 1 -hostfile my.hosts *.exe and pass
>> MPI_Info
>>> Key to the Spawn function!
>>>
>>> One more question: is it necessary to start my "master" program with
>>> mpirun -n 1 -hostfile my_hostfile -host my_master_host my_master.exe ?
>>
>> No, it isn't necessary - assuming that my_master_host is the first host
>> listed in your hostfile! If you are only executing one my_master.exe (i.e.,
>> you gave -n 1 to mpirun), then we will automatically map that process onto
>> the first host in your hostfile.
>>
>> If you want my_master.exe to go on someone other than the first host in the
>> file, then you have to give us the -host option.
>>
>>>
>>> Are there other possibilities for easy start?
>>> I would say just to run ./my_master.exe , but then the master process
>> doesn't
>>> know about the available in the network hosts.
>>
>> You can set the hostfile parameter in your environment instead of on the
>> command line. Just set OMPI_MCA_rds_hostfile_path = my.hosts.
>>
>> You can then just run ./my_master.exe on the host where you want the master
>> to reside - everything should work the same.
>>
>> Just as an FYI: the name of that environmental variable is going to chang

[OMPI users] application with mxm hangs on startup

2012-08-21 Thread Pavel Mezentsev
Hello!

I've built openmpi 1.6.1rc3 with support of MXM. But when I try to launch
an application using this mtl it hangs and can't figure out why.

If I launch it with np below 128 then everything works fine since mxm isn't
used. I've tried setting the threshold to 0 and launching 2 processes with
the same result: hangs on startup.
What could be causing this problem?

Here is the command I execute:
/opt/openmpi/1.6.1/mxm-test/bin/mpirun \
-np $NP \
-hostfile hosts_fdr2 \
--mca mtl mxm \
--mca btl ^tcp \
--mca mtl_mxm_np 0 \
-x OMP_NUM_THREADS=$NT \
-x LD_LIBRARY_PATH \
--bind-to-core \
-npernode 16 \
--mca coll_fca_np 0 -mca coll_fca_enable 0 \
./IMB-MPI1 -npmin $NP Allreduce Reduce Barrier Bcast
Allgather Allgatherv

I'm performing the tests on nodes with Intel SB processors and FDR. Openmpi
was configured with the following parameters:
CC=icc CXX=icpc F77=ifort FC=ifort ./configure
--prefix=/opt/openmpi/1.6.1rc3/mxm-test --with-mxm=/opt/mellanox/mxm
--with-fca=/opt/mellanox/fca --with-knem=/usr/share/knem
I'm using the latest ofed from mellanox: 1.5.3-3.1.0 on centos 6.1 with
default kernel: 2.6.32-131.0.15.
The compilation with default mxm (1.0.601) failed so I installed the latest
version from mellanox: 1.1.1227

Best regards, Pavel Mezentsev.


Re: [OMPI users] Measuring latency

2012-08-21 Thread Iliev, Hristo
Hello,

Intel MPI Benchmarks suite 
(http://software.intel.com/en-us/articles/intel-mpi-benchmarks/) will probably 
measure more things about your MPI environment than you'd ever need to know :)

NetPIPE (http://www.scl.ameslab.gov/netpipe/) also has an MPI version. It 
measures point-to-point bandwidth and latency and has the option to test the 
effect of using unaligned memory buffers.

Kind regards,

Hristo

On 21.08.2012, at 23:32, Maginot Junior  wrote:

> Hello.
> How do you suggest me to measure the latency between master em slaves
> in my cluster? Is there any tool that I can use to test the
> performance of my environment?
> Thanks
> 
> 
> --
> Maginot Júnior
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

--
Hristo Iliev, Ph.D. -- High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)




smime.p7s
Description: S/MIME cryptographic signature


Re: [OMPI users] Measuring latency

2012-08-21 Thread Lloyd Brown
That's fine.  In that case, you just compile it with your MPI
implementation and do something like this:

mpiexec -np 2 -H masterhostname,slavehostname ./osu_latency

There may be some all-to-all latency tools too.  I don't really remember.

Lloyd Brown
Systems Administrator
Fulton Supercomputing Lab
Brigham Young University
http://marylou.byu.edu

On 08/21/2012 03:41 PM, Maginot Junior wrote:
> Sorry for the type, what I meant was "and" not "em".
> Thank you for the quick response, I will take a look at your suggestion


Re: [OMPI users] Measuring latency

2012-08-21 Thread Maginot Junior
On Tue, Aug 21, 2012 at 6:34 PM, Lloyd Brown  wrote:
> I'm not really familiar enough to know what you mean by "em slaves", but
> for general testing of bandwidth and latency, I usually use the "OSU
> Micro-benchmarks" (see http://mvapich.cse.ohio-state.edu/benchmarks/).
>
> Lloyd Brown
> Systems Administrator
> Fulton Supercomputing Lab
> Brigham Young University
> http://marylou.byu.edu
>
> On 08/21/2012 03:32 PM, Maginot Junior wrote:
>> Hello.
>> How do you suggest me to measure the latency between master em slaves
>> in my cluster? Is there any tool that I can use to test the
>> performance of my environment?
>> Thanks
>>
>>
>> --
>> Maginot Júnior
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Sorry for the type, what I meant was "and" not "em".
Thank you for the quick response, I will take a look at your suggestion


regards

--
Maginot Jr.



Re: [OMPI users] Measuring latency

2012-08-21 Thread Lloyd Brown
I'm not really familiar enough to know what you mean by "em slaves", but
for general testing of bandwidth and latency, I usually use the "OSU
Micro-benchmarks" (see http://mvapich.cse.ohio-state.edu/benchmarks/).

Lloyd Brown
Systems Administrator
Fulton Supercomputing Lab
Brigham Young University
http://marylou.byu.edu

On 08/21/2012 03:32 PM, Maginot Junior wrote:
> Hello.
> How do you suggest me to measure the latency between master em slaves
> in my cluster? Is there any tool that I can use to test the
> performance of my environment?
> Thanks
> 
> 
> --
> Maginot Júnior
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 


[OMPI users] Measuring latency

2012-08-21 Thread Maginot Junior
Hello.
How do you suggest me to measure the latency between master em slaves
in my cluster? Is there any tool that I can use to test the
performance of my environment?
Thanks


--
Maginot Júnior



Re: [OMPI users] "Connection to lifeline lost" when developing a new rsh agent

2012-08-21 Thread Ralph Castain
Have you looked thru the code in orte/mca/plm/rsh/plm_rsh_module.c? It is 
executing a tree-like spawn pattern by default, but there isn't anything magic 
about what ssh is doing. However, there are things done to prep the remote 
shell (setting paths etc.), and the tree spawn passes some additional 
parameters. It would be worth your while to read thru it to see if just 
replacing ssh is going to be enough for your environment.

The OOB output is telling you that the connection is being attempted, but being 
rejected for some reason during the return "ACK". Not sure why that would be 
happening, unless the remote daemon died during the connection handshake.

--debug-daemons doesn't do anything but (a) turn on the debug output, and (b) 
cause ssh to leave the session open by telling the orted not to "daemonize" 
itself. The --leave-session-attached option does (b) without all the debug 
output.


On Aug 21, 2012, at 8:15 AM, Yann RADENAC  wrote:

> 
> Le 20/08/2012 15:56, Ralph Castain wrote :
> > You might try adding "-mca plm_base_verbose 5 --debug-daemons" to watch the 
> > debug output from the daemons as they are launched.
> 
> There seems to be an interference here: my problem is "solved" by enabling 
> option --debug-daemons with a verbose level > 0 !!
> 
> This command fails (3 processes on 3 different machines):
> 
> mpirun  --mca orte_rsh_agent xos-createProcess --leave-session-attached   -np 
> 3   -host `xreservation -a $XOS_RSVID` mpi/hello_world_MPI
> 
> 
> This command works !!!
> (just adding the debug-daemons with verbose level > 0) :
> 
> mpirun  --mca orte_rsh_agent xos-createProcess --leave-session-attached  -mca 
> plm_base_verbose 5 --debug-daemons -np 3   -host `xreservation -a $XOS_RSVID` 
> mpi/hello_world_MPI
> 
> 
> Anyway, this is just a workaround, and requiring the users to set the 
> debug-daemons option is not acceptable.
> 
> So what ssh is doing, and also the debug-daemons, that my agent 
> xos-createProcess is not doing?
> 
> 
> 
>> The lifeline is a socket connection between the daemons and mpirun. For some 
>> reason, the socket from your remote daemon back to mpirun is being closed, 
>> which the remote daemon interprets as "lifeline lost" and terminates itself. 
>> You could try setting the verbosity on the OOB to get the debug output from 
>> it (see "ompi_info --param oob tcp" for the settings), though it's likely to 
>> just tell you that the socket closed.
> 
> By the way, no firewall is running on any of my machines.
> 
> Using the oob_tcp options:
> 
> mpirun  --mca orte_rsh_agent xos-createProcess --leave-session-attached  -mca 
> oob_tcp_debug 1 -mca oob_tcp_verbose 2 -np 3   -host `xreservation -a 
> $XOS_RSVID` mpi/hello_world_MPI
> 
> 
> On the machine running the mpirun, the process is still waiting (polling) and 
> standard error output is:
> 
> [paradent-26.rennes.grid5000.fr:27762] [[1338,0],0]-[[1338,0],2] accepted: 
> 172.16.97.26 - 172.16.97.6 nodelay 1 sndbuf 262142 rcvbuf 262142 flags 
> 0802
> [paradent-26.rennes.grid5000.fr:27762] [[1338,0],0]-[[1338,0],2] 
> mca_oob_tcp_recv_handler: rejected connection from [[1338,0],2] connection 
> state 4
> 
> 
> 
> On the remote machine running the orted, orted fails and standard error 
> output is:
> 
> [paradent-6.rennes.grid5000.fr:10391] [[1338,0],2] routed:binomial: 
> Connection to lifeline [[1338,0],0] lost
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] "Connection to lifeline lost" when developing a new rsh agent

2012-08-21 Thread Yann RADENAC


Le 20/08/2012 15:56, Ralph Castain wrote :
> You might try adding "-mca plm_base_verbose 5 --debug-daemons" to 
watch the debug output from the daemons as they are launched.


There seems to be an interference here: my problem is "solved" by 
enabling option --debug-daemons with a verbose level > 0 !!


This command fails (3 processes on 3 different machines):

mpirun  --mca orte_rsh_agent xos-createProcess --leave-session-attached 
  -np 3   -host `xreservation -a $XOS_RSVID` mpi/hello_world_MPI



This command works !!!
(just adding the debug-daemons with verbose level > 0) :

mpirun  --mca orte_rsh_agent xos-createProcess --leave-session-attached 
 -mca plm_base_verbose 5 --debug-daemons -np 3   -host `xreservation -a 
$XOS_RSVID` mpi/hello_world_MPI



Anyway, this is just a workaround, and requiring the users to set the 
debug-daemons option is not acceptable.


So what ssh is doing, and also the debug-daemons, that my agent 
xos-createProcess is not doing?





The lifeline is a socket connection between the daemons and mpirun. For some reason, the socket 
from your remote daemon back to mpirun is being closed, which the remote daemon interprets as 
"lifeline lost" and terminates itself. You could try setting the verbosity on the OOB to 
get the debug output from it (see "ompi_info --param oob tcp" for the settings), though 
it's likely to just tell you that the socket closed.


By the way, no firewall is running on any of my machines.

Using the oob_tcp options:

mpirun  --mca orte_rsh_agent xos-createProcess --leave-session-attached 
 -mca oob_tcp_debug 1 -mca oob_tcp_verbose 2 -np 3   -host 
`xreservation -a $XOS_RSVID` mpi/hello_world_MPI



On the machine running the mpirun, the process is still waiting 
(polling) and standard error output is:


[paradent-26.rennes.grid5000.fr:27762] [[1338,0],0]-[[1338,0],2] 
accepted: 172.16.97.26 - 172.16.97.6 nodelay 1 sndbuf 262142 rcvbuf 
262142 flags 0802
[paradent-26.rennes.grid5000.fr:27762] [[1338,0],0]-[[1338,0],2] 
mca_oob_tcp_recv_handler: rejected connection from [[1338,0],2] 
connection state 4




On the remote machine running the orted, orted fails and standard error 
output is:


[paradent-6.rennes.grid5000.fr:10391] [[1338,0],2] routed:binomial: 
Connection to lifeline [[1338,0],0] lost




Re: [OMPI users] MPI_Irecv: Confusion with <> inputy parameter

2012-08-21 Thread Iliev, Hristo
Hello, Devendra,

Sending and receiving messages in MPI are atomic operations - they complete 
only when the whole message was sent or received. MPI_Test only tells you if 
the operation has completed - there is no indication like "30% of the message 
was sent/received, stay tuned for more".

On the sender side, the message is constructed by taking bytes from various 
locations in memory, specified by the type map of the MPI datatype used. Then 
on the receiver side the message is deconstructed back into memory by placing 
the received bytes according to the type map of the MPI datatype provided. The 
combination of receive datatype and receive count gives you a certain number of 
bytes (that is the type size obtainable by MPI_Type_size times "count"). If the 
message is shorter, that means that some elements of the receive buffer will 
not be filled, which is OK - you can test exactly how many elements were filled 
with MPI_Get_count on the status of the receive operation. If the message was 
longer, however, there won't be enough place to put all the data that the 
message is carrying and an overflow error would occur.

This works best by example. Image that in one process you issue:

MPI_Send(data, 80, MPI_BYTE, ...);

This will send a message containing 80 elements of type byte. Now on the 
receiver side you issue:

MPI_Irecv(data, 160, MPI_BYTE, ..., &request);

What will happen is that the message will be received in its entirety since 80 
times the size of MPI_BYTE is less than or equal to 160 times the size of 
MPI_BYTE. Calling MPI_Test on "request" will produce true in the completion 
flag and you will get back a status variable (unless you provided 
MPI_STATUS_IGNORE) and then you can call:

MPI_Get_count(&status, MPI_BYTE, &count);

Now "count" will contain 80 - the actual number of elements received.

But if the receive operation was instead:

MPI_Irecv(data, 40, MPI_BYTE, ..., &request);

since 40 times the size of MPI_BYTE is less than the size of the message, there 
will be not enough space to receive the entire message and an overflow error 
would occur. The MPI_Irecv itself only initiates the receive operation and will 
not return an error. Rather you will obtain the overflow error in the MPI_ERROR 
field of the status argument, returned by MPI_Test (the test call itself will 
return MPI_SUCCESS).

Since MPI operations are atomic, you cannot send a message of 160 element and 
then receive it with two separate receives of size 80 - this is very important 
and it is difficult to grasp initially by people, who come to MPI from the 
traditional Unix network programming.

I would recommend that you head to http://www.mpi-forum.org/ and download from 
there the PDF of the latest MPI 2.2 standard (or order the printed book). 
Unlike many other standard documents this one is actually readable by normal 
people and contains many useful explanations and examples. Read through entire 
section 3.2 to get a better idea of how messaging works in MPI.

Hope that helps to clarify things,

Hristo

On 21.08.2012, at 10:01, devendra rai  wrote:

> Hello Jeff and Hristo,
> 
> Now I am completely confused:
> 
> So, let's say, the complete reception requires 8192 bytes. And, I have:
> 
> MPI_Irecv(
> (void*)this->receivebuffer,/* the receive buffer */
> this->receive_packetsize,  /* 80 */
> MPI_BYTE,   /* The data type expected 
> */
> this->transmittingnode,/* The node from which to 
> receive */
> this->uniquetag,   /* Tag */
> MPI_COMM_WORLD, /* Communicator */
> &Irecv_request  /* request handle */
> );
> 
> 
> That means, the the MPI_Test will tell me that the reception is complete when 
> I have received the first 80 bytes. Correct?
> 
> Next, let[s say that I have a receive buffer with a capacity of 160 bytes, 
> then, will overflow error occur here? Even if I have decided to receive a 
> large payload in chunks of 80 bytes?
> 
> I am sorry, the manual and the API reference was too vague for me.
> 
> Thanks a lot
> 
> Devendra
> From: "Iliev, Hristo" 
> To: Open MPI Users  
> Cc: devendra rai  
> Sent: Tuesday, 21 August 2012, 9:48
> Subject: Re: [OMPI users] MPI_Irecv: Confusion with <> inputy 
> parameter
> 
> Jeff,
> 
> >> Or is it the number of elements that are expected to be received, and 
> >> hence MPI_Test will tell me that the receive is not complete untill 
> >> "count" number of elements have not been received?
> > 
> > Yes.
> 
> Answering "Yes" this question might further the confusion there. The "count" 
> argument specifies the *capacity* of the receive buffer and the receive 
> operation (blocking or not) will complete successfully for any matching 
> message with size up to "count", even for an empty message with 0 elements, 
> and will produce an ove

Re: [OMPI users] MPI_Irecv: Confusion with <> inputy parameter

2012-08-21 Thread jody
Hi Devendra

MPI has no way of knowing how big your receive buffer is -
that's why you have to pass the "count" argument, to tell MPI
how many items of your data type (in your case many bytes)
it may copy to your receive buffer.

When data arrives that is longer than the number you
specified in the "count" argument, the data will be cut off after
count bytes (and an error will be returned).
Any shorter amount of data will be copied to your receive buffer
and the call to MPI_Recv will terminate successfully.

It is your responsibility to pass the correct value of "count".

If you expect data of 160 bytes you have to allocate a buffer
with a size greater or equal to 160 and you have to set your
"count" parameter to the size you allocated.

If you want to receive data in chunks, you have to send it in chunks.

I hope this helps
  Jody


On Tue, Aug 21, 2012 at 10:01 AM, devendra rai  wrote:
> Hello Jeff and Hristo,
>
> Now I am completely confused:
>
> So, let's say, the complete reception requires 8192 bytes. And, I have:
>
> MPI_Irecv(
> (void*)this->receivebuffer,/* the receive buffer */
> this->receive_packetsize,  /* 80 */
> MPI_BYTE,   /* The data type
> expected */
> this->transmittingnode,/* The node from which to
> receive */
> this->uniquetag,   /* Tag */
> MPI_COMM_WORLD, /* Communicator */
> &Irecv_request  /* request handle */
> );
>
>
> That means, the the MPI_Test will tell me that the reception is complete
> when I have received the first 80 bytes. Correct?
>
> Next, let[s say that I have a receive buffer with a capacity of 160 bytes,
> then, will overflow error occur here? Even if I have decided to receive a
> large payload in chunks of 80 bytes?
>
> I am sorry, the manual and the API reference was too vague for me.
>
> Thanks a lot
>
> Devendra
> 
> From: "Iliev, Hristo" 
> To: Open MPI Users 
> Cc: devendra rai 
> Sent: Tuesday, 21 August 2012, 9:48
> Subject: Re: [OMPI users] MPI_Irecv: Confusion with <> inputy
> parameter
>
> Jeff,
>
>>> Or is it the number of elements that are expected to be received, and
>>> hence MPI_Test will tell me that the receive is not complete untill "count"
>>> number of elements have not been received?
>>
>> Yes.
>
> Answering "Yes" this question might further the confusion there. The "count"
> argument specifies the *capacity* of the receive buffer and the receive
> operation (blocking or not) will complete successfully for any matching
> message with size up to "count", even for an empty message with 0 elements,
> and will produce an overflow error if the received message was longer and
> data truncation has to occur.
>
> On 20.08.2012, at 16:32, Jeff Squyres  wrote:
>
>> On Aug 20, 2012, at 5:51 AM, devendra rai wrote:
>>
>>> Is it the number of elements that have been received *thus far* in the
>>> buffer?
>>
>> No.
>>
>>> Or is it the number of elements that are expected to be received, and
>>> hence MPI_Test will tell me that the receive is not complete untill "count"
>>> number of elements have not been received?
>>
>> Yes.
>>
>>> Here's the reason why I have a problem (and I think I may be completely
>>> stupid here, I'd appreciate your patience):
>> [snip]
>>> Does anyone see what could be going wrong?
>>
>> Double check that the (sender_rank, tag, communicator) tuple that you
>> issued in the MPI_Irecv matches the (rank, tag, communicator) tuple from the
>> sender (tag and communicator are arguments on the sending side, and rank is
>> the rank of the sender in that communicator).
>>
>> When receives block like this without completing like this, it usually
>> means a mismatch between the tuples.
>>
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> --
> Hristo Iliev, Ph.D. -- High Performance Computing,
> RWTH Aachen University, Center for Computing and Communication
> Seffenter Weg 23,  D 52074  Aachen (Germany)
> Tel: +49 241 80 24367 -- Fax/UMS: +49 241 80 624367
>
>
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] MPI_Irecv: Confusion with <> inputy parameter

2012-08-21 Thread devendra rai
Hello Jeff and Hristo,

Now I am completely confused:

So, let's say, the complete reception requires 8192 bytes. And, I have:

MPI_Irecv(
    (void*)this->receivebuffer,/* the receive buffer */
    this->receive_packetsize,  /* 80 */
    MPI_BYTE,   /* The data type expected */
    this->transmittingnode,    /* The node from which to 
receive */
    this->uniquetag,   /* Tag */
    MPI_COMM_WORLD, /* Communicator */
    &Irecv_request  /* request handle */
    );



That means, the the MPI_Test will tell me that the reception is complete when I 
have received the first 80 bytes. Correct?

Next, let[s say that I have a receive buffer with a capacity of 160 bytes, 
then, will overflow error occur here? Even if I have decided to receive a large 
payload in chunks of 80 bytes?

I am sorry, the manual and the API reference was too vague for me.

Thanks a lot

Devendra



 From: "Iliev, Hristo" 
To: Open MPI Users  
Cc: devendra rai  
Sent: Tuesday, 21 August 2012, 9:48
Subject: Re: [OMPI users] MPI_Irecv: Confusion with <> inputy 
parameter
 
Jeff,

>> Or is it the number of elements that are expected to be received, and hence 
>> MPI_Test will tell me that the receive is not complete untill "count" number 
>> of elements have not been received?
> 
> Yes.

Answering "Yes" this question might further the confusion there. The "count" 
argument specifies the *capacity* of the receive buffer and the receive 
operation (blocking or not) will complete successfully for any matching message 
with size up to "count", even for an empty message with 0 elements, and will 
produce an overflow error if the received message was longer and data 
truncation has to occur.

On 20.08.2012, at 16:32, Jeff Squyres  wrote:

> On Aug 20, 2012, at 5:51 AM, devendra rai wrote:
> 
>> Is it the number of elements that have been received *thus far* in the 
>> buffer?
> 
> No.
> 
>> Or is it the number of elements that are expected to be received, and hence 
>> MPI_Test will tell me that the receive is not complete untill "count" number 
>> of elements have not been received?
> 
> Yes.
> 
>> Here's the reason why I have a problem (and I think I may be completely 
>> stupid here, I'd appreciate your patience):
> [snip]
>> Does anyone see what could be going wrong?
> 
> Double check that the (sender_rank, tag, communicator) tuple that you issued 
> in the MPI_Irecv matches the (rank, tag, communicator) tuple from the sender 
> (tag and communicator are arguments on the sending side, and rank is the rank 
> of the sender in that communicator).
> 
> When receives block like this without completing like this, it usually means 
> a mismatch between the tuples.
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

--
Hristo Iliev, Ph.D. -- High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241 80 24367 -- Fax/UMS: +49 241 80 624367

Re: [OMPI users] MPI_Irecv: Confusion with <> inputy parameter

2012-08-21 Thread Iliev, Hristo
Jeff,

>> Or is it the number of elements that are expected to be received, and hence 
>> MPI_Test will tell me that the receive is not complete untill "count" number 
>> of elements have not been received?
> 
> Yes.

Answering "Yes" this question might further the confusion there. The "count" 
argument specifies the *capacity* of the receive buffer and the receive 
operation (blocking or not) will complete successfully for any matching message 
with size up to "count", even for an empty message with 0 elements, and will 
produce an overflow error if the received message was longer and data 
truncation has to occur.

On 20.08.2012, at 16:32, Jeff Squyres  wrote:

> On Aug 20, 2012, at 5:51 AM, devendra rai wrote:
> 
>> Is it the number of elements that have been received *thus far* in the 
>> buffer?
> 
> No.
> 
>> Or is it the number of elements that are expected to be received, and hence 
>> MPI_Test will tell me that the receive is not complete untill "count" number 
>> of elements have not been received?
> 
> Yes.
> 
>> Here's the reason why I have a problem (and I think I may be completely 
>> stupid here, I'd appreciate your patience):
> [snip]
>> Does anyone see what could be going wrong?
> 
> Double check that the (sender_rank, tag, communicator) tuple that you issued 
> in the MPI_Irecv matches the (rank, tag, communicator) tuple from the sender 
> (tag and communicator are arguments on the sending side, and rank is the rank 
> of the sender in that communicator).
> 
> When receives block like this without completing like this, it usually means 
> a mismatch between the tuples.
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

--
Hristo Iliev, Ph.D. -- High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241 80 24367 -- Fax/UMS: +49 241 80 624367




smime.p7s
Description: S/MIME cryptographic signature