Re: [OMPI devel] [bug/fix] correction of a small bug in conf file parsing

2013-06-21 Thread Piotr Lesnicki

Hi,

1) you are right, an eof in the middle of a  section must be
   handled explicitly, otherwise it ends by an unrelated error later,
   just as it currently does. By the way, here it ends with
   [no-options-error] which has no corresponding message in
   'help-opal-wrapper.txt'.

   I joined the patch to correct this  problem.

2) indeed, other lex files have similar patterns to 'keyval_lex.l', so
   we should correct them also. I will take a look at them.

Thanks,

Piotr


Le 21/06/2013 00:17, Jeff Squyres (jsquyres) a écrit :

Piotr --

Many thanks for the patch.  Sorry, our lex is quite a bit rusty, and it took us 
quite a while to look at this.  :-\

I have a few questions:

1. What happens if the file ends while in?  E.g., if a the last line of the file 
is "/* Hello".

2. Does this same kind of fixup need to be applied to the 5 other flex files in 
the OMPI source tree?



On May 30, 2013, at 11:30 AM, Piotr Lesnicki  
wrote:


Hi,

The parser of key/value configuration files (like
'openmpi-mca-params.conf') has some small bugs:

- a parsing error occurs when there is no new line at the end of the
  file (and the error shows while reading the next conf file)
- error messages display wrong line numbers
- error messages show nothing meaninful when a new line replaces an
  expected token

I attached a patch of the lex production rules of the keyval
parser to correct this.



# steps to reproduce (all versions):
$ cp $OPAL_PREFIX/etc/openmpi-mca-params.conf .
$ (head -n -1 openmpi-mca-params.conf ; tail -n1 openmpi-mca-params.conf | tr -d 
'\n')>  params.conf
$ export OMPI_MCA_mca_param_files=$PWD/params.conf
$ mpicc -v
[berlin73:00360] keyval parser: error 1 reading file 
/home_nfs/lesnickp/tmp/params.conf at line 160:
  #
[berlin73:00360] keyval parser: error 1 reading file 
/home_nfs/lesnickp/local/openmpi-1.6.3/share/openmpi/mpicc-wrapper-data.txt at 
line 1:
  # There can be multiple blocks of configuration data, chosen by
[...]


--
Piotr LESNICKI
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel





diff -r 8d3bf35f2294 -r 0c52ab670ae9 opal/util/keyval/keyval_lex.l
--- a/opal/util/keyval/keyval_lex.l	Mon Jun 17 20:02:40 2013 +0200
+++ b/opal/util/keyval/keyval_lex.l	Fri Jun 21 11:28:29 2013 +0200
@@ -66,6 +66,7 @@
 [^*\n]*   ; /* Eat up non '*'s */
 "*"+[^*/\n]*  ; /* Eat '*'s not followed by a '/' */
 \n { opal_util_keyval_yynewlines++; }
+<>{ BEGIN(INITIAL); return OPAL_UTIL_KEYVAL_PARSE_DONE; }
 "*"+"/"{ BEGIN(INITIAL); /* Done with Block Comment */ }

 {WHITE}*"="{WHITE}* { BEGIN(VALUE); return OPAL_UTIL_KEYVAL_PARSE_EQUAL; }



Re: [OMPI devel] [bug/fix] correction of a small bug in conf file parsing

2013-06-21 Thread Jeff Squyres (jsquyres)
On Jun 21, 2013, at 7:40 AM, Piotr Lesnicki  wrote:

> 1) you are right, an eof in the middle of a  section must be
>   handled explicitly, otherwise it ends by an unrelated error later,
>   just as it currently does. By the way, here it ends with
>   [no-options-error] which has no corresponding message in
>   'help-opal-wrapper.txt'.
> 
>   I joined the patch to correct this  problem.

Sweet.  Sounds like this was a long-standing problem.

> 2) indeed, other lex files have similar patterns to 'keyval_lex.l', so
>   we should correct them also. I will take a look at them.

Thanks!

Can you submit a combined patch when ready?

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI devel] [bug/fix] correction of a small bug in conf file parsing

2013-06-21 Thread Piotr Lesnicki

Le 21/06/2013 15:03, Jeff Squyres (jsquyres) a écrit :
Can you submit a combined patch when ready? 

Ok, i'll make a combined lex patch.




Re: [OMPI devel] Problem when using struct types at specific offsets

2013-06-21 Thread Thomas Jahns
Hello,

On 04/08/2013 04:08 PM, Thomas Jahns wrote:
> a colleague of mine has investigated a difficult problem we traced to OpenMPI
> giving incorrectly delivered data on some struct datatypes which use specific
> offsets (on the stack in our case but the problem can be reproduced when using
> specifically chosen slices of an array). Our library is used to aggregate
> several MPI communications in a generic and transparent manner and therefore 
> we
> need to be able to handle any combination of properly aligned offsets and
> component types.
> 
> The attached example program contains the necessary steps to reproduce the 
> problem:
> 
> 1. create the struct types in question
> 2. send/recv the data
> 3. compare to reference (said comparison works on several MPICH2 versions)

our IT service provider has applied the patch to openmpi 1.6.4 and the C
test-case I provided now works but the original code which uses a bigger number
of struct dataypes still fails.

Has anyone already discovered a potential problem with the fix provided in
r28319? I'm asking because developing the C test case is quite some amount of
work and is not easily reproducible with every Fortran compiler because it
depends on the stack layout.

Regards, Thomas
-- 
Thomas Jahns
DKRZ GmbH, Department: Application software

Deutsches Klimarechenzentrum
Bundesstraße 45a
D-20146 Hamburg

Phone: +49-40-460094-151
Fax: +49-40-460094-270
Email: Thomas Jahns 



smime.p7s
Description: S/MIME Cryptographic Signature


[OMPI devel] RGET issue when send is less than receive

2013-06-21 Thread Rolf vandeVaart
I ran into a hang in a test in which the sender sends less data than the 
receiver is expecting.  For example, the following shows the receiver expecting 
twice what the sender is sending.

Rank 0:  MPI_Send(buf, BUFSIZE, MPI_INT, 1, 99, MPI_COMM_WORLD)
Rank 1: MPI_Recv(buf, BUFSIZE*2,  MPI_INT, 0, 99, MPI_COMM_WORLD)

This is also reproducible using one of the intel tests and adjusting the eager 
value for the openib BTL.

Ø  mpirun -np 2 -host frick,frack -mca btl_openib_eager_limit 56 
MPI_Send_overtake_c

In most cases, this works just fine.  However, when the PML protocol used is 
the RGET protocol, the test hangs.   Below is a proposed fix for this issue.
I believe we want to be checking against req_bytes_packed rather than 
req_bytes_expected as req_bytes_expected is what the user originally told us.
Otherwise, with the current code, we never send a FIN message back to the 
sender.

Any thoughts?

[rvandevaart@sm065 ompi-trunk]$ svn diff ompi/mca/pml/ob1/pml_ob1_recvreq.c
Index: ompi/mca/pml/ob1/pml_ob1_recvreq.c
===
--- ompi/mca/pml/ob1/pml_ob1_recvreq.c(revision 28633)
+++ ompi/mca/pml/ob1/pml_ob1_recvreq.c (working copy)
@@ -335,7 +335,7 @@
 /* is receive request complete */
 OPAL_THREAD_ADD_SIZE_T(&recvreq->req_bytes_received, frag->rdma_length);
-if (recvreq->req_bytes_expected <= recvreq->req_bytes_received) {
+if (recvreq->req_recv.req_bytes_packed <= recvreq->req_bytes_received) {
 mca_pml_ob1_send_fin(recvreq->req_recv.req_base.req_proc,
   bml_btl,
  frag->rdma_hdr.hdr_rget.hdr_des,



---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---


Re: [OMPI devel] Problem when using struct types at specific offsets

2013-06-21 Thread George Bosilca
Thomas,

I'm not aware about any other issue with the datatypes.

There might an easy way to see what the issue with your application is. If you 
can debug your application, and know exactly which datatype has problems, then 
attach with gdb and call ompi_datatype_dump(type), where type is the datatype 
creating problems. With the resulting output it should be pretty easy to 
reproduce a test case and/or identify the problem.

  George.


On Jun 21, 2013, at 16:33 , Thomas Jahns  wrote:

> our IT service provider has applied the patch to openmpi 1.6.4 and the C
> test-case I provided now works but the original code which uses a bigger 
> number
> of struct dataypes still fails.
> 
> Has anyone already discovered a potential problem with the fix provided in
> r28319? I'm asking because developing the C test case is quite some amount of
> work and is not easily reproducible with every Fortran compiler because it
> depends on the stack layout.
> 
> Regards, Thomas
> -- 
> Thomas Jahns
> DKRZ GmbH, Department: Application software
> 
> Deutsches Klimarechenzentrum
> Bundesstraße 45a
> D-20146 Hamburg
> 
> Phone: +49-40-460094-151
> Fax: +49-40-460094-270
> Email: Thomas Jahns 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] RGET issue when send is less than receive

2013-06-21 Thread Nathan Hjelm
I thought I fixed this problem awhile back (though looking at the code its 
possible I never committed the fix). I will have to look through my local 
repository and see what happened to that fix. Your fix might not work correctly 
since a RGET can be broken up into multiple get operations. It may work, I 
would just need to test it to make sure.

-Nathan

On Fri, Jun 21, 2013 at 08:25:29AM -0700, Rolf vandeVaart wrote:
> I ran into a hang in a test in which the sender sends less data than the 
> receiver is expecting.  For example, the following shows the receiver 
> expecting twice what the sender is sending.
> 
> Rank 0:  MPI_Send(buf, BUFSIZE, MPI_INT, 1, 99, MPI_COMM_WORLD)
> Rank 1: MPI_Recv(buf, BUFSIZE*2,  MPI_INT, 0, 99, MPI_COMM_WORLD)
> 
> This is also reproducible using one of the intel tests and adjusting the 
> eager value for the openib BTL.
> 
> ?  mpirun -np 2 -host frick,frack -mca btl_openib_eager_limit 56 
> MPI_Send_overtake_c
> 
> In most cases, this works just fine.  However, when the PML protocol used is 
> the RGET protocol, the test hangs.   Below is a proposed fix for this issue.
> I believe we want to be checking against req_bytes_packed rather than 
> req_bytes_expected as req_bytes_expected is what the user originally told us.
> Otherwise, with the current code, we never send a FIN message back to the 
> sender.
> 
> Any thoughts?
> 
> [rvandevaart@sm065 ompi-trunk]$ svn diff ompi/mca/pml/ob1/pml_ob1_recvreq.c
> Index: ompi/mca/pml/ob1/pml_ob1_recvreq.c
> ===
> --- ompi/mca/pml/ob1/pml_ob1_recvreq.c(revision 28633)
> +++ ompi/mca/pml/ob1/pml_ob1_recvreq.c (working copy)
> @@ -335,7 +335,7 @@
>  /* is receive request complete */
>  OPAL_THREAD_ADD_SIZE_T(&recvreq->req_bytes_received, frag->rdma_length);
> -if (recvreq->req_bytes_expected <= recvreq->req_bytes_received) {
> +if (recvreq->req_recv.req_bytes_packed <= recvreq->req_bytes_received) {
>  mca_pml_ob1_send_fin(recvreq->req_recv.req_base.req_proc,
>bml_btl,
>   frag->rdma_hdr.hdr_rget.hdr_des,
> 
> 
> 
> ---
> This email message is for the sole use of the intended recipient(s) and may 
> contain
> confidential information.  Any unauthorized review, use, disclosure or 
> distribution
> is prohibited.  If you are not the intended recipient, please contact the 
> sender by
> reply email and destroy all copies of the original message.
> ---

> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] RGET issue when send is less than receive

2013-06-21 Thread George Bosilca
The amount of bytes received is atomically updated on the completion callback, 
and the completion test is clearly spelled-out int the 
recv_request_pml_complete_check function (of course minus the lock part). Rolf 
I think your patch is correct.

That being said req_bytes_expected is a special value, one that should only be 
used to check from truncation. Otherwise, req_bytes_packed is the value we 
should compare against.

  George.

On Jun 21, 2013, at 17:40 , Nathan Hjelm  wrote:

> I thought I fixed this problem awhile back (though looking at the code its 
> possible I never committed the fix). I will have to look through my local 
> repository and see what happened to that fix. Your fix might not work 
> correctly since a RGET can be broken up into multiple get operations. It may 
> work, I would just need to test it to make sure.
> 
> -Nathan
> 
> On Fri, Jun 21, 2013 at 08:25:29AM -0700, Rolf vandeVaart wrote:
>> I ran into a hang in a test in which the sender sends less data than the 
>> receiver is expecting.  For example, the following shows the receiver 
>> expecting twice what the sender is sending.
>> 
>> Rank 0:  MPI_Send(buf, BUFSIZE, MPI_INT, 1, 99, MPI_COMM_WORLD)
>> Rank 1: MPI_Recv(buf, BUFSIZE*2,  MPI_INT, 0, 99, MPI_COMM_WORLD)
>> 
>> This is also reproducible using one of the intel tests and adjusting the 
>> eager value for the openib BTL.
>> 
>> ?  mpirun -np 2 -host frick,frack -mca btl_openib_eager_limit 56 
>> MPI_Send_overtake_c
>> 
>> In most cases, this works just fine.  However, when the PML protocol used is 
>> the RGET protocol, the test hangs.   Below is a proposed fix for this issue.
>> I believe we want to be checking against req_bytes_packed rather than 
>> req_bytes_expected as req_bytes_expected is what the user originally told us.
>> Otherwise, with the current code, we never send a FIN message back to the 
>> sender.
>> 
>> Any thoughts?
>> 
>> [rvandevaart@sm065 ompi-trunk]$ svn diff ompi/mca/pml/ob1/pml_ob1_recvreq.c
>> Index: ompi/mca/pml/ob1/pml_ob1_recvreq.c
>> ===
>> --- ompi/mca/pml/ob1/pml_ob1_recvreq.c(revision 28633)
>> +++ ompi/mca/pml/ob1/pml_ob1_recvreq.c (working copy)
>> @@ -335,7 +335,7 @@
>> /* is receive request complete */
>> OPAL_THREAD_ADD_SIZE_T(&recvreq->req_bytes_received, frag->rdma_length);
>> -if (recvreq->req_bytes_expected <= recvreq->req_bytes_received) {
>> +if (recvreq->req_recv.req_bytes_packed <= recvreq->req_bytes_received) {
>> mca_pml_ob1_send_fin(recvreq->req_recv.req_base.req_proc,
>>   bml_btl,
>>  frag->rdma_hdr.hdr_rget.hdr_des,
>> 
>> 
>> 
>> ---
>> This email message is for the sole use of the intended recipient(s) and may 
>> contain
>> confidential information.  Any unauthorized review, use, disclosure or 
>> distribution
>> is prohibited.  If you are not the intended recipient, please contact the 
>> sender by
>> reply email and destroy all copies of the original message.
>> ---
> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] RGET issue when send is less than receive

2013-06-21 Thread Nathan Hjelm
Found my original fix (still don't know why I never pushed it) and I think 
George is correct. This should in both the single and multiple get cases.

-Nathan

On Fri, Jun 21, 2013 at 05:52:28PM +0200, George Bosilca wrote:
> The amount of bytes received is atomically updated on the completion 
> callback, and the completion test is clearly spelled-out int the 
> recv_request_pml_complete_check function (of course minus the lock part). 
> Rolf I think your patch is correct.
> 
> That being said req_bytes_expected is a special value, one that should only 
> be used to check from truncation. Otherwise, req_bytes_packed is the value we 
> should compare against.
> 
>   George.
> 
> On Jun 21, 2013, at 17:40 , Nathan Hjelm  wrote:
> 
> > I thought I fixed this problem awhile back (though looking at the code its 
> > possible I never committed the fix). I will have to look through my local 
> > repository and see what happened to that fix. Your fix might not work 
> > correctly since a RGET can be broken up into multiple get operations. It 
> > may work, I would just need to test it to make sure.
> > 
> > -Nathan
> > 
> > On Fri, Jun 21, 2013 at 08:25:29AM -0700, Rolf vandeVaart wrote:
> >> I ran into a hang in a test in which the sender sends less data than the 
> >> receiver is expecting.  For example, the following shows the receiver 
> >> expecting twice what the sender is sending.
> >> 
> >> Rank 0:  MPI_Send(buf, BUFSIZE, MPI_INT, 1, 99, MPI_COMM_WORLD)
> >> Rank 1: MPI_Recv(buf, BUFSIZE*2,  MPI_INT, 0, 99, MPI_COMM_WORLD)
> >> 
> >> This is also reproducible using one of the intel tests and adjusting the 
> >> eager value for the openib BTL.
> >> 
> >> ?  mpirun -np 2 -host frick,frack -mca btl_openib_eager_limit 56 
> >> MPI_Send_overtake_c
> >> 
> >> In most cases, this works just fine.  However, when the PML protocol used 
> >> is the RGET protocol, the test hangs.   Below is a proposed fix for this 
> >> issue.
> >> I believe we want to be checking against req_bytes_packed rather than 
> >> req_bytes_expected as req_bytes_expected is what the user originally told 
> >> us.
> >> Otherwise, with the current code, we never send a FIN message back to the 
> >> sender.
> >> 
> >> Any thoughts?
> >> 
> >> [rvandevaart@sm065 ompi-trunk]$ svn diff ompi/mca/pml/ob1/pml_ob1_recvreq.c
> >> Index: ompi/mca/pml/ob1/pml_ob1_recvreq.c
> >> ===
> >> --- ompi/mca/pml/ob1/pml_ob1_recvreq.c(revision 28633)
> >> +++ ompi/mca/pml/ob1/pml_ob1_recvreq.c (working copy)
> >> @@ -335,7 +335,7 @@
> >> /* is receive request complete */
> >> OPAL_THREAD_ADD_SIZE_T(&recvreq->req_bytes_received, 
> >> frag->rdma_length);
> >> -if (recvreq->req_bytes_expected <= recvreq->req_bytes_received) {
> >> +if (recvreq->req_recv.req_bytes_packed <= 
> >> recvreq->req_bytes_received) {
> >> mca_pml_ob1_send_fin(recvreq->req_recv.req_base.req_proc,
> >>   bml_btl,
> >>  frag->rdma_hdr.hdr_rget.hdr_des,
> >> 
> >> 
> >> 
> >> ---
> >> This email message is for the sole use of the intended recipient(s) and 
> >> may contain
> >> confidential information.  Any unauthorized review, use, disclosure or 
> >> distribution
> >> is prohibited.  If you are not the intended recipient, please contact the 
> >> sender by
> >> reply email and destroy all copies of the original message.
> >> ---
> > 
> >> ___
> >> devel mailing list
> >> de...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > 
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Re: [OMPI devel] RGET issue when send is less than receive

2013-06-21 Thread Jeff Squyres (jsquyres)
Does this need to go to v1.6?

On Jun 21, 2013, at 11:59 AM, Nathan Hjelm  wrote:

> Found my original fix (still don't know why I never pushed it) and I think 
> George is correct. This should in both the single and multiple get cases.
> 
> -Nathan
> 
> On Fri, Jun 21, 2013 at 05:52:28PM +0200, George Bosilca wrote:
>> The amount of bytes received is atomically updated on the completion 
>> callback, and the completion test is clearly spelled-out int the 
>> recv_request_pml_complete_check function (of course minus the lock part). 
>> Rolf I think your patch is correct.
>> 
>> That being said req_bytes_expected is a special value, one that should only 
>> be used to check from truncation. Otherwise, req_bytes_packed is the value 
>> we should compare against.
>> 
>>  George.
>> 
>> On Jun 21, 2013, at 17:40 , Nathan Hjelm  wrote:
>> 
>>> I thought I fixed this problem awhile back (though looking at the code its 
>>> possible I never committed the fix). I will have to look through my local 
>>> repository and see what happened to that fix. Your fix might not work 
>>> correctly since a RGET can be broken up into multiple get operations. It 
>>> may work, I would just need to test it to make sure.
>>> 
>>> -Nathan
>>> 
>>> On Fri, Jun 21, 2013 at 08:25:29AM -0700, Rolf vandeVaart wrote:
 I ran into a hang in a test in which the sender sends less data than the 
 receiver is expecting.  For example, the following shows the receiver 
 expecting twice what the sender is sending.
 
 Rank 0:  MPI_Send(buf, BUFSIZE, MPI_INT, 1, 99, MPI_COMM_WORLD)
 Rank 1: MPI_Recv(buf, BUFSIZE*2,  MPI_INT, 0, 99, MPI_COMM_WORLD)
 
 This is also reproducible using one of the intel tests and adjusting the 
 eager value for the openib BTL.
 
 ?  mpirun -np 2 -host frick,frack -mca btl_openib_eager_limit 56 
 MPI_Send_overtake_c
 
 In most cases, this works just fine.  However, when the PML protocol used 
 is the RGET protocol, the test hangs.   Below is a proposed fix for this 
 issue.
 I believe we want to be checking against req_bytes_packed rather than 
 req_bytes_expected as req_bytes_expected is what the user originally told 
 us.
 Otherwise, with the current code, we never send a FIN message back to the 
 sender.
 
 Any thoughts?
 
 [rvandevaart@sm065 ompi-trunk]$ svn diff ompi/mca/pml/ob1/pml_ob1_recvreq.c
 Index: ompi/mca/pml/ob1/pml_ob1_recvreq.c
 ===
 --- ompi/mca/pml/ob1/pml_ob1_recvreq.c(revision 28633)
 +++ ompi/mca/pml/ob1/pml_ob1_recvreq.c (working copy)
 @@ -335,7 +335,7 @@
/* is receive request complete */
OPAL_THREAD_ADD_SIZE_T(&recvreq->req_bytes_received, frag->rdma_length);
 -if (recvreq->req_bytes_expected <= recvreq->req_bytes_received) {
 +if (recvreq->req_recv.req_bytes_packed <= 
 recvreq->req_bytes_received) {
mca_pml_ob1_send_fin(recvreq->req_recv.req_base.req_proc,
  bml_btl,
 frag->rdma_hdr.hdr_rget.hdr_des,
 
 
 
 ---
 This email message is for the sole use of the intended recipient(s) and 
 may contain
 confidential information.  Any unauthorized review, use, disclosure or 
 distribution
 is prohibited.  If you are not the intended recipient, please contact the 
 sender by
 reply email and destroy all copies of the original message.
 ---
>>> 
 ___
 devel mailing list
 de...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI devel] RGET issue when send is less than receive

2013-06-21 Thread Nathan Hjelm
I don't think so. The Mellanox change that caused this issue should not be in 
1.6.

-Nathan

On Fri, Jun 21, 2013 at 05:18:16PM +, Jeff Squyres (jsquyres) wrote:
> Does this need to go to v1.6?
> 
> On Jun 21, 2013, at 11:59 AM, Nathan Hjelm  wrote:
> 
> > Found my original fix (still don't know why I never pushed it) and I think 
> > George is correct. This should in both the single and multiple get cases.
> > 
> > -Nathan
> > 
> > On Fri, Jun 21, 2013 at 05:52:28PM +0200, George Bosilca wrote:
> >> The amount of bytes received is atomically updated on the completion 
> >> callback, and the completion test is clearly spelled-out int the 
> >> recv_request_pml_complete_check function (of course minus the lock part). 
> >> Rolf I think your patch is correct.
> >> 
> >> That being said req_bytes_expected is a special value, one that should 
> >> only be used to check from truncation. Otherwise, req_bytes_packed is the 
> >> value we should compare against.
> >> 
> >>  George.
> >> 
> >> On Jun 21, 2013, at 17:40 , Nathan Hjelm  wrote:
> >> 
> >>> I thought I fixed this problem awhile back (though looking at the code 
> >>> its possible I never committed the fix). I will have to look through my 
> >>> local repository and see what happened to that fix. Your fix might not 
> >>> work correctly since a RGET can be broken up into multiple get 
> >>> operations. It may work, I would just need to test it to make sure.
> >>> 
> >>> -Nathan
> >>> 
> >>> On Fri, Jun 21, 2013 at 08:25:29AM -0700, Rolf vandeVaart wrote:
>  I ran into a hang in a test in which the sender sends less data than the 
>  receiver is expecting.  For example, the following shows the receiver 
>  expecting twice what the sender is sending.
>  
>  Rank 0:  MPI_Send(buf, BUFSIZE, MPI_INT, 1, 99, MPI_COMM_WORLD)
>  Rank 1: MPI_Recv(buf, BUFSIZE*2,  MPI_INT, 0, 99, MPI_COMM_WORLD)
>  
>  This is also reproducible using one of the intel tests and adjusting the 
>  eager value for the openib BTL.
>  
>  ?  mpirun -np 2 -host frick,frack -mca btl_openib_eager_limit 56 
>  MPI_Send_overtake_c
>  
>  In most cases, this works just fine.  However, when the PML protocol 
>  used is the RGET protocol, the test hangs.   Below is a proposed fix for 
>  this issue.
>  I believe we want to be checking against req_bytes_packed rather than 
>  req_bytes_expected as req_bytes_expected is what the user originally 
>  told us.
>  Otherwise, with the current code, we never send a FIN message back to 
>  the sender.
>  
>  Any thoughts?
>  
>  [rvandevaart@sm065 ompi-trunk]$ svn diff 
>  ompi/mca/pml/ob1/pml_ob1_recvreq.c
>  Index: ompi/mca/pml/ob1/pml_ob1_recvreq.c
>  ===
>  --- ompi/mca/pml/ob1/pml_ob1_recvreq.c(revision 28633)
>  +++ ompi/mca/pml/ob1/pml_ob1_recvreq.c (working copy)
>  @@ -335,7 +335,7 @@
> /* is receive request complete */
> OPAL_THREAD_ADD_SIZE_T(&recvreq->req_bytes_received, 
>  frag->rdma_length);
>  -if (recvreq->req_bytes_expected <= recvreq->req_bytes_received) {
>  +if (recvreq->req_recv.req_bytes_packed <= 
>  recvreq->req_bytes_received) {
> mca_pml_ob1_send_fin(recvreq->req_recv.req_base.req_proc,
>   bml_btl,
>  frag->rdma_hdr.hdr_rget.hdr_des,
>  
>  
>  
>  ---
>  This email message is for the sole use of the intended recipient(s) and 
>  may contain
>  confidential information.  Any unauthorized review, use, disclosure or 
>  distribution
>  is prohibited.  If you are not the intended recipient, please contact 
>  the sender by
>  reply email and destroy all copies of the original message.
>  ---
> >>> 
>  ___
>  devel mailing list
>  de...@open-mpi.org
>  http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>> 
> >>> ___
> >>> devel mailing list
> >>> de...@open-mpi.org
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >> 
> >> 
> >> ___
> >> devel mailing list
> >> de...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-