I do not see any resetting of sequence numbers.  It has been a long time since 
I have looked at the matching code, so don't know if the out-of-order handling 
has been taken out.  If not, the sequence number has to be dealt with in some 
manner, or else there will be a gap in the arriving sequence numbers, and the 
matching logic will prevent any further progress.

Rich

-----Original Message-----
From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On Behalf 
Of George Bosilca
Sent: Thursday, July 26, 2012 10:06 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] [patch] MPI_Cancel should not cancel a request if it 
has a matched recv frag

Takahiro,

Indeed we were way to lax on canceling the requests. I modified your patch to 
correctly deal with the MEMCHECK macro (remove the call from the branch that 
will requires a completion function). The modified patch is attached below. I 
will commit asap.

  Thanks,
    george.


Index: ompi/mca/pml/ob1/pml_ob1_recvreq.c
===================================================================
--- ompi/mca/pml/ob1/pml_ob1_recvreq.c  (revision 26870)
+++ ompi/mca/pml/ob1/pml_ob1_recvreq.c  (working copy)
@@ -3,7 +3,7 @@
  * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana
  *                         University Research and Technology
  *                         Corporation.  All rights reserved.
- * Copyright (c) 2004-2009 The University of Tennessee and The University
+ * Copyright (c) 2004-2012 The University of Tennessee and The 
+ University
  *                         of Tennessee Research Foundation.  All rights
  *                         reserved.
  * Copyright (c) 2004-2008 High Performance Computing Center Stuttgart, @@ 
-15,6 +15,7 @@
  * Copyright (c) 2012      NVIDIA Corporation.  All rights reserved.
  * Copyright (c) 2011-2012 Los Alamos National Security, LLC. All rights
  *                         reserved.
+ * Copyright (c) 2012      FUJITSU LIMITED.  All rights reserved.
  * $COPYRIGHT$
  *
  * Additional copyrights may follow
@@ -97,36 +98,26 @@
     mca_pml_ob1_recv_request_t* request = 
(mca_pml_ob1_recv_request_t*)ompi_request;
     mca_pml_ob1_comm_t* comm = request->req_recv.req_base.req_comm->c_pml_comm;

-    if( true == ompi_request->req_complete ) { /* way to late to cancel this 
one */
-        /*
-         * Receive request completed, make user buffer accessable.
-         */
-        MEMCHECKER(
-            memchecker_call(&opal_memchecker_base_mem_defined,
-                            request->req_recv.req_base.req_addr,
-                            request->req_recv.req_base.req_count,
-                            request->req_recv.req_base.req_datatype);
-        );
+    if( true == request->req_match_received ) { /* way to late to cancel this 
one */
+        assert( OMPI_ANY_TAG != ompi_request->req_status.MPI_TAG ); /* 
+ not matched isn't it */
         return OMPI_SUCCESS;
     }

     /* The rest should be protected behind the match logic lock */
     OPAL_THREAD_LOCK(&comm->matching_lock);
-    if( OMPI_ANY_TAG == ompi_request->req_status.MPI_TAG ) { /* the match has 
not been already done */
-       if( request->req_recv.req_base.req_peer == OMPI_ANY_SOURCE ) {
-          opal_list_remove_item( &comm->wild_receives, 
(opal_list_item_t*)request );
-       } else {
-          mca_pml_ob1_comm_proc_t* proc = comm->procs + 
request->req_recv.req_base.req_peer;
-          opal_list_remove_item(&proc->specific_receives, 
(opal_list_item_t*)request);
-       }
-       PERUSE_TRACE_COMM_EVENT( PERUSE_COMM_REQ_REMOVE_FROM_POSTED_Q,
-                                &(request->req_recv.req_base), PERUSE_RECV );
-       /**
-        * As now the PML is done with this request we have to force the 
pml_complete
-        * to true. Otherwise, the request will never be freed.
-        */
-       request->req_recv.req_base.req_pml_complete = true;
+    if( request->req_recv.req_base.req_peer == OMPI_ANY_SOURCE ) {
+        opal_list_remove_item( &comm->wild_receives, 
(opal_list_item_t*)request );
+    } else {
+        mca_pml_ob1_comm_proc_t* proc = comm->procs + 
request->req_recv.req_base.req_peer;
+        opal_list_remove_item(&proc->specific_receives, 
+ (opal_list_item_t*)request);
     }
+    PERUSE_TRACE_COMM_EVENT( PERUSE_COMM_REQ_REMOVE_FROM_POSTED_Q,
+                             &(request->req_recv.req_base), PERUSE_RECV );
+    /**
+     * As now the PML is done with this request we have to force the 
pml_complete
+     * to true. Otherwise, the request will never be freed.
+     */
+    request->req_recv.req_base.req_pml_complete = true;
     OPAL_THREAD_UNLOCK(&comm->matching_lock);

     OPAL_THREAD_LOCK(&ompi_request_lock);
@@ -138,7 +129,7 @@
     MCA_PML_OB1_RECV_REQUEST_MPI_COMPLETE(request);
     OPAL_THREAD_UNLOCK(&ompi_request_lock);
     /*
-     * Receive request cancelled, make user buffer accessable.
+     * Receive request cancelled, make user buffer accessible.
      */
     MEMCHECKER(
         memchecker_call(&opal_memchecker_base_mem_defined,

On Jul 26, 2012, at 13:41 , Kawashima, Takahiro wrote:

> Hi Open MPI developers,
> 
> I found a small bug in Open MPI.
> 
> See attached program cancelled.c.
> In this program, rank 1 tries to cancel a MPI_Irecv and calls a 
> MPI_Recv instead if the cancellation succeeds. This program should 
> terminate whether the cancellation succeeds or not. But it leads a 
> deadlock in MPI_Recv after printing "MPI_Test_cancelled: 1".
> I confirmed it works fine with MPICH2.
> 
> The problem is in mca_pml_ob1_recv_request_cancel function in 
> ompi/mca/pml/ob1/pml_ob1_recvreq.c. It accepts the cancellation unless 
> the request has been completed. I think it should not accept the 
> cancellation if the request has been matched. If it want to accept the 
> cancellation, it must push the recv frag to the unexpected message 
> queue back and redo matching. Furthermore, the receive buffer must be 
> reverted if the received message has been written to the receive 
> buffer partially in a pipeline protocol.
> 
> Attached patch cancel-recv.patch is a sample fix for this bug for Open 
> MPI trunk. Though this patch has 65 lines, main modifications are 
> adding one if-statement and deleting one if-statement. Other lines are 
> just for indent alignment.
> I cannot confirm the MEMCHECKER part is correct. Could anyone review 
> it before committing?
> 
> Regards,
> 
> Takahiro Kawashima,
> MPI development team,
> Fujitsu
> <cancelled.c><cancel-recv.patch><License.txt>_________________________
> ______________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to