Rich, There is no matching in this case. Canceling a receive operation is possible only up to the moment the request has been matched. Up to this point the sequence numbers of the peers are not used, so removing a non-matched request has no impact on the sequence number.
george. On Jul 26, 2012, at 16:31 , Richard Graham wrote: > I do not see any resetting of sequence numbers. It has been a long time > since I have looked at the matching code, so don't know if the out-of-order > handling has been taken out. If not, the sequence number has to be dealt > with in some manner, or else there will be a gap in the arriving sequence > numbers, and the matching logic will prevent any further progress. > > Rich > > -----Original Message----- > From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On > Behalf Of George Bosilca > Sent: Thursday, July 26, 2012 10:06 AM > To: Open MPI Developers > Subject: Re: [OMPI devel] [patch] MPI_Cancel should not cancel a request if > it has a matched recv frag > > Takahiro, > > Indeed we were way to lax on canceling the requests. I modified your patch to > correctly deal with the MEMCHECK macro (remove the call from the branch that > will requires a completion function). The modified patch is attached below. I > will commit asap. > > Thanks, > george. > > > Index: ompi/mca/pml/ob1/pml_ob1_recvreq.c > =================================================================== > --- ompi/mca/pml/ob1/pml_ob1_recvreq.c (revision 26870) > +++ ompi/mca/pml/ob1/pml_ob1_recvreq.c (working copy) > @@ -3,7 +3,7 @@ > * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana > * University Research and Technology > * Corporation. All rights reserved. > - * Copyright (c) 2004-2009 The University of Tennessee and The University > + * Copyright (c) 2004-2012 The University of Tennessee and The > + University > * of Tennessee Research Foundation. All rights > * reserved. > * Copyright (c) 2004-2008 High Performance Computing Center Stuttgart, @@ > -15,6 +15,7 @@ > * Copyright (c) 2012 NVIDIA Corporation. All rights reserved. > * Copyright (c) 2011-2012 Los Alamos National Security, LLC. All rights > * reserved. > + * Copyright (c) 2012 FUJITSU LIMITED. All rights reserved. > * $COPYRIGHT$ > * > * Additional copyrights may follow > @@ -97,36 +98,26 @@ > mca_pml_ob1_recv_request_t* request = > (mca_pml_ob1_recv_request_t*)ompi_request; > mca_pml_ob1_comm_t* comm = > request->req_recv.req_base.req_comm->c_pml_comm; > > - if( true == ompi_request->req_complete ) { /* way to late to cancel this > one */ > - /* > - * Receive request completed, make user buffer accessable. > - */ > - MEMCHECKER( > - memchecker_call(&opal_memchecker_base_mem_defined, > - request->req_recv.req_base.req_addr, > - request->req_recv.req_base.req_count, > - request->req_recv.req_base.req_datatype); > - ); > + if( true == request->req_match_received ) { /* way to late to cancel > this one */ > + assert( OMPI_ANY_TAG != ompi_request->req_status.MPI_TAG ); /* > + not matched isn't it */ > return OMPI_SUCCESS; > } > > /* The rest should be protected behind the match logic lock */ > OPAL_THREAD_LOCK(&comm->matching_lock); > - if( OMPI_ANY_TAG == ompi_request->req_status.MPI_TAG ) { /* the match > has not been already done */ > - if( request->req_recv.req_base.req_peer == OMPI_ANY_SOURCE ) { > - opal_list_remove_item( &comm->wild_receives, > (opal_list_item_t*)request ); > - } else { > - mca_pml_ob1_comm_proc_t* proc = comm->procs + > request->req_recv.req_base.req_peer; > - opal_list_remove_item(&proc->specific_receives, > (opal_list_item_t*)request); > - } > - PERUSE_TRACE_COMM_EVENT( PERUSE_COMM_REQ_REMOVE_FROM_POSTED_Q, > - &(request->req_recv.req_base), PERUSE_RECV ); > - /** > - * As now the PML is done with this request we have to force the > pml_complete > - * to true. Otherwise, the request will never be freed. > - */ > - request->req_recv.req_base.req_pml_complete = true; > + if( request->req_recv.req_base.req_peer == OMPI_ANY_SOURCE ) { > + opal_list_remove_item( &comm->wild_receives, > (opal_list_item_t*)request ); > + } else { > + mca_pml_ob1_comm_proc_t* proc = comm->procs + > request->req_recv.req_base.req_peer; > + opal_list_remove_item(&proc->specific_receives, > + (opal_list_item_t*)request); > } > + PERUSE_TRACE_COMM_EVENT( PERUSE_COMM_REQ_REMOVE_FROM_POSTED_Q, > + &(request->req_recv.req_base), PERUSE_RECV ); > + /** > + * As now the PML is done with this request we have to force the > pml_complete > + * to true. Otherwise, the request will never be freed. > + */ > + request->req_recv.req_base.req_pml_complete = true; > OPAL_THREAD_UNLOCK(&comm->matching_lock); > > OPAL_THREAD_LOCK(&ompi_request_lock); > @@ -138,7 +129,7 @@ > MCA_PML_OB1_RECV_REQUEST_MPI_COMPLETE(request); > OPAL_THREAD_UNLOCK(&ompi_request_lock); > /* > - * Receive request cancelled, make user buffer accessable. > + * Receive request cancelled, make user buffer accessible. > */ > MEMCHECKER( > memchecker_call(&opal_memchecker_base_mem_defined, > > On Jul 26, 2012, at 13:41 , Kawashima, Takahiro wrote: > >> Hi Open MPI developers, >> >> I found a small bug in Open MPI. >> >> See attached program cancelled.c. >> In this program, rank 1 tries to cancel a MPI_Irecv and calls a >> MPI_Recv instead if the cancellation succeeds. This program should >> terminate whether the cancellation succeeds or not. But it leads a >> deadlock in MPI_Recv after printing "MPI_Test_cancelled: 1". >> I confirmed it works fine with MPICH2. >> >> The problem is in mca_pml_ob1_recv_request_cancel function in >> ompi/mca/pml/ob1/pml_ob1_recvreq.c. It accepts the cancellation unless >> the request has been completed. I think it should not accept the >> cancellation if the request has been matched. If it want to accept the >> cancellation, it must push the recv frag to the unexpected message >> queue back and redo matching. Furthermore, the receive buffer must be >> reverted if the received message has been written to the receive >> buffer partially in a pipeline protocol. >> >> Attached patch cancel-recv.patch is a sample fix for this bug for Open >> MPI trunk. Though this patch has 65 lines, main modifications are >> adding one if-statement and deleting one if-statement. Other lines are >> just for indent alignment. >> I cannot confirm the MEMCHECKER part is correct. Could anyone review >> it before committing? >> >> Regards, >> >> Takahiro Kawashima, >> MPI development team, >> Fujitsu >> <cancelled.c><cancel-recv.patch><License.txt>_________________________ >> ______________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel