I'm jumping into the middle of this conversation and probably don't have all the right context, so forgive me if this is a stupid question: did you set MPI_ERRORS_RETURN on the communicator in question?
On Dec 14, 2011, at 10:43 AM, Hugo Daniel Meyer wrote: > Hello George and @ll. > > Sorry for the late answer, but i was doing some trace to see where is set the > MPI_ERROR. I took a look to ompi_request_default_wait and try to see what > happen with request. > > Well, i've noticed that all requests that are not inmediately solved go to > ompi_request_wait_completion. But i don't know exactly where the execution > jumps when i inject a failure to the receiver of the message. After the > failure, the sender does not return from ompi_request_wait_completion to > ompi_request_default_wait, and i don't know where to catch when the > req->req_status.MPI_ERROR is set. Do you know where jumps the execution? or > at least in which error handler? > > Thanks in advance. > > Hugo > > 2011/12/9 George Bosilca <bosi...@eecs.utk.edu> > > On Dec 9, 2011, at 06:59 , Hugo Daniel Meyer wrote: > >> Hello George and all. >> >> I've been adapting some of the code to copy the request, and now i think >> that it is working ok. I'm storing the request as you do on the pessimist, >> but i'm only logging received messages, as my approach is a pessimist log >> based on the receiver. >> >> I do have a question about how you detect when you have to resend a message, >> or at least repost it? > > The error in the status attached to the request will be set in case of > failure. As the MPI error handler is triggered right before returning above > the MPI layer, at the level where you placed your interception you have all > the freedom you need to handle the faults. > > george. > >> >> Thanks for the help. >> >> Hugo >> >> 2011/11/19 Hugo Daniel Meyer <meyer.h...@gmail.com> >> >> >> 2011/11/18 George Bosilca <bosi...@eecs.utk.edu> >> >> On Nov 18, 2011, at 11:50 , Hugo Daniel Meyer wrote: >> >>> >>> 2011/11/18 George Bosilca <bosi...@eecs.utk.edu> >>> >>> On Nov 18, 2011, at 11:14 , Hugo Daniel Meyer wrote: >>> >>>> 2011/11/18 George Bosilca <bosi...@eecs.utk.edu> >>>> >>>> On Nov 18, 2011, at 07:29 , Hugo Daniel Meyer wrote: >>>> >>>>> Hello again. >>>>> >>>>> I was doing some trace into de PML_OB1 files. I start to follow a >>>>> MPI_Ssend() trying to find where a message is stored (in the sender) if >>>>> it is not send until the receiver post the recv, but i didn't find that >>>>> place. >>>> >>>> Right, you can't find this as the message is not stored on the sender. The >>>> pointer to the send request is sent encapsulated in the matching header, >>>> and the receiver will provide it back once the message has been matched >>>> (this means the data is now ready to flow). >>>> >>>> So, what you're saying is that the sender only sends the header, so when >>>> the receiver post the recv will send again the header so the sender starts >>>> with the data sent? am i getting it right? If this is ok, the data stays >>>> in the sender, but where it is stored? >>> >>> If we consider rendez-vous messages the data is remains in the sender >>> buffer (aka the buffer provided by the upper level to the MPI_Send >>> function). >>> >>> Yes, so i will only need to save the headears of the messages (where the >>> status is incomplete), and then maybe just call again the upper level >>> MP_Send. A question here, the headers are not marked as pending (at least i >>> think so), so, my only approach might be to create a list of pending >>> headers and store there the pointer to the send, then try to identify its >>> corresponding upper level MPI_Send and retries it in case of failure, is >>> this a correct approach? >> >> Look in the mca/vprotocol/base to see how we deal with the send requests in >> our message logging protocol. We hijack the send request list, and replace >> them with our own, allowing us to chain all active requests. This make the >> tracking of chive requests very simple, and minimize the impact on the >> overall code. >> >> george. >> >> >> Ok George. >> I will take a look there and then let you know how it goes. >> >> Thanks. >> >> Hugo >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/