I'm jumping into the middle of this conversation and probably don't have all 
the right context, so forgive me if this is a stupid question: did you set 
MPI_ERRORS_RETURN on the communicator in question?


On Dec 14, 2011, at 10:43 AM, Hugo Daniel Meyer wrote:

> Hello George and @ll.
> 
> Sorry for the late answer, but i was doing some trace to see where is set the 
> MPI_ERROR. I took a look to ompi_request_default_wait and try to see what 
> happen with request.
> 
> Well, i've noticed that all requests that are not inmediately solved go to 
> ompi_request_wait_completion. But i don't know exactly where the execution 
> jumps when i inject a failure to the receiver of the message. After the 
> failure, the sender does not return from ompi_request_wait_completion to 
> ompi_request_default_wait, and i don't know where to catch when the 
> req->req_status.MPI_ERROR is set. Do you know where jumps the execution? or 
> at least in which error handler?
> 
> Thanks in advance.
> 
> Hugo
> 
> 2011/12/9 George Bosilca <bosi...@eecs.utk.edu>
> 
> On Dec 9, 2011, at 06:59 , Hugo Daniel Meyer wrote:
> 
>> Hello George and all.
>> 
>> I've been adapting some of the code to copy the request, and now i think 
>> that it is working ok. I'm storing the request as you do on the pessimist, 
>> but i'm only logging received messages, as my approach is a pessimist log 
>> based on the receiver. 
>> 
>> I do have a question about how you detect when you have to resend a message, 
>> or at least repost it? 
> 
> The error in the status attached to the request will be set in case of 
> failure. As the MPI error handler is triggered right before returning above 
> the MPI layer, at the level where you placed your interception you have all 
> the freedom you need to handle the faults.
> 
>   george.
> 
>> 
>> Thanks for the help.
>> 
>> Hugo
>> 
>> 2011/11/19 Hugo Daniel Meyer <meyer.h...@gmail.com>
>> 
>> 
>> 2011/11/18 George Bosilca <bosi...@eecs.utk.edu>
>> 
>> On Nov 18, 2011, at 11:50 , Hugo Daniel Meyer wrote:
>> 
>>> 
>>> 2011/11/18 George Bosilca <bosi...@eecs.utk.edu>
>>> 
>>> On Nov 18, 2011, at 11:14 , Hugo Daniel Meyer wrote:
>>> 
>>>> 2011/11/18 George Bosilca <bosi...@eecs.utk.edu>
>>>> 
>>>> On Nov 18, 2011, at 07:29 , Hugo Daniel Meyer wrote:
>>>> 
>>>>> Hello again.
>>>>> 
>>>>> I was doing some trace into de PML_OB1 files. I start to follow a 
>>>>> MPI_Ssend() trying to find where a message is stored (in the sender) if 
>>>>> it is not send until the receiver post the recv, but i didn't find that 
>>>>> place. 
>>>> 
>>>> Right, you can't find this as the message is not stored on the sender. The 
>>>> pointer to the send request is sent encapsulated in the matching header, 
>>>> and the receiver will provide it back once the message has been matched 
>>>> (this means the data is now ready to flow).
>>>> 
>>>> So, what you're saying is that the sender only sends the header, so when 
>>>> the receiver post the recv will send again the header so the sender starts 
>>>> with the data sent? am i getting it right?  If this is ok, the data stays 
>>>> in the sender, but where it is stored?
>>> 
>>> If we consider rendez-vous messages the data is remains in the sender 
>>> buffer (aka the buffer provided by the upper level to the MPI_Send 
>>> function).
>>> 
>>> Yes, so i will only need to save the headears of the messages (where the 
>>> status is incomplete), and then maybe just call again the upper level 
>>> MP_Send. A question here, the headers are not marked as pending (at least i 
>>> think so), so, my only approach might be to create a list of pending 
>>> headers and store there the pointer to the send, then try to identify its 
>>> corresponding upper level MPI_Send and retries it in case of failure, is 
>>> this a correct approach? 
>> 
>> Look in the mca/vprotocol/base to see how we deal with the send requests in 
>> our message logging protocol. We hijack the send request list, and replace 
>> them with our own, allowing us to chain all active requests. This make the 
>> tracking of chive requests very simple, and minimize the impact on the 
>> overall code.
>> 
>>   george.
>> 
>> 
>> Ok George.
>> I will take a look there and then let you know how it goes.
>> 
>> Thanks.
>> 
>> Hugo 
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to