Hello,
I am developing a new BTL component (Open MPI v1.3.2) for a new 3D-torus interconnect. During a simple message transfer of 16362 B between two nodes with MPI_Send(), MPI_Recv() I encounter the following:

The sender:
-----------

1. prepare_src() size: 16304 reserve: 32
   -> alloc() size: 16336
   -> ompi_convertor_pack(): 16304
2. send()
3. component_progress()
   -> send cb ()
   -> free()
4. component_progress()
   -> recv cb ()
      -> prepare_src() size: 58 reserve: 32
         -> alloc() size: 90
         -> ompi_convertor_pack(): 58
      -> free() size: 90  Send is missing !!!
5. NO PROGRESS

The receiver:
-------------

1. component_progress()
   -> recv cb ()
      -> alloc() size: 32
      -> send()
2. component_progress()
   -> send cb ()
   -> free() size: 32
3. component_progress() for ever !!!

The problem is that after prepare_src() for the 2nd fragment, the
sender calls free() instead of send() in its recv cb. Thus, the 2nd
fragment is not being transmitted.
As a consequence, the receiver waits for the 2nd fragment.

I have found that mca_pml_ob1_recv_frag_callback_ack() is  the
corresponding recv cb. Before diving into the ob1 code,
could you tell me under which conditions this cb calls free() instead of send()
so that I can get an idea of where to look for errors in my BTL component.

Thank you very much in advance.

Sebastian Rinke

Reply via email to