Re: [OMPI devel] BTL receive callback
I am curious if you are indeed using a new interconnect (new hardware and protocol) or if it is requirements of the 3D-torus network that are not addressed by the openib btl that are driving the need for a new btl? It is the first one. Sebastian. On 07/21/09 11:55, Sebastian Rinke wrote: Hello, I am developing a new BTL component (Open MPI v1.3.2) for a new 3D-torus interconnect. During a simple message transfer of 16362 B between two nodes with MPI_Send(), MPI_Recv() I encounter the following: The sender: --- 1. prepare_src() size: 16304 reserve: 32 -> alloc() size: 16336 -> ompi_convertor_pack(): 16304 2. send() 3. component_progress() -> send cb () -> free() 4. component_progress() -> recv cb () -> prepare_src() size: 58 reserve: 32 -> alloc() size: 90 -> ompi_convertor_pack(): 58 -> free() size: 90 Send is missing !!! 5. NO PROGRESS The receiver: - 1. component_progress() -> recv cb () -> alloc() size: 32 -> send() 2. component_progress() -> send cb () -> free() size: 32 3. component_progress() for ever !!! The problem is that after prepare_src() for the 2nd fragment, the sender calls free() instead of send() in its recv cb. Thus, the 2nd fragment is not being transmitted. As a consequence, the receiver waits for the 2nd fragment. I have found that mca_pml_ob1_recv_frag_callback_ack() is the corresponding recv cb. Before diving into the ob1 code, could you tell me under which conditions this cb calls free() instead of send() so that I can get an idea of where to look for errors in my BTL component. Thank you very much in advance. Sebastian Rinke ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] BTL receive callback
Hello Sebastian, Sounds like you are using the openib btl as a starting point, which is a good place to start. I am curious if you are indeed using a new interconnect (new hardware and protocol) or if it is requirements of the 3D-torus network that are not addressed by the openib btl that are driving the need for a new btl? -DON On 07/21/09 11:55, Sebastian Rinke wrote: Hello, I am developing a new BTL component (Open MPI v1.3.2) for a new 3D-torus interconnect. During a simple message transfer of 16362 B between two nodes with MPI_Send(), MPI_Recv() I encounter the following: The sender: --- 1. prepare_src() size: 16304 reserve: 32 -> alloc() size: 16336 -> ompi_convertor_pack(): 16304 2. send() 3. component_progress() -> send cb () -> free() 4. component_progress() -> recv cb () -> prepare_src() size: 58 reserve: 32 -> alloc() size: 90 -> ompi_convertor_pack(): 58 -> free() size: 90 Send is missing !!! 5. NO PROGRESS The receiver: - 1. component_progress() -> recv cb () -> alloc() size: 32 -> send() 2. component_progress() -> send cb () -> free() size: 32 3. component_progress() for ever !!! The problem is that after prepare_src() for the 2nd fragment, the sender calls free() instead of send() in its recv cb. Thus, the 2nd fragment is not being transmitted. As a consequence, the receiver waits for the 2nd fragment. I have found that mca_pml_ob1_recv_frag_callback_ack() is the corresponding recv cb. Before diving into the ob1 code, could you tell me under which conditions this cb calls free() instead of send() so that I can get an idea of where to look for errors in my BTL component. Thank you very much in advance. Sebastian Rinke ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] BTL receive callback
Thank you for your hint. I found that prepare_src() didn't return the correct size, i.e. it did ompi_convertor_pack(...,&max_data); *size = max_data; However, after ompi_convertor_pack(), max_data == 0 thus *size == 0 and free() is called without a prior send() in pml_ob1_sendreq.c:1064 I took this order from btl_openib.c's prepare_src(). So it seems that it doesn't cause any problems there but for me it does. Thanks for your help. Sebastian. Quoting George Bosilca : Based on your code the only reason I can imagine for the second send to never be triggered is that the request is considered completed at that point. I can't imagine how the free is called without a prior send. If I look at the code pml_ob1_sendreq.c:1061, the free is only called when the send fails, but it is always preceded by a send. Can you check the return values of the ompi_convertor_pack and prepare_src please? george. On Jul 21, 2009, at 11:55 , Sebastian Rinke wrote: Hello, I am developing a new BTL component (Open MPI v1.3.2) for a new 3D-torus interconnect. During a simple message transfer of 16362 B between two nodes with MPI_Send(), MPI_Recv() I encounter the following: The sender: --- 1. prepare_src() size: 16304 reserve: 32 -> alloc() size: 16336 -> ompi_convertor_pack(): 16304 2. send() 3. component_progress() -> send cb () -> free() 4. component_progress() -> recv cb () -> prepare_src() size: 58 reserve: 32 -> alloc() size: 90 -> ompi_convertor_pack(): 58 -> free() size: 90 Send is missing !!! 5. NO PROGRESS The receiver: - 1. component_progress() -> recv cb () -> alloc() size: 32 -> send() 2. component_progress() -> send cb () -> free() size: 32 3. component_progress() for ever !!! The problem is that after prepare_src() for the 2nd fragment, the sender calls free() instead of send() in its recv cb. Thus, the 2nd fragment is not being transmitted. As a consequence, the receiver waits for the 2nd fragment. I have found that mca_pml_ob1_recv_frag_callback_ack() is the corresponding recv cb. Before diving into the ob1 code, could you tell me under which conditions this cb calls free() instead of send() so that I can get an idea of where to look for errors in my BTL component. Thank you very much in advance. Sebastian Rinke ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] BTL receive callback
Based on your code the only reason I can imagine for the second send to never be triggered is that the request is considered completed at that point. I can't imagine how the free is called without a prior send. If I look at the code pml_ob1_sendreq.c:1061, the free is only called when the send fails, but it is always preceded by a send. Can you check the return values of the ompi_convertor_pack and prepare_src please? george. On Jul 21, 2009, at 11:55 , Sebastian Rinke wrote: Hello, I am developing a new BTL component (Open MPI v1.3.2) for a new 3D- torus interconnect. During a simple message transfer of 16362 B between two nodes with MPI_Send(), MPI_Recv() I encounter the following: The sender: --- 1. prepare_src() size: 16304 reserve: 32 -> alloc() size: 16336 -> ompi_convertor_pack(): 16304 2. send() 3. component_progress() -> send cb () -> free() 4. component_progress() -> recv cb () -> prepare_src() size: 58 reserve: 32 -> alloc() size: 90 -> ompi_convertor_pack(): 58 -> free() size: 90 Send is missing !!! 5. NO PROGRESS The receiver: - 1. component_progress() -> recv cb () -> alloc() size: 32 -> send() 2. component_progress() -> send cb () -> free() size: 32 3. component_progress() for ever !!! The problem is that after prepare_src() for the 2nd fragment, the sender calls free() instead of send() in its recv cb. Thus, the 2nd fragment is not being transmitted. As a consequence, the receiver waits for the 2nd fragment. I have found that mca_pml_ob1_recv_frag_callback_ack() is the corresponding recv cb. Before diving into the ob1 code, could you tell me under which conditions this cb calls free() instead of send() so that I can get an idea of where to look for errors in my BTL component. Thank you very much in advance. Sebastian Rinke ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
[OMPI devel] BTL receive callback
Hello, I am developing a new BTL component (Open MPI v1.3.2) for a new 3D-torus interconnect. During a simple message transfer of 16362 B between two nodes with MPI_Send(), MPI_Recv() I encounter the following: The sender: --- 1. prepare_src() size: 16304 reserve: 32 -> alloc() size: 16336 -> ompi_convertor_pack(): 16304 2. send() 3. component_progress() -> send cb () -> free() 4. component_progress() -> recv cb () -> prepare_src() size: 58 reserve: 32 -> alloc() size: 90 -> ompi_convertor_pack(): 58 -> free() size: 90 Send is missing !!! 5. NO PROGRESS The receiver: - 1. component_progress() -> recv cb () -> alloc() size: 32 -> send() 2. component_progress() -> send cb () -> free() size: 32 3. component_progress() for ever !!! The problem is that after prepare_src() for the 2nd fragment, the sender calls free() instead of send() in its recv cb. Thus, the 2nd fragment is not being transmitted. As a consequence, the receiver waits for the 2nd fragment. I have found that mca_pml_ob1_recv_frag_callback_ack() is the corresponding recv cb. Before diving into the ob1 code, could you tell me under which conditions this cb calls free() instead of send() so that I can get an idea of where to look for errors in my BTL component. Thank you very much in advance. Sebastian Rinke