Re: [OMPI devel] BTL receive callback

2009-07-23 Thread Sebastian Rinke


I am curious if you are indeed using a new interconnect (new  
hardware and protocol) or if it is requirements of the 3D-torus  
network that are not addressed by the openib btl that are driving  
the need for a new btl?


It is the first one.

Sebastian.


On 07/21/09 11:55, Sebastian Rinke wrote:

Hello,
I am developing a new BTL component (Open MPI v1.3.2) for a new  
3D-torus interconnect. During a simple message transfer of 16362 B  
between two nodes with MPI_Send(), MPI_Recv() I encounter the  
following:


The sender:
---

1. prepare_src() size: 16304 reserve: 32
-> alloc() size: 16336
-> ompi_convertor_pack(): 16304
2. send()
3. component_progress()
-> send cb ()
-> free()
4. component_progress()
-> recv cb ()
-> prepare_src() size: 58 reserve: 32
-> alloc() size: 90
-> ompi_convertor_pack(): 58
-> free() size: 90 Send is missing !!!
5. NO PROGRESS

The receiver:
-

1. component_progress()
-> recv cb ()
-> alloc() size: 32
-> send()
2. component_progress()
-> send cb ()
-> free() size: 32
3. component_progress() for ever !!!

The problem is that after prepare_src() for the 2nd fragment, the
sender calls free() instead of send() in its recv cb. Thus, the 2nd
fragment is not being transmitted.
As a consequence, the receiver waits for the 2nd fragment.

I have found that mca_pml_ob1_recv_frag_callback_ack() is the
corresponding recv cb. Before diving into the ob1 code,
could you tell me under which conditions this cb calls free()  
instead of send()

so that I can get an idea of where to look for errors in my BTL component.

Thank you very much in advance.

Sebastian Rinke

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel








Re: [OMPI devel] BTL receive callback

2009-07-22 Thread Don Kerr

Hello Sebastian,

Sounds like you are using the openib btl as a starting point, which is a 
good place to start. I am curious if you are indeed using a new 
interconnect (new hardware and protocol) or if it is requirements of the 
3D-torus network that are not addressed by the openib btl that are 
driving the need for a new btl?


-DON

On 07/21/09 11:55, Sebastian Rinke wrote:

Hello,
I am developing a new BTL component (Open MPI v1.3.2) for a new 
3D-torus interconnect. During a simple message transfer of 16362 B 
between two nodes with MPI_Send(), MPI_Recv() I encounter the following:


The sender:
---

1. prepare_src() size: 16304 reserve: 32
-> alloc() size: 16336
-> ompi_convertor_pack(): 16304
2. send()
3. component_progress()
-> send cb ()
-> free()
4. component_progress()
-> recv cb ()
-> prepare_src() size: 58 reserve: 32
-> alloc() size: 90
-> ompi_convertor_pack(): 58
-> free() size: 90 Send is missing !!!
5. NO PROGRESS

The receiver:
-

1. component_progress()
-> recv cb ()
-> alloc() size: 32
-> send()
2. component_progress()
-> send cb ()
-> free() size: 32
3. component_progress() for ever !!!

The problem is that after prepare_src() for the 2nd fragment, the
sender calls free() instead of send() in its recv cb. Thus, the 2nd
fragment is not being transmitted.
As a consequence, the receiver waits for the 2nd fragment.

I have found that mca_pml_ob1_recv_frag_callback_ack() is the
corresponding recv cb. Before diving into the ob1 code,
could you tell me under which conditions this cb calls free() instead 
of send()
so that I can get an idea of where to look for errors in my BTL 
component.


Thank you very much in advance.

Sebastian Rinke

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


Re: [OMPI devel] BTL receive callback

2009-07-21 Thread Sebastian Rinke

Thank you for your hint. I found that prepare_src() didn't
return the correct size, i.e. it did

ompi_convertor_pack(...,&max_data);
*size = max_data;

However, after ompi_convertor_pack(), max_data == 0 thus *size == 0
and free() is called without a prior send() in pml_ob1_sendreq.c:1064

I took this order from btl_openib.c's prepare_src().
So it seems that it doesn't cause any problems there but for me it does.

Thanks for your help.
Sebastian.


Quoting George Bosilca :

Based on your code the only reason I can imagine for the second send  
to never be triggered is that the request is considered completed at  
that point.


I can't imagine how the free is called without a prior send. If I  
look at the code pml_ob1_sendreq.c:1061, the free is only called  
when the send fails, but it is always preceded by a send.


Can you check the return values of the ompi_convertor_pack and  
prepare_src please?


  george.

On Jul 21, 2009, at 11:55 , Sebastian Rinke wrote:


Hello,
I am developing a new BTL component (Open MPI v1.3.2) for a new  
3D-torus interconnect. During a simple message transfer of 16362 B  
between two nodes with MPI_Send(), MPI_Recv() I encounter the  
following:


The sender:
---

1. prepare_src() size: 16304 reserve: 32
 -> alloc() size: 16336
 -> ompi_convertor_pack(): 16304
2. send()
3. component_progress()
 -> send cb ()
 -> free()
4. component_progress()
 -> recv cb ()
-> prepare_src() size: 58 reserve: 32
   -> alloc() size: 90
   -> ompi_convertor_pack(): 58
-> free() size: 90  Send is missing !!!
5. NO PROGRESS

The receiver:
-

1. component_progress()
 -> recv cb ()
-> alloc() size: 32
-> send()
2. component_progress()
 -> send cb ()
 -> free() size: 32
3. component_progress() for ever !!!

The problem is that after prepare_src() for the 2nd fragment, the
sender calls free() instead of send() in its recv cb. Thus, the 2nd
fragment is not being transmitted.
As a consequence, the receiver waits for the 2nd fragment.

I have found that mca_pml_ob1_recv_frag_callback_ack() is  the
corresponding recv cb. Before diving into the ob1 code,
could you tell me under which conditions this cb calls free()  
instead of send()

so that I can get an idea of where to look for errors in my BTL component.

Thank you very much in advance.

Sebastian Rinke

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel








Re: [OMPI devel] BTL receive callback

2009-07-21 Thread George Bosilca
Based on your code the only reason I can imagine for the second send  
to never be triggered is that the request is considered completed at  
that point.


I can't imagine how the free is called without a prior send. If I look  
at the code pml_ob1_sendreq.c:1061, the free is only called when the  
send fails, but it is always preceded by a send.


Can you check the return values of the ompi_convertor_pack and  
prepare_src please?


  george.

On Jul 21, 2009, at 11:55 , Sebastian Rinke wrote:


Hello,
I am developing a new BTL component (Open MPI v1.3.2) for a new 3D- 
torus interconnect. During a simple message transfer of 16362 B  
between two nodes with MPI_Send(), MPI_Recv() I encounter the  
following:


The sender:
---

1. prepare_src() size: 16304 reserve: 32
  -> alloc() size: 16336
  -> ompi_convertor_pack(): 16304
2. send()
3. component_progress()
  -> send cb ()
  -> free()
4. component_progress()
  -> recv cb ()
 -> prepare_src() size: 58 reserve: 32
-> alloc() size: 90
-> ompi_convertor_pack(): 58
 -> free() size: 90  Send is missing !!!
5. NO PROGRESS

The receiver:
-

1. component_progress()
  -> recv cb ()
 -> alloc() size: 32
 -> send()
2. component_progress()
  -> send cb ()
  -> free() size: 32
3. component_progress() for ever !!!

The problem is that after prepare_src() for the 2nd fragment, the
sender calls free() instead of send() in its recv cb. Thus, the 2nd
fragment is not being transmitted.
As a consequence, the receiver waits for the 2nd fragment.

I have found that mca_pml_ob1_recv_frag_callback_ack() is  the
corresponding recv cb. Before diving into the ob1 code,
could you tell me under which conditions this cb calls free()  
instead of send()
so that I can get an idea of where to look for errors in my BTL  
component.


Thank you very much in advance.

Sebastian Rinke

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




[OMPI devel] BTL receive callback

2009-07-21 Thread Sebastian Rinke

Hello,
I am developing a new BTL component (Open MPI v1.3.2) for a new  
3D-torus interconnect. During a simple message transfer of 16362 B  
between two nodes with MPI_Send(), MPI_Recv() I encounter the following:


The sender:
---

1. prepare_src() size: 16304 reserve: 32
   -> alloc() size: 16336
   -> ompi_convertor_pack(): 16304
2. send()
3. component_progress()
   -> send cb ()
   -> free()
4. component_progress()
   -> recv cb ()
  -> prepare_src() size: 58 reserve: 32
 -> alloc() size: 90
 -> ompi_convertor_pack(): 58
  -> free() size: 90  Send is missing !!!
5. NO PROGRESS

The receiver:
-

1. component_progress()
   -> recv cb ()
  -> alloc() size: 32
  -> send()
2. component_progress()
   -> send cb ()
   -> free() size: 32
3. component_progress() for ever !!!

The problem is that after prepare_src() for the 2nd fragment, the
sender calls free() instead of send() in its recv cb. Thus, the 2nd
fragment is not being transmitted.
As a consequence, the receiver waits for the 2nd fragment.

I have found that mca_pml_ob1_recv_frag_callback_ack() is  the
corresponding recv cb. Before diving into the ob1 code,
could you tell me under which conditions this cb calls free() instead  
of send()

so that I can get an idea of where to look for errors in my BTL component.

Thank you very much in advance.

Sebastian Rinke