Sorry for the huge delay in replies -- it's summer / vacation season, and I 
think we (as a community) are a little behind in answering some of these 
emails.  :-(

It's been quite a while since I have been in the depths of BTL internals; I'm 
afraid I don't remember the details offhand.

When I was writing the usnic BTL, I know I found it useful to attach a debugger 
on the sending and/or receiving side processes, and actually step through both 
my BTL code and the OB1 PML code to see what was happening.  I frequently found 
that either my BTL wasn't correctly accounting for network conditions, or it 
wasn't passing information up to OB1 that it expected (e.g., it passed the 
wrong length, or the wrong ID number, or ...something else).  You can actually 
follow what happens in OB1 when your BTL invokes the cbfunc -- does it find a 
corresponding MPI_Request, and does it mark it complete?  Or does it put your 
incoming fragment as an unexpected message for some reason, and put it on the 
unexpected queue?  Look for that kind of stuff.

-- 
Jeff Squyres
jsquy...@cisco.com

________________________________________
From: devel <devel-boun...@lists.open-mpi.org> on behalf of Michele Martinelli 
via devel <devel@lists.open-mpi.org>
Sent: Saturday, July 23, 2022 9:04 AM
To: devel@lists.open-mpi.org
Cc: Michele Martinelli
Subject: [OMPI devel] How to progress MPI_Recv using custom BTL for NIC under 
development

Hi,

I'm trying to develop a btl for a custom NIC. I studied the btl.h file
to understand the flow of calls that are expected to be implemented in
my component. I'm using a simple test (which works like a charm with the
TCP btl) to test my development, the code is a simple MPI_Send + MPI_Recv:

       MPI_Init(NULL, NULL);
       int world_rank;
       MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
       int world_size;
       MPI_Comm_size(MPI_COMM_WORLD, &world_size);
       int ping_pong_count = 1;
       int partner_rank = (world_rank + 1) % 2;
       printf("MY RANK: %d PARTNER: %d\n",world_rank,partner_rank);
         if (world_rank == 0) {
           ping_pong_count++;
           MPI_Send(&ping_pong_count, 1, MPI_INT, partner_rank, 0,
MPI_COMM_WORLD);
           printf("%d sent and incremented ping_pong_count %d to %d\n",
world_rank, ping_pong_count, partner_rank);
         } else {
           MPI_Recv(&ping_pong_count, 1, MPI_INT, partner_rank, 0,
MPI_COMM_WORLD, MPI_STATUS_IGNORE);
           printf("%d received ping_pong_count %d from %d\n",
                  world_rank, ping_pong_count, partner_rank);
         }
       MPI_Finalize();

I see that in my component's btl code the functions called during the
"MPI_send" phase are:

  1. mca_btl_mycomp_add_procs
  2. mca_btl_mycomp_prepare_src
  3. mca_btl_mycomp_send (where I set the return to 1, so the send phase
     should be finished)

I see then the print inside the test:

     0 sent and incremented ping_pong_count 2 to 1

and this should conclude the MPI_Send phase.
Then I implemented in the btl_mycomp_component_progress function a call to:

     mca_btl_active_message_callback_t *reg =
mca_btl_base_active_message_trigger + tag;
     reg->cbfunc(&my_btl->super, &desc);

I saw the same code in all the other BTLs and I thought this was enough
to "unlock" the MPI_Recv "polling". But actually I see my test hangs,
probably "waiting" for something that never happens (?).

I also took a look in the ob1 mca_pml_ob1_recv_frag_callback_match
function (which I suppose to be the reg->cbfunc), and it seems to get to
the end of the function, actually matching my frag.

So my question is: how can I say to the framework that I finished my
work and so the function can return to the user application? What am I
doing wrong?
Is there a way to understand where and what my code is waiting for?


Best

Reply via email to