Hi,

I'm trying to develop a btl for a custom NIC. I studied the btl.h file to understand the flow of calls that are expected to be implemented in my component. I'm using a simple test (which works like a charm with the TCP btl) to test my development, the code is a simple MPI_Send + MPI_Recv:

      MPI_Init(NULL, NULL);
      int world_rank;
      MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
      int world_size;
      MPI_Comm_size(MPI_COMM_WORLD, &world_size);
      int ping_pong_count = 1;
      int partner_rank = (world_rank + 1) % 2;
      printf("MY RANK: %d PARTNER: %d\n",world_rank,partner_rank);
        if (world_rank == 0) {
          ping_pong_count++;
          MPI_Send(&ping_pong_count, 1, MPI_INT, partner_rank, 0, MPI_COMM_WORLD);           printf("%d sent and incremented ping_pong_count %d to %d\n", world_rank, ping_pong_count, partner_rank);
        } else {
          MPI_Recv(&ping_pong_count, 1, MPI_INT, partner_rank, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
          printf("%d received ping_pong_count %d from %d\n",
                 world_rank, ping_pong_count, partner_rank);
        }
      MPI_Finalize();

I see that in my component's btl code the functions called during the "MPI_send" phase are:

 1. mca_btl_mycomp_add_procs
 2. mca_btl_mycomp_prepare_src
 3. mca_btl_mycomp_send (where I set the return to 1, so the send phase
    should be finished)

I see then the print inside the test:

    0 sent and incremented ping_pong_count 2 to 1

and this should conclude the MPI_Send phase.
Then I implemented in the btl_mycomp_component_progress function a call to:

    mca_btl_active_message_callback_t *reg = mca_btl_base_active_message_trigger + tag;
    reg->cbfunc(&my_btl->super, &desc);

I saw the same code in all the other BTLs and I thought this was enough to "unlock" the MPI_Recv "polling". But actually I see my test hangs, probably "waiting" for something that never happens (?).

I also took a look in the ob1 mca_pml_ob1_recv_frag_callback_match function (which I suppose to be the reg->cbfunc), and it seems to get to the end of the function, actually matching my frag.

So my question is: how can I say to the framework that I finished my work and so the function can return to the user application? What am I doing wrong?
Is there a way to understand where and what my code is waiting for?


Best

Reply via email to