it seems I misunderstood some things ... add_procs is always invoked, regardless the cutoff value. cutoff is used to retrieve processes info via the modex "on demand" vs at init time.
Please someone correct me and/or elaborate if needed Cheers, Gilles On Monday, May 16, 2016, Gilles Gouaillardet <gil...@rist.or.jp> wrote: > i cannot reproduce this behavior. > > note mca_btl_tcp_add_procs is invoked once per tcp component (e.g. once > per physical NIC) > > so you might want to explicitly select one nic > > mpirun --mca btl_tcp_if_include xxx ... > > my printf output are the same and regardless the mpi_add_procs_cutoff value > > > Cheers, > > > Gilles > On 5/16/2016 12:22 AM, dpchoudh . wrote: > > Sorry, I accidentally pressed 'Send' before I was done writing the last > mail. What I wanted to ask was what is the parameter mpi_add_procs_cutoff > and why adding it seems to make a difference in the code path but not in > the end result of the program? How would it help me debug my problem? > > Thank you > Durga > > The surgeon general advises you to eat right, exercise regularly and quit > ageing. > > On Sun, May 15, 2016 at 11:17 AM, dpchoudh . <dpcho...@gmail.com > <javascript:_e(%7B%7D,'cvml','dpcho...@gmail.com');>> wrote: > >> Hello Gilles >> >> Setting -mca mpi_add_procs_cutoff 1024 indeed makes a difference to the >> output, as follows: >> >> With -mca mpi_add_procs_cutoff 1024: >> reachable = 0x1 >> (Note that add_procs was called once and the value of 'reachable is >> correct') >> >> Without -mca mpi_add_procs_cutoff 1024 >> reachable = 0x0 >> reachable = NULL >> reachable = NULL >> (Note that add_procs() was caklled three times and the value of >> 'reachable' seems wrong. >> >> The program does run correctly in either case. The program listing is as >> below (note that I have removed output from the program itself in the above >> reporting.) >> >> The code that prints 'reachable' is as follows: >> >> if (reachable == NULL) >> printf("reachable = NULL\n"); >> else >> { >> int i; >> printf("reachable = "); >> for (i = 0; i < reachable->array_size; i++) >> printf("\t0x%llu", reachable->bitmap[i]); >> printf("\n\n"); >> } >> return OPAL_SUCCESS; >> >> And the code for the test program is as follows: >> >> #include <mpi.h> >> #include <stdio.h> >> #include <string.h> >> #include <stdlib.h> >> >> int main(int argc, char *argv[]) >> { >> int world_size, world_rank, name_len; >> char hostname[MPI_MAX_PROCESSOR_NAME], buf[8]; >> >> MPI_Init(&argc, &argv); >> MPI_Comm_size(MPI_COMM_WORLD, &world_size); >> MPI_Comm_rank(MPI_COMM_WORLD, &world_rank); >> MPI_Get_processor_name(hostname, &name_len); >> printf("Hello world from processor %s, rank %d out of %d >> processors\n", hostname, world_rank, world_size); >> if (world_rank == 1) >> { >> MPI_Recv(buf, 6, MPI_CHAR, 0, 99, MPI_COMM_WORLD, MPI_STATUS_IGNORE); >> printf("%s received %s, rank %d\n", hostname, buf, world_rank); >> } >> else >> { >> strcpy(buf, "haha!"); >> MPI_Send(buf, 6, MPI_CHAR, 1, 99, MPI_COMM_WORLD); >> printf("%s sent %s, rank %d\n", hostname, buf, world_rank); >> } >> MPI_Barrier(MPI_COMM_WORLD); >> MPI_Finalize(); >> return 0; >> } >> >> >> >> The surgeon general advises you to eat right, exercise regularly and quit >> ageing. >> >> On Sun, May 15, 2016 at 10:49 AM, Gilles Gouaillardet < >> <javascript:_e(%7B%7D,'cvml','gilles.gouaillar...@gmail.com');> >> gilles.gouaillar...@gmail.com >> <javascript:_e(%7B%7D,'cvml','gilles.gouaillar...@gmail.com');>> wrote: >> >>> At first glance, that seems a bit odd... >>> are you sure you correctly print the reachable bitmap ? >>> I would suggest you add some instrumentation to understand what happens >>> (e.g., printf before opal_bitmap_set_bit() and other places that prevent >>> this from happening) >>> >>> one more thing ... >>> now, master default behavior is >>> mpirun --mca mpi_add_procs_cutoff 0 ... >>> you might want to try >>> mpirun --mca mpi_add_procs_cutoff 1024 ... >>> and see if things make more sense. >>> if it helps, and iirc, there is a parameter so a btl can report it does >>> not support cutoff. >>> >>> >>> Cheers, >>> >>> Gilles >>> >>> On Sunday, May 15, 2016, dpchoudh . < >>> <javascript:_e(%7B%7D,'cvml','dpcho...@gmail.com');>dpcho...@gmail.com >>> <javascript:_e(%7B%7D,'cvml','dpcho...@gmail.com');>> wrote: >>> >>>> Hello Gilles >>>> >>>> Thanks for jumping in to help again. Actually, I had already tried some >>>> of your suggestions before asking for help. >>>> >>>> I have several interconnects that can run both openib and tcp BTL. To >>>> simplify things, I explicitly mentioned TCP: >>>> >>>> mpirun -np 2 -hostfile ~/hostfile -mca pml ob1 -mca btl self.tcp >>>> ./mpitest >>>> >>>> where mpitest is a small program that does MPI_Send()/MPI_Recv() on a >>>> small string, and then does an MPI_Barrier(). The program does work as >>>> expected. >>>> >>>> I put a printf on the last line of mca_tcp_add_procs() to print the >>>> value of 'reachable'. What I saw was that the value was always 0 when it >>>> was invoked for Send()/Recv() and the pointer itself was NULL when invoked >>>> for Barrier() >>>> >>>> Next I looked at pml_ob1_add_procs(), where the call chain starts, and >>>> found that it initializes and passes an opal_bitmap_t reachable down the >>>> call chain, but the resulting value is not used later in the code (the >>>> memory is simply freed later). >>>> >>>> That, coupled with the fact that I am trying to imitate what the other >>>> BTL implementations are doing, yet in mca_bml_r2_endpoint_add_btl() by BTL >>>> is not being picked up, left me puzzled. Please note that the interconnect >>>> that I am developing for is on a different cluster (than where I ran the >>>> above test for TCP BTL.) >>>> >>>> Thanks again >>>> Durga >>>> >>>> The surgeon general advises you to eat right, exercise regularly and >>>> quit ageing. >>>> >>>> On Sun, May 15, 2016 at 10:20 AM, Gilles Gouaillardet < >>>> gilles.gouaillar...@gmail.com >>>> <javascript:_e(%7B%7D,'cvml','gilles.gouaillar...@gmail.com');>> wrote: >>>> >>>>> did you check the add_procs callbacks ? >>>>> (e.g. mca_btl_tcp_add_procs() for the tcp btl) >>>>> this is where the reachable bitmap is set, and I guess this is what >>>>> you are looking for. >>>>> >>>>> keep in mind that if several btl can be used, the one with the higher >>>>> exclusivity is used >>>>> (e.g. tcp is never used if openib is available) >>>>> you can simply force your btl and self, and the ob1 pml, so you do not >>>>> have to worry about other btl exclusivity. >>>>> >>>>> Cheers, >>>>> >>>>> Gilles >>>>> >>>>> >>>>> On Sunday, May 15, 2016, dpchoudh . <dpcho...@gmail.com >>>>> <javascript:_e(%7B%7D,'cvml','dpcho...@gmail.com');>> wrote: >>>>> >>>>>> Hello all >>>>>> >>>>>> I have been struggling with this issue for a while and figured it >>>>>> might be a good idea to ask for help. >>>>>> >>>>>> Where (in the code path) is the connectivity map created? >>>>>> >>>>>> I can see that it is *used* in mca_bml_r2_endpoint_add_btl(), but >>>>>> obviously I am not setting it up right, because this routine is not >>>>>> finding >>>>>> the BTL corresponding to my interconnect. >>>>>> >>>>>> Thanks in advance >>>>>> Durga >>>>>> >>>>>> The surgeon general advises you to eat right, exercise regularly and >>>>>> quit ageing. >>>>>> >>>>> >>>>> _______________________________________________ >>>>> devel mailing list >>>>> de...@open-mpi.org >>>>> Subscription: <https://www.open-mpi.org/mailman/listinfo.cgi/devel> >>>>> https://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> Link to this post: >>>>> <http://www.open-mpi.org/community/lists/devel/2016/05/18975.php> >>>>> http://www.open-mpi.org/community/lists/devel/2016/05/18975.php >>>>> >>>> >>>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org <javascript:_e(%7B%7D,'cvml','de...@open-mpi.org');> >>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel >>> Link to this post: >>> http://www.open-mpi.org/community/lists/devel/2016/05/18977.php >>> >> >> > > > _______________________________________________ > devel mailing listde...@open-mpi.org > <javascript:_e(%7B%7D,'cvml','de...@open-mpi.org');> > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2016/05/18979.php > > >