it seems I misunderstood some things ...

add_procs is always invoked, regardless the cutoff value.
cutoff is used to retrieve processes info via the modex "on demand" vs at
init time.

Please someone correct me and/or elaborate if needed

Cheers,

Gilles

On Monday, May 16, 2016, Gilles Gouaillardet <gil...@rist.or.jp> wrote:

> i cannot reproduce this behavior.
>
> note mca_btl_tcp_add_procs is invoked once per tcp component (e.g. once
> per physical NIC)
>
> so you might want to explicitly select one nic
>
> mpirun --mca btl_tcp_if_include xxx ...
>
> my printf output are the same and regardless the mpi_add_procs_cutoff value
>
>
> Cheers,
>
>
> Gilles
> On 5/16/2016 12:22 AM, dpchoudh . wrote:
>
> Sorry, I accidentally pressed 'Send' before I was done writing the last
> mail. What I wanted to ask was what is the parameter mpi_add_procs_cutoff
> and why adding it seems to make a difference in the code path but not in
> the end result of the program? How would it help me debug my problem?
>
> Thank you
> Durga
>
> The surgeon general advises you to eat right, exercise regularly and quit
> ageing.
>
> On Sun, May 15, 2016 at 11:17 AM, dpchoudh . <dpcho...@gmail.com
> <javascript:_e(%7B%7D,'cvml','dpcho...@gmail.com');>> wrote:
>
>> Hello Gilles
>>
>> Setting -mca mpi_add_procs_cutoff 1024 indeed makes a difference to the
>> output, as follows:
>>
>> With -mca mpi_add_procs_cutoff 1024:
>> reachable =     0x1
>> (Note that add_procs was called once and the value of 'reachable is
>> correct')
>>
>> Without -mca mpi_add_procs_cutoff 1024
>> reachable =     0x0
>> reachable = NULL
>> reachable = NULL
>> (Note that add_procs() was caklled three times and the value of
>> 'reachable' seems wrong.
>>
>> The program does run correctly in either case. The program listing is as
>> below (note that I have removed output from the program itself in the above
>> reporting.)
>>
>> The code that prints 'reachable' is as follows:
>>
>> if (reachable == NULL)
>>     printf("reachable = NULL\n");
>> else
>> {
>>     int i;
>>     printf("reachable = ");
>>     for (i = 0; i < reachable->array_size; i++)
>>     printf("\t0x%llu", reachable->bitmap[i]);
>>     printf("\n\n");
>> }
>> return OPAL_SUCCESS;
>>
>> And the code for the test program is as follows:
>>
>> #include <mpi.h>
>> #include <stdio.h>
>> #include <string.h>
>> #include <stdlib.h>
>>
>> int main(int argc, char *argv[])
>> {
>>     int world_size, world_rank, name_len;
>>     char hostname[MPI_MAX_PROCESSOR_NAME], buf[8];
>>
>>     MPI_Init(&argc, &argv);
>>     MPI_Comm_size(MPI_COMM_WORLD, &world_size);
>>     MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
>>     MPI_Get_processor_name(hostname, &name_len);
>>     printf("Hello world from processor %s, rank %d out of %d
>> processors\n", hostname, world_rank, world_size);
>>     if (world_rank == 1)
>>     {
>>     MPI_Recv(buf, 6, MPI_CHAR, 0, 99, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
>>     printf("%s received %s, rank %d\n", hostname, buf, world_rank);
>>     }
>>     else
>>     {
>>     strcpy(buf, "haha!");
>>     MPI_Send(buf, 6, MPI_CHAR, 1, 99, MPI_COMM_WORLD);
>>     printf("%s sent %s, rank %d\n", hostname, buf, world_rank);
>>     }
>>     MPI_Barrier(MPI_COMM_WORLD);
>>     MPI_Finalize();
>>     return 0;
>> }
>>
>>
>>
>> The surgeon general advises you to eat right, exercise regularly and quit
>> ageing.
>>
>> On Sun, May 15, 2016 at 10:49 AM, Gilles Gouaillardet <
>> <javascript:_e(%7B%7D,'cvml','gilles.gouaillar...@gmail.com');>
>> gilles.gouaillar...@gmail.com
>> <javascript:_e(%7B%7D,'cvml','gilles.gouaillar...@gmail.com');>> wrote:
>>
>>> At first glance, that seems a bit odd...
>>> are you sure you correctly print the reachable bitmap ?
>>> I would suggest you add some instrumentation to understand what happens
>>> (e.g., printf before opal_bitmap_set_bit() and other places that prevent
>>> this from happening)
>>>
>>> one more thing ...
>>> now, master default behavior is
>>> mpirun --mca mpi_add_procs_cutoff 0 ...
>>> you might want to try
>>> mpirun --mca mpi_add_procs_cutoff 1024 ...
>>> and see if things make more sense.
>>> if it helps, and iirc, there is a parameter so a btl can report it does
>>> not support cutoff.
>>>
>>>
>>> Cheers,
>>>
>>> Gilles
>>>
>>> On Sunday, May 15, 2016, dpchoudh . <
>>> <javascript:_e(%7B%7D,'cvml','dpcho...@gmail.com');>dpcho...@gmail.com
>>> <javascript:_e(%7B%7D,'cvml','dpcho...@gmail.com');>> wrote:
>>>
>>>> Hello Gilles
>>>>
>>>> Thanks for jumping in to help again. Actually, I had already tried some
>>>> of your suggestions before asking for help.
>>>>
>>>> I have several interconnects that can run both openib and tcp BTL. To
>>>> simplify things, I explicitly mentioned TCP:
>>>>
>>>> mpirun -np 2 -hostfile ~/hostfile -mca pml ob1 -mca btl self.tcp
>>>> ./mpitest
>>>>
>>>> where mpitest is a small program that does MPI_Send()/MPI_Recv() on a
>>>> small string, and then does an MPI_Barrier(). The program does work as
>>>> expected.
>>>>
>>>> I put a printf on the last line of mca_tcp_add_procs() to print the
>>>> value of 'reachable'. What I saw was that the value was always 0 when it
>>>> was invoked for Send()/Recv() and the pointer itself was NULL when invoked
>>>> for Barrier()
>>>>
>>>> Next I looked at pml_ob1_add_procs(), where the call chain starts, and
>>>> found that it initializes and passes an opal_bitmap_t reachable down the
>>>> call chain, but the resulting value is not used later in the code (the
>>>> memory is simply freed later).
>>>>
>>>> That, coupled with the fact that I am trying to imitate what the other
>>>> BTL implementations are doing, yet in mca_bml_r2_endpoint_add_btl() by BTL
>>>> is not being picked up, left me puzzled. Please note that the interconnect
>>>> that I am developing for is on a different cluster (than where I ran the
>>>> above test for TCP BTL.)
>>>>
>>>> Thanks again
>>>> Durga
>>>>
>>>> The surgeon general advises you to eat right, exercise regularly and
>>>> quit ageing.
>>>>
>>>> On Sun, May 15, 2016 at 10:20 AM, Gilles Gouaillardet <
>>>> gilles.gouaillar...@gmail.com
>>>> <javascript:_e(%7B%7D,'cvml','gilles.gouaillar...@gmail.com');>> wrote:
>>>>
>>>>> did you check the add_procs callbacks ?
>>>>> (e.g. mca_btl_tcp_add_procs() for the tcp btl)
>>>>> this is where the reachable bitmap is set, and I guess this is what
>>>>> you are looking for.
>>>>>
>>>>> keep in mind that if several btl can be used, the one with the higher
>>>>> exclusivity is used
>>>>> (e.g. tcp is never used if openib is available)
>>>>> you can simply force your btl and self, and the ob1 pml, so you do not
>>>>> have to worry about other btl exclusivity.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Gilles
>>>>>
>>>>>
>>>>> On Sunday, May 15, 2016, dpchoudh . <dpcho...@gmail.com
>>>>> <javascript:_e(%7B%7D,'cvml','dpcho...@gmail.com');>> wrote:
>>>>>
>>>>>> Hello all
>>>>>>
>>>>>> I have been struggling with this issue for a while and figured it
>>>>>> might be a good idea to ask for help.
>>>>>>
>>>>>> Where (in the code path) is the connectivity map created?
>>>>>>
>>>>>> I can see that it is *used* in mca_bml_r2_endpoint_add_btl(), but
>>>>>> obviously I am not setting it up right, because this routine is not 
>>>>>> finding
>>>>>> the BTL corresponding to my interconnect.
>>>>>>
>>>>>> Thanks in advance
>>>>>> Durga
>>>>>>
>>>>>> The surgeon general advises you to eat right, exercise regularly and
>>>>>> quit ageing.
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> de...@open-mpi.org
>>>>> Subscription: <https://www.open-mpi.org/mailman/listinfo.cgi/devel>
>>>>> https://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> Link to this post:
>>>>> <http://www.open-mpi.org/community/lists/devel/2016/05/18975.php>
>>>>> http://www.open-mpi.org/community/lists/devel/2016/05/18975.php
>>>>>
>>>>
>>>>
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org <javascript:_e(%7B%7D,'cvml','de...@open-mpi.org');>
>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/devel/2016/05/18977.php
>>>
>>
>>
>
>
> _______________________________________________
> devel mailing listde...@open-mpi.org 
> <javascript:_e(%7B%7D,'cvml','de...@open-mpi.org');>
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2016/05/18979.php
>
>
>

Reply via email to