Re: [OMPI devel] Process connectivity map

2016-05-15 Thread Gilles Gouaillardet
did you check the add_procs callbacks ?
(e.g. mca_btl_tcp_add_procs() for the tcp btl)
this is where the reachable bitmap is set, and I guess this is what you are
looking for.

keep in mind that if several btl can be used, the one with the higher
exclusivity is used
(e.g. tcp is never used if openib is available)
you can simply force your btl and self, and the ob1 pml, so you do not have
to worry about other btl exclusivity.

Cheers,

Gilles

On Sunday, May 15, 2016, dpchoudh .  wrote:

> Hello all
>
> I have been struggling with this issue for a while and figured it might be
> a good idea to ask for help.
>
> Where (in the code path) is the connectivity map created?
>
> I can see that it is *used* in mca_bml_r2_endpoint_add_btl(), but
> obviously I am not setting it up right, because this routine is not finding
> the BTL corresponding to my interconnect.
>
> Thanks in advance
> Durga
>
> The surgeon general advises you to eat right, exercise regularly and quit
> ageing.
>


Re: [OMPI devel] Process connectivity map

2016-05-15 Thread dpchoudh .
Hello Gilles

Thanks for jumping in to help again. Actually, I had already tried some of
your suggestions before asking for help.

I have several interconnects that can run both openib and tcp BTL. To
simplify things, I explicitly mentioned TCP:

mpirun -np 2 -hostfile ~/hostfile -mca pml ob1 -mca btl self.tcp ./mpitest

where mpitest is a small program that does MPI_Send()/MPI_Recv() on a small
string, and then does an MPI_Barrier(). The program does work as expected.

I put a printf on the last line of mca_tcp_add_procs() to print the value
of 'reachable'. What I saw was that the value was always 0 when it was
invoked for Send()/Recv() and the pointer itself was NULL when invoked for
Barrier()

Next I looked at pml_ob1_add_procs(), where the call chain starts, and
found that it initializes and passes an opal_bitmap_t reachable down the
call chain, but the resulting value is not used later in the code (the
memory is simply freed later).

That, coupled with the fact that I am trying to imitate what the other BTL
implementations are doing, yet in mca_bml_r2_endpoint_add_btl() by BTL is
not being picked up, left me puzzled. Please note that the interconnect
that I am developing for is on a different cluster (than where I ran the
above test for TCP BTL.)

Thanks again
Durga

The surgeon general advises you to eat right, exercise regularly and quit
ageing.

On Sun, May 15, 2016 at 10:20 AM, Gilles Gouaillardet <
gilles.gouaillar...@gmail.com> wrote:

> did you check the add_procs callbacks ?
> (e.g. mca_btl_tcp_add_procs() for the tcp btl)
> this is where the reachable bitmap is set, and I guess this is what you
> are looking for.
>
> keep in mind that if several btl can be used, the one with the higher
> exclusivity is used
> (e.g. tcp is never used if openib is available)
> you can simply force your btl and self, and the ob1 pml, so you do not
> have to worry about other btl exclusivity.
>
> Cheers,
>
> Gilles
>
>
> On Sunday, May 15, 2016, dpchoudh .  wrote:
>
>> Hello all
>>
>> I have been struggling with this issue for a while and figured it might
>> be a good idea to ask for help.
>>
>> Where (in the code path) is the connectivity map created?
>>
>> I can see that it is *used* in mca_bml_r2_endpoint_add_btl(), but
>> obviously I am not setting it up right, because this routine is not finding
>> the BTL corresponding to my interconnect.
>>
>> Thanks in advance
>> Durga
>>
>> The surgeon general advises you to eat right, exercise regularly and quit
>> ageing.
>>
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2016/05/18975.php
>


Re: [OMPI devel] Process connectivity map

2016-05-15 Thread Gilles Gouaillardet
At first glance, that seems a bit odd...
are you sure you correctly print the reachable bitmap ?
I would suggest you add some instrumentation to understand what happens
(e.g., printf before opal_bitmap_set_bit() and other places that prevent
this from happening)

one more thing ...
now, master default behavior is
mpirun --mca mpi_add_procs_cutoff 0 ...
you might want to try
mpirun --mca mpi_add_procs_cutoff 1024 ...
and see if things make more sense.
if it helps, and iirc, there is a parameter so a btl can report it does not
support cutoff.


Cheers,

Gilles

On Sunday, May 15, 2016, dpchoudh .  wrote:

> Hello Gilles
>
> Thanks for jumping in to help again. Actually, I had already tried some of
> your suggestions before asking for help.
>
> I have several interconnects that can run both openib and tcp BTL. To
> simplify things, I explicitly mentioned TCP:
>
> mpirun -np 2 -hostfile ~/hostfile -mca pml ob1 -mca btl self.tcp ./mpitest
>
> where mpitest is a small program that does MPI_Send()/MPI_Recv() on a
> small string, and then does an MPI_Barrier(). The program does work as
> expected.
>
> I put a printf on the last line of mca_tcp_add_procs() to print the value
> of 'reachable'. What I saw was that the value was always 0 when it was
> invoked for Send()/Recv() and the pointer itself was NULL when invoked for
> Barrier()
>
> Next I looked at pml_ob1_add_procs(), where the call chain starts, and
> found that it initializes and passes an opal_bitmap_t reachable down the
> call chain, but the resulting value is not used later in the code (the
> memory is simply freed later).
>
> That, coupled with the fact that I am trying to imitate what the other BTL
> implementations are doing, yet in mca_bml_r2_endpoint_add_btl() by BTL is
> not being picked up, left me puzzled. Please note that the interconnect
> that I am developing for is on a different cluster (than where I ran the
> above test for TCP BTL.)
>
> Thanks again
> Durga
>
> The surgeon general advises you to eat right, exercise regularly and quit
> ageing.
>
> On Sun, May 15, 2016 at 10:20 AM, Gilles Gouaillardet <
> gilles.gouaillar...@gmail.com
> > wrote:
>
>> did you check the add_procs callbacks ?
>> (e.g. mca_btl_tcp_add_procs() for the tcp btl)
>> this is where the reachable bitmap is set, and I guess this is what you
>> are looking for.
>>
>> keep in mind that if several btl can be used, the one with the higher
>> exclusivity is used
>> (e.g. tcp is never used if openib is available)
>> you can simply force your btl and self, and the ob1 pml, so you do not
>> have to worry about other btl exclusivity.
>>
>> Cheers,
>>
>> Gilles
>>
>>
>> On Sunday, May 15, 2016, dpchoudh . > > wrote:
>>
>>> Hello all
>>>
>>> I have been struggling with this issue for a while and figured it might
>>> be a good idea to ask for help.
>>>
>>> Where (in the code path) is the connectivity map created?
>>>
>>> I can see that it is *used* in mca_bml_r2_endpoint_add_btl(), but
>>> obviously I am not setting it up right, because this routine is not finding
>>> the BTL corresponding to my interconnect.
>>>
>>> Thanks in advance
>>> Durga
>>>
>>> The surgeon general advises you to eat right, exercise regularly and
>>> quit ageing.
>>>
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org 
>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2016/05/18975.php
>>
>
>


Re: [OMPI devel] Process connectivity map

2016-05-15 Thread dpchoudh .
Hello Gilles

Setting -mca mpi_add_procs_cutoff 1024 indeed makes a difference to the
output, as follows:

With -mca mpi_add_procs_cutoff 1024:
reachable = 0x1
(Note that add_procs was called once and the value of 'reachable is
correct')

Without -mca mpi_add_procs_cutoff 1024
reachable = 0x0
reachable = NULL
reachable = NULL
(Note that add_procs() was caklled three times and the value of 'reachable'
seems wrong.

The program does run correctly in either case. The program listing is as
below (note that I have removed output from the program itself in the above
reporting.)

The code that prints 'reachable' is as follows:

if (reachable == NULL)
printf("reachable = NULL\n");
else
{
int i;
printf("reachable = ");
for (i = 0; i < reachable->array_size; i++)
printf("\t0x%llu", reachable->bitmap[i]);
printf("\n\n");
}
return OPAL_SUCCESS;

And the code for the test program is as follows:

#include 
#include 
#include 
#include 

int main(int argc, char *argv[])
{
int world_size, world_rank, name_len;
char hostname[MPI_MAX_PROCESSOR_NAME], buf[8];

MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
MPI_Get_processor_name(hostname, &name_len);
printf("Hello world from processor %s, rank %d out of %d processors\n",
hostname, world_rank, world_size);
if (world_rank == 1)
{
MPI_Recv(buf, 6, MPI_CHAR, 0, 99, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
printf("%s received %s, rank %d\n", hostname, buf, world_rank);
}
else
{
strcpy(buf, "haha!");
MPI_Send(buf, 6, MPI_CHAR, 1, 99, MPI_COMM_WORLD);
printf("%s sent %s, rank %d\n", hostname, buf, world_rank);
}
MPI_Barrier(MPI_COMM_WORLD);
MPI_Finalize();
return 0;
}



The surgeon general advises you to eat right, exercise regularly and quit
ageing.

On Sun, May 15, 2016 at 10:49 AM, Gilles Gouaillardet <
gilles.gouaillar...@gmail.com> wrote:

> At first glance, that seems a bit odd...
> are you sure you correctly print the reachable bitmap ?
> I would suggest you add some instrumentation to understand what happens
> (e.g., printf before opal_bitmap_set_bit() and other places that prevent
> this from happening)
>
> one more thing ...
> now, master default behavior is
> mpirun --mca mpi_add_procs_cutoff 0 ...
> you might want to try
> mpirun --mca mpi_add_procs_cutoff 1024 ...
> and see if things make more sense.
> if it helps, and iirc, there is a parameter so a btl can report it does
> not support cutoff.
>
>
> Cheers,
>
> Gilles
>
> On Sunday, May 15, 2016, dpchoudh .  wrote:
>
>> Hello Gilles
>>
>> Thanks for jumping in to help again. Actually, I had already tried some
>> of your suggestions before asking for help.
>>
>> I have several interconnects that can run both openib and tcp BTL. To
>> simplify things, I explicitly mentioned TCP:
>>
>> mpirun -np 2 -hostfile ~/hostfile -mca pml ob1 -mca btl self.tcp ./mpitest
>>
>> where mpitest is a small program that does MPI_Send()/MPI_Recv() on a
>> small string, and then does an MPI_Barrier(). The program does work as
>> expected.
>>
>> I put a printf on the last line of mca_tcp_add_procs() to print the value
>> of 'reachable'. What I saw was that the value was always 0 when it was
>> invoked for Send()/Recv() and the pointer itself was NULL when invoked for
>> Barrier()
>>
>> Next I looked at pml_ob1_add_procs(), where the call chain starts, and
>> found that it initializes and passes an opal_bitmap_t reachable down the
>> call chain, but the resulting value is not used later in the code (the
>> memory is simply freed later).
>>
>> That, coupled with the fact that I am trying to imitate what the other
>> BTL implementations are doing, yet in mca_bml_r2_endpoint_add_btl() by BTL
>> is not being picked up, left me puzzled. Please note that the interconnect
>> that I am developing for is on a different cluster (than where I ran the
>> above test for TCP BTL.)
>>
>> Thanks again
>> Durga
>>
>> The surgeon general advises you to eat right, exercise regularly and quit
>> ageing.
>>
>> On Sun, May 15, 2016 at 10:20 AM, Gilles Gouaillardet <
>> gilles.gouaillar...@gmail.com> wrote:
>>
>>> did you check the add_procs callbacks ?
>>> (e.g. mca_btl_tcp_add_procs() for the tcp btl)
>>> this is where the reachable bitmap is set, and I guess this is what you
>>> are looking for.
>>>
>>> keep in mind that if several btl can be used, the one with the higher
>>> exclusivity is used
>>> (e.g. tcp is never used if openib is available)
>>> you can simply force your btl and self, and the ob1 pml, so you do not
>>> have to worry about other btl exclusivity.
>>>
>>> Cheers,
>>>
>>> Gilles
>>>
>>>
>>> On Sunday, May 15, 2016, dpchoudh .  wrote:
>>>
 Hello all

 I have been struggling with this issue for a while and figured it might
 be a good idea to ask for help.

 Where (in the code path) is the connectivity map created?

 I can see that it is *use

Re: [OMPI devel] Process connectivity map

2016-05-15 Thread dpchoudh .
Sorry, I accidentally pressed 'Send' before I was done writing the last
mail. What I wanted to ask was what is the parameter mpi_add_procs_cutoff
and why adding it seems to make a difference in the code path but not in
the end result of the program? How would it help me debug my problem?

Thank you
Durga

The surgeon general advises you to eat right, exercise regularly and quit
ageing.

On Sun, May 15, 2016 at 11:17 AM, dpchoudh .  wrote:

> Hello Gilles
>
> Setting -mca mpi_add_procs_cutoff 1024 indeed makes a difference to the
> output, as follows:
>
> With -mca mpi_add_procs_cutoff 1024:
> reachable = 0x1
> (Note that add_procs was called once and the value of 'reachable is
> correct')
>
> Without -mca mpi_add_procs_cutoff 1024
> reachable = 0x0
> reachable = NULL
> reachable = NULL
> (Note that add_procs() was caklled three times and the value of
> 'reachable' seems wrong.
>
> The program does run correctly in either case. The program listing is as
> below (note that I have removed output from the program itself in the above
> reporting.)
>
> The code that prints 'reachable' is as follows:
>
> if (reachable == NULL)
> printf("reachable = NULL\n");
> else
> {
> int i;
> printf("reachable = ");
> for (i = 0; i < reachable->array_size; i++)
> printf("\t0x%llu", reachable->bitmap[i]);
> printf("\n\n");
> }
> return OPAL_SUCCESS;
>
> And the code for the test program is as follows:
>
> #include 
> #include 
> #include 
> #include 
>
> int main(int argc, char *argv[])
> {
> int world_size, world_rank, name_len;
> char hostname[MPI_MAX_PROCESSOR_NAME], buf[8];
>
> MPI_Init(&argc, &argv);
> MPI_Comm_size(MPI_COMM_WORLD, &world_size);
> MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
> MPI_Get_processor_name(hostname, &name_len);
> printf("Hello world from processor %s, rank %d out of %d
> processors\n", hostname, world_rank, world_size);
> if (world_rank == 1)
> {
> MPI_Recv(buf, 6, MPI_CHAR, 0, 99, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
> printf("%s received %s, rank %d\n", hostname, buf, world_rank);
> }
> else
> {
> strcpy(buf, "haha!");
> MPI_Send(buf, 6, MPI_CHAR, 1, 99, MPI_COMM_WORLD);
> printf("%s sent %s, rank %d\n", hostname, buf, world_rank);
> }
> MPI_Barrier(MPI_COMM_WORLD);
> MPI_Finalize();
> return 0;
> }
>
>
>
> The surgeon general advises you to eat right, exercise regularly and quit
> ageing.
>
> On Sun, May 15, 2016 at 10:49 AM, Gilles Gouaillardet <
> gilles.gouaillar...@gmail.com> wrote:
>
>> At first glance, that seems a bit odd...
>> are you sure you correctly print the reachable bitmap ?
>> I would suggest you add some instrumentation to understand what happens
>> (e.g., printf before opal_bitmap_set_bit() and other places that prevent
>> this from happening)
>>
>> one more thing ...
>> now, master default behavior is
>> mpirun --mca mpi_add_procs_cutoff 0 ...
>> you might want to try
>> mpirun --mca mpi_add_procs_cutoff 1024 ...
>> and see if things make more sense.
>> if it helps, and iirc, there is a parameter so a btl can report it does
>> not support cutoff.
>>
>>
>> Cheers,
>>
>> Gilles
>>
>> On Sunday, May 15, 2016, dpchoudh .  wrote:
>>
>>> Hello Gilles
>>>
>>> Thanks for jumping in to help again. Actually, I had already tried some
>>> of your suggestions before asking for help.
>>>
>>> I have several interconnects that can run both openib and tcp BTL. To
>>> simplify things, I explicitly mentioned TCP:
>>>
>>> mpirun -np 2 -hostfile ~/hostfile -mca pml ob1 -mca btl self.tcp
>>> ./mpitest
>>>
>>> where mpitest is a small program that does MPI_Send()/MPI_Recv() on a
>>> small string, and then does an MPI_Barrier(). The program does work as
>>> expected.
>>>
>>> I put a printf on the last line of mca_tcp_add_procs() to print the
>>> value of 'reachable'. What I saw was that the value was always 0 when it
>>> was invoked for Send()/Recv() and the pointer itself was NULL when invoked
>>> for Barrier()
>>>
>>> Next I looked at pml_ob1_add_procs(), where the call chain starts, and
>>> found that it initializes and passes an opal_bitmap_t reachable down the
>>> call chain, but the resulting value is not used later in the code (the
>>> memory is simply freed later).
>>>
>>> That, coupled with the fact that I am trying to imitate what the other
>>> BTL implementations are doing, yet in mca_bml_r2_endpoint_add_btl() by BTL
>>> is not being picked up, left me puzzled. Please note that the interconnect
>>> that I am developing for is on a different cluster (than where I ran the
>>> above test for TCP BTL.)
>>>
>>> Thanks again
>>> Durga
>>>
>>> The surgeon general advises you to eat right, exercise regularly and
>>> quit ageing.
>>>
>>> On Sun, May 15, 2016 at 10:20 AM, Gilles Gouaillardet <
>>> gilles.gouaillar...@gmail.com> wrote:
>>>
 did you check the add_procs callbacks ?
 (e.g. mca_btl_tcp_add_procs() for the tcp btl)
 this is where the reachable bitmap is set, and