[OMPI devel] Valgrind warning in MPI_Win_allocate[_shared]()

2014-09-28 Thread Lisandro Dalcin
Just built 1.8.3 for another round of testing with mpi4py. I'm getting
the following valgrind warning:

==4718== Conditional jump or move depends on uninitialised value(s)
==4718==at 0xD0D9F4C: component_select (osc_sm_component.c:333)
==4718==by 0x4CF44F6: ompi_osc_base_select (osc_base_init.c:73)
==4718==by 0x4C68B69: ompi_win_allocate (win.c:182)
==4718==by 0x4CBB8C2: PMPI_Win_allocate (pwin_allocate.c:79)
==4718==by 0x400898: main (in /home/dalcinl/Devel/BUGS-MPI/openmpi/a.out)

The offending code is in ompi/mca/osc/sm/osc_sm_component.c, it seems
you forgot to initialize the "blocking_fence" to a default true or
false value.

bool blocking_fence;
int flag;

if (OMPI_SUCCESS != ompi_info_get_bool(info, "blocking_fence",
   &blocking_fence, &flag)) {
goto error;
}

if (blocking_fence) {


-- 
Lisandro Dalcin

Research Scientist
Computer, Electrical and Mathematical Sciences & Engineering (CEMSE)
Numerical Porous Media Center (NumPor)
King Abdullah University of Science and Technology (KAUST)
http://numpor.kaust.edu.sa/

4700 King Abdullah University of Science and Technology
al-Khawarizmi Bldg (Bldg 1), Office # 4332
Thuwal 23955-6900, Kingdom of Saudi Arabia
http://www.kaust.edu.sa

Office Phone: +966 12 808-0459


Re: [OMPI devel] Different behaviour with MPI_IN_PLACE in MPI_Reduce_scatter() and MPI_Ireduce_scatter()

2014-09-28 Thread Lisandro Dalcin
On 22 April 2014 03:02, George Bosilca  wrote:
> Btw, the proposed validator was incorrect the first printf instead of
>
>  printf(“[%d] rbuf[%d]=%2d  expected:%2d\n”, rank, 0, recvbuf[i], size);
>
> should be
>
>  printf(“[%d] rbuf[%d]=%2d  expected:%2d\n”, rank, 0, recvbuf[0], size);
>

I'm testing this with 1.8.3 after fixed the my incorrect printf, and
still get different results (and the nbcoll one is wrong) using one
process (for two or more everything's OK).

$ mpicc -DNBCOLL=0 ireduce_scatter.c && mpiexec -n 1 ./a.out
[0] rbuf[0]= 1  expected: 1

$ mpicc -DNBCOLL=1 ireduce_scatter.c && mpiexec -n 1 ./a.out
[0] rbuf[0]=60  expected: 1


-- 
Lisandro Dalcin

Research Scientist
Computer, Electrical and Mathematical Sciences & Engineering (CEMSE)
Numerical Porous Media Center (NumPor)
King Abdullah University of Science and Technology (KAUST)
http://numpor.kaust.edu.sa/

4700 King Abdullah University of Science and Technology
al-Khawarizmi Bldg (Bldg 1), Office # 4332
Thuwal 23955-6900, Kingdom of Saudi Arabia
http://www.kaust.edu.sa

Office Phone: +966 12 808-0459


Re: [OMPI devel] Neighbor collectives with periodic Cartesian topologies of size one

2014-09-28 Thread Lisandro Dalcin
On 25 September 2014 20:50, Nathan Hjelm  wrote:
> On Tue, Aug 26, 2014 at 07:03:24PM +0300, Lisandro Dalcin wrote:
>> I finally managed to track down some issues in mpi4py's test suite
>> using Open MPI 1.8+. The code below should be enough to reproduce the
>> problem. Run it under valgrind to make sense of my following
>> diagnostics.
>>
>> In this code I'm creating a 2D, periodic Cartesian topology out of
>> COMM_SELF. In this case, the process in COMM_SELF has 4 logical in/out
>> links to itself. So we have size=1 but indegree=outdegree=4. However,
>> in ompi/mca/coll/basic/coll_basic_module.c, "size * 2" request are
>> being allocated to manage communication:
>>
>> if (OMPI_COMM_IS_INTER(comm)) {
>> size = ompi_comm_remote_size(comm);
>> } else {
>> size = ompi_comm_size(comm);
>> }
>> basic_module->mccb_num_reqs = size * 2;
>> basic_module->mccb_reqs = (ompi_request_t**)
>> malloc(sizeof(ompi_request_t *) * basic_module->mccb_num_reqs);
>>
>> I guess you have to also special-case for topologies and allocate
>> indegree+outdegree requests (not sure about this number, just
>> guessing).
>>
>
> I wish this was possible but the topology information is not available
> at that point. We may be able to change that but I don't see the work
> completing anytime soon. I committed an alternative fix as r32796 and
> CMR'd it to 1.8.3. I can confirm that the attached reproducer no longer
> produces a SEGV. Let me know if you run into any more issues.
>

Did your fix get in for 1.8.3? I'm still getting the segfault.



-- 
Lisandro Dalcin

Research Scientist
Computer, Electrical and Mathematical Sciences & Engineering (CEMSE)
Numerical Porous Media Center (NumPor)
King Abdullah University of Science and Technology (KAUST)
http://numpor.kaust.edu.sa/

4700 King Abdullah University of Science and Technology
al-Khawarizmi Bldg (Bldg 1), Office # 4332
Thuwal 23955-6900, Kingdom of Saudi Arabia
http://www.kaust.edu.sa

Office Phone: +966 12 808-0459


Re: [OMPI devel] Different behaviour with MPI_IN_PLACE in MPI_Reduce_scatter() and MPI_Ireduce_scatter()

2014-09-28 Thread George Bosilca
Lisandro,

Good catch. Indeed the MPI_Ireduce_scatter was not covering the case where
MPI_IN_PLACE was used over a communicator with a single participant. I
pushed a patch and schedule it for 1.8.4. Check
https://svn.open-mpi.org/trac/ompi/ticket/4924 for more info.

Thanks,
  George.


On Sun, Sep 28, 2014 at 6:29 AM, Lisandro Dalcin  wrote:

> On 22 April 2014 03:02, George Bosilca  wrote:
> > Btw, the proposed validator was incorrect the first printf instead of
> >
> >  printf(“[%d] rbuf[%d]=%2d  expected:%2d\n”, rank, 0, recvbuf[i], size);
> >
> > should be
> >
> >  printf(“[%d] rbuf[%d]=%2d  expected:%2d\n”, rank, 0, recvbuf[0], size);
> >
>
> I'm testing this with 1.8.3 after fixed the my incorrect printf, and
> still get different results (and the nbcoll one is wrong) using one
> process (for two or more everything's OK).
>
> $ mpicc -DNBCOLL=0 ireduce_scatter.c && mpiexec -n 1 ./a.out
> [0] rbuf[0]= 1  expected: 1
>
> $ mpicc -DNBCOLL=1 ireduce_scatter.c && mpiexec -n 1 ./a.out
> [0] rbuf[0]=60  expected: 1
>
>
> --
> Lisandro Dalcin
> 
> Research Scientist
> Computer, Electrical and Mathematical Sciences & Engineering (CEMSE)
> Numerical Porous Media Center (NumPor)
> King Abdullah University of Science and Technology (KAUST)
> http://numpor.kaust.edu.sa/
>
> 4700 King Abdullah University of Science and Technology
> al-Khawarizmi Bldg (Bldg 1), Office # 4332
> Thuwal 23955-6900, Kingdom of Saudi Arabia
> http://www.kaust.edu.sa
>
> Office Phone: +966 12 808-0459
>