Re: [OMPI devel] btl_openib.c:1200: mca_btl_openib_alloc: Assertion `qp != 255' failed

2015-01-20 Thread George Bosilca
On Tue, Jan 20, 2015 at 10:01 PM, Gilles Gouaillardet <
gilles.gouaillar...@iferc.org> wrote:

> 2) the mpi_test_suite uses a weird type (e.g. artificially send 20k
> integers to the wire when sending one
>  would produce the very same result)
> i briefly checked the mpi_test_suite source code, and the weird type is
> send/recv with buffers whose size
> is one element.
> i can only guess the authors wanted to send a large message to the wire
> (e.g. create traffic) without pointless
> large memory allocation.
> at this stage, i am tempted to conclude the authors did what they intended.
>

Receiving with such a datatype is illegal in MPI (sending is allowed as the
buffer is supposed read only during the operation). In fact having any
datatype that span over the same memory region twice is illegal to be used
for any receive operations. The reason is simple, an MPI implementation can
move the data in any order it wants, and as MPI guaranteed only the FIFO
ordering of the matching such a datatype will break the determinism of the
application.

We should ping the authors of the test code to address this.

  George.



>
> Cheers,
>
> Gilles
>
> On 2015/01/21 3:00, Jeff Squyres (jsquyres) wrote:
> > George is right -- Gilles: was this the correct solution?
> >
> > Put differently: the extent of the 20K vector created below is 4 (bytes).
> >
> >
> >
> >> On Jan 19, 2015, at 2:39 AM, George Bosilca 
> wrote:
> >>
> >> Btw,
> >>
> >> MPI_Type_hvector(2, 1, 0, MPI_INT, );
> >>
> >> Is just a weird datatype. Because the stride is 0, this datatype a
> memory layout that includes 2 times the same int. I'm not sure this was
> indeed intended...
> >>
> >>   George.
> >>
> >>
> >> On Mon, Jan 19, 2015 at 12:17 AM, Gilles Gouaillardet <
> gilles.gouaillar...@iferc.org> wrote:
> >> Adrian,
> >>
> >> i just fixed this in the master
> >> (
> https://github.com/open-mpi/ompi/commit/d14daf40d041f7a0a8e9d85b3bfd5eb570495fd2
> )
> >>
> >> the root cause is a corner case was not handled correctly :
> >>
> >> MPI_Type_hvector(2, 1, 0, MPI_INT, );
> >>
> >> type has extent = 4 *but* size = 8
> >> ob1 used to test only the extent to determine whether the message should
> >> be sent inlined or not
> >> extent <= 256 means try to send the message inline
> >> that meant a fragment of size 8 (which is greater than 65536 e.g.
> >> max default size for IB) was allocated,
> >> and that failed.
> >>
> >> now both extent and size are tested, so the message is not sent inline,
> >> and it just works.
> >>
> >> Cheers,
> >>
> >> Gilles
> >> ___
> >> devel mailing list
> >> de...@open-mpi.org
> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/01/16798.php
> >>
> >> ___
> >> devel mailing list
> >> de...@open-mpi.org
> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/01/16801.php
> >
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/01/16808.php
>


Re: [OMPI devel] btl_openib.c:1200: mca_btl_openib_alloc: Assertion `qp != 255' failed

2015-01-20 Thread Gilles Gouaillardet
Jeff,

There are two things here :

1) ompi was crashing, and this was fixed (now use type size instead of
type extent to figure out whether
the btl should try to send the message inline or not). and yes, George
is right (e.g. use size instead of extent
or both size and extent)

2) the mpi_test_suite uses a weird type (e.g. artificially send 20k
integers to the wire when sending one
 would produce the very same result)
i briefly checked the mpi_test_suite source code, and the weird type is
send/recv with buffers whose size
is one element.
i can only guess the authors wanted to send a large message to the wire
(e.g. create traffic) without pointless
large memory allocation.
at this stage, i am tempted to conclude the authors did what they intended.

Cheers,

Gilles

On 2015/01/21 3:00, Jeff Squyres (jsquyres) wrote:
> George is right -- Gilles: was this the correct solution?
>
> Put differently: the extent of the 20K vector created below is 4 (bytes).
>
>
>
>> On Jan 19, 2015, at 2:39 AM, George Bosilca  wrote:
>>
>> Btw,
>>
>> MPI_Type_hvector(2, 1, 0, MPI_INT, );
>>
>> Is just a weird datatype. Because the stride is 0, this datatype a memory 
>> layout that includes 2 times the same int. I'm not sure this was indeed 
>> intended...
>>
>>   George.
>>
>>
>> On Mon, Jan 19, 2015 at 12:17 AM, Gilles Gouaillardet 
>>  wrote:
>> Adrian,
>>
>> i just fixed this in the master
>> (https://github.com/open-mpi/ompi/commit/d14daf40d041f7a0a8e9d85b3bfd5eb570495fd2)
>>
>> the root cause is a corner case was not handled correctly :
>>
>> MPI_Type_hvector(2, 1, 0, MPI_INT, );
>>
>> type has extent = 4 *but* size = 8
>> ob1 used to test only the extent to determine whether the message should
>> be sent inlined or not
>> extent <= 256 means try to send the message inline
>> that meant a fragment of size 8 (which is greater than 65536 e.g.
>> max default size for IB) was allocated,
>> and that failed.
>>
>> now both extent and size are tested, so the message is not sent inline,
>> and it just works.
>>
>> Cheers,
>>
>> Gilles
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2015/01/16798.php
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2015/01/16801.php
>



Re: [OMPI devel] btl_openib.c:1200: mca_btl_openib_alloc: Assertion `qp != 255' failed

2015-01-20 Thread Adrian Reber
Using today's nightly snapshot (openmpi-dev-730-g06d3b57) both errors
are gone. Thanks!

On Mon, Jan 19, 2015 at 02:38:42PM +0900, Gilles Gouaillardet wrote:
> Adrian,
> 
> about the
> "[n050409][[36216,1],1][btl_openib_xrc.c:58:mca_btl_openib_xrc_check_api] XRC
> error: bad XRC API (require XRC from OFED pre 3.12). " message.
> 
> this means ompi was built on a system with OFED 3.12 or greater, and you
> are running on a system with an earlier OFED release.
> 
> please not Jeff recently pushed a patch related to that and this message
> might be a false positive.
> 
> Cheers,
> 
> Gilles
> 
> On 2015/01/19 14:17, Gilles Gouaillardet wrote:
> > Adrian,
> >
> > i just fixed this in the master
> > (https://github.com/open-mpi/ompi/commit/d14daf40d041f7a0a8e9d85b3bfd5eb570495fd2)
> >
> > the root cause is a corner case was not handled correctly :
> >
> > MPI_Type_hvector(2, 1, 0, MPI_INT, );
> >
> > type has extent = 4 *but* size = 8
> > ob1 used to test only the extent to determine whether the message should
> > be sent inlined or not
> > extent <= 256 means try to send the message inline
> > that meant a fragment of size 8 (which is greater than 65536 e.g.
> > max default size for IB) was allocated,
> > and that failed.
> >
> > now both extent and size are tested, so the message is not sent inline,
> > and it just works.
> >
> > Cheers,
> >
> > Gilles
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post: 
> > http://www.open-mpi.org/community/lists/devel/2015/01/16798.php
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/01/16799.php