Re: [OMPI devel] -mca coll "ml" cause segv or hangs with different command lines.

2014-03-04 Thread Rolf vandeVaart
I am still seeing the same issue where I get some type of segv unless I disable 
the coll ml component.  This may be an issue at my end, but just thought I 
would double check that we are sure this is fixed.
Thanks,
Rolf

>-Original Message-
>From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Hjelm,
>Nathan T
>Sent: Tuesday, March 04, 2014 2:34 PM
>To: Open MPI Developers
>Subject: Re: [OMPI devel] -mca coll "ml" cause segv or hangs with different
>command lines.
>
>There was a rounding issue in basesmuma. If the control data happened to be
>less than a page then we were trying to allocate 0 bytes. It should be fixed on
>the trunk and has been CMR'ed to 1.7.5
>
>-Nathan
>
>Please excuse the horrible Outlook-style quoting. OWA sucks.
>
>
>From: devel [devel-boun...@open-mpi.org] on behalf of Mike Dubman
>[mi...@dev.mellanox.co.il]
>Sent: Tuesday, March 04, 2014 7:04 AM
>To: Open MPI Developers
>Subject: Re: [OMPI devel] -mca coll "ml" cause segv or hangs with different
>command lines.
>
>Hi,
>
>coll/hcoll is Mellanox driven collective package.
>coll/ml is managed/supported/developed by ORNL folks.
>
>
>On Tue, Mar 4, 2014 at 1:06 PM, Ralph Castain <rhc@open-
>mpi.org<mailto:r...@open-mpi.org>> wrote:
>Ummm...the "ml" stands for Mellanox. This is a component you folks
>contributed at some time. IIRC, the hcoll and/or bcol are meant to replace it,
>but you folks would know best what to do with it.
>
>
>
>On Tue, Mar 4, 2014 at 12:12 AM, Elena Elkina
><elena.elk...@itseez.com<mailto:elena.elk...@itseez.com>> wrote:
>Hi,
>
>Recently I often meet hangs and seg faults with different command lines and
>there are "ml" functions in the stack trace.
>When I just turn "ml" off by do -mca coll ^ml, problems disappear.
>For example,
>oshrun -np 4 --map-by node --display-map  ./ring_oshmem fails with seg fault
>while oshrun -np 4 --map-by node --display-map -mca coll ^ml ./ring_oshmem
>passes.
>
>The "ml" priority is low (27), but it could have issues during comm_query (it
>does all initialization staff there).
>
>"Ml" is unreliable component. So It may be reasonable do not to build this
>component by default to avoid such problems.
>
>What do you think?
>
>Best regards,
>Elena
>
>___
>devel mailing list
>de...@open-mpi.org<mailto:de...@open-mpi.org>
>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>Searchable archives: http://www.open-
>mpi.org/community/lists/devel/2014/03/date.php
>
>
>___
>devel mailing list
>de...@open-mpi.org<mailto:de...@open-mpi.org>
>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>Searchable archives: http://www.open-
>mpi.org/community/lists/devel/2014/03/date.php
>
>___
>devel mailing list
>de...@open-mpi.org
>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>Searchable archives: http://www.open-
>mpi.org/community/lists/devel/2014/03/index.php
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---


Re: [OMPI devel] -mca coll "ml" cause segv or hangs with different command lines.

2014-03-04 Thread Hjelm, Nathan T
There was a rounding issue in basesmuma. If the control data happened to be 
less than a page then we were trying to allocate 0 bytes. It should be fixed on 
the trunk and has been CMR'ed to 1.7.5

-Nathan

Please excuse the horrible Outlook-style quoting. OWA sucks.


From: devel [devel-boun...@open-mpi.org] on behalf of Mike Dubman 
[mi...@dev.mellanox.co.il]
Sent: Tuesday, March 04, 2014 7:04 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] -mca coll "ml" cause segv or hangs with different 
command lines.

Hi,

coll/hcoll is Mellanox driven collective package.
coll/ml is managed/supported/developed by ORNL folks.


On Tue, Mar 4, 2014 at 1:06 PM, Ralph Castain 
<r...@open-mpi.org<mailto:r...@open-mpi.org>> wrote:
Ummm...the "ml" stands for Mellanox. This is a component you folks contributed 
at some time. IIRC, the hcoll and/or bcol are meant to replace it, but you 
folks would know best what to do with it.



On Tue, Mar 4, 2014 at 12:12 AM, Elena Elkina 
<elena.elk...@itseez.com<mailto:elena.elk...@itseez.com>> wrote:
Hi,

Recently I often meet hangs and seg faults with different command lines and 
there are "ml" functions in the stack trace.
When I just turn "ml" off by do -mca coll ^ml, problems disappear.
For example,
oshrun -np 4 --map-by node --display-map  ./ring_oshmem
fails with seg fault while
oshrun -np 4 --map-by node --display-map -mca coll ^ml ./ring_oshmem
passes.

The "ml" priority is low (27), but it could have issues during comm_query (it 
does all initialization staff there).

"Ml" is unreliable component. So It may be reasonable do not to build this 
component by default to avoid such problems.

What do you think?

Best regards,
Elena

___
devel mailing list
de...@open-mpi.org<mailto:de...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Searchable archives: 
http://www.open-mpi.org/community/lists/devel/2014/03/date.php


___
devel mailing list
de...@open-mpi.org<mailto:de...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Searchable archives: 
http://www.open-mpi.org/community/lists/devel/2014/03/date.php



Re: [OMPI devel] -mca coll "ml" cause segv or hangs with different command lines.

2014-03-04 Thread Mike Dubman
Hi,

coll/hcoll is Mellanox driven collective package.
coll/ml is managed/supported/developed by ORNL folks.


On Tue, Mar 4, 2014 at 1:06 PM, Ralph Castain  wrote:

> Ummm...the "ml" stands for Mellanox. This is a component you folks
> contributed at some time. IIRC, the hcoll and/or bcol are meant to replace
> it, but you folks would know best what to do with it.
>
>
>
> On Tue, Mar 4, 2014 at 12:12 AM, Elena Elkina wrote:
>
>> Hi,
>>
>> Recently I often meet hangs and seg faults with different command lines
>> and there are "ml" functions in the stack trace.
>> When I just turn "ml" off by do -mca coll ^ml, problems disappear.
>> For example,
>> oshrun -np 4 --map-by node --display-map  ./ring_oshmem
>> fails with seg fault while
>> oshrun -np 4 --map-by node --display-map -mca coll ^ml ./ring_oshmem
>> passes.
>>
>> The "ml" priority is low (27), but it could have issues during comm_query
>> (it does all initialization staff there).
>>
>> "Ml" is unreliable component. So It may be reasonable do not to build
>> this component by default to avoid such problems.
>>
>> What do you think?
>>
>> Best regards,
>> Elena
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Searchable archives:
>> http://www.open-mpi.org/community/lists/devel/2014/03/date.php
>>
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Searchable archives:
> http://www.open-mpi.org/community/lists/devel/2014/03/date.php
>


Re: [OMPI devel] -mca coll "ml" cause segv or hangs with different command lines.

2014-03-04 Thread Ralph Castain
Ummm...the "ml" stands for Mellanox. This is a component you folks
contributed at some time. IIRC, the hcoll and/or bcol are meant to replace
it, but you folks would know best what to do with it.



On Tue, Mar 4, 2014 at 12:12 AM, Elena Elkina wrote:

> Hi,
>
> Recently I often meet hangs and seg faults with different command lines
> and there are "ml" functions in the stack trace.
> When I just turn "ml" off by do -mca coll ^ml, problems disappear.
> For example,
> oshrun -np 4 --map-by node --display-map  ./ring_oshmem
> fails with seg fault while
> oshrun -np 4 --map-by node --display-map -mca coll ^ml ./ring_oshmem
> passes.
>
> The "ml" priority is low (27), but it could have issues during comm_query
> (it does all initialization staff there).
>
> "Ml" is unreliable component. So It may be reasonable do not to build this
> component by default to avoid such problems.
>
> What do you think?
>
> Best regards,
> Elena
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Searchable archives:
> http://www.open-mpi.org/community/lists/devel/2014/03/date.php
>


[OMPI devel] -mca coll "ml" cause segv or hangs with different command lines.

2014-03-04 Thread Elena Elkina
Hi,

Recently I often meet hangs and seg faults with different command lines and
there are "ml" functions in the stack trace.
When I just turn "ml" off by do -mca coll ^ml, problems disappear.
For example,
oshrun -np 4 --map-by node --display-map  ./ring_oshmem
fails with seg fault while
oshrun -np 4 --map-by node --display-map -mca coll ^ml ./ring_oshmem
passes.

The "ml" priority is low (27), but it could have issues during comm_query
(it does all initialization staff there).

"Ml" is unreliable component. So It may be reasonable do not to build this
component by default to avoid such problems.

What do you think?

Best regards,
Elena