Re: [OMPI devel] RFC: Remove --without-hwloc configure option

2015-09-04 Thread Brice Goglin
Le 04/09/2015 00:36, Gilles Gouaillardet a écrit :
> Ralph,
>
> just to be clear, your proposal is to abort if openmpi is configured
> with --without-hwloc, right ?
> ( the --with-hwloc option is not removed because we want to keep the
> option of using an external hwloc library )
>
> if I understand correctly, Paul's point is that if openmpi is ported
> to a new architecture for which hwloc has not been ported yet
> (embedded hwloc or external hwloc), then the very first step is to
> port hwloc before ompi can be built.
>
> did I get it right Paul ?
>
> Brice, what would happen in such a case ?
> embedded hwloc cannot be built ?
> hwloc returns little or no information ?

If it's a new operating system and it supports at least things like
sysconf, you will get a Machine object with one PUs per logical processor.

If it's a new platform running Linux, they are supposed to tell Linux at
least package/core/thread information. That's what we have for ARM for
instance.

Missing topology detection can be worked around easily (with XML and
synthetic description, what we did for BlueGene/Q before adding manual
support for that specific processor). Binding support can't.
And once you get binding, you get x86-topology even if the operating
system isn't supported (using cpuid).

> for example, on Fujitsu FX10 node (single socket, 16 cores), hwloc
> reports 16 sockets with one core each and no cache. though this is not
> correct, that can be seen as equivalent to the real config by ompi, so
> this is not really an issue for ompi.

Can you help fixing this?

The issue is indeed with supercomputers with uncommon architectures like
this one.

Brice


>
> Cheers,
>
> Gilles
>
> On Friday, September 4, 2015, Ralph Castain  > wrote:
>
> No - hwloc is embedded in OMPI anyway.
>
>> On Sep 3, 2015, at 11:09 AM, Paul Hargrove > > wrote:
>>
>>
>> On Thu, Sep 3, 2015 at 8:03 AM, Ralph Castain > > wrote:
>>
>> Does anyone know of a reason why we shouldn’t do this?
>>
>>
>>
>> Would doing this mean that a port to a new system would require
>> that one first perform a full hwloc port?
>>
>> -Paul
>>
>> -- 
>> Paul H. Hargrove  phhargr...@lbl.gov
>> 
>> Computer Languages & Systems Software (CLaSS) Group
>> Computer Science Department   Tel: +1-510-495-2352
>> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> 
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2015/09/17942.php
>
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/09/17952.php



Re: [OMPI devel] RFC: Remove --without-hwloc configure option

2015-09-04 Thread Gilles Gouaillardet

Thanks Brice,

bottom line, even if hwloc is not fully ported, it should build and ompi 
should get something usable.
in this case, i have no objection removing the --without-hwloc configure 
option.


you can contact me off-list regarding the FX10 specific issue

Cheers,

Gilles

On 9/4/2015 2:31 PM, Brice Goglin wrote:

Le 04/09/2015 00:36, Gilles Gouaillardet a écrit :

Ralph,

just to be clear, your proposal is to abort if openmpi is configured 
with --without-hwloc, right ?
( the --with-hwloc option is not removed because we want to keep the 
option of using an external hwloc library )


if I understand correctly, Paul's point is that if openmpi is ported 
to a new architecture for which hwloc has not been ported yet 
(embedded hwloc or external hwloc), then the very first step is to 
port hwloc before ompi can be built.


did I get it right Paul ?

Brice, what would happen in such a case ?
embedded hwloc cannot be built ?
hwloc returns little or no information ?


If it's a new operating system and it supports at least things like 
sysconf, you will get a Machine object with one PUs per logical processor.


If it's a new platform running Linux, they are supposed to tell Linux 
at least package/core/thread information. That's what we have for ARM 
for instance.


Missing topology detection can be worked around easily (with XML and 
synthetic description, what we did for BlueGene/Q before adding manual 
support for that specific processor). Binding support can't.
And once you get binding, you get x86-topology even if the operating 
system isn't supported (using cpuid).


for example, on Fujitsu FX10 node (single socket, 16 cores), hwloc 
reports 16 sockets with one core each and no cache. though this is 
not correct, that can be seen as equivalent to the real config by 
ompi, so this is not really an issue for ompi.


Can you help fixing this?

The issue is indeed with supercomputers with uncommon architectures 
like this one.


Brice




Cheers,

Gilles

On Friday, September 4, 2015, Ralph Castain > wrote:


No - hwloc is embedded in OMPI anyway.


On Sep 3, 2015, at 11:09 AM, Paul Hargrove > wrote:


On Thu, Sep 3, 2015 at 8:03 AM, Ralph Castain > wrote:

Does anyone know of a reason why we shouldn’t do this?



Would doing this mean that a port to a new system would require
that one first perform a full hwloc port?

-Paul

-- 
Paul H. Hargrove phhargr...@lbl.gov


Computer Languages & Systems Software (CLaSS) Group
Computer Science Department Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
de...@open-mpi.org

Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post:
http://www.open-mpi.org/community/lists/devel/2015/09/17942.php




___
devel mailing list
de...@open-mpi.org
Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this 
post:http://www.open-mpi.org/community/lists/devel/2015/09/17952.php




___
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2015/09/17958.php




[OMPI devel] RFC: Remove the --enable-mpi-profile option

2015-09-04 Thread Gilles Gouaillardet

Folks,

Jeff and i have been discussing the possibility of removing the 
--enable-mpi-profile option from ompi.

(see https://github.com/open-mpi/ompi/pull/845 for the details)

Removing this option would simplify the building process, and make it 
crystal clear that Fortran bindings call
the C PMPI_* bindings. From a tool point of view, that means a Fortran 
MPI call is wrapped only once in Fortran.
Currently, a Fortran MPI call is wrapped twice, once in Fortran and once 
in C.


We do not see any reason why someone would not want to build without the 
PMPI_* bindings for a production build.
That being said, the --disable-mpi-profile option can be useful to 
developers in order to build openmpi faster on a laptop running OSX. For 
example, and on my MacBook (recent but low voltage cpu with two core and 
two threads per core), the full build process (from autogen.pl to make 
install) takes around 30 minutes, and not building the PMPI_* bindings 
can save around 5 minutes.


/* when weak symbols are not available (e.g. ompi was configured with 
--disable-weak-symbols or weak symbols are not available by the OS, OSX 
for example), MPI bindings must be built twice:

- once to generate the MPI_* bindings
- an other time to generate the PMPI_* bindings */


any thoughts or objections to the removal of the --enable-mpi-profile 
configure option ?


Cheers,

Gilles


Re: [OMPI devel] RFC: Remove --without-hwloc configure option

2015-09-04 Thread Kawashima, Takahiro
Brice,

I'm a developer of Fujitsu MPI for K computer and Fujitsu
PRIMEHPC FX10/FX100 (SPARC-based CPU).

Though I'm not familiar with the hwloc code and didn't know
the issue reported by Gilles, I also would be able to help
you to fix the issue.

Takahiro Kawashima,
MPI development team,
Fujitsu

> Thanks Brice,
> 
> bottom line, even if hwloc is not fully ported, it should build and ompi 
> should get something usable.
> in this case, i have no objection removing the --without-hwloc configure 
> option.
> 
> you can contact me off-list regarding the FX10 specific issue
> 
> Cheers,
> 
> Gilles
> 
> On 9/4/2015 2:31 PM, Brice Goglin wrote:
> > Le 04/09/2015 00:36, Gilles Gouaillardet a écrit :
> >> Ralph,
> >>
> >> just to be clear, your proposal is to abort if openmpi is configured 
> >> with --without-hwloc, right ?
> >> ( the --with-hwloc option is not removed because we want to keep the 
> >> option of using an external hwloc library )
> >>
> >> if I understand correctly, Paul's point is that if openmpi is ported 
> >> to a new architecture for which hwloc has not been ported yet 
> >> (embedded hwloc or external hwloc), then the very first step is to 
> >> port hwloc before ompi can be built.
> >>
> >> did I get it right Paul ?
> >>
> >> Brice, what would happen in such a case ?
> >> embedded hwloc cannot be built ?
> >> hwloc returns little or no information ?
> >
> > If it's a new operating system and it supports at least things like 
> > sysconf, you will get a Machine object with one PUs per logical processor.
> >
> > If it's a new platform running Linux, they are supposed to tell Linux 
> > at least package/core/thread information. That's what we have for ARM 
> > for instance.
> >
> > Missing topology detection can be worked around easily (with XML and 
> > synthetic description, what we did for BlueGene/Q before adding manual 
> > support for that specific processor). Binding support can't.
> > And once you get binding, you get x86-topology even if the operating 
> > system isn't supported (using cpuid).
> >
> >> for example, on Fujitsu FX10 node (single socket, 16 cores), hwloc 
> >> reports 16 sockets with one core each and no cache. though this is 
> >> not correct, that can be seen as equivalent to the real config by 
> >> ompi, so this is not really an issue for ompi.
> >
> > Can you help fixing this?
> >
> > The issue is indeed with supercomputers with uncommon architectures 
> > like this one.

[OMPI devel] no more cast away const

2015-09-04 Thread Gilles Gouaillardet

Folks,

a bunch of C bindings have comments such as
/* XXX -- CONST -- do not cast away const -- update mca/coll */
and that has been there for a long time.

i made PR #839 https://github.com/open-mpi/ompi/pull/839 to fix this.
the change is quite massive (270 files) since :
- the C bindings had to be modified
- the MCA frameworks had to be modified
- the MCA components had to be modified

i did my best to update all the components, but i was not able to build 
all of them

(mainly because i do not have the required libs)
if a component is not updated, the worst case scenario should be a 
warning about

function types.

currently, 99% of the work is done.
components based on Mellanox proprietary libraries (fca, mxm and hcoll) 
still issue some warnings,
the root cause is proprietary libs interfaces should be updated with the 
const keywords where needed.


i did not change MPI_Reduce_local (ompi/mpi/c/reduce_local.c)
the reason is there is the change would have been half baked anyway.
it could have fully baked if the MPI_User_function type was
typedef void (MPI_User_function)(const void *, void *, int *, 
MPI_Datatype *);

instead of
typedef void (MPI_User_function)(void *, void *, int *, MPI_Datatype *);

fwiw, in MPI 2.2, the C++ binding has the const modifier, but not the C one.
per #140 https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/140

/* NOT CHANGING the following API's to keep backward compatibility and 
ease of use */


so it seems it was intentional not to add the const modifier to 
MPI_User_function


as i wrote earlier, the change is quite massive.
i plan to commit it by the end of next week, unless there are any 
objections.

(and then i will PR for v2.x, and v1.10 but only if there is a request)

Cheers,

Gilles


Re: [OMPI devel] RFC: Remove the --enable-mpi-profile option

2015-09-04 Thread Ralph Castain
While it would take longer on the laptop, we don’t often build on laptops from 
bottom-up - i.e., we only have to run the full build operation when configury 
is changed. So I’d say go ahead and remove it


> On Sep 4, 2015, at 12:49 AM, Gilles Gouaillardet  wrote:
> 
> Folks,
> 
> Jeff and i have been discussing the possibility of removing the 
> --enable-mpi-profile option from ompi.
> (see https://github.com/open-mpi/ompi/pull/845 for the details)
> 
> Removing this option would simplify the building process, and make it crystal 
> clear that Fortran bindings call
> the C PMPI_* bindings. From a tool point of view, that means a Fortran MPI 
> call is wrapped only once in Fortran.
> Currently, a Fortran MPI call is wrapped twice, once in Fortran and once in C.
> 
> We do not see any reason why someone would not want to build without the 
> PMPI_* bindings for a production build.
> That being said, the --disable-mpi-profile option can be useful to developers 
> in order to build openmpi faster on a laptop running OSX. For example, and on 
> my MacBook (recent but low voltage cpu with two core and two threads per 
> core), the full build process (from autogen.pl to make install) takes around 
> 30 minutes, and not building the PMPI_* bindings can save around 5 minutes.
> 
> /* when weak symbols are not available (e.g. ompi was configured with 
> --disable-weak-symbols or weak symbols are not available by the OS, OSX for 
> example), MPI bindings must be built twice:
> - once to generate the MPI_* bindings
> - an other time to generate the PMPI_* bindings */
> 
> 
> any thoughts or objections to the removal of the --enable-mpi-profile 
> configure option ?
> 
> Cheers,
> 
> Gilles
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/09/17960.php



Re: [OMPI devel] RFC: Remove --without-hwloc configure option

2015-09-04 Thread Ralph Castain
It sounds, then, like removing —without-hwloc will do no harm. At worst, hwloc 
might report inaccurate info, but that won’t stop us from running with 
appropriate cmd line options (e.g., to set the #slots and bind-to none).

Unless there are any further concerns, I’ll prep the PR


> On Sep 4, 2015, at 1:08 AM, Kawashima, Takahiro  
> wrote:
> 
> Brice,
> 
> I'm a developer of Fujitsu MPI for K computer and Fujitsu
> PRIMEHPC FX10/FX100 (SPARC-based CPU).
> 
> Though I'm not familiar with the hwloc code and didn't know
> the issue reported by Gilles, I also would be able to help
> you to fix the issue.
> 
> Takahiro Kawashima,
> MPI development team,
> Fujitsu
> 
>> Thanks Brice,
>> 
>> bottom line, even if hwloc is not fully ported, it should build and ompi 
>> should get something usable.
>> in this case, i have no objection removing the --without-hwloc configure 
>> option.
>> 
>> you can contact me off-list regarding the FX10 specific issue
>> 
>> Cheers,
>> 
>> Gilles
>> 
>> On 9/4/2015 2:31 PM, Brice Goglin wrote:
>>> Le 04/09/2015 00:36, Gilles Gouaillardet a écrit :
 Ralph,
 
 just to be clear, your proposal is to abort if openmpi is configured 
 with --without-hwloc, right ?
 ( the --with-hwloc option is not removed because we want to keep the 
 option of using an external hwloc library )
 
 if I understand correctly, Paul's point is that if openmpi is ported 
 to a new architecture for which hwloc has not been ported yet 
 (embedded hwloc or external hwloc), then the very first step is to 
 port hwloc before ompi can be built.
 
 did I get it right Paul ?
 
 Brice, what would happen in such a case ?
 embedded hwloc cannot be built ?
 hwloc returns little or no information ?
>>> 
>>> If it's a new operating system and it supports at least things like 
>>> sysconf, you will get a Machine object with one PUs per logical processor.
>>> 
>>> If it's a new platform running Linux, they are supposed to tell Linux 
>>> at least package/core/thread information. That's what we have for ARM 
>>> for instance.
>>> 
>>> Missing topology detection can be worked around easily (with XML and 
>>> synthetic description, what we did for BlueGene/Q before adding manual 
>>> support for that specific processor). Binding support can't.
>>> And once you get binding, you get x86-topology even if the operating 
>>> system isn't supported (using cpuid).
>>> 
 for example, on Fujitsu FX10 node (single socket, 16 cores), hwloc 
 reports 16 sockets with one core each and no cache. though this is 
 not correct, that can be seen as equivalent to the real config by 
 ompi, so this is not really an issue for ompi.
>>> 
>>> Can you help fixing this?
>>> 
>>> The issue is indeed with supercomputers with uncommon architectures 
>>> like this one.
> ___
> devel mailing list
> de...@open-mpi.org 
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
> 
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/09/17961.php 
> 


Re: [OMPI devel] 1.10.0 issue

2015-09-04 Thread Michal Schmidt
On 09/03/2015 03:47 PM, Ralph Castain wrote:
> I guess I didn’t make it clear in my prior comment, so let me try
> again. I understand about dlopen and the fix that George proposed -
> we had internally discussed this as well. However, the questions that
> raises are:
> 
> 1. how does the distro (Michal) decide which PSM module to disable by
> default in their package?

In the RHEL package I have disabled PSM2 by default in
openmpi-mca-params.conf:

# Disable the psm2 MTL by default.
# Workaround for https://bugzilla.redhat.com/show_bug.cgi?id=1259835
# This avoids a conflict between libpsm2.so.2 and libpsm_infinipath.so.1.
mtl = ^psm2
# If psm2 is needed, comment out the above line and uncomment
# the following two lines. This will disable MCAs that are known to
# depend on libpsm_infinipath.so.1:
#   mtl = ^psm,ofi
#   btl = ^usnic

> 2. how does the user “discover” that their fabric has automatically
> been disabled, especially since this has never been the case before?

There will be a release note.
OmniPath was not previously supported in RHEL at all, so it's not like
I'm disabling something that used to work.

Regards,
Michal


Re: [OMPI devel] RFC: Remove --without-hwloc configure option

2015-09-04 Thread Gilles Gouaillardet
iirc, hwloc can read input from an xml file.
if not already the case, should we provide a simple mechanism to tell hwloc
not to detect the topology from the os but from a config file.
for example, if working on a new os and/or hardware, then manually generate
the hwloc xml file on each node and do something like
mpirun --mca hwloc_file /etc/hwloc.xml ...

makes sense ?

On Friday, September 4, 2015, Ralph Castain  wrote:

> It sounds, then, like removing —without-hwloc will do no harm. At worst,
> hwloc might report inaccurate info, but that won’t stop us from running
> with appropriate cmd line options (e.g., to set the #slots and bind-to
> none).
>
> Unless there are any further concerns, I’ll prep the PR
>
>
> On Sep 4, 2015, at 1:08 AM, Kawashima, Takahiro <
> t-kawash...@jp.fujitsu.com
> > wrote:
>
> Brice,
>
> I'm a developer of Fujitsu MPI for K computer and Fujitsu
> PRIMEHPC FX10/FX100 (SPARC-based CPU).
>
> Though I'm not familiar with the hwloc code and didn't know
> the issue reported by Gilles, I also would be able to help
> you to fix the issue.
>
> Takahiro Kawashima,
> MPI development team,
> Fujitsu
>
> Thanks Brice,
>
> bottom line, even if hwloc is not fully ported, it should build and ompi
> should get something usable.
> in this case, i have no objection removing the --without-hwloc configure
> option.
>
> you can contact me off-list regarding the FX10 specific issue
>
> Cheers,
>
> Gilles
>
> On 9/4/2015 2:31 PM, Brice Goglin wrote:
>
> Le 04/09/2015 00:36, Gilles Gouaillardet a écrit :
>
> Ralph,
>
> just to be clear, your proposal is to abort if openmpi is configured
> with --without-hwloc, right ?
> ( the --with-hwloc option is not removed because we want to keep the
> option of using an external hwloc library )
>
> if I understand correctly, Paul's point is that if openmpi is ported
> to a new architecture for which hwloc has not been ported yet
> (embedded hwloc or external hwloc), then the very first step is to
> port hwloc before ompi can be built.
>
> did I get it right Paul ?
>
> Brice, what would happen in such a case ?
> embedded hwloc cannot be built ?
> hwloc returns little or no information ?
>
>
> If it's a new operating system and it supports at least things like
> sysconf, you will get a Machine object with one PUs per logical processor.
>
> If it's a new platform running Linux, they are supposed to tell Linux
> at least package/core/thread information. That's what we have for ARM
> for instance.
>
> Missing topology detection can be worked around easily (with XML and
> synthetic description, what we did for BlueGene/Q before adding manual
> support for that specific processor). Binding support can't.
> And once you get binding, you get x86-topology even if the operating
> system isn't supported (using cpuid).
>
> for example, on Fujitsu FX10 node (single socket, 16 cores), hwloc
> reports 16 sockets with one core each and no cache. though this is
> not correct, that can be seen as equivalent to the real config by
> ompi, so this is not really an issue for ompi.
>
>
> Can you help fixing this?
>
> The issue is indeed with supercomputers with uncommon architectures
> like this one.
>
> ___
> devel mailing list
> de...@open-mpi.org 
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/09/17961.php
>
>
>


Re: [OMPI devel] 1.10.0 issue

2015-09-04 Thread Ralph Castain
Umm…why would USNIC depend on libpsm_infinipath?? Jeff or Dave - is that true?



> On Sep 4, 2015, at 5:57 AM, Michal Schmidt  wrote:
> 
> On 09/03/2015 03:47 PM, Ralph Castain wrote:
>> I guess I didn’t make it clear in my prior comment, so let me try
>> again. I understand about dlopen and the fix that George proposed -
>> we had internally discussed this as well. However, the questions that
>> raises are:
>> 
>> 1. how does the distro (Michal) decide which PSM module to disable by
>> default in their package?
> 
> In the RHEL package I have disabled PSM2 by default in
> openmpi-mca-params.conf:
> 
> # Disable the psm2 MTL by default.
> # Workaround for https://bugzilla.redhat.com/show_bug.cgi?id=1259835
> # This avoids a conflict between libpsm2.so.2 and libpsm_infinipath.so.1.
> mtl = ^psm2
> # If psm2 is needed, comment out the above line and uncomment
> # the following two lines. This will disable MCAs that are known to
> # depend on libpsm_infinipath.so.1:
> #   mtl = ^psm,ofi
> #   btl = ^usnic
> 
>> 2. how does the user “discover” that their fabric has automatically
>> been disabled, especially since this has never been the case before?
> 
> There will be a release note.
> OmniPath was not previously supported in RHEL at all, so it's not like
> I'm disabling something that used to work.
> 
> Regards,
> Michal
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/09/17965.php



Re: [OMPI devel] RFC: Remove --without-hwloc configure option

2015-09-04 Thread Ralph Castain
I think we already do, but I can check

> On Sep 4, 2015, at 6:06 AM, Gilles Gouaillardet 
>  wrote:
> 
> iirc, hwloc can read input from an xml file.
> if not already the case, should we provide a simple mechanism to tell hwloc 
> not to detect the topology from the os but from a config file.
> for example, if working on a new os and/or hardware, then manually generate 
> the hwloc xml file on each node and do something like
> mpirun --mca hwloc_file /etc/hwloc.xml ...
> 
> makes sense ?
> 
> On Friday, September 4, 2015, Ralph Castain  > wrote:
> It sounds, then, like removing —without-hwloc will do no harm. At worst, 
> hwloc might report inaccurate info, but that won’t stop us from running with 
> appropriate cmd line options (e.g., to set the #slots and bind-to none).
> 
> Unless there are any further concerns, I’ll prep the PR
> 
> 
>> On Sep 4, 2015, at 1:08 AM, Kawashima, Takahiro > > wrote:
>> 
>> Brice,
>> 
>> I'm a developer of Fujitsu MPI for K computer and Fujitsu
>> PRIMEHPC FX10/FX100 (SPARC-based CPU).
>> 
>> Though I'm not familiar with the hwloc code and didn't know
>> the issue reported by Gilles, I also would be able to help
>> you to fix the issue.
>> 
>> Takahiro Kawashima,
>> MPI development team,
>> Fujitsu
>> 
>>> Thanks Brice,
>>> 
>>> bottom line, even if hwloc is not fully ported, it should build and ompi 
>>> should get something usable.
>>> in this case, i have no objection removing the --without-hwloc configure 
>>> option.
>>> 
>>> you can contact me off-list regarding the FX10 specific issue
>>> 
>>> Cheers,
>>> 
>>> Gilles
>>> 
>>> On 9/4/2015 2:31 PM, Brice Goglin wrote:
 Le 04/09/2015 00:36, Gilles Gouaillardet a écrit :
> Ralph,
> 
> just to be clear, your proposal is to abort if openmpi is configured 
> with --without-hwloc, right ?
> ( the --with-hwloc option is not removed because we want to keep the 
> option of using an external hwloc library )
> 
> if I understand correctly, Paul's point is that if openmpi is ported 
> to a new architecture for which hwloc has not been ported yet 
> (embedded hwloc or external hwloc), then the very first step is to 
> port hwloc before ompi can be built.
> 
> did I get it right Paul ?
> 
> Brice, what would happen in such a case ?
> embedded hwloc cannot be built ?
> hwloc returns little or no information ?
 
 If it's a new operating system and it supports at least things like 
 sysconf, you will get a Machine object with one PUs per logical processor.
 
 If it's a new platform running Linux, they are supposed to tell Linux 
 at least package/core/thread information. That's what we have for ARM 
 for instance.
 
 Missing topology detection can be worked around easily (with XML and 
 synthetic description, what we did for BlueGene/Q before adding manual 
 support for that specific processor). Binding support can't.
 And once you get binding, you get x86-topology even if the operating 
 system isn't supported (using cpuid).
 
> for example, on Fujitsu FX10 node (single socket, 16 cores), hwloc 
> reports 16 sockets with one core each and no cache. though this is 
> not correct, that can be seen as equivalent to the real config by 
> ompi, so this is not really an issue for ompi.
 
 Can you help fixing this?
 
 The issue is indeed with supercomputers with uncommon architectures 
 like this one.
>> ___
>> devel mailing list
>> de...@open-mpi.org 
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
>> 
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2015/09/17961.php 
>> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/09/17966.php



Re: [OMPI devel] 1.10.0 issue

2015-09-04 Thread Michal Schmidt
On 09/04/2015 03:29 PM, Ralph Castain wrote:
> Umm…why would USNIC depend on libpsm_infinipath?? Jeff or Dave - is that true?

Indirectly, via libfabric.

Michal




Re: [OMPI devel] 1.10.0 issue

2015-09-04 Thread Ralph Castain
A…thanks!


> On Sep 4, 2015, at 6:52 AM, Michal Schmidt  wrote:
> 
> On 09/04/2015 03:29 PM, Ralph Castain wrote:
>> Umm…why would USNIC depend on libpsm_infinipath?? Jeff or Dave - is that 
>> true?
> 
> Indirectly, via libfabric.
> 
> Michal
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/09/17969.php



Re: [OMPI devel] 1.10.0 issue

2015-09-04 Thread Friedley, Andrew
The PSM2 MTL code I submitted to master, 1.10, and 2.x will auto-build if the 
library is detected in the system; I don't think that's been changed since.  
Feel free to disable the auto-build until we can get a PSM2 solution, but we'd 
much prefer not to outright remove the PSM2 MTL.

Andrew

> -Original Message-
> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph
> Castain
> Sent: Thursday, September 3, 2015 4:44 PM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] 1.10.0 issue
> 
> Yes, it actually is rather easy to do. I can check, but I think that should 
> happen
> now (unless psm2 was set to auto-build if the lib was detected). Regardless,
> we can always have RH et al simply build with —enable-mca-no-build=mtl-
> psm2 and that will solve the problem.
> 
> Please keep us posted - and thanks!
> Ralph
> 
> > On Sep 3, 2015, at 3:44 PM, Friedley, Andrew 
> wrote:
> >
> > Hi Ralph & crew,
> >
> > I'm representing the Intel PSM team to Open MPI.  They're aware of the
> problem and have seen the comments on both this thread and in OFI, and
> are working on solving the issue within PSM2.  Current estimate is that it 
> will
> take 3-4 weeks.
> >
> > If it comes to removing the PSM2 MTL from 1.10, would it instead be
> possible to disable it from being configured/built by default, unless
> specifically requested using --with-psm2?
> >
> > Andrew
> >
> >> -Original Message-
> >> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph
> >> Castain
> >> Sent: Wednesday, September 2, 2015 6:22 PM
> >> To: Open MPI Developers
> >> Subject: [OMPI devel] 1.10.0 issue
> >>
> >> Hi folks
> >>
> >> I regret to say that 1.10.0 is hitting an issue with at least one upstream
> distro.
> >> Apparently, there is a symbol conflict between the PSM and PSM2
> >> libraries that precludes building both of those MTLs at the same
> >> time. This is leading the distro to push for release of two OMPI
> >> 1.10.0 builds - one with PSM and the other with PSM2.
> >>
> >> IMO, this is a very undesirable situation. I agree with the distro
> >> that delaying release for some significant time as this would impact
> everyone else’s users.
> >> Therefore, assuming that the PSM team is unable to quickly resolve
> >> the problem in their libraries, my inclination is to release an
> >> immediate 1.10.1 with the PSM2 MTL removed.
> >>
> >> I’m soliciting input - any opinions?
> >> Ralph
> >>
> >> ___
> >> devel mailing list
> >> de...@open-mpi.org
> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >> Link to this post: http://www.open-
> >> mpi.org/community/lists/devel/2015/09/17919.php
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
> > http://www.open-mpi.org/community/lists/devel/2015/09/17953.php
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: http://www.open-
> mpi.org/community/lists/devel/2015/09/17956.php


Re: [OMPI devel] 1.10.0 issue

2015-09-04 Thread Ralph Castain
Understood - for now, it appears RH has an acceptable solution by using the MCA 
param, so I see no need for further action until you complete the fix


> On Sep 4, 2015, at 7:45 AM, Friedley, Andrew  
> wrote:
> 
> The PSM2 MTL code I submitted to master, 1.10, and 2.x will auto-build if the 
> library is detected in the system; I don't think that's been changed since.  
> Feel free to disable the auto-build until we can get a PSM2 solution, but 
> we'd much prefer not to outright remove the PSM2 MTL.
> 
> Andrew
> 
>> -Original Message-
>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph
>> Castain
>> Sent: Thursday, September 3, 2015 4:44 PM
>> To: Open MPI Developers
>> Subject: Re: [OMPI devel] 1.10.0 issue
>> 
>> Yes, it actually is rather easy to do. I can check, but I think that should 
>> happen
>> now (unless psm2 was set to auto-build if the lib was detected). Regardless,
>> we can always have RH et al simply build with —enable-mca-no-build=mtl-
>> psm2 and that will solve the problem.
>> 
>> Please keep us posted - and thanks!
>> Ralph
>> 
>>> On Sep 3, 2015, at 3:44 PM, Friedley, Andrew 
>> wrote:
>>> 
>>> Hi Ralph & crew,
>>> 
>>> I'm representing the Intel PSM team to Open MPI.  They're aware of the
>> problem and have seen the comments on both this thread and in OFI, and
>> are working on solving the issue within PSM2.  Current estimate is that it 
>> will
>> take 3-4 weeks.
>>> 
>>> If it comes to removing the PSM2 MTL from 1.10, would it instead be
>> possible to disable it from being configured/built by default, unless
>> specifically requested using --with-psm2?
>>> 
>>> Andrew
>>> 
 -Original Message-
 From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph
 Castain
 Sent: Wednesday, September 2, 2015 6:22 PM
 To: Open MPI Developers
 Subject: [OMPI devel] 1.10.0 issue
 
 Hi folks
 
 I regret to say that 1.10.0 is hitting an issue with at least one upstream
>> distro.
 Apparently, there is a symbol conflict between the PSM and PSM2
 libraries that precludes building both of those MTLs at the same
 time. This is leading the distro to push for release of two OMPI
 1.10.0 builds - one with PSM and the other with PSM2.
 
 IMO, this is a very undesirable situation. I agree with the distro
 that delaying release for some significant time as this would impact
>> everyone else’s users.
 Therefore, assuming that the PSM team is unable to quickly resolve
 the problem in their libraries, my inclination is to release an
 immediate 1.10.1 with the PSM2 MTL removed.
 
 I’m soliciting input - any opinions?
 Ralph
 
 ___
 devel mailing list
 de...@open-mpi.org
 Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
 Link to this post: http://www.open-
 mpi.org/community/lists/devel/2015/09/17919.php
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/devel/2015/09/17953.php
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: http://www.open-
>> mpi.org/community/lists/devel/2015/09/17956.php
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/09/17971.php



Re: [OMPI devel] 1.10.0 issue

2015-09-04 Thread Jeff Squyres (jsquyres)
Ralph: you might want to just advise RH to use --without-psm2.  It's a little 
more direct / a little less convoluted than --enable-mca-no-build.


> On Sep 4, 2015, at 10:51 AM, Ralph Castain  wrote:
> 
> Understood - for now, it appears RH has an acceptable solution by using the 
> MCA param, so I see no need for further action until you complete the fix
> 
> 
>> On Sep 4, 2015, at 7:45 AM, Friedley, Andrew  
>> wrote:
>> 
>> The PSM2 MTL code I submitted to master, 1.10, and 2.x will auto-build if 
>> the library is detected in the system; I don't think that's been changed 
>> since.  Feel free to disable the auto-build until we can get a PSM2 
>> solution, but we'd much prefer not to outright remove the PSM2 MTL.
>> 
>> Andrew
>> 
>>> -Original Message-
>>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph
>>> Castain
>>> Sent: Thursday, September 3, 2015 4:44 PM
>>> To: Open MPI Developers
>>> Subject: Re: [OMPI devel] 1.10.0 issue
>>> 
>>> Yes, it actually is rather easy to do. I can check, but I think that should 
>>> happen
>>> now (unless psm2 was set to auto-build if the lib was detected). Regardless,
>>> we can always have RH et al simply build with —enable-mca-no-build=mtl-
>>> psm2 and that will solve the problem.
>>> 
>>> Please keep us posted - and thanks!
>>> Ralph
>>> 
 On Sep 3, 2015, at 3:44 PM, Friedley, Andrew 
>>> wrote:
 
 Hi Ralph & crew,
 
 I'm representing the Intel PSM team to Open MPI.  They're aware of the
>>> problem and have seen the comments on both this thread and in OFI, and
>>> are working on solving the issue within PSM2.  Current estimate is that it 
>>> will
>>> take 3-4 weeks.
 
 If it comes to removing the PSM2 MTL from 1.10, would it instead be
>>> possible to disable it from being configured/built by default, unless
>>> specifically requested using --with-psm2?
 
 Andrew
 
> -Original Message-
> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph
> Castain
> Sent: Wednesday, September 2, 2015 6:22 PM
> To: Open MPI Developers
> Subject: [OMPI devel] 1.10.0 issue
> 
> Hi folks
> 
> I regret to say that 1.10.0 is hitting an issue with at least one upstream
>>> distro.
> Apparently, there is a symbol conflict between the PSM and PSM2
> libraries that precludes building both of those MTLs at the same
> time. This is leading the distro to push for release of two OMPI
> 1.10.0 builds - one with PSM and the other with PSM2.
> 
> IMO, this is a very undesirable situation. I agree with the distro
> that delaying release for some significant time as this would impact
>>> everyone else’s users.
> Therefore, assuming that the PSM team is unable to quickly resolve
> the problem in their libraries, my inclination is to release an
> immediate 1.10.1 with the PSM2 MTL removed.
> 
> I’m soliciting input - any opinions?
> Ralph
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: http://www.open-
> mpi.org/community/lists/devel/2015/09/17919.php
 ___
 devel mailing list
 de...@open-mpi.org
 Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
 Link to this post:
 http://www.open-mpi.org/community/lists/devel/2015/09/17953.php
>>> 
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post: http://www.open-
>>> mpi.org/community/lists/devel/2015/09/17956.php
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2015/09/17971.php
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/09/17972.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] 1.10.0 issue

2015-09-04 Thread Jeff Squyres (jsquyres)
Michael: Wait, why are you disabling usnic?

Please don't penalize usNIC because of Intel's PSM issues.



> On Sep 4, 2015, at 9:29 AM, Ralph Castain  wrote:
> 
> Umm…why would USNIC depend on libpsm_infinipath?? Jeff or Dave - is that true?
> 
> 
> 
>> On Sep 4, 2015, at 5:57 AM, Michal Schmidt  wrote:
>> 
>> On 09/03/2015 03:47 PM, Ralph Castain wrote:
>>> I guess I didn’t make it clear in my prior comment, so let me try
>>> again. I understand about dlopen and the fix that George proposed -
>>> we had internally discussed this as well. However, the questions that
>>> raises are:
>>> 
>>> 1. how does the distro (Michal) decide which PSM module to disable by
>>> default in their package?
>> 
>> In the RHEL package I have disabled PSM2 by default in
>> openmpi-mca-params.conf:
>> 
>> # Disable the psm2 MTL by default.
>> # Workaround for https://bugzilla.redhat.com/show_bug.cgi?id=1259835
>> # This avoids a conflict between libpsm2.so.2 and libpsm_infinipath.so.1.
>> mtl = ^psm2
>> # If psm2 is needed, comment out the above line and uncomment
>> # the following two lines. This will disable MCAs that are known to
>> # depend on libpsm_infinipath.so.1:
>> #   mtl = ^psm,ofi
>> #   btl = ^usnic
>> 
>>> 2. how does the user “discover” that their fabric has automatically
>>> been disabled, especially since this has never been the case before?
>> 
>> There will be a release note.
>> OmniPath was not previously supported in RHEL at all, so it's not like
>> I'm disabling something that used to work.
>> 
>> Regards,
>> Michal
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2015/09/17965.php
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/09/17967.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] 1.10.0 issue

2015-09-04 Thread Jeff Squyres (jsquyres)
Ignore me; I read your email wrong.  You have "btl = ^usnic" commented out, and 
a line above it saying "if you need PSM2, then uncomment these...".

Makes perfect sense.  Sorry for the noise.


> On Sep 4, 2015, at 12:00 PM, Jeff Squyres (jsquyres)  
> wrote:
> 
> Michael: Wait, why are you disabling usnic?
> 
> Please don't penalize usNIC because of Intel's PSM issues.
> 
> 
> 
>> On Sep 4, 2015, at 9:29 AM, Ralph Castain  wrote:
>> 
>> Umm…why would USNIC depend on libpsm_infinipath?? Jeff or Dave - is that 
>> true?
>> 
>> 
>> 
>>> On Sep 4, 2015, at 5:57 AM, Michal Schmidt  wrote:
>>> 
>>> On 09/03/2015 03:47 PM, Ralph Castain wrote:
 I guess I didn’t make it clear in my prior comment, so let me try
 again. I understand about dlopen and the fix that George proposed -
 we had internally discussed this as well. However, the questions that
 raises are:
 
 1. how does the distro (Michal) decide which PSM module to disable by
 default in their package?
>>> 
>>> In the RHEL package I have disabled PSM2 by default in
>>> openmpi-mca-params.conf:
>>> 
>>> # Disable the psm2 MTL by default.
>>> # Workaround for https://bugzilla.redhat.com/show_bug.cgi?id=1259835
>>> # This avoids a conflict between libpsm2.so.2 and libpsm_infinipath.so.1.
>>> mtl = ^psm2
>>> # If psm2 is needed, comment out the above line and uncomment
>>> # the following two lines. This will disable MCAs that are known to
>>> # depend on libpsm_infinipath.so.1:
>>> #   mtl = ^psm,ofi
>>> #   btl = ^usnic
>>> 
 2. how does the user “discover” that their fabric has automatically
 been disabled, especially since this has never been the case before?
>>> 
>>> There will be a release note.
>>> OmniPath was not previously supported in RHEL at all, so it's not like
>>> I'm disabling something that used to work.
>>> 
>>> Regards,
>>> Michal
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/devel/2015/09/17965.php
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2015/09/17967.php
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/09/17974.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] Annual Open MPI membership review, and Git repo permissions review

2015-09-04 Thread Jeff Squyres (jsquyres)
Still waiting on the following organizations to update their list of committers 
in this 
spreadsheet:


  *   Mellanox
  *   U. Houston
  *   AMD
  *   ORNL
  *   Oracle
  *   HLRS
  *   HFT Stuttgart
  *   LANL
  *   Sandia
  *   Chelsio
  *   Oscar Vegis
  *   Craig Rasmussen

The deadline is today.

Folks: I don't like this administravia any more than you do.  But I'm 
volunteering to do it, and you're wasting my time by not taking two minutes to 
fill out your part of the spreadsheet.  Please do it ASAP.

Thanks.





On Aug 26, 2015, at 4:12 PM, Jeff Squyres (jsquyres) 
mailto:jsquy...@cisco.com>> wrote:

It's that time of year again: it's our annual review of those with write access 
to our Git repos.

The purpose is simply to trim those who are no longer active, or otherwise no 
longer need write access to Open MPI repositories.  We don't want people moving 
to new jobs and still having write access to our code bases; this is our annual 
reminder to remove such accounts.

*** I've created a spreadsheet that every organization will need to examine by 
COB Fri, 4 Sep, 2015 ***

What you need to do:

1. Visit this Google spreadsheet

2. Find your organization.

3. Examine each person in your organization:
   - I've marked each Git repo that each person currently has WRITE access to
   *** If that person can be dropped from all Open MPI repos, color that 
person's row ORANGE
   *** If that person still needs WRITE git repo access, place an X in the 
corresponding "Need commit?" column
   *** Once you are done checking each person, color that person's row GREEN so 
that we know this row is done

### Please only mark off members who actually need write access in the 
foreseeable future.

If someone in your org *might* need access at some point in the future, we can 
add them in the future -- it's a quick/easy process to do so.  Meaning: please 
don't mark someone as needing access *now* if they only *might* need access in 
the future.

--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/