Re: [OMPI devel] test/class/opal_fifo failure on ppc64

2015-01-09 Thread Adrian Reber
Thanks. mtt on my ppc64 system is happy again.

On Thu, Jan 08, 2015 at 09:16:43AM -0700, Nathan Hjelm wrote:
> 
> Fixed on master. I forgot a write memory barrier in the 64-bit version
> of opal_fifo_pop_atomic.
> 
> -Nathan
> 
> On Thu, Jan 08, 2015 at 02:29:05PM +0100, Adrian Reber wrote:
> > I am trying to build OMPI git master on ppc64 (PPC970MP) and
> > test/class/opal_fifo fails during make check most of the time.
> > 
> > [adrian@bimini class]$ ./opal_fifo
> > Single thread test. Time: 0 s 99714 us 99 nsec/poppush
> > Atomics thread finished. Time: 0 s 347577 us 347 nsec/poppush
> > Atomics thread finished. Time: 11 s 490743 us 11490 nsec/poppush
> > Atomics thread finished. Time: 11 s 567542 us 11567 nsec/poppush
> > Atomics thread finished. Time: 11 s 655924 us 11655 nsec/poppush
> > Atomics thread finished. Time: 11 s 786925 us 11786 nsec/poppush
> > Atomics thread finished. Time: 11 s 931230 us 11931 nsec/poppush
> > Atomics thread finished. Time: 12 s 11617 us 12011 nsec/poppush
> > Atomics thread finished. Time: 12 s 63224 us 12063 nsec/poppush
> > Atomics thread finished. Time: 12 s 65844 us 12065 nsec/poppush
> >  Failure :  fifo push/pop multi-threaded with atomics
> > All threads finished. Thread count: 8 Time: 12 s 66103 us 1508 nsec/poppush
> > Exhaustive atomics thread finished. Popped 11982 items. Time: 3 s 700703 us 
> > 308855 nsec/poppush
> > Exhaustive atomics thread finished. Popped 12171 items. Time: 3 s 759974 us 
> > 308928 nsec/poppush
> > Exhaustive atomics thread finished. Popped 11593 items. Time: 3 s 787227 us 
> > 326682 nsec/poppush
> > Exhaustive atomics thread finished. Popped 11079 items. Time: 3 s 786468 us 
> > 341769 nsec/poppush
> > Exhaustive atomics thread finished. Popped 16467 items. Time: 4 s 7891 us 
> > 243389 nsec/poppush
> > Exhaustive atomics thread finished. Popped 11097 items. Time: 4 s 68897 us 
> > 36 nsec/poppush
> > Exhaustive atomics thread finished. Popped 25583 items. Time: 4 s 89074 us 
> > 159835 nsec/poppush
> > Exhaustive atomics thread finished. Popped 22092 items. Time: 4 s 82373 us 
> > 184789 nsec/poppush
> >  Failure :  fifo push/pop multi-threaded with atomics when there are 
> > insufficient items
> > All threads finished. Thread count: 8 Time: 4 s 93369 us 511 nsec/poppush
> >  Failure :  fifo pop all items
> > SUPPORT: OMPI Test failed: opal_fifo_t (3 of 8 failed)
> > 
> > I had a look at the memory barriers in 
> > opal/include/opal/sys/powerpc/atomic.h
> > and from what little I remember about PPC64 those look correct:
> > 
> > #define MB()  __asm__ __volatile__ ("sync" : : : "memory")
> > #define RMB() __asm__ __volatile__ ("lwsync" : : : "memory")
> > #define WMB() __asm__ __volatile__ ("eieio" : : : "memory")
> > 
> > The system is running Fedora 21 with gcc 4.9.2 and if this platform
> > is still relevant I can provide SSH access to the machine
> > for further debugging.
> > 
> > Adrian
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post: 
> > http://www.open-mpi.org/community/lists/devel/2015/01/16760.php



> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/01/16762.php


pgpbtrbB6LSB9.pgp
Description: PGP signature


[OMPI devel] Changed behaviour with PSM on master

2015-01-09 Thread Adrian Reber
Running the mpi_test_suite on master used to work with no problems. At
some point in time it stopped working however and now I get only error
messages from PSM:

"""
n050301:3.0.In PSM version 1.14, it is not possible to open more than one 
context per process

[n050301:26526] Open MPI detected an unexpected PSM error in opening an 
endpoint: In PSM version
1.14, it is not possible to open more than one context per process
"""

I know that I do not have the newest version of the PSM library and
that I need to update the library but as this requires many
software packages to be re-compiled we are trying to avoid it on
our CentOS6 based system.

My main question (probably for Andrew) is if this is an expected
behaviour on master. It works on 1.8.x and it used to work on
master at least until 2014-12-08.

This is the last MTT entry for working PSM (with my older version)
http://mtt.open-mpi.org/index.php?do_redir=2226

and since a few days it fails on master
http://mtt.open-mpi.org/index.php?do_redir=2225

On another system (RHEL7) with newer PSM libraries there is no such
error.

Adrian


pgp96nLKZ43Qe.pgp
Description: PGP signature


Re: [OMPI devel] Changed behaviour with PSM on master

2015-01-09 Thread Friedley, Andrew
No this is not expected behavior.

The PSM MTL code has not changed in 2 months, when I fixed that unused variable 
warning for you.  That suggests something above the PSM MTL broke things.  I 
see no reason your older software install should suddenly stopping working if 
all you are updating is OMPI master -- at least with respect to PSM anyway.

The error message is right, it's not possible to open more than one context per 
process.  This hasn't changed.  It does indicate that maybe something is 
causing the MTL to be opened twice in each process?

Andrew

> -Original Message-
> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Adrian
> Reber
> Sent: Friday, January 9, 2015 4:13 AM
> To: de...@open-mpi.org
> Subject: [OMPI devel] Changed behaviour with PSM on master
> 
> Running the mpi_test_suite on master used to work with no problems. At
> some point in time it stopped working however and now I get only error
> messages from PSM:
> 
> """
> n050301:3.0.In PSM version 1.14, it is not possible to open more than one
> context per process
> 
> [n050301:26526] Open MPI detected an unexpected PSM error in opening an
> endpoint: In PSM version 1.14, it is not possible to open more than one
> context per process """
> 
> I know that I do not have the newest version of the PSM library and that I
> need to update the library but as this requires many software packages to be
> re-compiled we are trying to avoid it on our CentOS6 based system.
> 
> My main question (probably for Andrew) is if this is an expected behaviour
> on master. It works on 1.8.x and it used to work on master at least until 
> 2014-
> 12-08.
> 
> This is the last MTT entry for working PSM (with my older version)
> http://mtt.open-mpi.org/index.php?do_redir=2226
> 
> and since a few days it fails on master
> http://mtt.open-mpi.org/index.php?do_redir=2225
> 
> On another system (RHEL7) with newer PSM libraries there is no such error.
> 
>   Adrian


Re: [OMPI devel] Changed behaviour with PSM on master

2015-01-09 Thread Howard Pritchard
Hi Adrian and Andrew,

I"m able to reproduce your problem on one of our qlogic clusters.
We are using PSM 1.14 and slurm.  I'm noticing that for some reason
in our setup the ORTE_MCA_orte_precondition_transports env.
variable is not being set.

Could you run your test with

--mca odls_base_verbose 100

and check to see that in fact that env. variable isn't in the list
of passed env. variables?

Would one of you mind opening an issue to track this problem?

Thanks,

Howard



2015-01-09 7:52 GMT-07:00 Friedley, Andrew :

> No this is not expected behavior.
>
> The PSM MTL code has not changed in 2 months, when I fixed that unused
> variable warning for you.  That suggests something above the PSM MTL broke
> things.  I see no reason your older software install should suddenly
> stopping working if all you are updating is OMPI master -- at least with
> respect to PSM anyway.
>
> The error message is right, it's not possible to open more than one
> context per process.  This hasn't changed.  It does indicate that maybe
> something is causing the MTL to be opened twice in each process?
>
> Andrew
>
> > -Original Message-
> > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Adrian
> > Reber
> > Sent: Friday, January 9, 2015 4:13 AM
> > To: de...@open-mpi.org
> > Subject: [OMPI devel] Changed behaviour with PSM on master
> >
> > Running the mpi_test_suite on master used to work with no problems. At
> > some point in time it stopped working however and now I get only error
> > messages from PSM:
> >
> > """
> > n050301:3.0.In PSM version 1.14, it is not possible to open more than
> one
> > context per process
> >
> > [n050301:26526] Open MPI detected an unexpected PSM error in opening an
> > endpoint: In PSM version 1.14, it is not possible to open more than one
> > context per process """
> >
> > I know that I do not have the newest version of the PSM library and that
> I
> > need to update the library but as this requires many software packages
> to be
> > re-compiled we are trying to avoid it on our CentOS6 based system.
> >
> > My main question (probably for Andrew) is if this is an expected
> behaviour
> > on master. It works on 1.8.x and it used to work on master at least
> until 2014-
> > 12-08.
> >
> > This is the last MTT entry for working PSM (with my older version)
> > http://mtt.open-mpi.org/index.php?do_redir=2226
> >
> > and since a few days it fails on master
> > http://mtt.open-mpi.org/index.php?do_redir=2225
> >
> > On another system (RHEL7) with newer PSM libraries there is no such
> error.
> >
> >   Adrian
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/01/16766.php
>


Re: [OMPI devel] Changed behaviour with PSM on master

2015-01-09 Thread Howard Pritchard
HI Folks,

Sorry for my stupidity.  I now see the problem.  App is calling pmi_init
twice because
of the new ofiwg libfabric mtl.

You can try

mpirun blah blah blah --mca btl

and things should work.


Howard


2015-01-09 7:52 GMT-07:00 Friedley, Andrew :

> No this is not expected behavior.
>
> The PSM MTL code has not changed in 2 months, when I fixed that unused
> variable warning for you.  That suggests something above the PSM MTL broke
> things.  I see no reason your older software install should suddenly
> stopping working if all you are updating is OMPI master -- at least with
> respect to PSM anyway.
>
> The error message is right, it's not possible to open more than one
> context per process.  This hasn't changed.  It does indicate that maybe
> something is causing the MTL to be opened twice in each process?
>
> Andrew
>
> > -Original Message-
> > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Adrian
> > Reber
> > Sent: Friday, January 9, 2015 4:13 AM
> > To: de...@open-mpi.org
> > Subject: [OMPI devel] Changed behaviour with PSM on master
> >
> > Running the mpi_test_suite on master used to work with no problems. At
> > some point in time it stopped working however and now I get only error
> > messages from PSM:
> >
> > """
> > n050301:3.0.In PSM version 1.14, it is not possible to open more than
> one
> > context per process
> >
> > [n050301:26526] Open MPI detected an unexpected PSM error in opening an
> > endpoint: In PSM version 1.14, it is not possible to open more than one
> > context per process """
> >
> > I know that I do not have the newest version of the PSM library and that
> I
> > need to update the library but as this requires many software packages
> to be
> > re-compiled we are trying to avoid it on our CentOS6 based system.
> >
> > My main question (probably for Andrew) is if this is an expected
> behaviour
> > on master. It works on 1.8.x and it used to work on master at least
> until 2014-
> > 12-08.
> >
> > This is the last MTT entry for working PSM (with my older version)
> > http://mtt.open-mpi.org/index.php?do_redir=2226
> >
> > and since a few days it fails on master
> > http://mtt.open-mpi.org/index.php?do_redir=2225
> >
> > On another system (RHEL7) with newer PSM libraries there is no such
> error.
> >
> >   Adrian
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/01/16766.php
>


Re: [OMPI devel] Changed behaviour with PSM on master

2015-01-09 Thread Howard Pritchard
HI Adrian, Andrew,

Sorry try again,  both the libfabric psm provider and the open mpi psm
mtl are trying to use psm_init.

So, to avoid this problem, add

--mca mtl psm

to your mpirun command line.

Sorry for the confusion.

Howard


2015-01-09 7:52 GMT-07:00 Friedley, Andrew :

> No this is not expected behavior.
>
> The PSM MTL code has not changed in 2 months, when I fixed that unused
> variable warning for you.  That suggests something above the PSM MTL broke
> things.  I see no reason your older software install should suddenly
> stopping working if all you are updating is OMPI master -- at least with
> respect to PSM anyway.
>
> The error message is right, it's not possible to open more than one
> context per process.  This hasn't changed.  It does indicate that maybe
> something is causing the MTL to be opened twice in each process?
>
> Andrew
>
> > -Original Message-
> > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Adrian
> > Reber
> > Sent: Friday, January 9, 2015 4:13 AM
> > To: de...@open-mpi.org
> > Subject: [OMPI devel] Changed behaviour with PSM on master
> >
> > Running the mpi_test_suite on master used to work with no problems. At
> > some point in time it stopped working however and now I get only error
> > messages from PSM:
> >
> > """
> > n050301:3.0.In PSM version 1.14, it is not possible to open more than
> one
> > context per process
> >
> > [n050301:26526] Open MPI detected an unexpected PSM error in opening an
> > endpoint: In PSM version 1.14, it is not possible to open more than one
> > context per process """
> >
> > I know that I do not have the newest version of the PSM library and that
> I
> > need to update the library but as this requires many software packages
> to be
> > re-compiled we are trying to avoid it on our CentOS6 based system.
> >
> > My main question (probably for Andrew) is if this is an expected
> behaviour
> > on master. It works on 1.8.x and it used to work on master at least
> until 2014-
> > 12-08.
> >
> > This is the last MTT entry for working PSM (with my older version)
> > http://mtt.open-mpi.org/index.php?do_redir=2226
> >
> > and since a few days it fails on master
> > http://mtt.open-mpi.org/index.php?do_redir=2225
> >
> > On another system (RHEL7) with newer PSM libraries there is no such
> error.
> >
> >   Adrian
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/01/16766.php
>


Re: [OMPI devel] Changed behaviour with PSM on master

2015-01-09 Thread Adrian Reber
Should I still open a ticket? Will these be changed or do I always have
to provide '--mca mtl psm' in the future?

On Fri, Jan 09, 2015 at 12:27:59PM -0700, Howard Pritchard wrote:
> HI Adrian, Andrew,
> 
> Sorry try again,  both the libfabric psm provider and the open mpi psm
> mtl are trying to use psm_init.
> 
> So, to avoid this problem, add
> 
> --mca mtl psm
> 
> to your mpirun command line.
> 
> Sorry for the confusion.
> 
> Howard
> 
> 
> 2015-01-09 7:52 GMT-07:00 Friedley, Andrew :
> 
> > No this is not expected behavior.
> >
> > The PSM MTL code has not changed in 2 months, when I fixed that unused
> > variable warning for you.  That suggests something above the PSM MTL broke
> > things.  I see no reason your older software install should suddenly
> > stopping working if all you are updating is OMPI master -- at least with
> > respect to PSM anyway.
> >
> > The error message is right, it's not possible to open more than one
> > context per process.  This hasn't changed.  It does indicate that maybe
> > something is causing the MTL to be opened twice in each process?
> >
> > Andrew
> >
> > > -Original Message-
> > > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Adrian
> > > Reber
> > > Sent: Friday, January 9, 2015 4:13 AM
> > > To: de...@open-mpi.org
> > > Subject: [OMPI devel] Changed behaviour with PSM on master
> > >
> > > Running the mpi_test_suite on master used to work with no problems. At
> > > some point in time it stopped working however and now I get only error
> > > messages from PSM:
> > >
> > > """
> > > n050301:3.0.In PSM version 1.14, it is not possible to open more than
> > one
> > > context per process
> > >
> > > [n050301:26526] Open MPI detected an unexpected PSM error in opening an
> > > endpoint: In PSM version 1.14, it is not possible to open more than one
> > > context per process """
> > >
> > > I know that I do not have the newest version of the PSM library and that
> > I
> > > need to update the library but as this requires many software packages
> > to be
> > > re-compiled we are trying to avoid it on our CentOS6 based system.
> > >
> > > My main question (probably for Andrew) is if this is an expected
> > behaviour
> > > on master. It works on 1.8.x and it used to work on master at least
> > until 2014-
> > > 12-08.
> > >
> > > This is the last MTT entry for working PSM (with my older version)
> > > http://mtt.open-mpi.org/index.php?do_redir=2226
> > >
> > > and since a few days it fails on master
> > > http://mtt.open-mpi.org/index.php?do_redir=2225
> > >
> > > On another system (RHEL7) with newer PSM libraries there is no such
> > error.
> > >
> > >   Adrian
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
> > http://www.open-mpi.org/community/lists/devel/2015/01/16766.php
> >

> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/01/16769.php


Re: [OMPI devel] Changed behaviour with PSM on master

2015-01-09 Thread Ralph Castain
I suspect it will have to be fixed at some point.

> On Jan 9, 2015, at 12:04 PM, Adrian Reber  wrote:
> 
> Should I still open a ticket? Will these be changed or do I always have
> to provide '--mca mtl psm' in the future?
> 
> On Fri, Jan 09, 2015 at 12:27:59PM -0700, Howard Pritchard wrote:
>> HI Adrian, Andrew,
>> 
>> Sorry try again,  both the libfabric psm provider and the open mpi psm
>> mtl are trying to use psm_init.
>> 
>> So, to avoid this problem, add
>> 
>> --mca mtl psm
>> 
>> to your mpirun command line.
>> 
>> Sorry for the confusion.
>> 
>> Howard
>> 
>> 
>> 2015-01-09 7:52 GMT-07:00 Friedley, Andrew :
>> 
>>> No this is not expected behavior.
>>> 
>>> The PSM MTL code has not changed in 2 months, when I fixed that unused
>>> variable warning for you.  That suggests something above the PSM MTL broke
>>> things.  I see no reason your older software install should suddenly
>>> stopping working if all you are updating is OMPI master -- at least with
>>> respect to PSM anyway.
>>> 
>>> The error message is right, it's not possible to open more than one
>>> context per process.  This hasn't changed.  It does indicate that maybe
>>> something is causing the MTL to be opened twice in each process?
>>> 
>>> Andrew
>>> 
 -Original Message-
 From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Adrian
 Reber
 Sent: Friday, January 9, 2015 4:13 AM
 To: de...@open-mpi.org
 Subject: [OMPI devel] Changed behaviour with PSM on master
 
 Running the mpi_test_suite on master used to work with no problems. At
 some point in time it stopped working however and now I get only error
 messages from PSM:
 
 """
 n050301:3.0.In PSM version 1.14, it is not possible to open more than
>>> one
 context per process
 
 [n050301:26526] Open MPI detected an unexpected PSM error in opening an
 endpoint: In PSM version 1.14, it is not possible to open more than one
 context per process """
 
 I know that I do not have the newest version of the PSM library and that
>>> I
 need to update the library but as this requires many software packages
>>> to be
 re-compiled we are trying to avoid it on our CentOS6 based system.
 
 My main question (probably for Andrew) is if this is an expected
>>> behaviour
 on master. It works on 1.8.x and it used to work on master at least
>>> until 2014-
 12-08.
 
 This is the last MTT entry for working PSM (with my older version)
 http://mtt.open-mpi.org/index.php?do_redir=2226
 
 and since a few days it fails on master
 http://mtt.open-mpi.org/index.php?do_redir=2225
 
 On another system (RHEL7) with newer PSM libraries there is no such
>>> error.
 
  Adrian
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/devel/2015/01/16766.php
>>> 
> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2015/01/16769.php
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/01/16770.php



Re: [OMPI devel] Changed behaviour with PSM on master

2015-01-09 Thread Howard Pritchard
HI Adrian,

Please open an issue.  We don't want users having to explicitly specify
the mtl to use just to get a job to run on a intel/infinipath system.

Howard

2015-01-09 13:04 GMT-07:00 Adrian Reber :

> Should I still open a ticket? Will these be changed or do I always have
> to provide '--mca mtl psm' in the future?
>
> On Fri, Jan 09, 2015 at 12:27:59PM -0700, Howard Pritchard wrote:
> > HI Adrian, Andrew,
> >
> > Sorry try again,  both the libfabric psm provider and the open mpi psm
> > mtl are trying to use psm_init.
> >
> > So, to avoid this problem, add
> >
> > --mca mtl psm
> >
> > to your mpirun command line.
> >
> > Sorry for the confusion.
> >
> > Howard
> >
> >
> > 2015-01-09 7:52 GMT-07:00 Friedley, Andrew :
> >
> > > No this is not expected behavior.
> > >
> > > The PSM MTL code has not changed in 2 months, when I fixed that unused
> > > variable warning for you.  That suggests something above the PSM MTL
> broke
> > > things.  I see no reason your older software install should suddenly
> > > stopping working if all you are updating is OMPI master -- at least
> with
> > > respect to PSM anyway.
> > >
> > > The error message is right, it's not possible to open more than one
> > > context per process.  This hasn't changed.  It does indicate that maybe
> > > something is causing the MTL to be opened twice in each process?
> > >
> > > Andrew
> > >
> > > > -Original Message-
> > > > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Adrian
> > > > Reber
> > > > Sent: Friday, January 9, 2015 4:13 AM
> > > > To: de...@open-mpi.org
> > > > Subject: [OMPI devel] Changed behaviour with PSM on master
> > > >
> > > > Running the mpi_test_suite on master used to work with no problems.
> At
> > > > some point in time it stopped working however and now I get only
> error
> > > > messages from PSM:
> > > >
> > > > """
> > > > n050301:3.0.In PSM version 1.14, it is not possible to open more
> than
> > > one
> > > > context per process
> > > >
> > > > [n050301:26526] Open MPI detected an unexpected PSM error in opening
> an
> > > > endpoint: In PSM version 1.14, it is not possible to open more than
> one
> > > > context per process """
> > > >
> > > > I know that I do not have the newest version of the PSM library and
> that
> > > I
> > > > need to update the library but as this requires many software
> packages
> > > to be
> > > > re-compiled we are trying to avoid it on our CentOS6 based system.
> > > >
> > > > My main question (probably for Andrew) is if this is an expected
> > > behaviour
> > > > on master. It works on 1.8.x and it used to work on master at least
> > > until 2014-
> > > > 12-08.
> > > >
> > > > This is the last MTT entry for working PSM (with my older version)
> > > > http://mtt.open-mpi.org/index.php?do_redir=2226
> > > >
> > > > and since a few days it fails on master
> > > > http://mtt.open-mpi.org/index.php?do_redir=2225
> > > >
> > > > On another system (RHEL7) with newer PSM libraries there is no such
> > > error.
> > > >
> > > >   Adrian
> > > ___
> > > devel mailing list
> > > de...@open-mpi.org
> > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > > Link to this post:
> > > http://www.open-mpi.org/community/lists/devel/2015/01/16766.php
> > >
>
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/01/16769.php
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/01/16770.php
>


Re: [OMPI devel] Changed behaviour with PSM on master

2015-01-09 Thread Jeff Squyres (jsquyres)
+1 -- someone should file a bug.

I think Intel needs to decide how they want to handle this (e.g., whether the 
PSM MTL or OFI MTL should be the default, and how the other can detect if it's 
not the default and therefore it's safe to call psm_init... or something like 
that).


On Jan 9, 2015, at 4:10 PM, Howard Pritchard  wrote:

> HI Adrian,
> 
> Please open an issue.  We don't want users having to explicitly specify
> the mtl to use just to get a job to run on a intel/infinipath system.
> 
> Howard
> 
> 2015-01-09 13:04 GMT-07:00 Adrian Reber :
> Should I still open a ticket? Will these be changed or do I always have
> to provide '--mca mtl psm' in the future?
> 
> On Fri, Jan 09, 2015 at 12:27:59PM -0700, Howard Pritchard wrote:
> > HI Adrian, Andrew,
> >
> > Sorry try again,  both the libfabric psm provider and the open mpi psm
> > mtl are trying to use psm_init.
> >
> > So, to avoid this problem, add
> >
> > --mca mtl psm
> >
> > to your mpirun command line.
> >
> > Sorry for the confusion.
> >
> > Howard
> >
> >
> > 2015-01-09 7:52 GMT-07:00 Friedley, Andrew :
> >
> > > No this is not expected behavior.
> > >
> > > The PSM MTL code has not changed in 2 months, when I fixed that unused
> > > variable warning for you.  That suggests something above the PSM MTL broke
> > > things.  I see no reason your older software install should suddenly
> > > stopping working if all you are updating is OMPI master -- at least with
> > > respect to PSM anyway.
> > >
> > > The error message is right, it's not possible to open more than one
> > > context per process.  This hasn't changed.  It does indicate that maybe
> > > something is causing the MTL to be opened twice in each process?
> > >
> > > Andrew
> > >
> > > > -Original Message-
> > > > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Adrian
> > > > Reber
> > > > Sent: Friday, January 9, 2015 4:13 AM
> > > > To: de...@open-mpi.org
> > > > Subject: [OMPI devel] Changed behaviour with PSM on master
> > > >
> > > > Running the mpi_test_suite on master used to work with no problems. At
> > > > some point in time it stopped working however and now I get only error
> > > > messages from PSM:
> > > >
> > > > """
> > > > n050301:3.0.In PSM version 1.14, it is not possible to open more than
> > > one
> > > > context per process
> > > >
> > > > [n050301:26526] Open MPI detected an unexpected PSM error in opening an
> > > > endpoint: In PSM version 1.14, it is not possible to open more than one
> > > > context per process """
> > > >
> > > > I know that I do not have the newest version of the PSM library and that
> > > I
> > > > need to update the library but as this requires many software packages
> > > to be
> > > > re-compiled we are trying to avoid it on our CentOS6 based system.
> > > >
> > > > My main question (probably for Andrew) is if this is an expected
> > > behaviour
> > > > on master. It works on 1.8.x and it used to work on master at least
> > > until 2014-
> > > > 12-08.
> > > >
> > > > This is the last MTT entry for working PSM (with my older version)
> > > > http://mtt.open-mpi.org/index.php?do_redir=2226
> > > >
> > > > and since a few days it fails on master
> > > > http://mtt.open-mpi.org/index.php?do_redir=2225
> > > >
> > > > On another system (RHEL7) with newer PSM libraries there is no such
> > > error.
> > > >
> > > >   Adrian
> > > ___
> > > devel mailing list
> > > de...@open-mpi.org
> > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > > Link to this post:
> > > http://www.open-mpi.org/community/lists/devel/2015/01/16766.php
> > >
> 
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post: 
> > http://www.open-mpi.org/community/lists/devel/2015/01/16769.php
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/01/16770.php
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/01/16772.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



[OMPI devel] #327

2015-01-09 Thread George Bosilca
I have some comments about this ticket and the corresponding patch.
Honestly, the patch lacks most of the things we have talked about during
our last developers meeting. However, my main concern in this particular
email is about the SIGNAL flag.

1. The fact that currently there is little difference between this flag and
PRIORITY is a fact that I would like to hear a justification for.

2. Moreover, right now SIGNAL is a purely PML decision. Again, we talked
about this and decided that the upper layer (this meant whoever is using
the PML) was to define this policy. We specifically said that this should
not be a PML level decision, because the PML lacks the means to take the
right decision about what should be signaled and what not. The current code
signals most of the PML control logic, including some of the matching logic
(but not all for some obscure reason). Based on my understanding of the
code, one didn't need to pollute the PML code for this, it could have just
used the PRIORITY flag instead.

Additionally, if my memory is good we decided that this should be
thoughtfully evaluated before pushing it into the trunk. And here
thoughtfully meant over multiple BTL and so on. Obviously, I missed the
email thread about the evaluation of this flag on UGNI. I guess I might not
be the only one, so I would really appreciate if someone can repost it
again.

  George.


Re: [OMPI devel] Changed behaviour with PSM on master

2015-01-09 Thread Burette, Yohann
Hi,

For those of you who don't know me, my name is Yohann Burette, I work for Intel 
and I contributed the OFI MTL.

AFAIK, the PSM MTL should have the priority over the OFI MTL.

Please excuse my ignorance but is there a way to express this priority in the 
MTLs? Here is what is in ompi/mca/mtl/base/mtl_base_frame.c:

/*
 * Function for selecting one component from all those that are
 * available.
 *
 * For now, we take the first component that says it can run.  Might
 * need to reexamine this at a later time.
 */
int
ompi_mtl_base_select(bool enable_progress_threads,
 bool enable_mpi_threads)

Am I missing anything?

Thanks in advance,
Yohann

-Original Message-
From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff Squyres 
(jsquyres)
Sent: Friday, January 09, 2015 1:27 PM
To: Open MPI Developers List
Subject: Re: [OMPI devel] Changed behaviour with PSM on master

+1 -- someone should file a bug.

I think Intel needs to decide how they want to handle this (e.g., whether the 
PSM MTL or OFI MTL should be the default, and how the other can detect if it's 
not the default and therefore it's safe to call psm_init... or something like 
that).


On Jan 9, 2015, at 4:10 PM, Howard Pritchard  wrote:

> HI Adrian,
> 
> Please open an issue.  We don't want users having to explicitly 
> specify the mtl to use just to get a job to run on a intel/infinipath system.
> 
> Howard
> 
> 2015-01-09 13:04 GMT-07:00 Adrian Reber :
> Should I still open a ticket? Will these be changed or do I always 
> have to provide '--mca mtl psm' in the future?
> 
> On Fri, Jan 09, 2015 at 12:27:59PM -0700, Howard Pritchard wrote:
> > HI Adrian, Andrew,
> >
> > Sorry try again,  both the libfabric psm provider and the open mpi 
> > psm mtl are trying to use psm_init.
> >
> > So, to avoid this problem, add
> >
> > --mca mtl psm
> >
> > to your mpirun command line.
> >
> > Sorry for the confusion.
> >
> > Howard
> >
> >
> > 2015-01-09 7:52 GMT-07:00 Friedley, Andrew :
> >
> > > No this is not expected behavior.
> > >
> > > The PSM MTL code has not changed in 2 months, when I fixed that 
> > > unused variable warning for you.  That suggests something above 
> > > the PSM MTL broke things.  I see no reason your older software 
> > > install should suddenly stopping working if all you are updating 
> > > is OMPI master -- at least with respect to PSM anyway.
> > >
> > > The error message is right, it's not possible to open more than 
> > > one context per process.  This hasn't changed.  It does indicate 
> > > that maybe something is causing the MTL to be opened twice in each 
> > > process?
> > >
> > > Andrew
> > >
> > > > -Original Message-
> > > > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of 
> > > > Adrian Reber
> > > > Sent: Friday, January 9, 2015 4:13 AM
> > > > To: de...@open-mpi.org
> > > > Subject: [OMPI devel] Changed behaviour with PSM on master
> > > >
> > > > Running the mpi_test_suite on master used to work with no 
> > > > problems. At some point in time it stopped working however and 
> > > > now I get only error messages from PSM:
> > > >
> > > > """
> > > > n050301:3.0.In PSM version 1.14, it is not possible to open more 
> > > > than
> > > one
> > > > context per process
> > > >
> > > > [n050301:26526] Open MPI detected an unexpected PSM error in 
> > > > opening an
> > > > endpoint: In PSM version 1.14, it is not possible to open more 
> > > > than one context per process """
> > > >
> > > > I know that I do not have the newest version of the PSM library 
> > > > and that
> > > I
> > > > need to update the library but as this requires many software 
> > > > packages
> > > to be
> > > > re-compiled we are trying to avoid it on our CentOS6 based system.
> > > >
> > > > My main question (probably for Andrew) is if this is an expected
> > > behaviour
> > > > on master. It works on 1.8.x and it used to work on master at 
> > > > least
> > > until 2014-
> > > > 12-08.
> > > >
> > > > This is the last MTT entry for working PSM (with my older 
> > > > version)
> > > > http://mtt.open-mpi.org/index.php?do_redir=2226
> > > >
> > > > and since a few days it fails on master
> > > > http://mtt.open-mpi.org/index.php?do_redir=2225
> > > >
> > > > On another system (RHEL7) with newer PSM libraries there is no 
> > > > such
> > > error.
> > > >
> > > >   Adrian
> > > ___
> > > devel mailing list
> > > de...@open-mpi.org
> > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > > Link to this post:
> > > http://www.open-mpi.org/community/lists/devel/2015/01/16766.php
> > >
> 
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post: 
> > http://www.open-mpi.org/community/lists/devel/2015/01/16769.php
> ___
> devel mailing list
> de