Re: [OMPI devel] test/class/opal_fifo failure on ppc64
Thanks. mtt on my ppc64 system is happy again. On Thu, Jan 08, 2015 at 09:16:43AM -0700, Nathan Hjelm wrote: > > Fixed on master. I forgot a write memory barrier in the 64-bit version > of opal_fifo_pop_atomic. > > -Nathan > > On Thu, Jan 08, 2015 at 02:29:05PM +0100, Adrian Reber wrote: > > I am trying to build OMPI git master on ppc64 (PPC970MP) and > > test/class/opal_fifo fails during make check most of the time. > > > > [adrian@bimini class]$ ./opal_fifo > > Single thread test. Time: 0 s 99714 us 99 nsec/poppush > > Atomics thread finished. Time: 0 s 347577 us 347 nsec/poppush > > Atomics thread finished. Time: 11 s 490743 us 11490 nsec/poppush > > Atomics thread finished. Time: 11 s 567542 us 11567 nsec/poppush > > Atomics thread finished. Time: 11 s 655924 us 11655 nsec/poppush > > Atomics thread finished. Time: 11 s 786925 us 11786 nsec/poppush > > Atomics thread finished. Time: 11 s 931230 us 11931 nsec/poppush > > Atomics thread finished. Time: 12 s 11617 us 12011 nsec/poppush > > Atomics thread finished. Time: 12 s 63224 us 12063 nsec/poppush > > Atomics thread finished. Time: 12 s 65844 us 12065 nsec/poppush > > Failure : fifo push/pop multi-threaded with atomics > > All threads finished. Thread count: 8 Time: 12 s 66103 us 1508 nsec/poppush > > Exhaustive atomics thread finished. Popped 11982 items. Time: 3 s 700703 us > > 308855 nsec/poppush > > Exhaustive atomics thread finished. Popped 12171 items. Time: 3 s 759974 us > > 308928 nsec/poppush > > Exhaustive atomics thread finished. Popped 11593 items. Time: 3 s 787227 us > > 326682 nsec/poppush > > Exhaustive atomics thread finished. Popped 11079 items. Time: 3 s 786468 us > > 341769 nsec/poppush > > Exhaustive atomics thread finished. Popped 16467 items. Time: 4 s 7891 us > > 243389 nsec/poppush > > Exhaustive atomics thread finished. Popped 11097 items. Time: 4 s 68897 us > > 36 nsec/poppush > > Exhaustive atomics thread finished. Popped 25583 items. Time: 4 s 89074 us > > 159835 nsec/poppush > > Exhaustive atomics thread finished. Popped 22092 items. Time: 4 s 82373 us > > 184789 nsec/poppush > > Failure : fifo push/pop multi-threaded with atomics when there are > > insufficient items > > All threads finished. Thread count: 8 Time: 4 s 93369 us 511 nsec/poppush > > Failure : fifo pop all items > > SUPPORT: OMPI Test failed: opal_fifo_t (3 of 8 failed) > > > > I had a look at the memory barriers in > > opal/include/opal/sys/powerpc/atomic.h > > and from what little I remember about PPC64 those look correct: > > > > #define MB() __asm__ __volatile__ ("sync" : : : "memory") > > #define RMB() __asm__ __volatile__ ("lwsync" : : : "memory") > > #define WMB() __asm__ __volatile__ ("eieio" : : : "memory") > > > > The system is running Fedora 21 with gcc 4.9.2 and if this platform > > is still relevant I can provide SSH access to the machine > > for further debugging. > > > > Adrian > > ___ > > devel mailing list > > de...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > > Link to this post: > > http://www.open-mpi.org/community/lists/devel/2015/01/16760.php > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/01/16762.php pgpbtrbB6LSB9.pgp Description: PGP signature
[OMPI devel] Changed behaviour with PSM on master
Running the mpi_test_suite on master used to work with no problems. At some point in time it stopped working however and now I get only error messages from PSM: """ n050301:3.0.In PSM version 1.14, it is not possible to open more than one context per process [n050301:26526] Open MPI detected an unexpected PSM error in opening an endpoint: In PSM version 1.14, it is not possible to open more than one context per process """ I know that I do not have the newest version of the PSM library and that I need to update the library but as this requires many software packages to be re-compiled we are trying to avoid it on our CentOS6 based system. My main question (probably for Andrew) is if this is an expected behaviour on master. It works on 1.8.x and it used to work on master at least until 2014-12-08. This is the last MTT entry for working PSM (with my older version) http://mtt.open-mpi.org/index.php?do_redir=2226 and since a few days it fails on master http://mtt.open-mpi.org/index.php?do_redir=2225 On another system (RHEL7) with newer PSM libraries there is no such error. Adrian pgp96nLKZ43Qe.pgp Description: PGP signature
Re: [OMPI devel] Changed behaviour with PSM on master
No this is not expected behavior. The PSM MTL code has not changed in 2 months, when I fixed that unused variable warning for you. That suggests something above the PSM MTL broke things. I see no reason your older software install should suddenly stopping working if all you are updating is OMPI master -- at least with respect to PSM anyway. The error message is right, it's not possible to open more than one context per process. This hasn't changed. It does indicate that maybe something is causing the MTL to be opened twice in each process? Andrew > -Original Message- > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Adrian > Reber > Sent: Friday, January 9, 2015 4:13 AM > To: de...@open-mpi.org > Subject: [OMPI devel] Changed behaviour with PSM on master > > Running the mpi_test_suite on master used to work with no problems. At > some point in time it stopped working however and now I get only error > messages from PSM: > > """ > n050301:3.0.In PSM version 1.14, it is not possible to open more than one > context per process > > [n050301:26526] Open MPI detected an unexpected PSM error in opening an > endpoint: In PSM version 1.14, it is not possible to open more than one > context per process """ > > I know that I do not have the newest version of the PSM library and that I > need to update the library but as this requires many software packages to be > re-compiled we are trying to avoid it on our CentOS6 based system. > > My main question (probably for Andrew) is if this is an expected behaviour > on master. It works on 1.8.x and it used to work on master at least until > 2014- > 12-08. > > This is the last MTT entry for working PSM (with my older version) > http://mtt.open-mpi.org/index.php?do_redir=2226 > > and since a few days it fails on master > http://mtt.open-mpi.org/index.php?do_redir=2225 > > On another system (RHEL7) with newer PSM libraries there is no such error. > > Adrian
Re: [OMPI devel] Changed behaviour with PSM on master
Hi Adrian and Andrew, I"m able to reproduce your problem on one of our qlogic clusters. We are using PSM 1.14 and slurm. I'm noticing that for some reason in our setup the ORTE_MCA_orte_precondition_transports env. variable is not being set. Could you run your test with --mca odls_base_verbose 100 and check to see that in fact that env. variable isn't in the list of passed env. variables? Would one of you mind opening an issue to track this problem? Thanks, Howard 2015-01-09 7:52 GMT-07:00 Friedley, Andrew : > No this is not expected behavior. > > The PSM MTL code has not changed in 2 months, when I fixed that unused > variable warning for you. That suggests something above the PSM MTL broke > things. I see no reason your older software install should suddenly > stopping working if all you are updating is OMPI master -- at least with > respect to PSM anyway. > > The error message is right, it's not possible to open more than one > context per process. This hasn't changed. It does indicate that maybe > something is causing the MTL to be opened twice in each process? > > Andrew > > > -Original Message- > > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Adrian > > Reber > > Sent: Friday, January 9, 2015 4:13 AM > > To: de...@open-mpi.org > > Subject: [OMPI devel] Changed behaviour with PSM on master > > > > Running the mpi_test_suite on master used to work with no problems. At > > some point in time it stopped working however and now I get only error > > messages from PSM: > > > > """ > > n050301:3.0.In PSM version 1.14, it is not possible to open more than > one > > context per process > > > > [n050301:26526] Open MPI detected an unexpected PSM error in opening an > > endpoint: In PSM version 1.14, it is not possible to open more than one > > context per process """ > > > > I know that I do not have the newest version of the PSM library and that > I > > need to update the library but as this requires many software packages > to be > > re-compiled we are trying to avoid it on our CentOS6 based system. > > > > My main question (probably for Andrew) is if this is an expected > behaviour > > on master. It works on 1.8.x and it used to work on master at least > until 2014- > > 12-08. > > > > This is the last MTT entry for working PSM (with my older version) > > http://mtt.open-mpi.org/index.php?do_redir=2226 > > > > and since a few days it fails on master > > http://mtt.open-mpi.org/index.php?do_redir=2225 > > > > On another system (RHEL7) with newer PSM libraries there is no such > error. > > > > Adrian > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/01/16766.php >
Re: [OMPI devel] Changed behaviour with PSM on master
HI Folks, Sorry for my stupidity. I now see the problem. App is calling pmi_init twice because of the new ofiwg libfabric mtl. You can try mpirun blah blah blah --mca btl and things should work. Howard 2015-01-09 7:52 GMT-07:00 Friedley, Andrew : > No this is not expected behavior. > > The PSM MTL code has not changed in 2 months, when I fixed that unused > variable warning for you. That suggests something above the PSM MTL broke > things. I see no reason your older software install should suddenly > stopping working if all you are updating is OMPI master -- at least with > respect to PSM anyway. > > The error message is right, it's not possible to open more than one > context per process. This hasn't changed. It does indicate that maybe > something is causing the MTL to be opened twice in each process? > > Andrew > > > -Original Message- > > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Adrian > > Reber > > Sent: Friday, January 9, 2015 4:13 AM > > To: de...@open-mpi.org > > Subject: [OMPI devel] Changed behaviour with PSM on master > > > > Running the mpi_test_suite on master used to work with no problems. At > > some point in time it stopped working however and now I get only error > > messages from PSM: > > > > """ > > n050301:3.0.In PSM version 1.14, it is not possible to open more than > one > > context per process > > > > [n050301:26526] Open MPI detected an unexpected PSM error in opening an > > endpoint: In PSM version 1.14, it is not possible to open more than one > > context per process """ > > > > I know that I do not have the newest version of the PSM library and that > I > > need to update the library but as this requires many software packages > to be > > re-compiled we are trying to avoid it on our CentOS6 based system. > > > > My main question (probably for Andrew) is if this is an expected > behaviour > > on master. It works on 1.8.x and it used to work on master at least > until 2014- > > 12-08. > > > > This is the last MTT entry for working PSM (with my older version) > > http://mtt.open-mpi.org/index.php?do_redir=2226 > > > > and since a few days it fails on master > > http://mtt.open-mpi.org/index.php?do_redir=2225 > > > > On another system (RHEL7) with newer PSM libraries there is no such > error. > > > > Adrian > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/01/16766.php >
Re: [OMPI devel] Changed behaviour with PSM on master
HI Adrian, Andrew, Sorry try again, both the libfabric psm provider and the open mpi psm mtl are trying to use psm_init. So, to avoid this problem, add --mca mtl psm to your mpirun command line. Sorry for the confusion. Howard 2015-01-09 7:52 GMT-07:00 Friedley, Andrew : > No this is not expected behavior. > > The PSM MTL code has not changed in 2 months, when I fixed that unused > variable warning for you. That suggests something above the PSM MTL broke > things. I see no reason your older software install should suddenly > stopping working if all you are updating is OMPI master -- at least with > respect to PSM anyway. > > The error message is right, it's not possible to open more than one > context per process. This hasn't changed. It does indicate that maybe > something is causing the MTL to be opened twice in each process? > > Andrew > > > -Original Message- > > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Adrian > > Reber > > Sent: Friday, January 9, 2015 4:13 AM > > To: de...@open-mpi.org > > Subject: [OMPI devel] Changed behaviour with PSM on master > > > > Running the mpi_test_suite on master used to work with no problems. At > > some point in time it stopped working however and now I get only error > > messages from PSM: > > > > """ > > n050301:3.0.In PSM version 1.14, it is not possible to open more than > one > > context per process > > > > [n050301:26526] Open MPI detected an unexpected PSM error in opening an > > endpoint: In PSM version 1.14, it is not possible to open more than one > > context per process """ > > > > I know that I do not have the newest version of the PSM library and that > I > > need to update the library but as this requires many software packages > to be > > re-compiled we are trying to avoid it on our CentOS6 based system. > > > > My main question (probably for Andrew) is if this is an expected > behaviour > > on master. It works on 1.8.x and it used to work on master at least > until 2014- > > 12-08. > > > > This is the last MTT entry for working PSM (with my older version) > > http://mtt.open-mpi.org/index.php?do_redir=2226 > > > > and since a few days it fails on master > > http://mtt.open-mpi.org/index.php?do_redir=2225 > > > > On another system (RHEL7) with newer PSM libraries there is no such > error. > > > > Adrian > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/01/16766.php >
Re: [OMPI devel] Changed behaviour with PSM on master
Should I still open a ticket? Will these be changed or do I always have to provide '--mca mtl psm' in the future? On Fri, Jan 09, 2015 at 12:27:59PM -0700, Howard Pritchard wrote: > HI Adrian, Andrew, > > Sorry try again, both the libfabric psm provider and the open mpi psm > mtl are trying to use psm_init. > > So, to avoid this problem, add > > --mca mtl psm > > to your mpirun command line. > > Sorry for the confusion. > > Howard > > > 2015-01-09 7:52 GMT-07:00 Friedley, Andrew : > > > No this is not expected behavior. > > > > The PSM MTL code has not changed in 2 months, when I fixed that unused > > variable warning for you. That suggests something above the PSM MTL broke > > things. I see no reason your older software install should suddenly > > stopping working if all you are updating is OMPI master -- at least with > > respect to PSM anyway. > > > > The error message is right, it's not possible to open more than one > > context per process. This hasn't changed. It does indicate that maybe > > something is causing the MTL to be opened twice in each process? > > > > Andrew > > > > > -Original Message- > > > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Adrian > > > Reber > > > Sent: Friday, January 9, 2015 4:13 AM > > > To: de...@open-mpi.org > > > Subject: [OMPI devel] Changed behaviour with PSM on master > > > > > > Running the mpi_test_suite on master used to work with no problems. At > > > some point in time it stopped working however and now I get only error > > > messages from PSM: > > > > > > """ > > > n050301:3.0.In PSM version 1.14, it is not possible to open more than > > one > > > context per process > > > > > > [n050301:26526] Open MPI detected an unexpected PSM error in opening an > > > endpoint: In PSM version 1.14, it is not possible to open more than one > > > context per process """ > > > > > > I know that I do not have the newest version of the PSM library and that > > I > > > need to update the library but as this requires many software packages > > to be > > > re-compiled we are trying to avoid it on our CentOS6 based system. > > > > > > My main question (probably for Andrew) is if this is an expected > > behaviour > > > on master. It works on 1.8.x and it used to work on master at least > > until 2014- > > > 12-08. > > > > > > This is the last MTT entry for working PSM (with my older version) > > > http://mtt.open-mpi.org/index.php?do_redir=2226 > > > > > > and since a few days it fails on master > > > http://mtt.open-mpi.org/index.php?do_redir=2225 > > > > > > On another system (RHEL7) with newer PSM libraries there is no such > > error. > > > > > > Adrian > > ___ > > devel mailing list > > de...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > > Link to this post: > > http://www.open-mpi.org/community/lists/devel/2015/01/16766.php > > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/01/16769.php
Re: [OMPI devel] Changed behaviour with PSM on master
I suspect it will have to be fixed at some point. > On Jan 9, 2015, at 12:04 PM, Adrian Reber wrote: > > Should I still open a ticket? Will these be changed or do I always have > to provide '--mca mtl psm' in the future? > > On Fri, Jan 09, 2015 at 12:27:59PM -0700, Howard Pritchard wrote: >> HI Adrian, Andrew, >> >> Sorry try again, both the libfabric psm provider and the open mpi psm >> mtl are trying to use psm_init. >> >> So, to avoid this problem, add >> >> --mca mtl psm >> >> to your mpirun command line. >> >> Sorry for the confusion. >> >> Howard >> >> >> 2015-01-09 7:52 GMT-07:00 Friedley, Andrew : >> >>> No this is not expected behavior. >>> >>> The PSM MTL code has not changed in 2 months, when I fixed that unused >>> variable warning for you. That suggests something above the PSM MTL broke >>> things. I see no reason your older software install should suddenly >>> stopping working if all you are updating is OMPI master -- at least with >>> respect to PSM anyway. >>> >>> The error message is right, it's not possible to open more than one >>> context per process. This hasn't changed. It does indicate that maybe >>> something is causing the MTL to be opened twice in each process? >>> >>> Andrew >>> -Original Message- From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Adrian Reber Sent: Friday, January 9, 2015 4:13 AM To: de...@open-mpi.org Subject: [OMPI devel] Changed behaviour with PSM on master Running the mpi_test_suite on master used to work with no problems. At some point in time it stopped working however and now I get only error messages from PSM: """ n050301:3.0.In PSM version 1.14, it is not possible to open more than >>> one context per process [n050301:26526] Open MPI detected an unexpected PSM error in opening an endpoint: In PSM version 1.14, it is not possible to open more than one context per process """ I know that I do not have the newest version of the PSM library and that >>> I need to update the library but as this requires many software packages >>> to be re-compiled we are trying to avoid it on our CentOS6 based system. My main question (probably for Andrew) is if this is an expected >>> behaviour on master. It works on 1.8.x and it used to work on master at least >>> until 2014- 12-08. This is the last MTT entry for working PSM (with my older version) http://mtt.open-mpi.org/index.php?do_redir=2226 and since a few days it fails on master http://mtt.open-mpi.org/index.php?do_redir=2225 On another system (RHEL7) with newer PSM libraries there is no such >>> error. Adrian >>> ___ >>> devel mailing list >>> de...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> Link to this post: >>> http://www.open-mpi.org/community/lists/devel/2015/01/16766.php >>> > >> ___ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2015/01/16769.php > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/01/16770.php
Re: [OMPI devel] Changed behaviour with PSM on master
HI Adrian, Please open an issue. We don't want users having to explicitly specify the mtl to use just to get a job to run on a intel/infinipath system. Howard 2015-01-09 13:04 GMT-07:00 Adrian Reber : > Should I still open a ticket? Will these be changed or do I always have > to provide '--mca mtl psm' in the future? > > On Fri, Jan 09, 2015 at 12:27:59PM -0700, Howard Pritchard wrote: > > HI Adrian, Andrew, > > > > Sorry try again, both the libfabric psm provider and the open mpi psm > > mtl are trying to use psm_init. > > > > So, to avoid this problem, add > > > > --mca mtl psm > > > > to your mpirun command line. > > > > Sorry for the confusion. > > > > Howard > > > > > > 2015-01-09 7:52 GMT-07:00 Friedley, Andrew : > > > > > No this is not expected behavior. > > > > > > The PSM MTL code has not changed in 2 months, when I fixed that unused > > > variable warning for you. That suggests something above the PSM MTL > broke > > > things. I see no reason your older software install should suddenly > > > stopping working if all you are updating is OMPI master -- at least > with > > > respect to PSM anyway. > > > > > > The error message is right, it's not possible to open more than one > > > context per process. This hasn't changed. It does indicate that maybe > > > something is causing the MTL to be opened twice in each process? > > > > > > Andrew > > > > > > > -Original Message- > > > > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Adrian > > > > Reber > > > > Sent: Friday, January 9, 2015 4:13 AM > > > > To: de...@open-mpi.org > > > > Subject: [OMPI devel] Changed behaviour with PSM on master > > > > > > > > Running the mpi_test_suite on master used to work with no problems. > At > > > > some point in time it stopped working however and now I get only > error > > > > messages from PSM: > > > > > > > > """ > > > > n050301:3.0.In PSM version 1.14, it is not possible to open more > than > > > one > > > > context per process > > > > > > > > [n050301:26526] Open MPI detected an unexpected PSM error in opening > an > > > > endpoint: In PSM version 1.14, it is not possible to open more than > one > > > > context per process """ > > > > > > > > I know that I do not have the newest version of the PSM library and > that > > > I > > > > need to update the library but as this requires many software > packages > > > to be > > > > re-compiled we are trying to avoid it on our CentOS6 based system. > > > > > > > > My main question (probably for Andrew) is if this is an expected > > > behaviour > > > > on master. It works on 1.8.x and it used to work on master at least > > > until 2014- > > > > 12-08. > > > > > > > > This is the last MTT entry for working PSM (with my older version) > > > > http://mtt.open-mpi.org/index.php?do_redir=2226 > > > > > > > > and since a few days it fails on master > > > > http://mtt.open-mpi.org/index.php?do_redir=2225 > > > > > > > > On another system (RHEL7) with newer PSM libraries there is no such > > > error. > > > > > > > > Adrian > > > ___ > > > devel mailing list > > > de...@open-mpi.org > > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > Link to this post: > > > http://www.open-mpi.org/community/lists/devel/2015/01/16766.php > > > > > > ___ > > devel mailing list > > de...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/01/16769.php > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/01/16770.php >
Re: [OMPI devel] Changed behaviour with PSM on master
+1 -- someone should file a bug. I think Intel needs to decide how they want to handle this (e.g., whether the PSM MTL or OFI MTL should be the default, and how the other can detect if it's not the default and therefore it's safe to call psm_init... or something like that). On Jan 9, 2015, at 4:10 PM, Howard Pritchard wrote: > HI Adrian, > > Please open an issue. We don't want users having to explicitly specify > the mtl to use just to get a job to run on a intel/infinipath system. > > Howard > > 2015-01-09 13:04 GMT-07:00 Adrian Reber : > Should I still open a ticket? Will these be changed or do I always have > to provide '--mca mtl psm' in the future? > > On Fri, Jan 09, 2015 at 12:27:59PM -0700, Howard Pritchard wrote: > > HI Adrian, Andrew, > > > > Sorry try again, both the libfabric psm provider and the open mpi psm > > mtl are trying to use psm_init. > > > > So, to avoid this problem, add > > > > --mca mtl psm > > > > to your mpirun command line. > > > > Sorry for the confusion. > > > > Howard > > > > > > 2015-01-09 7:52 GMT-07:00 Friedley, Andrew : > > > > > No this is not expected behavior. > > > > > > The PSM MTL code has not changed in 2 months, when I fixed that unused > > > variable warning for you. That suggests something above the PSM MTL broke > > > things. I see no reason your older software install should suddenly > > > stopping working if all you are updating is OMPI master -- at least with > > > respect to PSM anyway. > > > > > > The error message is right, it's not possible to open more than one > > > context per process. This hasn't changed. It does indicate that maybe > > > something is causing the MTL to be opened twice in each process? > > > > > > Andrew > > > > > > > -Original Message- > > > > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Adrian > > > > Reber > > > > Sent: Friday, January 9, 2015 4:13 AM > > > > To: de...@open-mpi.org > > > > Subject: [OMPI devel] Changed behaviour with PSM on master > > > > > > > > Running the mpi_test_suite on master used to work with no problems. At > > > > some point in time it stopped working however and now I get only error > > > > messages from PSM: > > > > > > > > """ > > > > n050301:3.0.In PSM version 1.14, it is not possible to open more than > > > one > > > > context per process > > > > > > > > [n050301:26526] Open MPI detected an unexpected PSM error in opening an > > > > endpoint: In PSM version 1.14, it is not possible to open more than one > > > > context per process """ > > > > > > > > I know that I do not have the newest version of the PSM library and that > > > I > > > > need to update the library but as this requires many software packages > > > to be > > > > re-compiled we are trying to avoid it on our CentOS6 based system. > > > > > > > > My main question (probably for Andrew) is if this is an expected > > > behaviour > > > > on master. It works on 1.8.x and it used to work on master at least > > > until 2014- > > > > 12-08. > > > > > > > > This is the last MTT entry for working PSM (with my older version) > > > > http://mtt.open-mpi.org/index.php?do_redir=2226 > > > > > > > > and since a few days it fails on master > > > > http://mtt.open-mpi.org/index.php?do_redir=2225 > > > > > > > > On another system (RHEL7) with newer PSM libraries there is no such > > > error. > > > > > > > > Adrian > > > ___ > > > devel mailing list > > > de...@open-mpi.org > > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > Link to this post: > > > http://www.open-mpi.org/community/lists/devel/2015/01/16766.php > > > > > > ___ > > devel mailing list > > de...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > > Link to this post: > > http://www.open-mpi.org/community/lists/devel/2015/01/16769.php > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/01/16770.php > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/01/16772.php -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
[OMPI devel] #327
I have some comments about this ticket and the corresponding patch. Honestly, the patch lacks most of the things we have talked about during our last developers meeting. However, my main concern in this particular email is about the SIGNAL flag. 1. The fact that currently there is little difference between this flag and PRIORITY is a fact that I would like to hear a justification for. 2. Moreover, right now SIGNAL is a purely PML decision. Again, we talked about this and decided that the upper layer (this meant whoever is using the PML) was to define this policy. We specifically said that this should not be a PML level decision, because the PML lacks the means to take the right decision about what should be signaled and what not. The current code signals most of the PML control logic, including some of the matching logic (but not all for some obscure reason). Based on my understanding of the code, one didn't need to pollute the PML code for this, it could have just used the PRIORITY flag instead. Additionally, if my memory is good we decided that this should be thoughtfully evaluated before pushing it into the trunk. And here thoughtfully meant over multiple BTL and so on. Obviously, I missed the email thread about the evaluation of this flag on UGNI. I guess I might not be the only one, so I would really appreciate if someone can repost it again. George.
Re: [OMPI devel] Changed behaviour with PSM on master
Hi, For those of you who don't know me, my name is Yohann Burette, I work for Intel and I contributed the OFI MTL. AFAIK, the PSM MTL should have the priority over the OFI MTL. Please excuse my ignorance but is there a way to express this priority in the MTLs? Here is what is in ompi/mca/mtl/base/mtl_base_frame.c: /* * Function for selecting one component from all those that are * available. * * For now, we take the first component that says it can run. Might * need to reexamine this at a later time. */ int ompi_mtl_base_select(bool enable_progress_threads, bool enable_mpi_threads) Am I missing anything? Thanks in advance, Yohann -Original Message- From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff Squyres (jsquyres) Sent: Friday, January 09, 2015 1:27 PM To: Open MPI Developers List Subject: Re: [OMPI devel] Changed behaviour with PSM on master +1 -- someone should file a bug. I think Intel needs to decide how they want to handle this (e.g., whether the PSM MTL or OFI MTL should be the default, and how the other can detect if it's not the default and therefore it's safe to call psm_init... or something like that). On Jan 9, 2015, at 4:10 PM, Howard Pritchard wrote: > HI Adrian, > > Please open an issue. We don't want users having to explicitly > specify the mtl to use just to get a job to run on a intel/infinipath system. > > Howard > > 2015-01-09 13:04 GMT-07:00 Adrian Reber : > Should I still open a ticket? Will these be changed or do I always > have to provide '--mca mtl psm' in the future? > > On Fri, Jan 09, 2015 at 12:27:59PM -0700, Howard Pritchard wrote: > > HI Adrian, Andrew, > > > > Sorry try again, both the libfabric psm provider and the open mpi > > psm mtl are trying to use psm_init. > > > > So, to avoid this problem, add > > > > --mca mtl psm > > > > to your mpirun command line. > > > > Sorry for the confusion. > > > > Howard > > > > > > 2015-01-09 7:52 GMT-07:00 Friedley, Andrew : > > > > > No this is not expected behavior. > > > > > > The PSM MTL code has not changed in 2 months, when I fixed that > > > unused variable warning for you. That suggests something above > > > the PSM MTL broke things. I see no reason your older software > > > install should suddenly stopping working if all you are updating > > > is OMPI master -- at least with respect to PSM anyway. > > > > > > The error message is right, it's not possible to open more than > > > one context per process. This hasn't changed. It does indicate > > > that maybe something is causing the MTL to be opened twice in each > > > process? > > > > > > Andrew > > > > > > > -Original Message- > > > > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of > > > > Adrian Reber > > > > Sent: Friday, January 9, 2015 4:13 AM > > > > To: de...@open-mpi.org > > > > Subject: [OMPI devel] Changed behaviour with PSM on master > > > > > > > > Running the mpi_test_suite on master used to work with no > > > > problems. At some point in time it stopped working however and > > > > now I get only error messages from PSM: > > > > > > > > """ > > > > n050301:3.0.In PSM version 1.14, it is not possible to open more > > > > than > > > one > > > > context per process > > > > > > > > [n050301:26526] Open MPI detected an unexpected PSM error in > > > > opening an > > > > endpoint: In PSM version 1.14, it is not possible to open more > > > > than one context per process """ > > > > > > > > I know that I do not have the newest version of the PSM library > > > > and that > > > I > > > > need to update the library but as this requires many software > > > > packages > > > to be > > > > re-compiled we are trying to avoid it on our CentOS6 based system. > > > > > > > > My main question (probably for Andrew) is if this is an expected > > > behaviour > > > > on master. It works on 1.8.x and it used to work on master at > > > > least > > > until 2014- > > > > 12-08. > > > > > > > > This is the last MTT entry for working PSM (with my older > > > > version) > > > > http://mtt.open-mpi.org/index.php?do_redir=2226 > > > > > > > > and since a few days it fails on master > > > > http://mtt.open-mpi.org/index.php?do_redir=2225 > > > > > > > > On another system (RHEL7) with newer PSM libraries there is no > > > > such > > > error. > > > > > > > > Adrian > > > ___ > > > devel mailing list > > > de...@open-mpi.org > > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > Link to this post: > > > http://www.open-mpi.org/community/lists/devel/2015/01/16766.php > > > > > > ___ > > devel mailing list > > de...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > > Link to this post: > > http://www.open-mpi.org/community/lists/devel/2015/01/16769.php > ___ > devel mailing list > de