Re: [OMPI devel] "maybe" issue in 1.8.5rc[23]

2015-04-24 Thread Nysal Jan K A
Yeah, I remember this one. Its a bug in that specific version of the
compiler. I had reported it to the compiler team a couple of years back.

Quoting from the email I sent them:
The "stw r0,0(r31)" probably overwrites the previous stack pointer ?

static inline int opal_atomic_cmpset_32(volatile int32_t *addr,
1580:   94 21 ff c0 stwur1,-64(r1)
1584:   93 e1 00 3c stw r31,60(r1)
1588:   7c 3f 0b 78 mr  r31,r1
158c:   90 7f 00 24 stw r3,36(r31)
1590:   90 9f 00 28 stw r4,40(r31)
1594:   90 bf 00 2c stw r5,44(r31)
int32_t oldval, int32_t newval)
{
   int32_t ret;

   __asm__ __volatile__ (
1598:   80 9f 00 28 lwz r4,40(r31)
159c:   80 7f 00 2c lwz r3,44(r31)
15a0:   80 1f 00 24 lwz r0,36(r31)
*15a4:   90 1f 00 00 stw r0,0(r31)*
15a8:   90 1f 00 04 stw r0,4(r31)
15ac:   90 9f 00 08 stw r4,8(r31)
15b0:   90 7f 00 0c stw r3,12(r31)
15b4:   90 1f 00 10 stw r0,16(r31)
15b8:   80 7f 00 04 lwz r3,4(r31)
15bc:   7c 80 18 28 lwarx   r4,0,r3
15c0:   80 1f 00 08 lwz r0,8(r31)
15c4:   7c 04 00 00 cmpwr4,r0
15c8:   90 9f 00 14 stw r4,20(r31)
15cc:   90 7f 00 04 stw r3,4(r31)
15d0:   90 1f 00 08 stw r0,8(r31)
15d4:   40 82 00 1c bne-15f0

15d8:   80 1f 00 0c lwz r0,12(r31)
15dc:   80 7f 00 04 lwz r3,4(r31)
15e0:   7c 00 19 2d stwcx.  r0,0,r3

Regards
--Nysal

On Fri, Apr 24, 2015 at 5:06 AM, Paul Hargrove  wrote:

> Exhibit 1: the smoking gun
>
> Program terminated with signal 11, Segmentation fault.
> #0  0x0fffa4d6f184 in opal_atomic_cmpset_acq_32 (addr=Cannot access
> memory at address 0xd8
> )
> at
> /home/hargrov1/OMPI/openmpi-1.8.5rc3-linux-ppc64-xlc-11.1/openmpi-1.8.5rc3/opal/include/opal/sys/powerpc/atomic.h:158
>
>
> So, this is a new symptom of the known inability of this compiler to get
> the inline asm right.
>
> Sorry for the false alarm,
> -Paul
>
> On Thu, Apr 23, 2015 at 4:09 PM, Paul Hargrove  wrote:
>
>> I have a system w/ xlc-11.1.
>> It has essentially always failed "make check" in a LP64 build due to xlc
>> botching the atomics.
>> So, when it failed with 1.8.5.rc2 I didn't look closely.
>>
>> Today it has failed with rc3 and I *did* look closely and here is what I
>> see:
>>
>> PASS: predefined_gap_test
>> /bin/sh: line 5: 39766 Segmentation fault  ${dir}$tst
>> FAIL: dlopen_test
>> 
>> 1 of 2 tests failed
>> Please report to http://www.open-mpi.org/community/help/
>> 
>>
>> I also see the same in the rc2 results I hadn't examined closely before.
>> Meanwhile the rc1 failure was the known atomics-related one.
>>
>> So, UNLESS I find that the dlopen_test failure is related to the atomics
>> or some other problem specific to xlc, this may be a new issue related to
>> the elimination of the built-in libltdl.  Note that this system.
>>
>> Here's hoping this is a new symptom, and not a new problem.
>>
>> -Paul
>>
>> --
>> Paul H. Hargrove  phhargr...@lbl.gov
>> Computer Languages & Systems Software (CLaSS) Group
>> Computer Science Department   Tel: +1-510-495-2352
>> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>>
>
>
>
> --
> Paul H. Hargrove  phhargr...@lbl.gov
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department   Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/04/17352.php
>


[OMPI devel] Dead code in opal_config_asm.4

2015-04-24 Thread Paul Hargrove
There is a block of code near the start of the OPAL_CONF_ASM which begins:
  # OS X Leopard ld bus errors if ...

However, Leopard is OS X 10.5 and the minimum supported by Open MPI is 10.6.
So, that code should be unreachable at this time (and since Jan 2014
http://www.open-mpi.org/community/lists/devel/2014/01/13697.php)

-Paul

-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


Re: [OMPI devel] Dead code in opal_config_asm.4

2015-04-24 Thread Jeff Squyres (jsquyres)
Sweet -- thanks:


https://github.com/open-mpi/ompi/commit/0afda878a2017b1d50c58cccecc5334fbb81b7dc



> On Apr 24, 2015, at 6:34 AM, Paul Hargrove  wrote:
> 
> There is a block of code near the start of the OPAL_CONF_ASM which begins:
>   # OS X Leopard ld bus errors if ...
> 
> However, Leopard is OS X 10.5 and the minimum supported by Open MPI is 10.6.
> So, that code should be unreachable at this time (and since Jan 2014 
> http://www.open-mpi.org/community/lists/devel/2014/01/13697.php)
> 
> -Paul
> 
> -- 
> Paul H. Hargrove  phhargr...@lbl.gov
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department   Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/04/17356.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



[OMPI devel] 1.8.5....going once...going twice...

2015-04-24 Thread Ralph Castain
Any last minute issues people need to report? Otherwise, this baby is going to 
ship

Paul: I will include your README suggestions as they relate to 1.8.5. Thanks, 
as always!
Ralph



Re: [OMPI devel] 1.8.5....going once...going twice...

2015-04-24 Thread Paul Hargrove
5 of the 6 MIPS and ARM testers that were still running last night have
completed successfully.
No reason to think the last one won't pass on rc3 as it did on rc2, if
given another 2 or 3 hours to complete.

-Paul

On Fri, Apr 24, 2015 at 9:52 AM, Ralph Castain  wrote:

> Any last minute issues people need to report? Otherwise, this baby is
> going to ship
>
> Paul: I will include your README suggestions as they relate to 1.8.5.
> Thanks, as always!
> Ralph
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/04/17358.php
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


[OMPI devel] Next developer face-to-face meeting: Doodle

2015-04-24 Thread Jeff Squyres (jsquyres)
A bunch of people have filled out the Doodle -- thanks!

If you haven't done so, please fill it out by the teleconf next Tuesday, thanks:

 http://doodle.com/4arc4ciiby2ve222

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] Suggested README changes

2015-04-24 Thread Jeff Squyres (jsquyres)
Sandia --

Are you ok with this change?  If so, please change it on master and file a PR...

(one quibble: it's "verbs", not "VERBS")


> On Apr 23, 2015, at 6:25 PM, Paul Hargrove  wrote:
> 
> More suggestions for README:
> 
> On master I suggest  s/openib/verbs/  to catch two lingering instances of the 
> old BTL name.
> Only the first of those two can optionally be updated in v1.8 too.
> 
> As for Portals4, here is a suggested replacement stolen directly from 
> portals4's google code site:
> 
> -  Portals4 is the support library for Cray interconnects, but is also
> -  available on other platforms (e.g., there is a Portals4 library
> -  implemented over regular TCP).
> +  Portals is a low-level network API for high-performance networking
> +  on high-performance computing systems developed by Sandia National
> +  Laboratories, Intel Corporation, and the University of New Mexico.
> +  The Portals 4 Reference Implementation is a complete implementation
> +  of Portals 4, with transport over InfiniBand VERBS and UDP.
> 
> -Paul
> 
> 
> On Thu, Apr 23, 2015 at 2:01 PM, Jeff Squyres (jsquyres)  
> wrote:
> Applied -- thank you!
> 
> -Jeff [accepting patches from Paul since 2002]
> 
> ;-p
> 
> > On Apr 23, 2015, at 2:29 PM, Paul Hargrove  wrote:
> >
> > I have attached a patch (against master) that fixes some typos and makes an 
> > update.
> > It applies almost cleanly to v1.8, requiring "-C2" if applying with "git 
> > apply" due to context changes.
> >
> > I also noted the following which I believe is just plain false, but don't 
> > have an alternative for.
> >   Portals4 is the support library for Cray interconnects, but is also
> >   available on other platforms (e.g., there is a Portals4 library
> >   implemented over regular TCP).
> >
> > It seems to be based on a dated description of Portals3.3.  Cray does not 
> > (to the best of my knowledge) have an implementation of Portals4, and the 
> > reference implementation of Portals4 is over IB rather than over TCP.  
> > Perhaps @regrant can offer a re-write?
> >
> > -Paul [generating more work for @jsquyres since 1999]
> >
> > --
> > Paul H. Hargrove  phhargr...@lbl.gov
> > Computer Languages & Systems Software (CLaSS) Group
> > Computer Science Department   Tel: +1-510-495-2352
> > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post: 
> > http://www.open-mpi.org/community/lists/devel/2015/04/17346.php
> 
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/04/17349.php
> 
> 
> 
> -- 
> Paul H. Hargrove  phhargr...@lbl.gov
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department   Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/04/17350.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



[OMPI devel] My 1.8.5rc3 testing report

2015-04-24 Thread Paul Hargrove
All done!  All good!

Summary:
 3  Unavailable
 1  Known bad[*] configuration
70  PASS

-Paul

[*] Compiler bug confirmed by Nysal

On Thu, Apr 23, 2015 at 7:29 PM, Paul Hargrove  wrote:

> Sorry my ARMv6, ARMv8 and PowerPC64LE systems were dedicated to other
> purposes today.
> So, I was only able to test 1.8.5rc3 against 71 distinct configurations.
>  ;-)
>
> There was 1 known-failure (xlc-11.1 LP64) that scared me by changing
> failure modes between rc1 and rc2.
> Several slow tests (Linux on MIPS and ARM, with multiple ABIs/ISAs, all
> emulated w/ QEMU) are still running and will likely be ready by Fri noon
> PDT.
> Besides those I have only clean PASS results.
>
> In the unlikely event the ARM or MIPS testers fail, I will of course
> report it.
> However, I doubt anything has regressed since they passed rc2.
>
> Current summary:
>  3  Unavailable
>  1  Known bad[*] configuration
>  6  Still pending (QEMU)
> 64  PASS
>
> -Paul
>
> [*] Albert Einstein is said to have defined Insanity as "doing the same
> thing over and over again and expecting different results"
>
> --
> Paul H. Hargrove  phhargr...@lbl.gov
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department   Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


Re: [OMPI devel] My 1.8.5rc3 testing report

2015-04-24 Thread Ralph Castain
Thanks Paul !!

> On Apr 24, 2015, at 12:40 PM, Paul Hargrove  wrote:
> 
> All done!  All good!
> 
> Summary:
>  3  Unavailable
>  1  Known bad[*] configuration
> 70  PASS
> 
> -Paul
> 
> [*] Compiler bug confirmed by Nysal
> 
> On Thu, Apr 23, 2015 at 7:29 PM, Paul Hargrove  > wrote:
> Sorry my ARMv6, ARMv8 and PowerPC64LE systems were dedicated to other 
> purposes today.
> So, I was only able to test 1.8.5rc3 against 71 distinct configurations.  ;-)
> 
> There was 1 known-failure (xlc-11.1 LP64) that scared me by changing failure 
> modes between rc1 and rc2.
> Several slow tests (Linux on MIPS and ARM, with multiple ABIs/ISAs, all 
> emulated w/ QEMU) are still running and will likely be ready by Fri noon PDT.
> Besides those I have only clean PASS results.
> 
> In the unlikely event the ARM or MIPS testers fail, I will of course report 
> it.
> However, I doubt anything has regressed since they passed rc2.
> 
> Current summary:
>  3  Unavailable
>  1  Known bad[*] configuration
>  6  Still pending (QEMU) 
> 64  PASS
> 
> -Paul
> 
> [*] Albert Einstein is said to have defined Insanity as "doing the same thing 
> over and over again and expecting different results"
> 
> -- 
> Paul H. Hargrove  phhargr...@lbl.gov 
> 
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department   Tel: +1-510-495-2352 
> 
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 
> 
> 
> 
> -- 
> Paul H. Hargrove  phhargr...@lbl.gov 
> 
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department   Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/04/17362.php



Re: [OMPI devel] powerpc64le support [1-line patch]

2015-04-24 Thread Troy Benjegerdes
On Wed, Apr 22, 2015 at 02:19:07PM -0700, Paul Hargrove wrote:
> I had an opportunity to try the 1.8.5rc2 tarball on a little-endian POWER8
> (aka ppc64el or powerpc64le).
> The good news is that things "just worked" as they did when I tried ARMv8
> (aka aarch64).
> 

This is a little off-topic, but where do I buy one of these beasts, and 
can I at least load new firmware to put it back in proper ppc64 mode? ;)

ttile leaidnop npreww ,c taht siw sihdlromoc  gni\n?ot

-- 

Troy Benjegerdes 'da hozer'  ho...@hozed.org
7 elements  earth::water::air::fire::mind::spirit::soulgrid.coop

  Never pick a fight with someone who buys ink by the barrel,
 nor try buy a hacker who makes money by the megahash



Re: [OMPI devel] powerpc64le support [1-line patch]

2015-04-24 Thread Paul Hargrove
Troy,

My POWER8 is actually a QEMU emulator (same goes for my AARCH64).

However, the real thing (with 2sockets * 10cores * 8threads) exists in the
GCC compile farm as gcc112.
It is my understanding that with IBMs hypervisor one can run both LE and BE
partitions simultaneously.  So, probably no reloading of firmware is
involved.

-Paul

On Fri, Apr 24, 2015 at 3:04 PM, Troy Benjegerdes  wrote:

> On Wed, Apr 22, 2015 at 02:19:07PM -0700, Paul Hargrove wrote:
> > I had an opportunity to try the 1.8.5rc2 tarball on a little-endian
> POWER8
> > (aka ppc64el or powerpc64le).
> > The good news is that things "just worked" as they did when I tried ARMv8
> > (aka aarch64).
> >
>
> This is a little off-topic, but where do I buy one of these beasts, and
> can I at least load new firmware to put it back in proper ppc64 mode? ;)
>
> ttile leaidnop npreww ,c taht siw sihdlromoc  gni\n?ot
>
> --
>
> 
> Troy Benjegerdes 'da hozer'
> ho...@hozed.org
> 7 elements  earth::water::air::fire::mind::spirit::soul
> grid.coop
>
>   Never pick a fight with someone who buys ink by the barrel,
>  nor try buy a hacker who makes money by the megahash
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/04/17364.php
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900