Re: [OMPI devel] [2.0.1rc2] CRITICAL error in README

2016-08-31 Thread Jeff Squyres (jsquyres)
Good catch; thanks.  I'll go fix right now...

> On Aug 30, 2016, at 10:41 PM, Paul Hargrove  wrote:
> 
> I believe that both the addresses and subscription URLs for the mailing lists 
> are out-of-date in the README as shown in red below.
> I don't know if the list addresses might be forwarding, but those 
> subscription URLs are definitely 404.
> 
> -Paul
> 
> The best way to report bugs, send comments, or ask questions is to
> sign up on the user's and/or developer's mailing list (for user-level
> and developer-level questions; when in doubt, send to the user's
> list):
> 
> us...@open-mpi.org
> de...@open-mpi.org
> 
> Because of spam, only subscribers are allowed to post to these lists
> (ensure that you subscribe with and post from exactly the same e-mail
> address -- j...@example.com is considered different than
> j...@mycomputer.example.com!).  Visit these pages to subscribe to the
> lists:
> 
>  http://www.open-mpi.org/mailman/listinfo.cgi/users
>  http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> Thanks for your time.
> 
> 
> 
> 
> -- 
> Paul H. Hargrove  phhargr...@lbl.gov
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department   Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel


Re: [OMPI devel] 2.0.1rc2 released

2016-08-31 Thread Paul Hargrove
On Tue, Aug 30, 2016 at 1:38 PM, Jeff Squyres (jsquyres)  wrote:

> On Aug 30, 2016, at 4:06 PM, Paul Hargrove  wrote:
> >
> > I will report my findings as they come in from my testers.
> > However, NERSC is down for quarterly maintenance which means I am w/o
> Intel compilers today.
> >
> > I am proud to have been verb-ified, but could I get some clarification
> on which "Hargroved" items are fixed?
> >
> > I *am* expecting that the following are included:
> > + sec/native on Solaris (PR 1336)
> > + pmix use of strnlen() [requires unsupported Mac OS X 10.6 to verify]
> > + README updates [which I will not be proof reading]
> >
> > I am currently assuming these are *not* fixed for this rc (all have
> 2.0.2 milestone):
> > + Support for NAG Fortran (PR 1215)
> > + xlc-12.1 inline atomics (PR 1344)
> > + The memory/patcher issues with xlc (PR 1347)
>
> Correct on all counts.
>
> We talked today on the webex about releasing today/tomorrow, but Ralph
> just identified a second part to the stdin wireup issue, and we're still
> testing a COMM_SPAWN issue.  So we might push back a little further... :-\



With NERSC back in operation late last night I was able to get the Intel
compiler tests taken care of.
Also overnight my slow ARM and MIPS emulators finished.

Overall everything I had tested previously is as expected - things listed
as fixed are fixed.
No previously-unknown problems were seen on the platforms on which I had
tested 2.0.0rc1.

HOWEVER, I was able to get a full run on an emulated SPARCv9 platform for
the first time.
It FAILED with a SIGBUS, which I will report that shortly in a new thread.

-Paul

-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

[OMPI devel] [2.0.1rc2] SIGBUS on Linux/SPARC

2016-08-31 Thread Paul Hargrove
On an emulated UltraSPARC system running Linux (and using V9 ABI) I was
able to build the RC, but get a SIGBUS when running ring_c.
The problem is an unaligned 64-bit access, as shown by the gdb session
below.

I have not tried, but it *might* be possible to reproduce on PPC64 via
"prctl --unaligned=signal".

-Paul


Core was generated by `examples/ring_c'.
Program terminated with signal 10, Bus error.
#0  0xf630ed64 in component_set_addr (peer=0xf6bb7114, uris=0x90ec8)
at
/home/phargrov/OMPI/openmpi-2.0.1rc2-linux-sparcv9/openmpi-2.0.1rc2/orte/mca/oob/usock/oob_usock_component.c:318
318 if (OPAL_SUCCESS !=
opal_hash_table_get_value_uint64(&mca_oob_usock_module.peers,

(gdb) l
313 if (ORTE_PROC_IS_APP) {
314 /* if this is my daemon, then take it - otherwise, ignore */
315 if (ORTE_PROC_MY_DAEMON->jobid == peer->jobid &&
316 ORTE_PROC_MY_DAEMON->vpid == peer->vpid) {
317 ui64 = (uint64_t*)peer;
318 if (OPAL_SUCCESS !=
opal_hash_table_get_value_uint64(&mca_oob_usock_module.peers,
319
 (*ui64), (void**)&pr) || NULL == pr) {
320 pr = OBJ_NEW(mca_oob_usock_peer_t);
321 pr->name = *peer;
322
opal_hash_table_set_value_uint64(&mca_oob_usock_module.peers, (*ui64), pr);

(gdb) print ui64
$1 = (uint64_t *) 0xf6bb7114

-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] Off-topic re: supporting old systems

2016-08-31 Thread Christopher Samuel
On 31/08/16 14:01, Paul Hargrove wrote:

> So, the sparc platform is a bit more orphaned that it already was when
> support stopped at Wheezy.

Ah sorry, I didn't realise you were on a non-LTS Wheezy architecture.

-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel


Re: [OMPI devel] [2.0.1rc2] SIGBUS on Linux/SPARC

2016-08-31 Thread Gilles Gouaillardet

Thanks Paul,


can you please try the patch available at 
https://patch-diff.githubusercontent.com/raw/open-mpi/ompi-release/pull/1357.patch 
?



Cheers,


Gilles


On 9/1/2016 2:12 AM, Paul Hargrove wrote:
On an emulated UltraSPARC system running Linux (and using V9 ABI) I 
was able to build the RC, but get a SIGBUS when running ring_c.
The problem is an unaligned 64-bit access, as shown by the gdb session 
below.


I have not tried, but it *might* be possible to reproduce on PPC64 via 
"prctl --unaligned=signal".


-Paul


Core was generated by `examples/ring_c'.
Program terminated with signal 10, Bus error.
#0  0xf630ed64 in component_set_addr (peer=0xf6bb7114, uris=0x90ec8)
at 
/home/phargrov/OMPI/openmpi-2.0.1rc2-linux-sparcv9/openmpi-2.0.1rc2/orte/mca/oob/usock/oob_usock_component.c:318
318 if (OPAL_SUCCESS != 
opal_hash_table_get_value_uint64(&mca_oob_usock_module.peers,


(gdb) l
313 if (ORTE_PROC_IS_APP) {
314 /* if this is my daemon, then take it - otherwise, 
ignore */

315 if (ORTE_PROC_MY_DAEMON->jobid == peer->jobid &&
316 ORTE_PROC_MY_DAEMON->vpid == peer->vpid) {
317 ui64 = (uint64_t*)peer;
318 if (OPAL_SUCCESS != 
opal_hash_table_get_value_uint64(&mca_oob_usock_module.peers,

319  (*ui64), (void**)&pr) || NULL == pr) {
320   pr = OBJ_NEW(mca_oob_usock_peer_t);
321   pr->name = *peer;
322 opal_hash_table_set_value_uint64(&mca_oob_usock_module.peers, 
(*ui64), pr);


(gdb) print ui64
$1 = (uint64_t *) 0xf6bb7114

--
Paul H. Hargrove phhargr...@lbl.gov 
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel


___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel