On Feb 20, 2014, at 7:10 AM, Jeff Squyres (jsquyres) <[email protected]> wrote:
> For all of these, I'm using the openshmem test suite that is now committed to
> the ompi-svn SVN repo. I don't know if the errors are with the tests or with
> oshmem itself.
>
> 1. I'm running the oshmem test suite at 32 processes across 2 16-core
> servers. I'm seeing a segv in "examples/shmem_2dheat.x 10 10". It seems to
> run fine at lower np values such as 2, 4, and 8; I didn't try to determine
> where the crossover to badness occurs.
My memory is bad and my notes are on a machine I no longer have access to, but
I did this to the test suite run for Portals SHMEM:
Index: shmem_2dheat.c
===================================================================
--- shmem_2dheat.c (revision 270)
+++ shmem_2dheat.c (revision 271)
@@ -129,6 +129,11 @@
p = _num_pes ();
my_rank = _my_pe ();
+ if (p > 8) {
+ fprintf(stderr, "Ignoring test when run with more than 8 pes\n");
+ return 77;
+ }
+
/* argument processing done by everyone */
int c, errflg;
extern char *optarg;
The commit comment was that there was a scaling issue in the code itself, I
just wish I could remember exactly what it was.
> 2. "examples/adjacent_32bit_amo.x 10 10" seems to hang with both tcp and
> usnic BTLs, even when running at np=2 (I let it run for several minutes
> before killing it).
If atomics aren't fast, this test can run for a very long time (also, it takes
no arguments, so the 10 10 is being ignored). It's essentially looking for a
race by blasting 32-bit atomic ops at both parts of a 64 bit word.
> 3. Ditto for "example/ptp.x 10 10".
>
> 4. "examples/shmem_matrix.x 10 10" seems to run fine at np=32 on usnic, but
> hangs with TCP (i.e., I let it run for 8+ minutes before killing it --
> perhaps it would have finished eventually?).
>
> ...there's more results (more timeouts and more failures), but they're not
> yet complete, and I've got to keep working on my own features for v1.7.5, so
> I need to move to other things right now.
These start to sound like issues in the code; those last two are pretty decent
tests.
> I think I have oshmem running well enough to add these to Cisco's nightly MTT
> runs now, so the results will start showing up there without needing my
> manual attention.
Woot.
Brian
--
Brian Barrett
There is an art . . . to flying. The knack lies in learning how to
throw yourself at the ground and miss.
Douglas Adams, 'The Hitchhikers Guide to the Galaxy'