On Feb 20, 2014, at 7:10 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> wrote:

> For all of these, I'm using the openshmem test suite that is now committed to 
> the ompi-svn SVN repo.  I don't know if the errors are with the tests or with 
> oshmem itself.
> 
> 1. I'm running the oshmem test suite at 32 processes across 2 16-core 
> servers.  I'm seeing a segv in "examples/shmem_2dheat.x 10 10".  It seems to 
> run fine at lower np values such as 2, 4, and 8; I didn't try to determine 
> where the crossover to badness occurs.

My memory is bad and my notes are on a machine I no longer have access to, but 
I did this to the test suite run for Portals SHMEM:

Index: shmem_2dheat.c
===================================================================
--- shmem_2dheat.c      (revision 270)
+++ shmem_2dheat.c      (revision 271)
@@ -129,6 +129,11 @@
   p = _num_pes ();
   my_rank = _my_pe ();
 
+  if (p > 8) {
+      fprintf(stderr, "Ignoring test when run with more than 8 pes\n");
+      return 77;
+  }
+
   /* argument processing done by everyone */
   int c, errflg;
   extern char *optarg;

The commit comment was that there was a scaling issue in the code itself, I 
just wish I could remember exactly what it was.

> 2. "examples/adjacent_32bit_amo.x 10 10" seems to hang with both tcp and 
> usnic BTLs, even when running at np=2 (I let it run for several minutes 
> before killing it).

If atomics aren't fast, this test can run for a very long time (also, it takes 
no arguments, so the 10 10 is being ignored).  It's essentially looking for a 
race by blasting 32-bit atomic ops at both parts of a 64 bit word.

> 3. Ditto for "example/ptp.x 10 10".
> 
> 4. "examples/shmem_matrix.x 10 10" seems to run fine at np=32 on usnic, but 
> hangs with TCP (i.e., I let it run for 8+ minutes before killing it -- 
> perhaps it would have finished eventually?).
> 
> ...there's more results (more timeouts and more failures), but they're not 
> yet complete, and I've got to keep working on my own features for v1.7.5, so 
> I need to move to other things right now.

These start to sound like issues in the code; those last two are pretty decent 
tests.

> I think I have oshmem running well enough to add these to Cisco's nightly MTT 
> runs now, so the results will start showing up there without needing my 
> manual attention.

Woot.

Brian

-- 
 Brian Barrett

 There is an art . . . to flying. The knack lies in learning how to
 throw yourself at the ground and miss.
     Douglas Adams, 'The Hitchhikers Guide to the Galaxy'

Reply via email to