On Feb 20, 2014, at 7:10 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> wrote:
> For all of these, I'm using the openshmem test suite that is now committed to > the ompi-svn SVN repo. I don't know if the errors are with the tests or with > oshmem itself. > > 1. I'm running the oshmem test suite at 32 processes across 2 16-core > servers. I'm seeing a segv in "examples/shmem_2dheat.x 10 10". It seems to > run fine at lower np values such as 2, 4, and 8; I didn't try to determine > where the crossover to badness occurs. My memory is bad and my notes are on a machine I no longer have access to, but I did this to the test suite run for Portals SHMEM: Index: shmem_2dheat.c =================================================================== --- shmem_2dheat.c (revision 270) +++ shmem_2dheat.c (revision 271) @@ -129,6 +129,11 @@ p = _num_pes (); my_rank = _my_pe (); + if (p > 8) { + fprintf(stderr, "Ignoring test when run with more than 8 pes\n"); + return 77; + } + /* argument processing done by everyone */ int c, errflg; extern char *optarg; The commit comment was that there was a scaling issue in the code itself, I just wish I could remember exactly what it was. > 2. "examples/adjacent_32bit_amo.x 10 10" seems to hang with both tcp and > usnic BTLs, even when running at np=2 (I let it run for several minutes > before killing it). If atomics aren't fast, this test can run for a very long time (also, it takes no arguments, so the 10 10 is being ignored). It's essentially looking for a race by blasting 32-bit atomic ops at both parts of a 64 bit word. > 3. Ditto for "example/ptp.x 10 10". > > 4. "examples/shmem_matrix.x 10 10" seems to run fine at np=32 on usnic, but > hangs with TCP (i.e., I let it run for 8+ minutes before killing it -- > perhaps it would have finished eventually?). > > ...there's more results (more timeouts and more failures), but they're not > yet complete, and I've got to keep working on my own features for v1.7.5, so > I need to move to other things right now. These start to sound like issues in the code; those last two are pretty decent tests. > I think I have oshmem running well enough to add these to Cisco's nightly MTT > runs now, so the results will start showing up there without needing my > manual attention. Woot. Brian -- Brian Barrett There is an art . . . to flying. The knack lies in learning how to throw yourself at the ground and miss. Douglas Adams, 'The Hitchhikers Guide to the Galaxy'