Hi Phil:

If you review my results from much earlier this year that had a screen
shot attached in the next e-mail of my cpumeter results, the test
being run was for a case with absolutely no text data being written to
wxPLViewer.  So your concern (expressed off list to me) about the
efficiency of IPC of text may be well taken, but they are not relevant
to this particular result of huge idle times.

Your other thought that inefficient IPC method may not be the direct
cause of this issue is likely correct.  For why, read on.

One working hypothesis I could think of to generate these
excessive idle times on Linux is that shared memory is not being
shm_unlink'd properly so we have a shared memory leak that in a
heavy-duty testing environment (such as when building the
test_c_wxwidgets target) means the Linux kernel is soon right up
against it (since it probably only allows a rather small proportion of
actual memory to be used for the special purpose of shared memory). So
the extraordinary measures that would be needed to supply shared memory ==>
huge idle times.

To investigate that possibility further I introduced (as of commit
dcae24e) rudimentary but proper debug reporting of when the creation
of shared memory is a success.  I also did that for the shm_unlink
call (but only for the case when there is success for
PLMemoryMap::create followed by a call to PLMemoryMap::close).  The
results from building the test_c_wxwidgets target (before I ^C'd out
of it) are as follows:

Testing subset of C examples for device wxwidgets
x01c
PLMemoryMap::create: shm_open was a success for plplotMemoryMapIFWDQVRLBF
PLMemoryMap::create: shm_open was a success for plplotMemoryMapIFWDQVRLBF
PLMemoryMap::close: shm_unlink was a success for plplotMemoryMapIFWDQVRLBF
x04c
PLMemoryMap::create: shm_open was a success for plplotMemoryMapBTDCASTECJ
PLMemoryMap::create: shm_open was a success for plplotMemoryMapBTDCASTECJ
PLMemoryMap::close: shm_unlink was a success for plplotMemoryMapBTDCASTECJ
PLMemoryMap::close: shm_unlink error
PLMemoryMap::close: No such file or directory
x08c
PLMemoryMap::create: shm_open was a success for plplotMemoryMapYKNFFXNKGJ
PLMemoryMap::create: shm_open was a success for plplotMemoryMapYKNFFXNKGJ
PLMemoryMap::close: shm_unlink was a success for plplotMemoryMapYKNFFXNKGJ
PLMemoryMap::close: shm_unlink error
PLMemoryMap::close: No such file or directory
x14c
PLMemoryMap::create: shm_open was a success for plplotMemoryMapNYBYKUHKXR
PLMemoryMap::create: shm_open was a success for plplotMemoryMapNYBYKUHKXR
PLMemoryMap::create: shm_open was a success for plplotMemoryMapIJIAMCSBUP
PLMemoryMap::create: shm_open was a success for plplotMemoryMapIJIAMCSBUP
PLMemoryMap::close: shm_unlink was a success for plplotMemoryMapIJIAMCSBUP
PLMemoryMap::close: shm_unlink was a success for plplotMemoryMapNYBYKUHKXR
x16c
PLMemoryMap::create: shm_open was a success for plplotMemoryMapQPXDPGRVTF
PLMemoryMap::create: shm_open was a success for plplotMemoryMapQPXDPGRVTF
PLMemoryMap::close: shm_unlink was a success for plplotMemoryMapQPXDPGRVTF
PLMemoryMap::close: shm_unlink error
PLMemoryMap::close: No such file or directory
x17c
^C

So these results completely shoot down that hypothesis because for every
successful shm_open of a particular name there is also a successful
shm_unlink for that same name ==> no shared memory leak!  Oh well, on
to the next idea.  :-)

For a bit I was excited about the above shm_unlink error
messages that are also emitted for all but the first and 14th
examples, but I now believe that pattern is the expected one, i.e.,
-dev wxwidgets finishes so it calls the PLMemoryMap destructor 
or PLMemoryMap::close directly which
destroys the shared memory.  Then wxPLViewer finishes so when it calls
the PLMemoryMap destructor or PLMemoryMap::close directly those error messages 
must be emitted.
But I frankly do not understand why no such error messages were emitted
for the first and 14th examples.  Yet for the first example
experiments below, the error messages are always emitted.  I have
rerun both types of tests any number of times and these patterns
always repeat.  But I cannot figure out why different error messages are
emitted for x01c results
in the two contexts.  Do you have some intelligent guess or even
definitive response about that?

Note the above form of results always take a very long time to create so the
excessive idle time is still with us.  In fact to see that directly,
I ran x01c twice in a row as follows:

software@raven> time examples/c/x01c -dev wxwidgets -np; echo "done x01c"; time 
examples/c/x01c -dev wxwidgets -np;echo "done x01c"
PLplot library version: 5.11.1
PLMemoryMap::create: shm_open was a success for plplotMemoryMapRZZQSYHRAO
PLMemoryMap::create: shm_open was a success for plplotMemoryMapRZZQSYHRAO
PLMemoryMap::close: shm_unlink was a success for plplotMemoryMapRZZQSYHRAO

real    0m0.258s
user    0m0.040s
sys     0m0.016s
done x01c
PLplot library version: 5.11.1
PLMemoryMap::close: shm_unlink error
PLMemoryMap::close: No such file or directory
[~10 second pause]
PLMemoryMap::create: shm_open was a success for plplotMemoryMapOLMHNIFOPT
PLMemoryMap::create: shm_open was a success for plplotMemoryMapOLMHNIFOPT
PLMemoryMap::close: shm_unlink was a success for plplotMemoryMapOLMHNIFOPT

real    0m10.263s
user    0m0.052s
sys     0m0.004s
done x01c
software@raven> PLMemoryMap::close: shm_unlink error
PLMemoryMap::close: No such file or directory

I then did the same experiment only using -dev psc for the second
invocation of the example.

software@raven> time examples/c/x01c -dev wxwidgets -np; echo "done x01c"; time 
examples/c/x01c -dev psc -o test.ps ;echo "done x01c"
PLplot library version: 5.11.1
PLMemoryMap::create: shm_open was a success for plplotMemoryMapMAMYNDBJMT
PLMemoryMap::create: shm_open was a success for plplotMemoryMapMAMYNDBJMT
PLMemoryMap::close: shm_unlink was a success for plplotMemoryMapMAMYNDBJMT

real    0m0.255s
user    0m0.052s
sys     0m0.000s
done x01c
PLplot library version: 5.11.1

real    0m0.006s
user    0m0.004s
sys     0m0.000s
done x01c
software@raven> PLMemoryMap::close: shm_unlink error
PLMemoryMap::close: No such file or directory

So absolutely no delay of that second example run in this case (as
expected), but a ~10 second delay (sigh) when -dev wxwidgets is used
twice in a row.

This ~10 second pause of the second example run for -dev wxwidgets occurred 
when no
wxPLViewer GUI was showing.  And from the placement of that pause
it is now a certainty that the delay is due to a long
wait (likely for some system resources) by the -dev wxwidgets code on startup, 
and the
most likely candidate for that is that delay is occurring somewhere in
PLMemoryMap::create.  So further debug output is going to have to be
inserted to confirm that, but if that turns out to be the case, this
obviously has nothing to do with IPC methods but everything to do with
setting those up.
Note also this issue is exactly the one I posted about a long time
ago, i.e., it also occurs when no text is being written to the device.

Unfortunately, I have now run out of any further time to pursue this
right now so the answer to the above question about whether or not the
issue is somewhere in PLMemoryMap::create and if so finding the exact
system call that is intermittently inserting huge delays will have to
be a mystery until post-release (unless someone else steps in).

I have now made a subsequent commit (1627eec) to drop this debug
information for the release.  But output of that useful debug
information can trivially be restored by #defining PLMemoryMap_DEBUG
in drivers/wxwidgets_comms.h.

Alan
__________________________
Alan W. Irwin

Astronomical research affiliation with Department of Physics and Astronomy,
University of Victoria (astrowww.phys.uvic.ca).

Programming affiliations with the FreeEOS equation-of-state
implementation for stellar interiors (freeeos.sf.net); the Time
Ephemerides project (timeephem.sf.net); PLplot scientific plotting
software package (plplot.sf.net); the libLASi project
(unifont.org/lasi); the Loads of Linux Links project (loll.sf.net);
and the Linux Brochure Project (lbproject.sf.net).
__________________________

Linux-powered Science
__________________________

------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today.http://sdm.link/xeonphi
_______________________________________________
Plplot-devel mailing list
Plplot-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/plplot-devel

Reply via email to