Many thanks Alan for pushing this and getting the debug statements into the
code. You have confirmed that i was looking in the wrong place for the problem.
I was mulling things over this morning and wondered about the same thing as
you, some sort of memory leak with the shared memory. You have obviously shown
that not to be the case.
The error messages are a bit of a surprise. I would have thought that the
memory map would count the number of system wide open calls that had been made
and only close when all those open calls was matched with a close. I will have
to check the documentation to be sure. Maybe this is related to the long delay
somehow.
I think the next step is probably to add some timing code to the viewer. My
plan is to try to sort the other bugs i mentioned this evening. Then if there
is still time I may try that.
If you do anything else in the meantime then let me know so we can avoid
duplicated effort.
Thanks again
Phil
Sent from my Windows 10 phone
From: Alan W. Irwin
Sent: 10 December 2016 13:00
To: Phil Rosenberg
Cc: PLplot development list
Subject: Re: More on IPC
Hi Phil:
If you review my results from much earlier this year that had a screen
shot attached in the next e-mail of my cpumeter results, the test
being run was for a case with absolutely no text data being written to
wxPLViewer. So your concern (expressed off list to me) about the
efficiency of IPC of text may be well taken, but they are not relevant
to this particular result of huge idle times.
Your other thought that inefficient IPC method may not be the direct
cause of this issue is likely correct. For why, read on.
One working hypothesis I could think of to generate these
excessive idle times on Linux is that shared memory is not being
shm_unlink'd properly so we have a shared memory leak that in a
heavy-duty testing environment (such as when building the
test_c_wxwidgets target) means the Linux kernel is soon right up
against it (since it probably only allows a rather small proportion of
actual memory to be used for the special purpose of shared memory). So
the extraordinary measures that would be needed to supply shared memory ==>
huge idle times.
To investigate that possibility further I introduced (as of commit
dcae24e) rudimentary but proper debug reporting of when the creation
of shared memory is a success. I also did that for the shm_unlink
call (but only for the case when there is success for
PLMemoryMap::create followed by a call to PLMemoryMap::close). The
results from building the test_c_wxwidgets target (before I ^C'd out
of it) are as follows:
Testing subset of C examples for device wxwidgets
x01c
PLMemoryMap::create: shm_open was a success for plplotMemoryMapIFWDQVRLBF
PLMemoryMap::create: shm_open was a success for plplotMemoryMapIFWDQVRLBF
PLMemoryMap::close: shm_unlink was a success for plplotMemoryMapIFWDQVRLBF
x04c
PLMemoryMap::create: shm_open was a success for plplotMemoryMapBTDCASTECJ
PLMemoryMap::create: shm_open was a success for plplotMemoryMapBTDCASTECJ
PLMemoryMap::close: shm_unlink was a success for plplotMemoryMapBTDCASTECJ
PLMemoryMap::close: shm_unlink error
PLMemoryMap::close: No such file or directory
x08c
PLMemoryMap::create: shm_open was a success for plplotMemoryMapYKNFFXNKGJ
PLMemoryMap::create: shm_open was a success for plplotMemoryMapYKNFFXNKGJ
PLMemoryMap::close: shm_unlink was a success for plplotMemoryMapYKNFFXNKGJ
PLMemoryMap::close: shm_unlink error
PLMemoryMap::close: No such file or directory
x14c
PLMemoryMap::create: shm_open was a success for plplotMemoryMapNYBYKUHKXR
PLMemoryMap::create: shm_open was a success for plplotMemoryMapNYBYKUHKXR
PLMemoryMap::create: shm_open was a success for plplotMemoryMapIJIAMCSBUP
PLMemoryMap::create: shm_open was a success for plplotMemoryMapIJIAMCSBUP
PLMemoryMap::close: shm_unlink was a success for plplotMemoryMapIJIAMCSBUP
PLMemoryMap::close: shm_unlink was a success for plplotMemoryMapNYBYKUHKXR
x16c
PLMemoryMap::create: shm_open was a success for plplotMemoryMapQPXDPGRVTF
PLMemoryMap::create: shm_open was a success for plplotMemoryMapQPXDPGRVTF
PLMemoryMap::close: shm_unlink was a success for plplotMemoryMapQPXDPGRVTF
PLMemoryMap::close: shm_unlink error
PLMemoryMap::close: No such file or directory
x17c
^C
So these results completely shoot down that hypothesis because for every
successful shm_open of a particular name there is also a successful
shm_unlink for that same name ==> no shared memory leak! Oh well, on
to the next idea. :-)
For a bit I was excited about the above shm_unlink error
messages that are also emitted for all but the first and 14th
examples, but I now believe that pattern is the expected one, i.e.,
-dev wxwidgets finishes so it calls the PLMemoryMap destructor
or PLMemoryMap::close directly which
destroys the shared memory. Then wxPLViewer finishes so when it calls
the PLMemoryMap destructor or PLMemoryMap::close directly those error messages
must be emitted.
But I frankly do not understand why no such error messages were emitted
for the first and 14th examples. Yet for the first example
experiments below, the error messages are always emitted. I have
rerun both types of tests any number of times and these patterns
always repeat. But I cannot figure out why different error messages are
emitted for x01c results
in the two contexts. Do you have some intelligent guess or even
definitive response about that?
Note the above form of results always take a very long time to create so the
excessive idle time is still with us. In fact to see that directly,
I ran x01c twice in a row as follows:
software@raven> time examples/c/x01c -dev wxwidgets -np; echo "done x01c"; time
examples/c/x01c -dev wxwidgets -np;echo "done x01c"
PLplot library version: 5.11.1
PLMemoryMap::create: shm_open was a success for plplotMemoryMapRZZQSYHRAO
PLMemoryMap::create: shm_open was a success for plplotMemoryMapRZZQSYHRAO
PLMemoryMap::close: shm_unlink was a success for plplotMemoryMapRZZQSYHRAO
real 0m0.258s
user 0m0.040s
sys 0m0.016s
done x01c
PLplot library version: 5.11.1
PLMemoryMap::close: shm_unlink error
PLMemoryMap::close: No such file or directory
[~10 second pause]
PLMemoryMap::create: shm_open was a success for plplotMemoryMapOLMHNIFOPT
PLMemoryMap::create: shm_open was a success for plplotMemoryMapOLMHNIFOPT
PLMemoryMap::close: shm_unlink was a success for plplotMemoryMapOLMHNIFOPT
real 0m10.263s
user 0m0.052s
sys 0m0.004s
done x01c
software@raven> PLMemoryMap::close: shm_unlink error
PLMemoryMap::close: No such file or directory
I then did the same experiment only using -dev psc for the second
invocation of the example.
software@raven> time examples/c/x01c -dev wxwidgets -np; echo "done x01c"; time
examples/c/x01c -dev psc -o test.ps ;echo "done x01c"
PLplot library version: 5.11.1
PLMemoryMap::create: shm_open was a success for plplotMemoryMapMAMYNDBJMT
PLMemoryMap::create: shm_open was a success for plplotMemoryMapMAMYNDBJMT
PLMemoryMap::close: shm_unlink was a success for plplotMemoryMapMAMYNDBJMT
real 0m0.255s
user 0m0.052s
sys 0m0.000s
done x01c
PLplot library version: 5.11.1
real 0m0.006s
user 0m0.004s
sys 0m0.000s
done x01c
software@raven> PLMemoryMap::close: shm_unlink error
PLMemoryMap::close: No such file or directory
So absolutely no delay of that second example run in this case (as
expected), but a ~10 second delay (sigh) when -dev wxwidgets is used
twice in a row.
This ~10 second pause of the second example run for -dev wxwidgets occurred
when no
wxPLViewer GUI was showing. And from the placement of that pause
it is now a certainty that the delay is due to a long
wait (likely for some system resources) by the -dev wxwidgets code on startup,
and the
most likely candidate for that is that delay is occurring somewhere in
PLMemoryMap::create. So further debug output is going to have to be
inserted to confirm that, but if that turns out to be the case, this
obviously has nothing to do with IPC methods but everything to do with
setting those up.
Note also this issue is exactly the one I posted about a long time
ago, i.e., it also occurs when no text is being written to the device.
Unfortunately, I have now run out of any further time to pursue this
right now so the answer to the above question about whether or not the
issue is somewhere in PLMemoryMap::create and if so finding the exact
system call that is intermittently inserting huge delays will have to
be a mystery until post-release (unless someone else steps in).
I have now made a subsequent commit (1627eec) to drop this debug
information for the release. But output of that useful debug
information can trivially be restored by #defining PLMemoryMap_DEBUG
in drivers/wxwidgets_comms.h.
Alan
__________________________
Alan W. Irwin
Astronomical research affiliation with Department of Physics and Astronomy,
University of Victoria (astrowww.phys.uvic.ca).
Programming affiliations with the FreeEOS equation-of-state
implementation for stellar interiors (freeeos.sf.net); the Time
Ephemerides project (timeephem.sf.net); PLplot scientific plotting
software package (plplot.sf.net); the libLASi project
(unifont.org/lasi); the Loads of Linux Links project (loll.sf.net);
and the Linux Brochure Project (lbproject.sf.net).
__________________________
Linux-powered Science
__________________________
------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today.http://sdm.link/xeonphi
_______________________________________________
Plplot-devel mailing list
Plplot-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/plplot-devel