On 2017-10-04 12:11+0100 Phil Rosenberg wrote:

Note, I am including everything you wrote here (as opposed to dropping
parts of it) because the list has
not seen what you wrote because of the plplot-devel address that I flubbed.

I also respond in a few places below.

On 4 October 2017 at 05:54, Alan W. Irwin <ir...@beluga.phys.uvic.ca> wrote:
On 2017-10-03 23:44+0100 Phil Rosenberg wrote:

On Windows the fill test took 5 seconds using the old comms method and
12 with the new. That's with optimisations turned on and I just timed
it with my phone stopwatch from the point where I hit enter after
choosing the driver.

Interestingly I ran the viewer in a profiler to see why the
differences. Running the 3sem version first, it spent almost all its
time in a GDI rendering function, so no reason to think that the
different comms would make any difference. However, when I profiled
the old comms method, the profiler showed that the viewer spent all
it's time in a different GDI rendering function - this time called
NtGDIPolyPolyDraw. I saw something in wxWidgets the other day. This
was a function also called something like PolyPolygonFill and it said
that using this function plotting many polygons at once was faster
than plotting them all individually. So I am going to guess that GDI
maybe has some runtime optimisation or something and it was able to
better optimise the old comms than the new 3sem one. Maybe the
polygons arrive more rapidly?????


That's an interesting comparison, and it sure is a surprise that the
IPC method affects how the GDI rendering is optimized.  My bet is it
has nothing to do with specifically how the data are transmitted and
assembled, and instead that difference in GDI rendering optimization
is due to some "minor" difference in the code paths between IPC3 and
non-IPC3 case on the viewer side.  In other words, instead of looking
at transmitBytes and receiveBytes details, I think you should be
looking for IPC3 and non-IPC3 differences in utils/wxplframe.cpp
concerning how wxPlFrame::ReadTransmission() is called and also the
large number of IPC3 versus non-IPC3 code-path differences within that
routine.

Since the above is an interesting comparison I have decided to add
it to my results as well.

Just to be clear about nomenclature,

IPC3 wxwidgets is what I previously called default wxwidgets and which you
have called new comms.  You get that by default or by
specifying -DOLD_WXWIDGETS=OFF -DPL_WXWIDGETS_IPC3=ON

The non-IPC3 wxwidgets result I have added is what you have called
old comms.  You get that by specifying -DOLD_WXWIDGETS=OFF
-DPL_WXWIDGETS_IPC3=OFF

The old wxwidgets result corresponds to Werner's wxwidgets-related
software as updated by you until you decided to do completely rewrite
that software.  You get that by specifing -DOLD_WXWIDGETS=ON

So here is my old timing result table with non-IPC3 wxwidgets timings added
where those added timings are defined in exactly the same way and with
the same compiler options as the others.

device              plline test    plfill test
IPC3 wxwidgets      26  seconds    32  seconds
non-IPC3 wxwidgets  27  seconds    32  seconds
old wxwidgets       18  seconds    30  seconds
xcairo              1.4 seconds    2.2 seconds
qtwidget            1.5 seconds    1.6 seconds
xwin                9.5 seconds    3.4 seconds

So on Linux there is no significant measured time difference between
what you call new comms (IPC3) and old coms (non-IPC3) contrary to
your results on MSVC Windows.

So just one timing comparison like you did on a given platform is
tricky to generalize, and to get a better idea of what is going on for
a given platform it is a good idea to get as many comparisons as
possible. Therefore, could you please fill out a similar table to the
above with the first 3 devices the same and the last two for wingcc
and wingdi?  For example, if the three wxwidgets variants are roughly
the same speed as wingcc and wingdi, then it is likely there is
some remaining efficiency issue that just occurs for the Linux case.
But if on your platform all wxwidgets variants are roughly an order of
magnitude slower
than wingcc and wingdi, then we likely have a cross-platform efficiency
issue
with -dev wxwidgets.

For some reason I cannot build wingcc or wingdi, they do not come up
as enabled on my system when I run cmake. I have never looked into why
as I don't use them.

I will say more on this topic separately, but for now then
please fill in the first three rows since you do have access to
all those variations of wxwidgets.



I wonder why so slow on Linux?


I have been wondering about that issue forever.... :-)

More seriously though, it is certainly possible there is a unique
inefficiency issue on Linux that makes all IPC3 versus non-IPC3
comparisons look identical in (very slow) speed.  Also, as you know
such cross-platform time comparisons are notoriously unreliable since
we have different hardware, different underlying graphics systems
which wxwidgets necessarily wraps in extremely different ways,
different wxwidgets releases (probably), different compilers, and
different levels of optimizations of libraries and PLplot.  So I would
prefer to reserve judgement on MSVC Windows versus Linux comparisons
until you fill in the rest of the requested table, and probably only
pay attention to the relative results even then rather than the
absolute results.

By the way, I should have mentioned the above table was created
with the current HEAD of master branch (commit 124a0c3) with
no local changes (other than the two different patches
to examples/c/x00c.c to produce the plline and plfill tests above).
So when you produce your table would you be sure to do the same?

Do you have a profiler you can use?
Again if you uncomment #define WXPLVIEWER_DEBUG, then set the example
running normally it will display the command line params that you can
use to execute wxPLViewer in a profiler to see where it is spending
its time. There really is no other good way to work out the timings
other than by using a profiler as there are so many unexpected
optimisations that can happen.


I have never done profiling, but I agree this is an excellent idea
both for core and viewer for the 6 simple examples (plline and
plfill for the three wxwidgets variants).

I am quite familiar with valgrind so I am thinking of using callgrind
<https://www.cs.cmu.edu/afs/cs.cmu.edu/project/cmt-40/Nice/RuleRefinement/bin/valgrind-3.2.0/docs/html/cl-manual.html>
to do the profiling.

What do you think of that callgrind description and have you heard any
caveats/kudos about it?

One caveat with valgrind (and presumably callgrind) is identification
of source code lines depends on the -g option symbols being available
for the library. For wxwidgets, Debian apparently provides those
symbols in separate packaged form, e.g., package libwxbase3.0-0-dbg,
and my extrapolation from some web discussion is such
wxWidgets-related *-dbg packages will automatically allow me to
profile (with source-code line identifications) the official wxWidgets
Debian libraries.  But I will see.

I think that is a feature of all profilers and debuggers. But I got
the impression that most linux libraries were distributed with the
debug information. Maybe I'm mistaken. I can't really comment on
callgrind. I chose to use a tool called very sleepy. It was
recommended as very easy to use. It is a graphical tool - I simply
enter a command to run or select a currently running process and it
repeatedly checks over and over which function the current execution
path is in and which line of code it is executing until either the
process ends or you tell it to stop. Then I can view a heirarchy of %
time spent in each function or load a page of code and see time spent
on each line.  I think some profilers work a bit like debuggers
tracking function calls. I don't know which is better. I've only used
very sleepy and found it perfect for my needs so never changed. Visual
Studio now has a profiler built in, but I haven't played with it
really other than that it shows diagnostics like CPU and memory usage
when I run stuff from the IDE.

Of course beware - your optimiser will agressively inline things, so I
would often find that whole classes had 0 execution time because they
had been totally optimised away. This is of course a good thing as it
is the optimiser doing its job, but it just means a little care must
be taken when interpretting profiler info.

OK.  Thanks for that profiling advice.

Alan
__________________________
Alan W. Irwin

Astronomical research affiliation with Department of Physics and Astronomy,
University of Victoria (astrowww.phys.uvic.ca).

Programming affiliations with the FreeEOS equation-of-state
implementation for stellar interiors (freeeos.sf.net); the Time
Ephemerides project (timeephem.sf.net); PLplot scientific plotting
software package (plplot.sf.net); the libLASi project
(unifont.org/lasi); the Loads of Linux Links project (loll.sf.net);
and the Linux Brochure Project (lbproject.sf.net).
__________________________

Linux-powered Science
__________________________

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Plplot-devel mailing list
Plplot-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/plplot-devel

Reply via email to