Re: [Matplotlib-users] How to do million data-point plots with Matplotlib?

2011-12-16 Thread David Smith
I have experimented with path.simplify and can't see any appreciable
improvements.

Incidently, I also experimented with other back ends.  I found that all the
back ends
involving Agg behave similarly.  However, using the 'GTK' backend it
renders the
whole 1 million points and does it very fast (about 5x faster than Agg
backends).

I also found that gnuplot can render the whole million points very fast
using the
'x11' terminal.  I am guessing that both matplotlib's GTK backend and
gnuplot's
'x11' terminal use the hardware accelerated display driver.

David
--
Learn Windows Azure Live!  Tuesday, Dec 13, 2011
Microsoft is holding a special Learn Windows Azure training event for 
developers. It will provide a great way to learn Windows Azure and what it 
provides. You can attend the event by watching it streamed LIVE online.  
Learn more at http://p.sf.net/sfu/ms-windowsazure___
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users


Re: [Matplotlib-users] How to do million data-point plots with Matplotlib?

2011-12-15 Thread Michael Droettboom
On 12/10/2011 01:12 PM, David Smith wrote:
 I have been working on a program that uses Matplotlib to plot data
 consisting of around one million points.  Sometimes the plots succeed but
 often I get an exception: OverFlowError: Agg rendering complexity exceeded.
Are you sure path simplification is running?  (i.e. the rcParam 
path.simplify is True)?  That generally does a good job of removing 
excess points on the fly.  You shouldn't need a development version for 
this to work.  0.99.x or later should be adequate.  You're not going to 
see a million points at typical screen resolutions anyway.

 I can make this message go away by plotting the data in chunks as
 illustrated in the demo code below.  However, the extra code is a chore
 which I don't think should be necessary - I hope the developers will
 be able to fix this issue sometime soon.  I know that the development
 version has some modifications to addressing this issue.  I wonder if it is
 expected to make the problem go away?

 By the way, this plot takes about 30 seconds to render on my I7 2600k.
 The main program reaches the show() statement quickly and prints
 Done plotting?.   Then I see that the program reaches 100% usage
 on one CPU core (4 real, 8 virtual on the 2600k) until the plot is
 displayed.  I wonder if there is any way to persuade Matplotlib to run
 some of the chunks in parallel so as to use more CPU cores?
That would be great, but very difficult.  The Python parts of the 
problem are tricky to parallelize due to the GIL.  The Agg part of the 
problem will be difficult to parallelize unless there is a trivial way 
to chuck the plotted lines into parts before stroking -- each chunk 
could be rendered to its own buffer and then blended together in a final 
step.  But all that is academic at this point -- there's no code to do 
such a thing now.

 Plotting something other than random data, the plots run faster and
 the maximum chunk size is smaller.  The maximum chunk size
 also depends on the plot size - it is smaller for larger plots.  I am
 wondering if I could use this to plot course and fine versions of the
 plots.  The course plot is zoomed in version of the small-sized raster.
 That would be better than decimation as all the points would at least
 be there.
I think what you're seeing is the effect of the path simplification 
algorithm.  The number of points that it removes depends on the density 
of the points and the resolution of the output image.  It's hard to 
predict exactly how many points it will remove.

Mike

 Thanks in advance,

 David

 --- start code -
 ## Demo program shows how to chunk plots to avoid the exception:
 ##
 ##OverflowError: Agg rendering complexity exceeded.
 ##Consider downsampling or decimating your data.
 ##
 ## David Smith December 2011.

 from pylab import *
 import numpy as np

 nPts=600100
 x = np.random.rand(nPts)
 y = np.random.rand(nPts)

 ## This seems to always succeed if Npts= 2, but fails
 ## for Npts  3.  For points between, it sometimes succeeds
 ## and sometimes fails.
 figure(1)
 plot (x, y)

 ## Chunking the plot alway succeeds.
 figure(2)
 chunk_size=2
 iStarts=range(x.size/chunk_size)
 for iStart in iStarts:
  print Plotting chunk starting at %d\n % iStart
  plot(x[iStart:iStart+chunk_size], y[iStart:iStart+chunk_size], '-b')

 left_overs = nPts % chunk_size
 if left_overs  0:
  print Leftovers %d points\n % left_overs
  plot(x[-left_overs-1:], y[-left_overs-1:], '-r')

 print done plotting?
 show()
 -- end code 
 Please don't reply to this post It is rediculous to plot 1 million points on
 screen.  I am routinely capturing million-point traces from oscilloscopes and
 other test equipment and to I need to be able to spot features in the
 data (glitches if you will) that may not show up plotting decimated data.
 I can then zoom the plot to inspect these features.

 --
 Learn Windows Azure Live!  Tuesday, Dec 13, 2011
 Microsoft is holding a special Learn Windows Azure training event for
 developers. It will provide a great way to learn Windows Azure and what it
 provides. You can attend the event by watching it streamed LIVE online.
 Learn more at http://p.sf.net/sfu/ms-windowsazure
 ___
 Matplotlib-users mailing list
 Matplotlib-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/matplotlib-users


--
Learn Windows Azure Live!  Tuesday, Dec 13, 2011
Microsoft is holding a special Learn Windows Azure training event for 
developers. It will provide a great way to learn Windows Azure and what it 
provides. You can attend the event by watching it streamed LIVE online.  
Learn more at http://p.sf.net/sfu/ms-windowsazure