[matplotlib-devel] Large datasets performance....

2009-06-17 Thread vehemental

Hello,

I'm using matplotlib for various tasks beautifully...but on some occasions,
I have to visualize large datasets (in the range of 10M data points) (using
imshow or regular plots)...system start to choke a bit at that point...

I would like to be consistent somehow and not use different tools for
basically similar tasks...
so I'd like some pointers regarding rendering performance...as I would be
interested to be involved in dev is there is something to be done

To active developers, what's the general feel does matplotlib have room to
spare in its rendering performance?...
or is it pretty tied down to the speed of Agg right now?
Is there something to gain from using the multiprocessing module now
included by default in 2.6?
or even go as far as using something like pyGPU for fast vectorized
computations...?

I've seen around previous discussions about OpenGL being a backend in some
future...
would it really stand up compared to the current backends? is there clues
about that right now?

thanks for any inputs! :D
bye
-- 
View this message in context: 
http://www.nabble.com/Large-datasets-performance-tp24074329p24074329.html
Sent from the matplotlib - devel mailing list archive at Nabble.com.


--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects
___
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-devel


Re: [matplotlib-devel] Large datasets performance....

2009-06-17 Thread Nicolas Rougier

Hello,

To give you some hints on performances using OpenGL, you can have a look
at glumpy: http://www.loria.fr/~rougier/tmp/glumpy.tgz
(It requires pyglet for the OpenGL backend).

It is not yet finished but it is usable. Current version allows to
visualize static numpy float32 array up to 8000x8000  and dynamic numpy
float32 array around 500x500 depending on GPU hardware (dynamic means
that you update image at around 30 fps/second).

The idea behind glumpy is to directly translate a numpy array into a
texture and to use shaders to make the colormap transformation and
filtering (nearest, bilinear or bicubic).

Nicolas



On Wed, 2009-06-17 at 07:02 -0700, vehemental wrote:
> Hello,
> 
> I'm using matplotlib for various tasks beautifully...but on some occasions,
> I have to visualize large datasets (in the range of 10M data points) (using
> imshow or regular plots)...system start to choke a bit at that point...
> 
> I would like to be consistent somehow and not use different tools for
> basically similar tasks...
> so I'd like some pointers regarding rendering performance...as I would be
> interested to be involved in dev is there is something to be done
> 
> To active developers, what's the general feel does matplotlib have room to
> spare in its rendering performance?...
> or is it pretty tied down to the speed of Agg right now?
> Is there something to gain from using the multiprocessing module now
> included by default in 2.6?
> or even go as far as using something like pyGPU for fast vectorized
> computations...?
> 
> I've seen around previous discussions about OpenGL being a backend in some
> future...
> would it really stand up compared to the current backends? is there clues
> about that right now?
> 
> thanks for any inputs! :D
> bye


--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects
___
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-devel


Re: [matplotlib-devel] Large datasets performance....

2009-06-17 Thread Michael Droettboom
vehemental wrote:
> Hello,
>
> I'm using matplotlib for various tasks beautifully...but on some occasions,
> I have to visualize large datasets (in the range of 10M data points) (using
> imshow or regular plots)...system start to choke a bit at that point...
>   
The first thing I would check is whether your system becomes starved for 
memory at this point and virtual memory swapping kicks in.

A common technique for faster plotting of image data is to downsample it 
before passing it to matplotlib.  Same with line plots -- they can be 
decimated.  There is newer/faster path simplification code in SVN trunk 
that may help with complex line plots (when the path.simplify rcParam is 
True).  I would suggest starting with that as a baseline to see how much 
performance it already gives over the released version.
> I would like to be consistent somehow and not use different tools for
> basically similar tasks...
> so I'd like some pointers regarding rendering performance...as I would be
> interested to be involved in dev is there is something to be done
>
> To active developers, what's the general feel does matplotlib have room to
> spare in its rendering performance?...
>   
I've spent a lot of time optimizing the Agg backend (which is already 
one of the fastest software-only approaches out there), and I'm out of 
obvious ideas.  But a fresh set of eyes may find new things.  An 
advantage of Agg that shouldn't be overlooked is that is works 
identically everywhere.
> or is it pretty tied down to the speed of Agg right now?
> Is there something to gain from using the multiprocessing module now
> included by default in 2.6?
>   
Probably not.  If the work of rendering were to be divided among cores, 
that would probably be done at the C++ level anyway to see any gains.  
As it is, the problem with plotting many points generally tends to be 
limited by memory bandwidth anyway, not processor speed.
> or even go as far as using something like pyGPU for fast vectorized
> computations...?
>   
Perhaps.  But again, the computation isn't the bottleneck -- it's 
usually a memory bandwidth starvation issue in my experience.  Using a 
GPU may only make matters worse.  Note that I consider that approach 
distinct from just using OpenGL to colormap and render the image as a 
texture.  That approach may bear some fruit -- but only for image 
plots.  Vector graphics acceleration with GPUs is still difficult to do 
in high quality across platforms and chipsets and beat software for speed.
> I've seen around previous discussions about OpenGL being a backend in some
> future...
>   
> would it really stand up compared to the current backends? is there clues
> about that right now?
>
> thanks for any inputs! :D
> bye
>   
Hope this helps,
Mike

-- 
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA


--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects
___
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-devel


Re: [matplotlib-devel] Large datasets performance....

2009-06-17 Thread Jimmy Paillet
2009/6/17 Michael Droettboom 

> vehemental wrote:
>
>> Hello,
>>
>> I'm using matplotlib for various tasks beautifully...but on some
>> occasions,
>> I have to visualize large datasets (in the range of 10M data points)
>> (using
>> imshow or regular plots)...system start to choke a bit at that point...
>>
>>
> The first thing I would check is whether your system becomes starved for
> memory at this point and virtual memory swapping kicks in.


the python process is sitting around a 300Mo of memory comsumptionthere
should plenty of memory left...
but I will look more closely to what's happenning...
I would assume the Memory bandwidth to not be very high, given the cheapness
of the comp i' m using :D

>
>
> A common technique for faster plotting of image data is to downsample it
> before passing it to matplotlib.  Same with line plots -- they can be
> decimated.  There is newer/faster path simplification code in SVN trunk that
> may help with complex line plots (when the path.simplify rcParam is True).
>  I would suggest starting with that as a baseline to see how much
> performance it already gives over the released version.


yes totally make sense...no need to visualize 3 millions points if you can
only display 200 000
I'm already doing that to some extent, but it's taking time on its own...but
at least I have solutions to reduce this time if needed
i' ll try the SVN versionsee if I can extract some improvements


>
>  I would like to be consistent somehow and not use different tools for
>> basically similar tasks...
>> so I'd like some pointers regarding rendering performance...as I would be
>> interested to be involved in dev is there is something to be done
>>
>> To active developers, what's the general feel does matplotlib have room to
>> spare in its rendering performance?...
>>
>>
> I've spent a lot of time optimizing the Agg backend (which is already one
> of the fastest software-only approaches out there), and I'm out of obvious
> ideas.  But a fresh set of eyes may find new things.  An advantage of Agg
> that shouldn't be overlooked is that is works identically everywhere.
>
>> or is it pretty tied down to the speed of Agg right now?
>> Is there something to gain from using the multiprocessing module now
>> included by default in 2.6?
>>
>>
> Probably not.  If the work of rendering were to be divided among cores,
> that would probably be done at the C++ level anyway to see any gains.  As it
> is, the problem with plotting many points generally tends to be limited by
> memory bandwidth anyway, not processor speed.
>
>> or even go as far as using something like pyGPU for fast vectorized
>> computations...?
>>
>>
> Perhaps.  But again, the computation isn't the bottleneck -- it's usually a
> memory bandwidth starvation issue in my experience.  Using a GPU may only
> make matters worse.  Note that I consider that approach distinct from just
> using OpenGL to colormap and render the image as a texture.  That approach
> may bear some fruit -- but only for image plots.  Vector graphics
> acceleration with GPUs is still difficult to do in high quality across
> platforms and chipsets and beat software for speed.
>


So if I hear you correctly, the Matplotlib/Agg combination is not terribly
slower that would be a C plotting lib using Agg as well to render...
and we are talking more about hardware limitations, right?


>
>  I've seen around previous discussions about OpenGL being a backend in some
>> future...
>>  would it really stand up compared to the current backends? is there clues
>> about that right now?
>>
>
Thanks Nicolas, I' ll take a closer look at GLnumpy
I can probably gather some info by making a comparison of an imshow to the
equivalent in OGL



>
>> thanks for any inputs! :D
>> bye
>>
>>
> Hope this helps,


it did! thanks
jimmy


>
> Mike
>
> --
> Michael Droettboom
> Science Software Branch
> Operations and Engineering Division
> Space Telescope Science Institute
> Operated by AURA for NASA
>
>
--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects___
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-devel


Re: [matplotlib-devel] Large datasets performance....

2009-06-17 Thread Gökhan SEVER
On Wed, Jun 17, 2009 at 9:25 AM, Nicolas Rougier
wrote:

>
> Hello,
>
> To give you some hints on performances using OpenGL, you can have a look
> at glumpy: 
> http://www.loria.fr/~rougier/tmp/glumpy.tgz
> (It requires pyglet for the OpenGL backend).
>
> It is not yet finished but it is usable. Current version allows to
> visualize static numpy float32 array up to 8000x8000  and dynamic numpy
> float32 array around 500x500 depending on GPU hardware (dynamic means
> that you update image at around 30 fps/second).
>
> The idea behind glumpy is to directly translate a numpy array into a
> texture and to use shaders to make the colormap transformation and
> filtering (nearest, bilinear or bicubic).
>
> Nicolas


Nicholas,

How do you run a the demo scripts in glumpy?

I get errors both with Ipython run and python script_name.py

In [1]: run demo-simple.py
---
AttributeErrorTraceback (most recent call last)

/home/gsever/glumpy/demo-simple.py in ()
 20 #
 21 #
-
---> 22 import glumpy
 23 import numpy as np
 24 import pyglet, pyglet.gl as gl

/home/gsever/glumpy/glumpy/__init__.py in ()
 23 import colormap
 24 from color import Color
---> 25 from image import Image
 26 from trackball import Trackball
 27 from app import app, proxy

/home/gsever/glumpy/glumpy/image.py in ()
 25
 26
---> 27 class Image(object):
 28 ''' '''
 29 def __init__(self, Z, format=None, cmap=colormap.IceAndFire,
vmin=None,

/home/gsever/glumpy/glumpy/image.py in Image()
119 return self._cmap
120
--> 121 @cmap.setter
122 def cmap(self, cmap):
123 ''' Colormap to be used to represent the array. '''

AttributeError: 'property' object has no attribute 'setter'
WARNING: Failure executing file: 





[gse...@ccn glumpy]$ python demo-cube.py
Traceback (most recent call last):
  File "demo-cube.py", line 22, in 
import glumpy
  File "/home/gsever/glumpy/glumpy/__init__.py", line 25, in 
from image import Image
  File "/home/gsever/glumpy/glumpy/image.py", line 27, in 
class Image(object):
  File "/home/gsever/glumpy/glumpy/image.py", line 121, in Image
@cmap.setter
AttributeError: 'property' object has no attribute 'setter'


Have Python 2.5.2...
--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects___
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-devel


Re: [matplotlib-devel] Large datasets performance....

2009-06-17 Thread Nicolas Rougier



I think the setter method is available in python 2.6 only. I modified
sources and put them at same place. It should be ok now.

Nicolas


On Wed, 2009-06-17 at 10:10 -0500, Gökhan SEVER wrote:
> On Wed, Jun 17, 2009 at 9:25 AM, Nicolas Rougier
>  wrote:
> 
> Hello,
> 
> To give you some hints on performances using OpenGL, you can
> have a look
> at glumpy: http://www.loria.fr/~rougier/tmp/glumpy.tgz
> (It requires pyglet for the OpenGL backend).
> 
> It is not yet finished but it is usable. Current version
> allows to
> visualize static numpy float32 array up to 8000x8000  and
> dynamic numpy
> float32 array around 500x500 depending on GPU hardware
> (dynamic means
> that you update image at around 30 fps/second).
> 
> The idea behind glumpy is to directly translate a numpy array
> into a
> texture and to use shaders to make the colormap transformation
> and
> filtering (nearest, bilinear or bicubic).
> 
> Nicolas
> 
> Nicholas,
> 
> How do you run a the demo scripts in glumpy?
> 
> I get errors both with Ipython run and python script_name.py 
> 
> In [1]: run demo-simple.py
> ---
> AttributeErrorTraceback (most recent call
> last)
> 
> /home/gsever/glumpy/demo-simple.py in ()
>  20 #
>  21 #
> -
> ---> 22 import glumpy
>  23 import numpy as np
>  24 import pyglet, pyglet.gl as gl
> 
> /home/gsever/glumpy/glumpy/__init__.py in ()
>  23 import colormap
>  24 from color import Color
> ---> 25 from image import Image
>  26 from trackball import Trackball
>  27 from app import app, proxy
> 
> /home/gsever/glumpy/glumpy/image.py in ()
>  25 
>  26 
> ---> 27 class Image(object):
>  28 ''' '''
>  29 def __init__(self, Z, format=None,
> cmap=colormap.IceAndFire, vmin=None,
> 
> /home/gsever/glumpy/glumpy/image.py in Image()
> 119 return self._cmap
> 120 
> --> 121 @cmap.setter
> 122 def cmap(self, cmap):
> 123 ''' Colormap to be used to represent the array. '''
> 
> AttributeError: 'property' object has no attribute 'setter'
> WARNING: Failure executing file: 
> 
> 
> 
> 
> 
> [gse...@ccn glumpy]$ python demo-cube.py 
> Traceback (most recent call last):
>   File "demo-cube.py", line 22, in 
> import glumpy
>   File "/home/gsever/glumpy/glumpy/__init__.py", line 25, in 
> from image import Image
>   File "/home/gsever/glumpy/glumpy/image.py", line 27, in 
> class Image(object):
>   File "/home/gsever/glumpy/glumpy/image.py", line 121, in Image
> @cmap.setter
> AttributeError: 'property' object has no attribute 'setter'
> 
> 
> Have Python 2.5.2... 
> 
> 


--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects
___
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-devel


Re: [matplotlib-devel] Large datasets performance....

2009-06-17 Thread Jimmy Paillet
The demo-animation.py worked beautifully out of the box at 150fps
I upped a bit the array size to 1200x1200...still around 40fps...

very interesting...

jimmy



2009/6/17 Jimmy Paillet 

>
>
> 2009/6/17 Michael Droettboom 
>
>> vehemental wrote:
>>
>>> Hello,
>>>
>>> I'm using matplotlib for various tasks beautifully...but on some
>>> occasions,
>>> I have to visualize large datasets (in the range of 10M data points)
>>> (using
>>> imshow or regular plots)...system start to choke a bit at that point...
>>>
>>>
>> The first thing I would check is whether your system becomes starved for
>> memory at this point and virtual memory swapping kicks in.
>
>
> the python process is sitting around a 300Mo of memory comsumptionthere
> should plenty of memory left...
> but I will look more closely to what's happenning...
> I would assume the Memory bandwidth to not be very high, given the
> cheapness of the comp i' m using :D
>
>>
>>
>> A common technique for faster plotting of image data is to downsample it
>> before passing it to matplotlib.  Same with line plots -- they can be
>> decimated.  There is newer/faster path simplification code in SVN trunk that
>> may help with complex line plots (when the path.simplify rcParam is True).
>>  I would suggest starting with that as a baseline to see how much
>> performance it already gives over the released version.
>
>
> yes totally make sense...no need to visualize 3 millions points if you can
> only display 200 000
> I'm already doing that to some extent, but it's taking time on its
> own...but at least I have solutions to reduce this time if needed
> i' ll try the SVN versionsee if I can extract some improvements
>
>
>>
>>  I would like to be consistent somehow and not use different tools for
>>> basically similar tasks...
>>> so I'd like some pointers regarding rendering performance...as I would be
>>> interested to be involved in dev is there is something to be done
>>>
>>> To active developers, what's the general feel does matplotlib have room
>>> to
>>> spare in its rendering performance?...
>>>
>>>
>> I've spent a lot of time optimizing the Agg backend (which is already one
>> of the fastest software-only approaches out there), and I'm out of obvious
>> ideas.  But a fresh set of eyes may find new things.  An advantage of Agg
>> that shouldn't be overlooked is that is works identically everywhere.
>>
>>> or is it pretty tied down to the speed of Agg right now?
>>> Is there something to gain from using the multiprocessing module now
>>> included by default in 2.6?
>>>
>>>
>> Probably not.  If the work of rendering were to be divided among cores,
>> that would probably be done at the C++ level anyway to see any gains.  As it
>> is, the problem with plotting many points generally tends to be limited by
>> memory bandwidth anyway, not processor speed.
>>
>>> or even go as far as using something like pyGPU for fast vectorized
>>> computations...?
>>>
>>>
>> Perhaps.  But again, the computation isn't the bottleneck -- it's usually
>> a memory bandwidth starvation issue in my experience.  Using a GPU may only
>> make matters worse.  Note that I consider that approach distinct from just
>> using OpenGL to colormap and render the image as a texture.  That approach
>> may bear some fruit -- but only for image plots.  Vector graphics
>> acceleration with GPUs is still difficult to do in high quality across
>> platforms and chipsets and beat software for speed.
>>
>
>
> So if I hear you correctly, the Matplotlib/Agg combination is not terribly
> slower that would be a C plotting lib using Agg as well to render...
> and we are talking more about hardware limitations, right?
>
>
>>
>>  I've seen around previous discussions about OpenGL being a backend in
>>> some
>>> future...
>>>  would it really stand up compared to the current backends? is there
>>> clues
>>> about that right now?
>>>
>>
> Thanks Nicolas, I' ll take a closer look at GLnumpy
> I can probably gather some info by making a comparison of an imshow to the
> equivalent in OGL
>
>
>
>>
>>> thanks for any inputs! :D
>>> bye
>>>
>>>
>> Hope this helps,
>
>
> it did! thanks
> jimmy
>
>
>>
>> Mike
>>
>> --
>> Michael Droettboom
>> Science Software Branch
>> Operations and Engineering Division
>> Space Telescope Science Institute
>> Operated by AURA for NASA
>>
>>
>
--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects___
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-devel


Re: [matplotlib-devel] Large datasets performance....

2009-06-17 Thread Ludwig Schwardt
Hi,

On this subject, one program that has pretty impressive interactive
visualisation is the venerable snd
(http://ccrma.stanford.edu/software/snd/). It displays hours of audio
in a flash and allows you pan and zoom the signal without a hitch. It
only plots an envelope of the audio signal at first, and shows more
and more detail as you zoom in.

Jimmy's comment that there's no need to visualize 3 million points if
you can only display 200 000 is even more true for time signals, where
you can typically only display 1000 to 2000 samples (i.e. the number
of horizontal pixels).

Does the new path simplification code use a similar approach to snd?
I've always wanted something like that in matplotlib... :-)

Regards,
Ludwig

--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects
___
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-devel


Re: [matplotlib-devel] Large datasets performance....

2009-06-17 Thread Michael Droettboom
Ludwig Schwardt wrote:
> Does the new path simplification code use a similar approach to snd?
> I've always wanted something like that in matplotlib... :-)
>
>   
Not knowing the details of what snd is doing, I would say "probably".  
The general idea is to remove points on-the-fly that do not change the 
appearance of the plot at the given resolution.  Spending the time to do 
this at the front speeds up the path stroking immensely as it has fewer 
vertices and therefore fewer self-intersections to compute.  I suspect 
what matplotlib is doing is a little more general, and therefore not 
quite as efficient as snd, because it can't assume a 1-dimensional time 
series.

To give credit where it is due, the path simplification was originally 
written by Allan Haldane and has been in matplotlib for some time.  The 
recent work has been to fix some bugs when dealing with some degenerate 
cases, to improve its performance, greatly improve the clipping 
algorithm and allow the tolerance to be user-configurable.

Mike

-- 
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA


--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects
___
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-devel