Mike, Thanks for the quick response.
I was wrong as usual: the masked array overhead in your original version of the path initializer was actually small. I misinterpreted the kcachegrind display. Rats! I was hoping for a big gain. It looks like anything that makes a huge number of paths is going to be slow, no matter what we do to try to optimize the path initializer. A partial solution for pcolor is pcolormesh, using the quadmesh extension code, although that still has a bug. (Paul Kienzle was going to look into it.) Is the quadmesh extension compatible with your transforms branch? My impression is that the transforms branch is going to be a big step forward, with performance improvements in some areas, at worst minor penalties in others--except for some problems like pcolor that need to be solved. In order to replace matlab in my application, a very fast interactive pcolor-type capability is absolutely essential. I think this simply has to be done via extension code, like quadmesh and the image codes. (Pcolor in the trunk isn't fast enough, either.) Unfortunately, I have found those codes hard to understand. Only the regular-grid image code is fully integrated into the trunk, and even it has a long-standing bug revealed by extreme zooming. The irregular-grid image routine might be a big help, but it has never been integrated. I don't remember which bugs it shares with quadmesh and image, if any. Eric Michael Droettboom wrote: > Eric Firing wrote: >> Mike, >> >> I made a quick test and took a quick look, and I certainly see a ripe >> mango within reach. I don't know what your constraints and strategy >> are, but I thought I would give you the off-the-cuff idea before I >> forget what I did. >> >> The test was pcolortest.py, and the kcachegrind input is the .log file. >> >> The problem is the path initializer: it is converting everything to a >> masked array, which in the vast majority of cases is not needed, and is >> very costly. > > Thanks for finding this. I agree completely. I think that was > basically a typo that ended up "working", just suboptimally. The input > to the path constructor may be either a numpy array, an ma array or a > regular Python sequence. If it's the first two, it should be left alone > (if there is an array mask, it is dealt with later on in the > constructor), but if the latter, it should be converted to a numpy array. > > What I meant to type was: > > if not ma.isMaskedArray(vertices): > vertices = npy.asarray(vertices, npy.float_) > > The argument against just "npy.asarray(vertices, npy.float_)" is that > the mask needs to be preserved. > > If I understand correctly, that will be essentially a no-op when the > input is a numpy array, albeit with the overhead of some checks. > >> We need to think carefully about the levels of API, and what should be >> done at which levels. One possibility is that at the level of the path >> initializer, only ordinary ndarrays should be acceptable--any mask >> manipulations and compressions should already have been done. This >> would require a helper function to generate the codes for that case. >> Another is that the path initializer could get a flag telling it whether >> to check for masked arrays. And another is that a check for existance >> of a mask should be done at the start, and the mask processing done only >> if there is a mask. > > This option was the intent. > >> Yet another is that if a mask is needed, it be >> passed in as an optional 1-D array kwarg. An advantage of this is that >> the code that calls the path initializer may be in a better position to >> know what is needed to generate the 1-D mask (that is, a mask for each >> (x,y) point rather than for x and y separately)--that mask may already >> be sitting around. > > Many of these options I fear would significantly complicate the code. > One of the driving motivations for the refactoring is to allow > transformations to be combined more generally. Think of the case where > you have a polar plot with a logarithmic scale on the r-axis (this > wasn't ever possible in the trunk). The log scale means that there is > potential for negative masked values, but the polar part of the > transformation shouldn't have to know or care whether masked values are > being passed through. Requiring it to do so would need the same checks > currently performed in the Path constructor, but they would be copied > all over the code in every kind of new transformation. > > FWIW, there already is a deliberate "quarantining" of masked arrays -- > it happens where the logical elements of the plot hit the drawing > commands of the plot (the Path object). It could have been implemented > such that the backends must understand masked arrays and draw > accordingly, but it proved to be faster (based on the simple_plot_fps.py > benchmark) to convert to a non-masked array with MOVETO codes upfront > and reuse that. (Not surprising, given the overhead of masked arrays). > This means that masked arrays are not used at all during panning and > zooming operations where speed is perhaps the most crucial. > >> Masked arrays are pretty clunky and slow. The maskedarray >> implementation by Pierre GM is nicer, more complete, and faster for many >> operations than numpy.ma, but it still adds a lot of overhead, >> especially for small arrays. (It needs to have its core in C; so far I >> have failed dismally in trying to understand how to do that without >> repeating the bulk of the ndarray code.) >> >> A related point: can you (or is it OK if I do it) change all the "import >> numpy.ma as ma" or whatever to "from matplotlib.numerix import npyma as >> ma"? The advantage is that it makes it easy to test the new version >> with either maskedarray or ma. This should be temporary; I am still >> hoping and expecting that maskedarray will replace ma in the core numpy >> distribution. > > That sounds like a very good idea. I'll go ahead and do this (on the > branch only). > > Cheers, > Mike > ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Matplotlib-devel mailing list Matplotlib-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/matplotlib-devel