Re: [Cython] Arrays of Python Objects

Matthew Bromberg Fri, 08 Jan 2010 17:00:09 -0800

Thanks for the mini example and timings. Very instructive. By makingCMat an object encapsulated in CachedCMatyou avoided the 'self destruct' problem, in that the self object isdoomed once it reaches __dealloc__.

The only annoying thing about that is that one has to carry aroundCachedCMat along with the CMat object when it's being used or possiblyrewrite the CMat class itself to do the extra indirection when you aredoing operations such as add, multiply, qr etc.

The only Python object in my cmat class is an ndarray, so just copyinga reference to that object and a couple of C pointers to a containerclass in the hash presumably shouldn't be that bad and accomplishesnearly the same thing. It does create the overhead of creating thecontainer class whenever a cmat get's hashed. However my cython timingsseem to suggest that that's not such a big deal.

I did some more experiments by the way and discovered that myprobability of cache miss was 38%. That's still too high.I either need to cache multiple matrices of the same size or move to adifferent memory management scheme. I've used a circular buffer in thepast for temp matrices somewhat successfully, but you have to size yourbuffer pretty well to avoid overwriting live objects.

I'm not certain but it may be that cython 0.12 runs the code a bitfaster as well. More tests on Monday I guess.


-Matt

On 1/8/2010 2:07 PM, Robert Bradshaw wrote:

On Jan 7, 2010, at 2:04 PM, Matthew wrote:
OK I'm going to give the __new__ hack a try from
http://trac.cython.org/cython_trac/ticket/238
I don't really need to overload __new__ do I, so I don't have tochange matrix.pxd?
Sorry, this is the ticket number that I meant to refer to:http://trac.cython.org/cython_trac/ticket/443 , though this takes noarguments, so may not apply to you.
The vsipl vendor tells me that the only real expensive operation isthe cblockbind() within __cinit__(). However the __dealloc()__ routineis very expensive as well (Given the number of times it's beingcalled). It would be nice if I could profile on a line byline basis. I'm not sure if the python cProfile tool supports thisor not.
It does, but we don't have that implemented in Cython yet. Given thatit's a deterministic rather than (external) probabilistic profiler,the profiling itself may significantly impact the speed and results.Try commenting stuff out, or factoring it into an (inline) function.
I can't just chalk up this result to the vsipl code, since the hashroutine is not giving me any performance gain and seemed to be makingthings worse. (Though I probably need to do some more debugging tosee if I have a lot of cache misses,or some bug in my logic.)
Wel, maybe hashing is slightly more expensive than the vsipl call. Ona completely unrelated note, getting data to/from a GPU can be abottleneck as well, and due to its asynchronous nature may not show upas obviously in the main CPU profiling results.
For the life of me I could not figure out how to just put the matrixobject itself into my hash indexed memory cache. It seemed like mypython objects were always being garbage collected once I hit the__dealloc__ routine (the self.arr ndarray as an example). Later Ifound out that the cython class get's stripped of it's attributes ifit's stored in a dictionary. Only those attributes written to theclasses internal dictionary in the __init__() method seem to getsaved, as far as I can tell from my experiments. Of course I'dactually like to avoid calling __init__(). I really didn't intend totry to learn the internals of python or cython for that matter, butI do need to figure
out how to optimize this code.
I think you're trying to make things way more complicated thannecessary. The easiest approach is to only expose wrapper classes, andcache the expensive initalization in an internal class. See
http://sage.math.washington.edu/home/robertwb/cython/mat.html
http://sage.math.washington.edu/home/robertwb/cython/mat.pyx
(I'm sure there's some more room for optimization, and the cachingalgorithm could be improved as well.) Also, note that creating thenumpy arrays is expensive as well.
In [1]: from mat import *

In [2]: %time make_np(10**5)
CPU times: user 0.56 s, sys: 0.43 s, total: 0.99 s
Wall time: 0.99 s

In [4]: %time make_CMat(10**5)
CPU times: user 0.68 s, sys: 0.45 s, total: 1.13 s
Wall time: 1.14 s

In [6]: %time make_CachedCMat(10**5)
CPU times: user 0.14 s, sys: 0.00 s, total: 0.14 s
Wall time: 0.14 s

In [8]: %time make_Empty(10**5)
CPU times: user 0.02 s, sys: 0.00 s, total: 0.02 s
Wall time: 0.02 s

- Robert



No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 9.0.725 / Virus Database: 270.14.130/2607 - Release Date: 01/08/10 
02:35:00

_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Re: [Cython] Arrays of Python Objects

Reply via email to