OK I'm going to give the __new__ hack a try from
http://trac.cython.org/cython_trac/ticket/238
I don't really need to overload __new__ do I, so I don't have to change
matrix.pxd?
Just for completeness here is a recent profile on a very fast machine. (All
the other machines take 10 times as long. Hmm
they weren't running cython 0.12 either but 0.11.3)
1155781 function calls in 11.978 CPU seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
107520 2.928 0.000 3.009 0.000 matrix.pyx:569(__cinit__)
1024 2.223 0.002 2.632 0.003 matrix.pyx:1144(qr)
107516 2.009 0.000 2.091 0.000 matrix.pyx:604(__dealloc__)
40640 1.536 0.000 1.536 0.000 matrix.pyx:83(__dealloc__)
40640 1.170 0.000 1.176 0.000 matrix.pyx:37(__cinit__)
9920 0.337 0.000 0.394 0.000 matrix.pyx:1361(tomat)
3584 0.193 0.000 4.395 0.001 muddect.py:92(sigdelay)
19008 0.127 0.000 0.331 0.000 matrix.pyx:897(__mul__)
1792 0.076 0.000 3.669 0.002 muddect.py:151(despread)
### The calls to the hash table implemented via python dict are below, almost
two order of magnitudes less expensive than the
### matrix.pyx cinit in line 1. ####################
90304 0.074 0.000 0.074 0.000 memcache.pyx:56(hashcget)
91260 0.065 0.000 0.065 0.000 memcache.pyx:19(__cinit__)
3584 0.056 0.000 0.285 0.000 matrix.pyx:1050(fft)
3584 0.055 0.000 1.429 0.000 matrix.pyx:344(expj)
3584 0.054 0.000 0.285 0.000 matrix.pyx:1071(ifft)
7168 0.052 0.000 0.318 0.000 matrix.pyx:1324(r2cmat)
7168 0.052 0.000 0.620 0.000 matrix.pyx:861(__and__)
...
To give some perspective, my matlab code performing the same operation, is
running on octave on a single cpu
and takes about the same amount of time (13 seconds.). This cython application
is farmed out to a GPU with at least 240 processesors on it. I'd like to see
well over an order of magnitude improvement. With a few exceptions
the lions share of the cython processing is being sucked away on lousy memory
allocations. Even my qr decomposition is also getting
killed by allocations and memory transfers I believe.
Here is the __cinit__() and dealloc that dominates the time:
cdef class cmat :
""" Real matrix class binds to both vsipl and numpy ndarray.
The ndarray object is returned via getarr. The array is bound
back to vsipl via admit """
def __cinit__(self, int r, int c):
cdef vsip_cblock_f *myb
cdef int istrue
cdef np.ndarray[cfloat, ndim=2] carr
cdef tuple t = (r,c)
cdef bool isinhash
self.parent = None # assumes not a submatrix
if r <= 0 :
self.vm = NULL
return
isinhash = hashcget( r, c, self)
if isinhash :
# already done loading
return
# print "create an nd array float buffer"
carr = np.empty(shape=(r,c),dtype=np.complex64,order='Fortran')
# print "assigning to arr"
self.arr =<object> carr
# print " bind the ndarray to a vsipl block"
myb = vsip_cblockbind_f( <vsip_scalar_f *> carr.data, NULL, r * c,
VSIP_MEM_NONE)
if myb == NULL:
raise MemoryError("Could not allocate real matrix block")
# print " admit the data so that vsipl owns it"
istrue = vsip_cblockadmit_f( myb, 0)
# create a view
self.vm = vsip_cmbind_f( myb, 0, 1, r, r, c)
if self.vm == NULL:
raise MemoryError("Could not allocate real matrix view")
self.fftdata = NULL
def __dealloc__(self):
# print "freeing vm "
cdef hashcmat hc
cdef bool isin = True
if (self.vm) :
if self.parent is None :
# add this to the cache if it isn't already
hc = hashcmat(self)
if not hc.isin :
# saving it in the hash instead
return
vsip_cmalldestroy_f(self.vm)
else :
vsip_cmdestroy_f(self.vm)
# print "freeing fftdata"
if (self.fftdata) :
vsip_fft_destroy_f(self.fftdata)
The vsipl vendor tells me that the only real expensive operation is the
cblockbind() within __cinit__(). However the __dealloc()__ routine
is very expensive as well (Given the number of times it's being called). It
would be nice if I could profile on a line by
line basis. I'm not sure if the python cProfile tool supports this or not. I
can't just chalk up this result to the vsipl code, since the hash routine is
not giving me any performance gain and seemed to be making things worse.
(Though I probably need to do some more debugging to see if I have a lot of
cache misses,or some bug in my logic.)
For the life of me I could not figure out how to just put the matrix object
itself into my hash indexed memory cache. It seemed like my python objects were
always being garbage collected once I hit the __dealloc__ routine (the
self.arr ndarray as an example). Later I found out that the cython class get's
stripped of it's attributes if it's stored in a dictionary. Only those
attributes written to the classes internal dictionary in the __init__() method
seem to get saved, as far as I can tell from my experiments. Of course I'd
actually like to avoid calling __init__(). I really didn't intend to try to
learn the internals of python or cython for that matter, but I do need to
figure
out how to optimize this code.
Regards,
-Matt
Robert Bradshaw <rober...@...> writes:
>
> On Jan 7, 2010, at 8:59 AM, Matthew wrote:
>
> > Well I guess I outsmarted myself on this one. After implementing my
> > object hasher using dict,
> > my code slowed down by nearly a factor of 10 LOL.
> >
> > The hashing code itself shows up as a minimal contribution to the
> > overall time in the profiler.
> > However it does require a little extra logic in the initializer and
> > a copy of a few pointers.
> >
> > Basically I copy all the C device pointers to a new cython class
> > which then get's saved in a dictionary upon the deallocation
> > of a matrix object. If that object exists in the dictionary it is
> > given to the allocator instead of allocating a new block.
> >
> > I'm wondering just what kind of overhead Im up against with regards
> > to the python initialization of my cython defined class?
> > This almost suggests that the overhead problem is actually in python/
> > cython and not just in my C code.
>
> This could probably be discovered via some profiling. If this is the
> case, take a look at
>
> http://trac.cython.org/cython_trac/ticket/238 (the modern version of
> the former PY_NEW trick). Maybe you're passing lots of (keyword?)
> arguments?
>
> >
> > -Matt
> >
> > Robert Bradshaw <rober...@...> writes:
> >
> >>
> >> On Jan 7, 2010, at 12:05 AM, Stefan Behnel wrote:
> >>
> >>> Matthew Bromberg, 06.01.2010 21:50:
> >>>> How does tuple or list compare speed wise with dict?
> >>>
> >>> Like apples and oranges, basically.
> >>
> >> If you're trying to index into it with an int, especially a c int,
> >> lists and tuples will be much faster.
> >>
> >>>> Ultimately I have to hash into my list using size information.
> >>>
> >>> Any specific reason why you /can't/ use a dict?
> >>
> >> Which will likely be just as fast, if not faster, than hashing into a
> >> list manually yourself.
> >>
> >>>> This also still does not address my confusion with regards to how
> >>>> to
> >>>> capture a python object before it get's destroyed.
> >>>
> >>> As long as there is a reference to it (e.g. in the hash table), it
> >>> won't
> >>> get deallocated. So: use a Python list for your hash table, stop
> >>> caring
> >>> about ref-counts and it will just work.
> >>
> >> +1. Ideally, you should never have to worry about reference counts
> >> when working with Cython at all.
> >>
> >> I'm still not quite sure exactly what you're trying to do, but if
> >> it's
> >> creating and deleting thousands of these objects a second and that's
> >> killing you (the actual allocation/deallocation, not the
> >> initialization) then what you might want to do is something like
> >>
> >> http://hg.sagemath.org/sage-main/file/21efb0b3fc47/sage/rings/real_double.pyx#l2260
> >>
> >> which is a bit hackish and will probably need to be adapted to your
> >> specific case. If initializing is expensive, than you can probably
> >> keep around a pool of initalized pointers/buffers/whatever, and have
> >> the object creation just set/unset these fields (much cleaner).
> >>
> >> - Robert
> >>
> > _______________________________________________
> > Cython-dev mailing list
> > cython-...@...
> > http://codespeak.net/mailman/listinfo/cython-dev
>
>
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev