David Goldsmith wrote:
> Tim Hochberg wrote:
>
>> I periodically need to perform various linalg operations on a large
>> number of small matrices. The normal numpy approach of stacking up the
>> data into an extra dimension and doing the operations in parallel
>> doesn't work because the linalg functions are only designed to work on
>> one matrix at a time. At first glance, this restriction seems harmless
>> enough since the underlying LAPACK functions are also designed to work
>> one matrix at a time and it seem that there would be little to be gained
>> by parallelizing the Python level code. However, this turns out to be
>> the case.
>>
>> I have some code, originally written for numarray, but I recently ported
>> it over to numpy, that rewrites the linalg function determinant, inverse
>> and solve_linear_equations[1] to operate of 2 or 3D arrays, treating 3D
>> arrays as stacks of 2D arrays. For operating on large numbers of small
>> arrays (for example 1000 2x2 arrays), this approach is over an order of
>> magnitude faster than the obvious map(inverse, threeDArray) approach.
>>
>> So, some questions:
>>
>> 1. Is there any interest in this code for numpy.linalg? I'd be
>> willing to clean it up and work on the other functions in linalg so that
>> they were all consistent. Since these changes should all be backwards
>> compatible (unless your counting on catching an error from one of the
>> linalg functions), I don't see this as a problem to put in after 1.0, so
>> there's really no rush on this. I'm only bringing it up now since I just
>> ported it over to numpy and it's fresh in my mind.
>>
>>
> Say "no" to someone offering to make a Python module more feature rich?
> Blasphemy! :-)
>
> But I am very curious as to the source/basis of the performance
> improvement - did you figure this out?
>
Yes and no. Here's what the begining of linalg.det looks lik:
a = asarray(a)
_assertRank2(a)
_assertSquareness(a)
t, result_t = _commonType(a)
a = _fastCopyAndTranspose(t, a)
n = a.shape[0]
if isComplexType(t):
lapack_routine = lapack_lite.zgetrf
else:
lapack_routine = lapack_lite.dgetrf
# ...
There's quite a few function calls here, since the actual call to dgetrf
probably takes neglible time for a 2x2 array, the cost of computation
here is dominated by function call overhead, creating the return arrays
and such not. Which is the biggest culprit, I'm not sure; by vectorizing
it, much of that overhead is only incurred once rather than hundreds or
thousands of times, so it all becomes negligible at that point. I didn't
bother to track down the worst offender.
-tim
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Numpy-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/numpy-discussion