Re: Speed up suggestion for task #7807.

Edward d'Auvergne Tue, 10 Jun 2014 07:30:42 -0700

Here is an example of avoiding automatic numpy data structure creation
and then garbage collection:


"""
from numpy import add, ones, zeros

a = zeros((5, 4))
a[1] = 1
a[:,1] = 2

b = ones((5, 4))

add(a, b, a)
print(a)
"""

The result is:

[[ 1.  3.  1.  1.]
 [ 2.  3.  2.  2.]
 [ 1.  3.  1.  1.]
 [ 1.  3.  1.  1.]
 [ 1.  3.  1.  1.]]

The out argument for numpy.add() is used here to operate in a similar
way to the Python "+=" operation.  But it avoids the temporary numpy
data structures that the Python "+=" operation will create.  This will
save a lot of time in the dispersion code.

Regards,

Edward


On 10 June 2014 15:56, Edward d'Auvergne <[email protected]> wrote:
> Hi Troels,
>
> Here is one suggestion, of many that I have, for significantly
> improving the speed of the analytic dispersion models in your
> 'disp_spin_speed' branch.  The speed ups you have currently achieved
> for spin clusters are huge and very impressive.  But now that you have
> the infrastructure in place, you can advance this much more!
>
> The suggestion has to do with the R20, R20A, and R20B numpy data
> structures.  They way they are currently handled is relatively
> inefficient, in that they are created de novo for each function call.
> This means that memory allocation and Python garbage collection
> happens for every single function call - something which should be
> avoided at almost all costs.
>
> A better way to do this would be to have a self.R20_struct,
> self.R20A_struct, and self.R20B_struct created in __init__(), and then
> to pack in the values from the parameter vector into these structures.
> You could create a special structure in __init__() for this.  It would
> have the dimensions [r20_index][ei][si][mi][oi], where the first
> dimension corresponds to the different R20 parameters.  And for each
> r20_index element, you would have ones at the [ei][si][mi][oi]
> positions where you would like R20 to be, and zeros elsewhere.  The
> key is that this is created at the target function start up, and not
> for each function call.
>
> This would be combined with the very powerful 'out' argument set to
> self.R20_struct with the numpy.add() and numpy.multiply() functions to
> prevent all memory allocations and garbage collection.  Masks could be
> used, but I think that that would be much slower than having special
> numpy structures with ones where R20 should be and zeros elsewhere.
> For just creating these structures, looping over a single r20_index
> loop and multiplying by the special [r20_index][ei][si][mi][oi]
> one/zero structure and using numpy.add() and numpy.multiply() with out
> arguments would be much, much faster than masks or the current
> R20_axis logic.  It will also simplify the code.
>
> Regards,
>
> Edward

_______________________________________________
relax (http://www.nmr-relax.com)

This is the relax-devel mailing list
[email protected]

To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-devel

Re: Speed up suggestion for task #7807.

Reply via email to