Here is an example of avoiding automatic numpy data structure creation and then garbage collection:
""" from numpy import add, ones, zeros a = zeros((5, 4)) a[1] = 1 a[:,1] = 2 b = ones((5, 4)) add(a, b, a) print(a) """ The result is: [[ 1. 3. 1. 1.] [ 2. 3. 2. 2.] [ 1. 3. 1. 1.] [ 1. 3. 1. 1.] [ 1. 3. 1. 1.]] The out argument for numpy.add() is used here to operate in a similar way to the Python "+=" operation. But it avoids the temporary numpy data structures that the Python "+=" operation will create. This will save a lot of time in the dispersion code. Regards, Edward On 10 June 2014 15:56, Edward d'Auvergne <[email protected]> wrote: > Hi Troels, > > Here is one suggestion, of many that I have, for significantly > improving the speed of the analytic dispersion models in your > 'disp_spin_speed' branch. The speed ups you have currently achieved > for spin clusters are huge and very impressive. But now that you have > the infrastructure in place, you can advance this much more! > > The suggestion has to do with the R20, R20A, and R20B numpy data > structures. They way they are currently handled is relatively > inefficient, in that they are created de novo for each function call. > This means that memory allocation and Python garbage collection > happens for every single function call - something which should be > avoided at almost all costs. > > A better way to do this would be to have a self.R20_struct, > self.R20A_struct, and self.R20B_struct created in __init__(), and then > to pack in the values from the parameter vector into these structures. > You could create a special structure in __init__() for this. It would > have the dimensions [r20_index][ei][si][mi][oi], where the first > dimension corresponds to the different R20 parameters. And for each > r20_index element, you would have ones at the [ei][si][mi][oi] > positions where you would like R20 to be, and zeros elsewhere. The > key is that this is created at the target function start up, and not > for each function call. > > This would be combined with the very powerful 'out' argument set to > self.R20_struct with the numpy.add() and numpy.multiply() functions to > prevent all memory allocations and garbage collection. Masks could be > used, but I think that that would be much slower than having special > numpy structures with ones where R20 should be and zeros elsewhere. > For just creating these structures, looping over a single r20_index > loop and multiplying by the special [r20_index][ei][si][mi][oi] > one/zero structure and using numpy.add() and numpy.multiply() with out > arguments would be much, much faster than masks or the current > R20_axis logic. It will also simplify the code. > > Regards, > > Edward _______________________________________________ relax (http://www.nmr-relax.com) This is the relax-devel mailing list [email protected] To unsubscribe from this list, get a password reminder, or change your subscription options, visit the list information page at https://mail.gna.org/listinfo/relax-devel

