Note that you can find even more savings if you use back_calc as
temporary storage higher up the lib.dispersion module function.
Actually anywhere that the {NE, NS, NM, NO, ND} structures are created
and used, such a trick will save calculation time. Though you
probably cannot use it everywhere!
Regards,
Edward
On 11 June 2014 12:15, Edward d'Auvergne <[email protected]> wrote:
> And here is how to shave a few percent off the lib.dispersion code
> with the numpy ufuncs:
>
> Index: lib/dispersion/cr72.py
> ===================================================================
> --- lib/dispersion/cr72.py (revision 23825)
> +++ lib/dispersion/cr72.py (working copy)
> @@ -92,7 +92,7 @@
> """
>
> # Python module imports.
> -from numpy import allclose, arccosh, array, cos, cosh, isfinite,
> isnan, min, max, ndarray, ones, sqrt, sum, zeros
> +from numpy import add, allclose, arccosh, array, cos, cosh, isfinite,
> isnan, min, max, multiply, ndarray, ones, sqrt, sum, zeros
> from numpy.ma import masked_greater_equal
>
> # Repetitive calculations (to speed up calculations).
> @@ -211,17 +211,16 @@
> return
>
> # Calculate R2eff.
> - R2eff = r20_kex - cpmg_frqs * arccosh( fact )
> + multiply(cpmg_frqs, arccosh(fact), back_calc)
> + add(r20_kex, -back_calc, back_calc)
>
> # Replace data in array.
> if t_max_etapos:
> - R2eff[mask_max_etapos.mask] = r20a[mask_max_etapos.mask]
> + back_calc[mask_max_etapos.mask] = r20a[mask_max_etapos.mask]
>
> # Catch errors, taking a sum over array is the fastest way to check for
> # +/- inf (infinity) and nan (not a number).
> - if not isfinite(sum(R2eff)):
> + if not isfinite(sum(back_calc)):
> # Find the data mask which has nan values, and replace.
> - mask = isnan(R2eff)
> - R2eff[mask] = 1e100
> -
> - back_calc[:] = R2eff
> + mask = isnan(back_calc)
> + back_calc[mask] = 1e100
>
>
> Regards,
>
> Edward
>
>
> On 11 June 2014 12:12, Edward d'Auvergne <[email protected]> wrote:
>> And if you want to take this to the extreme, in __init__() define:
>>
>> self.dw_shape = (1, 1, self.NM, self.NO, self.ND)
>>
>> and then in the target function:
>>
>> self.dw_struct[:] = 1.0
>> multiply(self.dw_struct, tile(asarray(dw).reshape(self.NE,
>> self.NS)[:,:,None,None,None], self.dw_shape), self.dw_struct)
>> multiply(self.dw_struct, self.frqs_a2, self.dw_struct)
>>
>> These will speed things up by a few percent. It's a pity the
>> numpy.tile() function does not use the 'out' argument.
>>
>> Regards,
>>
>> Edward
>>
>>
>> On 11 June 2014 12:09, Edward d'Auvergne <[email protected]> wrote:
>>> Hi,
>>>
>>> Even faster is to use:
>>>
>>> """
>>> self.dw_struct[:] = 1.0
>>> multiply(self.dw_struct, tile(asarray(dw).reshape(self.NE,
>>> self.NS)[:,:,None,None,None], (1, 1, self.NM, self.NO, self.ND)),
>>> self.dw_struct)
>>> multiply(self.dw_struct, self.frqs_a2, self.dw_struct)
>>> """
>>>
>>> Where disp_struct and frqs_a are pre-multipled in the __init__()
>>> function, as that maths operation does not need to happen for each
>>> function call:
>>>
>>> self.frqs_a2 = self.disp_struct * self.frqs_a
>>>
>>> Regards,
>>>
>>> Edward
>>>
>>>
>>> On 11 June 2014 12:00, Edward d'Auvergne <[email protected]> wrote:
>>>> Hi,
>>>>
>>>> Oh well, I can see you've now have an implementation (new = False)
>>>> that beats mine when clustered :) You can use some of the ideas such
>>>> as the out ufunc argument and temporary storage to your advantage
>>>> nevertheless. For example you can use the out argument of these
>>>> ufuncs even more, replacing:
>>>>
>>>> """
>>>> self.dw_struct[:] = 1.0
>>>> self.dw_struct[:] = multiply(self.dw_struct,
>>>> tile(asarray(dw).reshape(self.NE, self.NS)[:,:,None,None,None], (1, 1,
>>>> self.NM, self.NO, self.ND)), ) * self.disp_struct * self.frqs_a
>>>> """
>>>>
>>>>
>>>> with:
>>>>
>>>> """
>>>> self.dw_struct[:] = 1.0
>>>> multiply(self.dw_struct, tile(asarray(dw).reshape(self.NE,
>>>> self.NS)[:,:,None,None,None], (1, 1, self.NM, self.NO, self.ND)),
>>>> self.dw_struct)
>>>> multiply(self.dw_struct, self.disp_struct, self.dw_struct)
>>>> multiply(self.dw_struct, self.frqs_a, self.dw_struct)
>>>> """
>>>>
>>>>
>>>> That shaves off a few milliseconds by avoiding automatic array
>>>> creation and destruction, with before:
>>>>
>>>> """
>>>> ('sfrq: ', 600000000.0, 'number of cpmg frq', 15, array([ 2., 6.,
>>>> 10., 14., 18., 22., 26., 30., 34., 38., 42.,
>>>> 46., 50., 54., 58.]))
>>>> ('sfrq: ', 800000000.0, 'number of cpmg frq', 20, array([ 2., 6.,
>>>> 10., 14., 18., 22., 26., 30., 34., 38., 42.,
>>>> 46., 50., 54., 58., 62., 66., 70., 74., 78.]))
>>>> ('sfrq: ', 900000000.0, 'number of cpmg frq', 22, array([ 2., 6.,
>>>> 10., 14., 18., 22., 26., 30., 34., 38., 42.,
>>>> 46., 50., 54., 58., 62., 66., 70., 74., 78., 82., 86.]))
>>>> ('chi2 cluster:', 0.0)
>>>> Wed Jun 11 11:45:42 2014 /tmp/tmpwkhLSr
>>>>
>>>> 198252 function calls (197150 primitive calls) in 1.499 seconds
>>>>
>>>> Ordered by: cumulative time
>>>>
>>>> ncalls tottime percall cumtime percall filename:lineno(function)
>>>> 1 0.000 0.000 1.499 1.499 <string>:1(<module>)
>>>> 1 0.001 0.001 1.499 1.499
>>>> profiling_cr72.py:449(cluster)
>>>> 1000 0.001 0.000 1.427 0.001 profiling_cr72.py:413(calc)
>>>> 1000 0.009 0.000 1.425 0.001
>>>> relax_disp.py:1020(func_CR72_full)
>>>> 1000 0.066 0.000 1.409 0.001
>>>> relax_disp.py:544(calc_CR72_chi2)
>>>> 1300 0.903 0.001 1.180 0.001 cr72.py:101(r2eff_CR72)
>>>> 2300 0.100 0.000 0.222 0.000 numeric.py:2056(allclose)
>>>> 3000 0.032 0.000 0.150 0.000 shape_base.py:761(tile)
>>>> 4000 0.104 0.000 0.104 0.000 {method 'repeat' of
>>>> 'numpy.ndarray' objects}
>>>> 11828 0.091 0.000 0.091 0.000 {method 'reduce' of
>>>> 'numpy.ufunc' objects}
>>>> 1 0.000 0.000 0.071 0.071
>>>> profiling_cr72.py:106(__init__)
>>>> 1 0.010 0.010 0.056 0.056
>>>> profiling_cr72.py:173(return_r2eff_arrays)
>>>> 1000 0.032 0.000 0.048 0.000 chi2.py:72(chi2_rankN)
>>>> 4609 0.005 0.000 0.045 0.000 fromnumeric.py:1762(any)
>>>> 2300 0.004 0.000 0.036 0.000 fromnumeric.py:1621(sum)
>>>> """
>>>>
>>>>
>>>> And after:
>>>>
>>>> """
>>>> ('sfrq: ', 600000000.0, 'number of cpmg frq', 15, array([ 2., 6.,
>>>> 10., 14., 18., 22., 26., 30., 34., 38., 42.,
>>>> 46., 50., 54., 58.]))
>>>> ('sfrq: ', 800000000.0, 'number of cpmg frq', 20, array([ 2., 6.,
>>>> 10., 14., 18., 22., 26., 30., 34., 38., 42.,
>>>> 46., 50., 54., 58., 62., 66., 70., 74., 78.]))
>>>> ('sfrq: ', 900000000.0, 'number of cpmg frq', 22, array([ 2., 6.,
>>>> 10., 14., 18., 22., 26., 30., 34., 38., 42.,
>>>> 46., 50., 54., 58., 62., 66., 70., 74., 78., 82., 86.]))
>>>> ('chi2 cluster:', 0.0)
>>>> Wed Jun 11 11:49:29 2014 /tmp/tmpML9Lx5
>>>>
>>>> 198252 function calls (197150 primitive calls) in 1.462 seconds
>>>>
>>>> Ordered by: cumulative time
>>>>
>>>> ncalls tottime percall cumtime percall filename:lineno(function)
>>>> 1 0.000 0.000 1.462 1.462 <string>:1(<module>)
>>>> 1 0.001 0.001 1.462 1.462
>>>> profiling_cr72.py:449(cluster)
>>>> 1000 0.001 0.000 1.393 0.001 profiling_cr72.py:413(calc)
>>>> 1000 0.009 0.000 1.392 0.001
>>>> relax_disp.py:1022(func_CR72_full)
>>>> 1000 0.056 0.000 1.376 0.001
>>>> relax_disp.py:544(calc_CR72_chi2)
>>>> 1300 0.887 0.001 1.158 0.001 cr72.py:101(r2eff_CR72)
>>>> 2300 0.097 0.000 0.217 0.000 numeric.py:2056(allclose)
>>>> 3000 0.031 0.000 0.148 0.000 shape_base.py:761(tile)
>>>> 4000 0.103 0.000 0.103 0.000 {method 'repeat' of
>>>> 'numpy.ndarray' objects}
>>>> 11828 0.090 0.000 0.090 0.000 {method 'reduce' of
>>>> 'numpy.ufunc' objects}
>>>> 1 0.000 0.000 0.068 0.068
>>>> profiling_cr72.py:106(__init__)
>>>> 1 0.010 0.010 0.053 0.053
>>>> profiling_cr72.py:173(return_r2eff_arrays)
>>>> 1000 0.031 0.000 0.047 0.000 chi2.py:72(chi2_rankN)
>>>> 4609 0.006 0.000 0.044 0.000 fromnumeric.py:1762(any)
>>>> 2300 0.004 0.000 0.036 0.000 fromnumeric.py:1621(sum)
>>>> """
>>>>
>>>>
>>>> The additional suggestions I didn't specify before was to use these
>>>> ufuncs with the out argument in the lib.dispersion modules themselves.
>>>> You don't need to create R2eff here, just pack it into back_calc!
>>>>
>>>> Regards,
>>>>
>>>> Edward
>>>>
>>>> On 11 June 2014 11:55, Troels Emtekær Linnet <[email protected]> wrote:
>>>>> Hi Edward.
>>>>>
>>>>> Some timings.
>>>>> Per spin, you have a faster method.
>>>>> But I win per cluster.
>>>>>
>>>>> 1000 iterations
>>>>> 1 / 100 spins
>>>>>
>>>>> Edward
>>>>> ncalls tottime percall cumtime percall filename:lineno(function)
>>>>> 1 0.000 0.000 0.523 0.523 <string>:1(<module>)
>>>>> ncalls tottime percall cumtime percall filename:lineno(function)
>>>>> 1 0.000 0.000 3.875 3.875 <string>:1(<module>)
>>>>>
>>>>> Troels Tile
>>>>> ncalls tottime percall cumtime percall filename:lineno(function)
>>>>> 1 0.000 0.000 0.563 0.563 <string>:1(<module>)
>>>>> ncalls tottime percall cumtime percall filename:lineno(function)
>>>>> 1 0.000 0.000 2.102 2.102 <string>:1(<module>)
>>>>>
>>>>> Troels Outer
>>>>> ncalls tottime percall cumtime percall filename:lineno(function)
>>>>> 1 0.000 0.000 0.546 0.546 <string>:1(<module>)
>>>>> ncalls tottime percall cumtime percall filename:lineno(function)
>>>>> 1 0.000 0.000 1.974 1.974 <string>:1(<module>)
>>>>>
>>>>> 2014-06-11 11:46 GMT+02:00 Troels Emtekær Linnet <[email protected]>:
>>>>>> Hi Edward.
>>>>>>
>>>>>> This is a really god page!
>>>>>> http://docs.scipy.org/doc/numpy/reference/ufuncs.html
>>>>>>
>>>>>> ""
>>>>>> Tip
>>>>>> The optional output arguments can be used to help you save memory for
>>>>>> large calculations. If your arrays are large, complicated expressions
>>>>>> can take longer than absolutely necessary due to the creation and
>>>>>> (later) destruction of temporary calculation spaces. For example, the
>>>>>> expression G = a * b + c is equivalent to t1 = A * B; G = T1 + C; del
>>>>>> t1. It will be more quickly executed as G = A * B; add(G, C, G) which
>>>>>> is the same as G = A * B; G += C.
>>>>>> ""
>>>>>>
>>>>>> 2014-06-10 23:08 GMT+02:00 Edward d'Auvergne <[email protected]>:
>>>>>>> Note that masks and numpy.ma.multiply() and numpy.ma.add() may speed
>>>>>>> this up even more. However due to overheads in the numpy masking,
>>>>>>> there is a chance that this also makes the dw and R20 data structure
>>>>>>> construction slower.
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Edward
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 10 June 2014 22:36, Edward d'Auvergne <[email protected]> wrote:
>>>>>>>> Hi Troels,
>>>>>>>>
>>>>>>>> To make things even simpler, here is what needs to be done for R20,
>>>>>>>> R20A and R20B:
>>>>>>>>
>>>>>>>> """
>>>>>>>> from numpy import abs, add, array, float64, multiply, ones, sum, zeros
>>>>>>>>
>>>>>>>> # Init mimic.
>>>>>>>> #############
>>>>>>>>
>>>>>>>> # Values from
>>>>>>>> Relax_disp.test_cpmg_synthetic_ns3d_to_cr72_noise_cluster.
>>>>>>>> NE = 1
>>>>>>>> NS = 2
>>>>>>>> NM = 2
>>>>>>>> NO = 1
>>>>>>>> ND = 8
>>>>>>>> R20A = array([ 9.984626320294867, 11.495327724693091,
>>>>>>>> 12.991028416082928, 14.498419290021163])
>>>>>>>> shape = (NE, NS, NM, NO, ND)
>>>>>>>>
>>>>>>>> # Final structure for lib.dispersion.
>>>>>>>> R20A_struct = zeros(shape, float64)
>>>>>>>>
>>>>>>>> # Temporary storage to avoid memory allocations and garbage collection.
>>>>>>>> R20A_temp = zeros(shape, float64)
>>>>>>>>
>>>>>>>> # The structure for multiplication with R20A to piecewise build up the
>>>>>>>> full R20A structure.
>>>>>>>> R20A_mask = zeros((NS*NM,) + shape, float64)
>>>>>>>> for si in range(NS):
>>>>>>>> for mi in range(NM):
>>>>>>>> R20A_mask[si*NM+mi, :, si, mi] = 1.0
>>>>>>>> print(R20A_mask)
>>>>>>>> print("\n\n")
>>>>>>>>
>>>>>>>> # Values to be found (again taken directly from
>>>>>>>> Relax_disp.test_cpmg_synthetic_ns3d_to_cr72_noise_cluster - as a
>>>>>>>> printout of dw_frq_a).
>>>>>>>> R20A_final = array([[[[[ 9.984626320294867, 9.984626320294867,
>>>>>>>> 9.984626320294867,
>>>>>>>> 9.984626320294867, 9.984626320294867,
>>>>>>>> 9.984626320294867,
>>>>>>>> 9.984626320294867, 9.984626320294867]],
>>>>>>>>
>>>>>>>> [[ 11.495327724693091, 11.495327724693091,
>>>>>>>> 11.495327724693091,
>>>>>>>> 11.495327724693091, 11.495327724693091,
>>>>>>>> 11.495327724693091,
>>>>>>>> 11.495327724693091, 11.495327724693091]]],
>>>>>>>>
>>>>>>>>
>>>>>>>> [[[ 12.991028416082928, 12.991028416082928,
>>>>>>>> 12.991028416082928,
>>>>>>>> 12.991028416082928, 12.991028416082928,
>>>>>>>> 12.991028416082928,
>>>>>>>> 12.991028416082928, 12.991028416082928]],
>>>>>>>>
>>>>>>>> [[ 14.498419290021163, 14.498419290021163,
>>>>>>>> 14.498419290021163,
>>>>>>>> 14.498419290021163, 14.498419290021163,
>>>>>>>> 14.498419290021163,
>>>>>>>> 14.498419290021163, 14.498419290021163]]]]])
>>>>>>>>
>>>>>>>>
>>>>>>>> # Target function.
>>>>>>>> ##################
>>>>>>>>
>>>>>>>> # Loop over the R20A elements (one per spin).
>>>>>>>> for r20_index in range(NS*NM):
>>>>>>>> # First multiply the spin specific R20A with the spin specific
>>>>>>>> frequency mask, using temporary storage.
>>>>>>>> multiply(R20A[r20_index], R20A_mask[r20_index], R20A_temp)
>>>>>>>>
>>>>>>>> # The add to the total.
>>>>>>>> add(R20A_struct, R20A_temp, R20A_struct)
>>>>>>>>
>>>>>>>> # Show that the structure is reproduced perfectly.
>>>>>>>> print(R20A_struct)
>>>>>>>> print(R20A_struct - R20A_final)
>>>>>>>> print(sum(abs(R20A_struct - R20A_final)))
>>>>>>>> """
>>>>>>>>
>>>>>>>>
>>>>>>>> You may notice one simplification compared to my previous example for
>>>>>>>> the dw parameter
>>>>>>>> (http://thread.gmane.org/gmane.science.nmr.relax.devel/6135/focus=6154).
>>>>>>>> The values here too come from the
>>>>>>>> Relax_disp.test_cpmg_synthetic_ns3d_to_cr72_noise_cluster system test.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>> Edward
>>>>>>>>
>>>>>>>>
>>>>>>>> On 10 June 2014 21:31, Edward d'Auvergne <[email protected]> wrote:
>>>>>>>>> Hi Troels,
>>>>>>>>>
>>>>>>>>> No need for an example. Here is the code to add to your
>>>>>>>>> infrastructure which will make the analytic dispersion models insanely
>>>>>>>>> fast:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> """
>>>>>>>>> from numpy import add, array, float64, multiply, ones, zeros
>>>>>>>>>
>>>>>>>>> # Init mimic.
>>>>>>>>> #############
>>>>>>>>>
>>>>>>>>> # Values from
>>>>>>>>> Relax_disp.test_cpmg_synthetic_ns3d_to_cr72_noise_cluster.
>>>>>>>>> NE = 1
>>>>>>>>> NS = 2
>>>>>>>>> NM = 2
>>>>>>>>> NO = 1
>>>>>>>>> ND = 8
>>>>>>>>> dw = array([ 1.847792726895652, 0.193719379085542])
>>>>>>>>> frqs = [-382.188861036982701, -318.479128911056137]
>>>>>>>>> shape = (NE, NS, NM, NO, ND)
>>>>>>>>>
>>>>>>>>> # Final structure for lib.dispersion.
>>>>>>>>> dw_struct = zeros(shape, float64)
>>>>>>>>>
>>>>>>>>> # Temporary storage to avoid memory allocations and garbage
>>>>>>>>> collection.
>>>>>>>>> dw_temp = zeros((NS,) + shape, float64)
>>>>>>>>>
>>>>>>>>> # The structure for multiplication with dw to piecewise build up the
>>>>>>>>> full dw structure.
>>>>>>>>> dw_mask = zeros((NS,) + shape, float64)
>>>>>>>>> for si in range(NS):
>>>>>>>>> for mi in range(NM):
>>>>>>>>> dw_mask[si, :, si, mi] = frqs[mi]
>>>>>>>>> print(dw_mask)
>>>>>>>>>
>>>>>>>>> # Values to be found (again taken directly from
>>>>>>>>> Relax_disp.test_cpmg_synthetic_ns3d_to_cr72_noise_cluster - as a
>>>>>>>>> printout of dw_frq_a).
>>>>>>>>> dw_final = array([[[[[-706.205797724669765, -706.205797724669765,
>>>>>>>>> -706.205797724669765, -706.205797724669765,
>>>>>>>>> -706.205797724669765, -706.205797724669765,
>>>>>>>>> -706.205797724669765, -706.205797724669765]],
>>>>>>>>>
>>>>>>>>> [[-588.483418069912318, -588.483418069912318,
>>>>>>>>> -588.483418069912318, -588.483418069912318,
>>>>>>>>> -588.483418069912318, -588.483418069912318,
>>>>>>>>> -588.483418069912318, -588.483418069912318]]],
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> [[[ -74.03738885349469 , -74.03738885349469 ,
>>>>>>>>> -74.03738885349469 , -74.03738885349469 ,
>>>>>>>>> -74.03738885349469 , -74.03738885349469 ,
>>>>>>>>> -74.03738885349469 , -74.03738885349469 ]],
>>>>>>>>>
>>>>>>>>> [[ -61.69557910435401 , -61.69557910435401 ,
>>>>>>>>> -61.69557910435401 , -61.69557910435401 ,
>>>>>>>>> -61.69557910435401 , -61.69557910435401 ,
>>>>>>>>> -61.69557910435401 , -61.69557910435401 ]]]]])
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> # Target function.
>>>>>>>>> ##################
>>>>>>>>>
>>>>>>>>> # Loop over the dw elements (one per spin).
>>>>>>>>> for si in range(NS):
>>>>>>>>> # First multiply the spin specific dw with the spin specific
>>>>>>>>> frequency mask, using temporary storage.
>>>>>>>>> multiply(dw[si], dw_mask[si], dw_temp[si])
>>>>>>>>>
>>>>>>>>> # The add to the total.
>>>>>>>>> add(dw_struct, dw_temp[si], dw_struct)
>>>>>>>>>
>>>>>>>>> # Show that the structure is reproduced perfectly.
>>>>>>>>> print(dw_struct - dw_final)
>>>>>>>>> """
>>>>>>>>>
>>>>>>>>> As mentioned in the comments, the structures come from the
>>>>>>>>> Relax_disp.test_cpmg_synthetic_ns3d_to_cr72_noise_cluster. I just
>>>>>>>>> added a check of "if len(dw) > 1: asdfasd" to kill the test, and added
>>>>>>>>> printouts to obtain dw, frq_a, dw_frq_a, etc. This is exactly the
>>>>>>>>> implementation I described. Although there might be an even faster
>>>>>>>>> way, this will eliminate all numpy array creation and deletion via
>>>>>>>>> Python garbage collection in the target functions (when used for R20
>>>>>>>>> as well).
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>>
>>>>>>>>> Edward
>>>>>>>>>
>>>>>>>>> On 10 June 2014 21:09, Edward d'Auvergne <[email protected]> wrote:
>>>>>>>>>> If you have a really complicated example of your current 'dw_frq_a'
>>>>>>>>>> data structure for multiple spins and multiple fields, that could
>>>>>>>>>> help
>>>>>>>>>> to construct an example.
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>>
>>>>>>>>>> Edward
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 10 June 2014 20:57, Edward d'Auvergne <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I'll have a look tomorrow but, as you've probably seen, some of the
>>>>>>>>>>> fine details such as indices to be used need to be sorted out when
>>>>>>>>>>> implementing this.
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>>
>>>>>>>>>>> Edward
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 10 June 2014 20:49, Troels Emtekær Linnet
>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>> What ever I do, I cannot get this to work?
>>>>>>>>>>>>
>>>>>>>>>>>> Can you show an example ?
>>>>>>>>>>>>
>>>>>>>>>>>> 2014-06-10 16:29 GMT+02:00 Edward d'Auvergne
>>>>>>>>>>>> <[email protected]>:
>>>>>>>>>>>>> Here is an example of avoiding automatic numpy data structure
>>>>>>>>>>>>> creation
>>>>>>>>>>>>> and then garbage collection:
>>>>>>>>>>>>>
>>>>>>>>>>>>> """
>>>>>>>>>>>>> from numpy import add, ones, zeros
>>>>>>>>>>>>>
>>>>>>>>>>>>> a = zeros((5, 4))
>>>>>>>>>>>>> a[1] = 1
>>>>>>>>>>>>> a[:,1] = 2
>>>>>>>>>>>>>
>>>>>>>>>>>>> b = ones((5, 4))
>>>>>>>>>>>>>
>>>>>>>>>>>>> add(a, b, a)
>>>>>>>>>>>>> print(a)
>>>>>>>>>>>>> """
>>>>>>>>>>>>>
>>>>>>>>>>>>> The result is:
>>>>>>>>>>>>>
>>>>>>>>>>>>> [[ 1. 3. 1. 1.]
>>>>>>>>>>>>> [ 2. 3. 2. 2.]
>>>>>>>>>>>>> [ 1. 3. 1. 1.]
>>>>>>>>>>>>> [ 1. 3. 1. 1.]
>>>>>>>>>>>>> [ 1. 3. 1. 1.]]
>>>>>>>>>>>>>
>>>>>>>>>>>>> The out argument for numpy.add() is used here to operate in a
>>>>>>>>>>>>> similar
>>>>>>>>>>>>> way to the Python "+=" operation. But it avoids the temporary
>>>>>>>>>>>>> numpy
>>>>>>>>>>>>> data structures that the Python "+=" operation will create. This
>>>>>>>>>>>>> will
>>>>>>>>>>>>> save a lot of time in the dispersion code.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Edward
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 10 June 2014 15:56, Edward d'Auvergne <[email protected]>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> Hi Troels,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Here is one suggestion, of many that I have, for significantly
>>>>>>>>>>>>>> improving the speed of the analytic dispersion models in your
>>>>>>>>>>>>>> 'disp_spin_speed' branch. The speed ups you have currently
>>>>>>>>>>>>>> achieved
>>>>>>>>>>>>>> for spin clusters are huge and very impressive. But now that
>>>>>>>>>>>>>> you have
>>>>>>>>>>>>>> the infrastructure in place, you can advance this much more!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The suggestion has to do with the R20, R20A, and R20B numpy data
>>>>>>>>>>>>>> structures. They way they are currently handled is relatively
>>>>>>>>>>>>>> inefficient, in that they are created de novo for each function
>>>>>>>>>>>>>> call.
>>>>>>>>>>>>>> This means that memory allocation and Python garbage collection
>>>>>>>>>>>>>> happens for every single function call - something which should
>>>>>>>>>>>>>> be
>>>>>>>>>>>>>> avoided at almost all costs.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> A better way to do this would be to have a self.R20_struct,
>>>>>>>>>>>>>> self.R20A_struct, and self.R20B_struct created in __init__(),
>>>>>>>>>>>>>> and then
>>>>>>>>>>>>>> to pack in the values from the parameter vector into these
>>>>>>>>>>>>>> structures.
>>>>>>>>>>>>>> You could create a special structure in __init__() for this. It
>>>>>>>>>>>>>> would
>>>>>>>>>>>>>> have the dimensions [r20_index][ei][si][mi][oi], where the first
>>>>>>>>>>>>>> dimension corresponds to the different R20 parameters. And for
>>>>>>>>>>>>>> each
>>>>>>>>>>>>>> r20_index element, you would have ones at the [ei][si][mi][oi]
>>>>>>>>>>>>>> positions where you would like R20 to be, and zeros elsewhere.
>>>>>>>>>>>>>> The
>>>>>>>>>>>>>> key is that this is created at the target function start up, and
>>>>>>>>>>>>>> not
>>>>>>>>>>>>>> for each function call.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This would be combined with the very powerful 'out' argument set
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> self.R20_struct with the numpy.add() and numpy.multiply()
>>>>>>>>>>>>>> functions to
>>>>>>>>>>>>>> prevent all memory allocations and garbage collection. Masks
>>>>>>>>>>>>>> could be
>>>>>>>>>>>>>> used, but I think that that would be much slower than having
>>>>>>>>>>>>>> special
>>>>>>>>>>>>>> numpy structures with ones where R20 should be and zeros
>>>>>>>>>>>>>> elsewhere.
>>>>>>>>>>>>>> For just creating these structures, looping over a single
>>>>>>>>>>>>>> r20_index
>>>>>>>>>>>>>> loop and multiplying by the special [r20_index][ei][si][mi][oi]
>>>>>>>>>>>>>> one/zero structure and using numpy.add() and numpy.multiply()
>>>>>>>>>>>>>> with out
>>>>>>>>>>>>>> arguments would be much, much faster than masks or the current
>>>>>>>>>>>>>> R20_axis logic. It will also simplify the code.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Edward
>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> relax (http://www.nmr-relax.com)
>>>>>>>>>>>>>
>>>>>>>>>>>>> This is the relax-devel mailing list
>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>
>>>>>>>>>>>>> To unsubscribe from this list, get a password
>>>>>>>>>>>>> reminder, or change your subscription options,
>>>>>>>>>>>>> visit the list information page at
>>>>>>>>>>>>> https://mail.gna.org/listinfo/relax-devel
_______________________________________________
relax (http://www.nmr-relax.com)
This is the relax-devel mailing list
[email protected]
To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-devel