Re: Speed up suggestion for task #7807.

Edward d'Auvergne Wed, 11 Jun 2014 03:14:22 -0700

And if you want to take this to the extreme, in __init__() define:

            self.dw_shape = (1, 1, self.NM, self.NO, self.ND)


and then in the target function:

            self.dw_struct[:] = 1.0
            multiply(self.dw_struct, tile(asarray(dw).reshape(self.NE,
self.NS)[:,:,None,None,None], self.dw_shape), self.dw_struct)
            multiply(self.dw_struct, self.frqs_a2, self.dw_struct)

These will speed things up by a few percent.  It's a pity the
numpy.tile() function does not use the 'out' argument.

Regards,

Edward


On 11 June 2014 12:09, Edward d'Auvergne <[email protected]> wrote:
> Hi,
>
> Even faster is to use:
>
> """
>             self.dw_struct[:] = 1.0
>             multiply(self.dw_struct, tile(asarray(dw).reshape(self.NE,
> self.NS)[:,:,None,None,None], (1, 1, self.NM, self.NO, self.ND)),
> self.dw_struct)
>             multiply(self.dw_struct, self.frqs_a2, self.dw_struct)
> """
>
> Where disp_struct and frqs_a are pre-multipled in the __init__()
> function, as that maths operation does not need to happen for each
> function call:
>
>             self.frqs_a2 = self.disp_struct * self.frqs_a
>
> Regards,
>
> Edward
>
>
> On 11 June 2014 12:00, Edward d'Auvergne <[email protected]> wrote:
>> Hi,
>>
>> Oh well, I can see you've now have an implementation (new = False)
>> that beats mine when clustered :)  You can use some of the ideas such
>> as the out ufunc argument and temporary storage to your advantage
>> nevertheless.  For example you can use the out argument of these
>> ufuncs even more, replacing:
>>
>> """
>>             self.dw_struct[:] = 1.0
>>             self.dw_struct[:] = multiply(self.dw_struct,
>> tile(asarray(dw).reshape(self.NE, self.NS)[:,:,None,None,None], (1, 1,
>> self.NM, self.NO, self.ND)), ) * self.disp_struct * self.frqs_a
>> """
>>
>>
>> with:
>>
>> """
>>             self.dw_struct[:] = 1.0
>>             multiply(self.dw_struct, tile(asarray(dw).reshape(self.NE,
>> self.NS)[:,:,None,None,None], (1, 1, self.NM, self.NO, self.ND)),
>> self.dw_struct)
>>             multiply(self.dw_struct, self.disp_struct, self.dw_struct)
>>             multiply(self.dw_struct, self.frqs_a, self.dw_struct)
>> """
>>
>>
>> That shaves off a few milliseconds by avoiding automatic array
>> creation and destruction, with before:
>>
>> """
>> ('sfrq: ', 600000000.0, 'number of cpmg frq', 15, array([  2.,   6.,
>> 10.,  14.,  18.,  22.,  26.,  30.,  34.,  38.,  42.,
>>         46.,  50.,  54.,  58.]))
>> ('sfrq: ', 800000000.0, 'number of cpmg frq', 20, array([  2.,   6.,
>> 10.,  14.,  18.,  22.,  26.,  30.,  34.,  38.,  42.,
>>         46.,  50.,  54.,  58.,  62.,  66.,  70.,  74.,  78.]))
>> ('sfrq: ', 900000000.0, 'number of cpmg frq', 22, array([  2.,   6.,
>> 10.,  14.,  18.,  22.,  26.,  30.,  34.,  38.,  42.,
>>         46.,  50.,  54.,  58.,  62.,  66.,  70.,  74.,  78.,  82.,  86.]))
>> ('chi2 cluster:', 0.0)
>> Wed Jun 11 11:45:42 2014    /tmp/tmpwkhLSr
>>
>>          198252 function calls (197150 primitive calls) in 1.499 seconds
>>
>>    Ordered by: cumulative time
>>
>>    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>>         1    0.000    0.000    1.499    1.499 <string>:1(<module>)
>>         1    0.001    0.001    1.499    1.499 profiling_cr72.py:449(cluster)
>>      1000    0.001    0.000    1.427    0.001 profiling_cr72.py:413(calc)
>>      1000    0.009    0.000    1.425    0.001 
>> relax_disp.py:1020(func_CR72_full)
>>      1000    0.066    0.000    1.409    0.001 
>> relax_disp.py:544(calc_CR72_chi2)
>>      1300    0.903    0.001    1.180    0.001 cr72.py:101(r2eff_CR72)
>>      2300    0.100    0.000    0.222    0.000 numeric.py:2056(allclose)
>>      3000    0.032    0.000    0.150    0.000 shape_base.py:761(tile)
>>      4000    0.104    0.000    0.104    0.000 {method 'repeat' of
>> 'numpy.ndarray' objects}
>>     11828    0.091    0.000    0.091    0.000 {method 'reduce' of
>> 'numpy.ufunc' objects}
>>         1    0.000    0.000    0.071    0.071 profiling_cr72.py:106(__init__)
>>         1    0.010    0.010    0.056    0.056
>> profiling_cr72.py:173(return_r2eff_arrays)
>>      1000    0.032    0.000    0.048    0.000 chi2.py:72(chi2_rankN)
>>      4609    0.005    0.000    0.045    0.000 fromnumeric.py:1762(any)
>>      2300    0.004    0.000    0.036    0.000 fromnumeric.py:1621(sum)
>> """
>>
>>
>> And after:
>>
>> """
>> ('sfrq: ', 600000000.0, 'number of cpmg frq', 15, array([  2.,   6.,
>> 10.,  14.,  18.,  22.,  26.,  30.,  34.,  38.,  42.,
>>         46.,  50.,  54.,  58.]))
>> ('sfrq: ', 800000000.0, 'number of cpmg frq', 20, array([  2.,   6.,
>> 10.,  14.,  18.,  22.,  26.,  30.,  34.,  38.,  42.,
>>         46.,  50.,  54.,  58.,  62.,  66.,  70.,  74.,  78.]))
>> ('sfrq: ', 900000000.0, 'number of cpmg frq', 22, array([  2.,   6.,
>> 10.,  14.,  18.,  22.,  26.,  30.,  34.,  38.,  42.,
>>         46.,  50.,  54.,  58.,  62.,  66.,  70.,  74.,  78.,  82.,  86.]))
>> ('chi2 cluster:', 0.0)
>> Wed Jun 11 11:49:29 2014    /tmp/tmpML9Lx5
>>
>>          198252 function calls (197150 primitive calls) in 1.462 seconds
>>
>>    Ordered by: cumulative time
>>
>>    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>>         1    0.000    0.000    1.462    1.462 <string>:1(<module>)
>>         1    0.001    0.001    1.462    1.462 profiling_cr72.py:449(cluster)
>>      1000    0.001    0.000    1.393    0.001 profiling_cr72.py:413(calc)
>>      1000    0.009    0.000    1.392    0.001 
>> relax_disp.py:1022(func_CR72_full)
>>      1000    0.056    0.000    1.376    0.001 
>> relax_disp.py:544(calc_CR72_chi2)
>>      1300    0.887    0.001    1.158    0.001 cr72.py:101(r2eff_CR72)
>>      2300    0.097    0.000    0.217    0.000 numeric.py:2056(allclose)
>>      3000    0.031    0.000    0.148    0.000 shape_base.py:761(tile)
>>      4000    0.103    0.000    0.103    0.000 {method 'repeat' of
>> 'numpy.ndarray' objects}
>>     11828    0.090    0.000    0.090    0.000 {method 'reduce' of
>> 'numpy.ufunc' objects}
>>         1    0.000    0.000    0.068    0.068 profiling_cr72.py:106(__init__)
>>         1    0.010    0.010    0.053    0.053
>> profiling_cr72.py:173(return_r2eff_arrays)
>>      1000    0.031    0.000    0.047    0.000 chi2.py:72(chi2_rankN)
>>      4609    0.006    0.000    0.044    0.000 fromnumeric.py:1762(any)
>>      2300    0.004    0.000    0.036    0.000 fromnumeric.py:1621(sum)
>> """
>>
>>
>> The additional suggestions I didn't specify before was to use these
>> ufuncs with the out argument in the lib.dispersion modules themselves.
>> You don't need to create R2eff here, just pack it into back_calc!
>>
>> Regards,
>>
>> Edward
>>
>> On 11 June 2014 11:55, Troels Emtekær Linnet <[email protected]> wrote:
>>> Hi Edward.
>>>
>>> Some timings.
>>> Per spin, you have a faster method.
>>> But I win per cluster.
>>>
>>> 1000 iterations
>>> 1 / 100 spins
>>>
>>> Edward
>>>    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>>>         1    0.000    0.000    0.523    0.523 <string>:1(<module>)
>>>    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>>>         1    0.000    0.000    3.875    3.875 <string>:1(<module>)
>>>
>>> Troels Tile
>>>    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>>>         1    0.000    0.000    0.563    0.563 <string>:1(<module>)
>>>    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>>>         1    0.000    0.000    2.102    2.102 <string>:1(<module>)
>>>
>>> Troels Outer
>>>    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>>>         1    0.000    0.000    0.546    0.546 <string>:1(<module>)
>>>    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>>>         1    0.000    0.000    1.974    1.974 <string>:1(<module>)
>>>
>>> 2014-06-11 11:46 GMT+02:00 Troels Emtekær Linnet <[email protected]>:
>>>> Hi Edward.
>>>>
>>>> This is a really god page!
>>>> http://docs.scipy.org/doc/numpy/reference/ufuncs.html
>>>>
>>>> ""
>>>> Tip
>>>> The optional output arguments can be used to help you save memory for
>>>> large calculations. If your arrays are large, complicated expressions
>>>> can take longer than absolutely necessary due to the creation and
>>>> (later) destruction of temporary calculation spaces. For example, the
>>>> expression G = a * b + c is equivalent to t1 = A * B; G = T1 + C; del
>>>> t1. It will be more quickly executed as G = A * B; add(G, C, G) which
>>>> is the same as G = A * B; G += C.
>>>> ""
>>>>
>>>> 2014-06-10 23:08 GMT+02:00 Edward d'Auvergne <[email protected]>:
>>>>> Note that masks and numpy.ma.multiply() and numpy.ma.add() may speed
>>>>> this up even more.  However due to overheads in the numpy masking,
>>>>> there is a chance that this also makes the dw and R20 data structure
>>>>> construction slower.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Edward
>>>>>
>>>>>
>>>>>
>>>>> On 10 June 2014 22:36, Edward d'Auvergne <[email protected]> wrote:
>>>>>> Hi Troels,
>>>>>>
>>>>>> To make things even simpler, here is what needs to be done for R20,
>>>>>> R20A and R20B:
>>>>>>
>>>>>> """
>>>>>> from numpy import abs, add, array, float64, multiply, ones, sum, zeros
>>>>>>
>>>>>> # Init mimic.
>>>>>> #############
>>>>>>
>>>>>> # Values from Relax_disp.test_cpmg_synthetic_ns3d_to_cr72_noise_cluster.
>>>>>> NE = 1
>>>>>> NS = 2
>>>>>> NM = 2
>>>>>> NO = 1
>>>>>> ND = 8
>>>>>> R20A = array([  9.984626320294867,  11.495327724693091,
>>>>>> 12.991028416082928, 14.498419290021163])
>>>>>> shape = (NE, NS, NM, NO, ND)
>>>>>>
>>>>>> # Final structure for lib.dispersion.
>>>>>> R20A_struct = zeros(shape, float64)
>>>>>>
>>>>>> # Temporary storage to avoid memory allocations and garbage collection.
>>>>>> R20A_temp = zeros(shape, float64)
>>>>>>
>>>>>> # The structure for multiplication with R20A to piecewise build up the
>>>>>> full R20A structure.
>>>>>> R20A_mask = zeros((NS*NM,) + shape, float64)
>>>>>> for si in range(NS):
>>>>>>     for mi in range(NM):
>>>>>>         R20A_mask[si*NM+mi, :, si, mi] = 1.0
>>>>>> print(R20A_mask)
>>>>>> print("\n\n")
>>>>>>
>>>>>> # Values to be found (again taken directly from
>>>>>> Relax_disp.test_cpmg_synthetic_ns3d_to_cr72_noise_cluster - as a
>>>>>> printout of dw_frq_a).
>>>>>> R20A_final = array([[[[[  9.984626320294867,   9.984626320294867,
>>>>>> 9.984626320294867,
>>>>>>                           9.984626320294867,   9.984626320294867,
>>>>>> 9.984626320294867,
>>>>>>                           9.984626320294867,   9.984626320294867]],
>>>>>>
>>>>>>                       [[ 11.495327724693091,  11.495327724693091,
>>>>>> 11.495327724693091,
>>>>>>                          11.495327724693091,  11.495327724693091,
>>>>>> 11.495327724693091,
>>>>>>                          11.495327724693091,  11.495327724693091]]],
>>>>>>
>>>>>>
>>>>>>                      [[[ 12.991028416082928,  12.991028416082928,
>>>>>> 12.991028416082928,
>>>>>>                          12.991028416082928,  12.991028416082928,
>>>>>> 12.991028416082928,
>>>>>>                          12.991028416082928,  12.991028416082928]],
>>>>>>
>>>>>>                       [[ 14.498419290021163,  14.498419290021163,
>>>>>> 14.498419290021163,
>>>>>>                          14.498419290021163,  14.498419290021163,
>>>>>> 14.498419290021163,
>>>>>>                          14.498419290021163,  14.498419290021163]]]]])
>>>>>>
>>>>>>
>>>>>> # Target function.
>>>>>> ##################
>>>>>>
>>>>>> # Loop over the R20A elements (one per spin).
>>>>>> for r20_index in range(NS*NM):
>>>>>>     # First multiply the spin specific R20A with the spin specific
>>>>>> frequency mask, using temporary storage.
>>>>>>     multiply(R20A[r20_index], R20A_mask[r20_index], R20A_temp)
>>>>>>
>>>>>>     # The add to the total.
>>>>>>     add(R20A_struct, R20A_temp, R20A_struct)
>>>>>>
>>>>>> # Show that the structure is reproduced perfectly.
>>>>>> print(R20A_struct)
>>>>>> print(R20A_struct - R20A_final)
>>>>>> print(sum(abs(R20A_struct - R20A_final)))
>>>>>> """
>>>>>>
>>>>>>
>>>>>> You may notice one simplification compared to my previous example for
>>>>>> the dw parameter
>>>>>> (http://thread.gmane.org/gmane.science.nmr.relax.devel/6135/focus=6154).
>>>>>> The values here too come from the
>>>>>> Relax_disp.test_cpmg_synthetic_ns3d_to_cr72_noise_cluster system test.
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Edward
>>>>>>
>>>>>>
>>>>>> On 10 June 2014 21:31, Edward d'Auvergne <[email protected]> wrote:
>>>>>>> Hi Troels,
>>>>>>>
>>>>>>> No need for an example.  Here is the code to add to your
>>>>>>> infrastructure which will make the analytic dispersion models insanely
>>>>>>> fast:
>>>>>>>
>>>>>>>
>>>>>>> """
>>>>>>> from numpy import add, array, float64, multiply, ones, zeros
>>>>>>>
>>>>>>> # Init mimic.
>>>>>>> #############
>>>>>>>
>>>>>>> # Values from Relax_disp.test_cpmg_synthetic_ns3d_to_cr72_noise_cluster.
>>>>>>> NE = 1
>>>>>>> NS = 2
>>>>>>> NM = 2
>>>>>>> NO = 1
>>>>>>> ND = 8
>>>>>>> dw = array([ 1.847792726895652,  0.193719379085542])
>>>>>>> frqs = [-382.188861036982701, -318.479128911056137]
>>>>>>> shape = (NE, NS, NM, NO, ND)
>>>>>>>
>>>>>>> # Final structure for lib.dispersion.
>>>>>>> dw_struct = zeros(shape, float64)
>>>>>>>
>>>>>>> # Temporary storage to avoid memory allocations and garbage collection.
>>>>>>> dw_temp = zeros((NS,) + shape, float64)
>>>>>>>
>>>>>>> # The structure for multiplication with dw to piecewise build up the
>>>>>>> full dw structure.
>>>>>>> dw_mask = zeros((NS,) + shape, float64)
>>>>>>> for si in range(NS):
>>>>>>>     for mi in range(NM):
>>>>>>>         dw_mask[si, :, si, mi] = frqs[mi]
>>>>>>> print(dw_mask)
>>>>>>>
>>>>>>> # Values to be found (again taken directly from
>>>>>>> Relax_disp.test_cpmg_synthetic_ns3d_to_cr72_noise_cluster - as a
>>>>>>> printout of dw_frq_a).
>>>>>>> dw_final = array([[[[[-706.205797724669765, -706.205797724669765,
>>>>>>>                       -706.205797724669765, -706.205797724669765,
>>>>>>>                       -706.205797724669765, -706.205797724669765,
>>>>>>>                       -706.205797724669765, -706.205797724669765]],
>>>>>>>
>>>>>>>                     [[-588.483418069912318, -588.483418069912318,
>>>>>>>                       -588.483418069912318, -588.483418069912318,
>>>>>>>                       -588.483418069912318, -588.483418069912318,
>>>>>>>                       -588.483418069912318, -588.483418069912318]]],
>>>>>>>
>>>>>>>
>>>>>>>                    [[[ -74.03738885349469 ,  -74.03738885349469 ,
>>>>>>>                        -74.03738885349469 ,  -74.03738885349469 ,
>>>>>>>                        -74.03738885349469 ,  -74.03738885349469 ,
>>>>>>>                        -74.03738885349469 ,  -74.03738885349469 ]],
>>>>>>>
>>>>>>>                     [[ -61.69557910435401 ,  -61.69557910435401 ,
>>>>>>>                        -61.69557910435401 ,  -61.69557910435401 ,
>>>>>>>                        -61.69557910435401 ,  -61.69557910435401 ,
>>>>>>>                        -61.69557910435401 ,  -61.69557910435401 ]]]]])
>>>>>>>
>>>>>>>
>>>>>>> # Target function.
>>>>>>> ##################
>>>>>>>
>>>>>>> # Loop over the dw elements (one per spin).
>>>>>>> for si in range(NS):
>>>>>>>     # First multiply the spin specific dw with the spin specific
>>>>>>> frequency mask, using temporary storage.
>>>>>>>     multiply(dw[si], dw_mask[si], dw_temp[si])
>>>>>>>
>>>>>>>     # The add to the total.
>>>>>>>     add(dw_struct, dw_temp[si], dw_struct)
>>>>>>>
>>>>>>> # Show that the structure is reproduced perfectly.
>>>>>>> print(dw_struct - dw_final)
>>>>>>> """
>>>>>>>
>>>>>>> As mentioned in the comments, the structures come from the
>>>>>>> Relax_disp.test_cpmg_synthetic_ns3d_to_cr72_noise_cluster.  I just
>>>>>>> added a check of "if len(dw) > 1: asdfasd" to kill the test, and added
>>>>>>> printouts to obtain dw, frq_a, dw_frq_a, etc.  This is exactly the
>>>>>>> implementation I described.  Although there might be an even faster
>>>>>>> way, this will eliminate all numpy array creation and deletion via
>>>>>>> Python garbage collection in the target functions (when used for R20
>>>>>>> as well).
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Edward
>>>>>>>
>>>>>>> On 10 June 2014 21:09, Edward d'Auvergne <[email protected]> wrote:
>>>>>>>> If you have a really complicated example of your current 'dw_frq_a'
>>>>>>>> data structure for multiple spins and multiple fields, that could help
>>>>>>>> to construct an example.
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>>
>>>>>>>> Edward
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 10 June 2014 20:57, Edward d'Auvergne <[email protected]> wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I'll have a look tomorrow but, as you've probably seen, some of the
>>>>>>>>> fine details such as indices to be used need to be sorted out when
>>>>>>>>> implementing this.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>>
>>>>>>>>> Edward
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 10 June 2014 20:49, Troels Emtekær Linnet <[email protected]> 
>>>>>>>>> wrote:
>>>>>>>>>> What ever I do, I cannot get this to work?
>>>>>>>>>>
>>>>>>>>>> Can you show an example ?
>>>>>>>>>>
>>>>>>>>>> 2014-06-10 16:29 GMT+02:00 Edward d'Auvergne <[email protected]>:
>>>>>>>>>>> Here is an example of avoiding automatic numpy data structure 
>>>>>>>>>>> creation
>>>>>>>>>>> and then garbage collection:
>>>>>>>>>>>
>>>>>>>>>>> """
>>>>>>>>>>> from numpy import add, ones, zeros
>>>>>>>>>>>
>>>>>>>>>>> a = zeros((5, 4))
>>>>>>>>>>> a[1] = 1
>>>>>>>>>>> a[:,1] = 2
>>>>>>>>>>>
>>>>>>>>>>> b = ones((5, 4))
>>>>>>>>>>>
>>>>>>>>>>> add(a, b, a)
>>>>>>>>>>> print(a)
>>>>>>>>>>> """
>>>>>>>>>>>
>>>>>>>>>>> The result is:
>>>>>>>>>>>
>>>>>>>>>>> [[ 1.  3.  1.  1.]
>>>>>>>>>>>  [ 2.  3.  2.  2.]
>>>>>>>>>>>  [ 1.  3.  1.  1.]
>>>>>>>>>>>  [ 1.  3.  1.  1.]
>>>>>>>>>>>  [ 1.  3.  1.  1.]]
>>>>>>>>>>>
>>>>>>>>>>> The out argument for numpy.add() is used here to operate in a 
>>>>>>>>>>> similar
>>>>>>>>>>> way to the Python "+=" operation.  But it avoids the temporary numpy
>>>>>>>>>>> data structures that the Python "+=" operation will create.  This 
>>>>>>>>>>> will
>>>>>>>>>>> save a lot of time in the dispersion code.
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>>
>>>>>>>>>>> Edward
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 10 June 2014 15:56, Edward d'Auvergne <[email protected]> 
>>>>>>>>>>> wrote:
>>>>>>>>>>>> Hi Troels,
>>>>>>>>>>>>
>>>>>>>>>>>> Here is one suggestion, of many that I have, for significantly
>>>>>>>>>>>> improving the speed of the analytic dispersion models in your
>>>>>>>>>>>> 'disp_spin_speed' branch.  The speed ups you have currently 
>>>>>>>>>>>> achieved
>>>>>>>>>>>> for spin clusters are huge and very impressive.  But now that you 
>>>>>>>>>>>> have
>>>>>>>>>>>> the infrastructure in place, you can advance this much more!
>>>>>>>>>>>>
>>>>>>>>>>>> The suggestion has to do with the R20, R20A, and R20B numpy data
>>>>>>>>>>>> structures.  They way they are currently handled is relatively
>>>>>>>>>>>> inefficient, in that they are created de novo for each function 
>>>>>>>>>>>> call.
>>>>>>>>>>>> This means that memory allocation and Python garbage collection
>>>>>>>>>>>> happens for every single function call - something which should be
>>>>>>>>>>>> avoided at almost all costs.
>>>>>>>>>>>>
>>>>>>>>>>>> A better way to do this would be to have a self.R20_struct,
>>>>>>>>>>>> self.R20A_struct, and self.R20B_struct created in __init__(), and 
>>>>>>>>>>>> then
>>>>>>>>>>>> to pack in the values from the parameter vector into these 
>>>>>>>>>>>> structures.
>>>>>>>>>>>> You could create a special structure in __init__() for this.  It 
>>>>>>>>>>>> would
>>>>>>>>>>>> have the dimensions [r20_index][ei][si][mi][oi], where the first
>>>>>>>>>>>> dimension corresponds to the different R20 parameters.  And for 
>>>>>>>>>>>> each
>>>>>>>>>>>> r20_index element, you would have ones at the [ei][si][mi][oi]
>>>>>>>>>>>> positions where you would like R20 to be, and zeros elsewhere.  The
>>>>>>>>>>>> key is that this is created at the target function start up, and 
>>>>>>>>>>>> not
>>>>>>>>>>>> for each function call.
>>>>>>>>>>>>
>>>>>>>>>>>> This would be combined with the very powerful 'out' argument set to
>>>>>>>>>>>> self.R20_struct with the numpy.add() and numpy.multiply() 
>>>>>>>>>>>> functions to
>>>>>>>>>>>> prevent all memory allocations and garbage collection.  Masks 
>>>>>>>>>>>> could be
>>>>>>>>>>>> used, but I think that that would be much slower than having 
>>>>>>>>>>>> special
>>>>>>>>>>>> numpy structures with ones where R20 should be and zeros elsewhere.
>>>>>>>>>>>> For just creating these structures, looping over a single r20_index
>>>>>>>>>>>> loop and multiplying by the special [r20_index][ei][si][mi][oi]
>>>>>>>>>>>> one/zero structure and using numpy.add() and numpy.multiply() with 
>>>>>>>>>>>> out
>>>>>>>>>>>> arguments would be much, much faster than masks or the current
>>>>>>>>>>>> R20_axis logic.  It will also simplify the code.
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>>
>>>>>>>>>>>> Edward
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> relax (http://www.nmr-relax.com)
>>>>>>>>>>>
>>>>>>>>>>> This is the relax-devel mailing list
>>>>>>>>>>> [email protected]
>>>>>>>>>>>
>>>>>>>>>>> To unsubscribe from this list, get a password
>>>>>>>>>>> reminder, or change your subscription options,
>>>>>>>>>>> visit the list information page at
>>>>>>>>>>> https://mail.gna.org/listinfo/relax-devel

_______________________________________________
relax (http://www.nmr-relax.com)

This is the relax-devel mailing list
[email protected]

To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-devel

Re: Speed up suggestion for task #7807.

Reply via email to