Hi,
Even faster is to use:
"""
self.dw_struct[:] = 1.0
multiply(self.dw_struct, tile(asarray(dw).reshape(self.NE,
self.NS)[:,:,None,None,None], (1, 1, self.NM, self.NO, self.ND)),
self.dw_struct)
multiply(self.dw_struct, self.frqs_a2, self.dw_struct)
"""
Where disp_struct and frqs_a are pre-multipled in the __init__()
function, as that maths operation does not need to happen for each
function call:
self.frqs_a2 = self.disp_struct * self.frqs_a
Regards,
Edward
On 11 June 2014 12:00, Edward d'Auvergne <[email protected]> wrote:
> Hi,
>
> Oh well, I can see you've now have an implementation (new = False)
> that beats mine when clustered :) You can use some of the ideas such
> as the out ufunc argument and temporary storage to your advantage
> nevertheless. For example you can use the out argument of these
> ufuncs even more, replacing:
>
> """
> self.dw_struct[:] = 1.0
> self.dw_struct[:] = multiply(self.dw_struct,
> tile(asarray(dw).reshape(self.NE, self.NS)[:,:,None,None,None], (1, 1,
> self.NM, self.NO, self.ND)), ) * self.disp_struct * self.frqs_a
> """
>
>
> with:
>
> """
> self.dw_struct[:] = 1.0
> multiply(self.dw_struct, tile(asarray(dw).reshape(self.NE,
> self.NS)[:,:,None,None,None], (1, 1, self.NM, self.NO, self.ND)),
> self.dw_struct)
> multiply(self.dw_struct, self.disp_struct, self.dw_struct)
> multiply(self.dw_struct, self.frqs_a, self.dw_struct)
> """
>
>
> That shaves off a few milliseconds by avoiding automatic array
> creation and destruction, with before:
>
> """
> ('sfrq: ', 600000000.0, 'number of cpmg frq', 15, array([ 2., 6.,
> 10., 14., 18., 22., 26., 30., 34., 38., 42.,
> 46., 50., 54., 58.]))
> ('sfrq: ', 800000000.0, 'number of cpmg frq', 20, array([ 2., 6.,
> 10., 14., 18., 22., 26., 30., 34., 38., 42.,
> 46., 50., 54., 58., 62., 66., 70., 74., 78.]))
> ('sfrq: ', 900000000.0, 'number of cpmg frq', 22, array([ 2., 6.,
> 10., 14., 18., 22., 26., 30., 34., 38., 42.,
> 46., 50., 54., 58., 62., 66., 70., 74., 78., 82., 86.]))
> ('chi2 cluster:', 0.0)
> Wed Jun 11 11:45:42 2014 /tmp/tmpwkhLSr
>
> 198252 function calls (197150 primitive calls) in 1.499 seconds
>
> Ordered by: cumulative time
>
> ncalls tottime percall cumtime percall filename:lineno(function)
> 1 0.000 0.000 1.499 1.499 <string>:1(<module>)
> 1 0.001 0.001 1.499 1.499 profiling_cr72.py:449(cluster)
> 1000 0.001 0.000 1.427 0.001 profiling_cr72.py:413(calc)
> 1000 0.009 0.000 1.425 0.001
> relax_disp.py:1020(func_CR72_full)
> 1000 0.066 0.000 1.409 0.001
> relax_disp.py:544(calc_CR72_chi2)
> 1300 0.903 0.001 1.180 0.001 cr72.py:101(r2eff_CR72)
> 2300 0.100 0.000 0.222 0.000 numeric.py:2056(allclose)
> 3000 0.032 0.000 0.150 0.000 shape_base.py:761(tile)
> 4000 0.104 0.000 0.104 0.000 {method 'repeat' of
> 'numpy.ndarray' objects}
> 11828 0.091 0.000 0.091 0.000 {method 'reduce' of
> 'numpy.ufunc' objects}
> 1 0.000 0.000 0.071 0.071 profiling_cr72.py:106(__init__)
> 1 0.010 0.010 0.056 0.056
> profiling_cr72.py:173(return_r2eff_arrays)
> 1000 0.032 0.000 0.048 0.000 chi2.py:72(chi2_rankN)
> 4609 0.005 0.000 0.045 0.000 fromnumeric.py:1762(any)
> 2300 0.004 0.000 0.036 0.000 fromnumeric.py:1621(sum)
> """
>
>
> And after:
>
> """
> ('sfrq: ', 600000000.0, 'number of cpmg frq', 15, array([ 2., 6.,
> 10., 14., 18., 22., 26., 30., 34., 38., 42.,
> 46., 50., 54., 58.]))
> ('sfrq: ', 800000000.0, 'number of cpmg frq', 20, array([ 2., 6.,
> 10., 14., 18., 22., 26., 30., 34., 38., 42.,
> 46., 50., 54., 58., 62., 66., 70., 74., 78.]))
> ('sfrq: ', 900000000.0, 'number of cpmg frq', 22, array([ 2., 6.,
> 10., 14., 18., 22., 26., 30., 34., 38., 42.,
> 46., 50., 54., 58., 62., 66., 70., 74., 78., 82., 86.]))
> ('chi2 cluster:', 0.0)
> Wed Jun 11 11:49:29 2014 /tmp/tmpML9Lx5
>
> 198252 function calls (197150 primitive calls) in 1.462 seconds
>
> Ordered by: cumulative time
>
> ncalls tottime percall cumtime percall filename:lineno(function)
> 1 0.000 0.000 1.462 1.462 <string>:1(<module>)
> 1 0.001 0.001 1.462 1.462 profiling_cr72.py:449(cluster)
> 1000 0.001 0.000 1.393 0.001 profiling_cr72.py:413(calc)
> 1000 0.009 0.000 1.392 0.001
> relax_disp.py:1022(func_CR72_full)
> 1000 0.056 0.000 1.376 0.001
> relax_disp.py:544(calc_CR72_chi2)
> 1300 0.887 0.001 1.158 0.001 cr72.py:101(r2eff_CR72)
> 2300 0.097 0.000 0.217 0.000 numeric.py:2056(allclose)
> 3000 0.031 0.000 0.148 0.000 shape_base.py:761(tile)
> 4000 0.103 0.000 0.103 0.000 {method 'repeat' of
> 'numpy.ndarray' objects}
> 11828 0.090 0.000 0.090 0.000 {method 'reduce' of
> 'numpy.ufunc' objects}
> 1 0.000 0.000 0.068 0.068 profiling_cr72.py:106(__init__)
> 1 0.010 0.010 0.053 0.053
> profiling_cr72.py:173(return_r2eff_arrays)
> 1000 0.031 0.000 0.047 0.000 chi2.py:72(chi2_rankN)
> 4609 0.006 0.000 0.044 0.000 fromnumeric.py:1762(any)
> 2300 0.004 0.000 0.036 0.000 fromnumeric.py:1621(sum)
> """
>
>
> The additional suggestions I didn't specify before was to use these
> ufuncs with the out argument in the lib.dispersion modules themselves.
> You don't need to create R2eff here, just pack it into back_calc!
>
> Regards,
>
> Edward
>
> On 11 June 2014 11:55, Troels Emtekær Linnet <[email protected]> wrote:
>> Hi Edward.
>>
>> Some timings.
>> Per spin, you have a faster method.
>> But I win per cluster.
>>
>> 1000 iterations
>> 1 / 100 spins
>>
>> Edward
>> ncalls tottime percall cumtime percall filename:lineno(function)
>> 1 0.000 0.000 0.523 0.523 <string>:1(<module>)
>> ncalls tottime percall cumtime percall filename:lineno(function)
>> 1 0.000 0.000 3.875 3.875 <string>:1(<module>)
>>
>> Troels Tile
>> ncalls tottime percall cumtime percall filename:lineno(function)
>> 1 0.000 0.000 0.563 0.563 <string>:1(<module>)
>> ncalls tottime percall cumtime percall filename:lineno(function)
>> 1 0.000 0.000 2.102 2.102 <string>:1(<module>)
>>
>> Troels Outer
>> ncalls tottime percall cumtime percall filename:lineno(function)
>> 1 0.000 0.000 0.546 0.546 <string>:1(<module>)
>> ncalls tottime percall cumtime percall filename:lineno(function)
>> 1 0.000 0.000 1.974 1.974 <string>:1(<module>)
>>
>> 2014-06-11 11:46 GMT+02:00 Troels Emtekær Linnet <[email protected]>:
>>> Hi Edward.
>>>
>>> This is a really god page!
>>> http://docs.scipy.org/doc/numpy/reference/ufuncs.html
>>>
>>> ""
>>> Tip
>>> The optional output arguments can be used to help you save memory for
>>> large calculations. If your arrays are large, complicated expressions
>>> can take longer than absolutely necessary due to the creation and
>>> (later) destruction of temporary calculation spaces. For example, the
>>> expression G = a * b + c is equivalent to t1 = A * B; G = T1 + C; del
>>> t1. It will be more quickly executed as G = A * B; add(G, C, G) which
>>> is the same as G = A * B; G += C.
>>> ""
>>>
>>> 2014-06-10 23:08 GMT+02:00 Edward d'Auvergne <[email protected]>:
>>>> Note that masks and numpy.ma.multiply() and numpy.ma.add() may speed
>>>> this up even more. However due to overheads in the numpy masking,
>>>> there is a chance that this also makes the dw and R20 data structure
>>>> construction slower.
>>>>
>>>> Regards,
>>>>
>>>> Edward
>>>>
>>>>
>>>>
>>>> On 10 June 2014 22:36, Edward d'Auvergne <[email protected]> wrote:
>>>>> Hi Troels,
>>>>>
>>>>> To make things even simpler, here is what needs to be done for R20,
>>>>> R20A and R20B:
>>>>>
>>>>> """
>>>>> from numpy import abs, add, array, float64, multiply, ones, sum, zeros
>>>>>
>>>>> # Init mimic.
>>>>> #############
>>>>>
>>>>> # Values from Relax_disp.test_cpmg_synthetic_ns3d_to_cr72_noise_cluster.
>>>>> NE = 1
>>>>> NS = 2
>>>>> NM = 2
>>>>> NO = 1
>>>>> ND = 8
>>>>> R20A = array([ 9.984626320294867, 11.495327724693091,
>>>>> 12.991028416082928, 14.498419290021163])
>>>>> shape = (NE, NS, NM, NO, ND)
>>>>>
>>>>> # Final structure for lib.dispersion.
>>>>> R20A_struct = zeros(shape, float64)
>>>>>
>>>>> # Temporary storage to avoid memory allocations and garbage collection.
>>>>> R20A_temp = zeros(shape, float64)
>>>>>
>>>>> # The structure for multiplication with R20A to piecewise build up the
>>>>> full R20A structure.
>>>>> R20A_mask = zeros((NS*NM,) + shape, float64)
>>>>> for si in range(NS):
>>>>> for mi in range(NM):
>>>>> R20A_mask[si*NM+mi, :, si, mi] = 1.0
>>>>> print(R20A_mask)
>>>>> print("\n\n")
>>>>>
>>>>> # Values to be found (again taken directly from
>>>>> Relax_disp.test_cpmg_synthetic_ns3d_to_cr72_noise_cluster - as a
>>>>> printout of dw_frq_a).
>>>>> R20A_final = array([[[[[ 9.984626320294867, 9.984626320294867,
>>>>> 9.984626320294867,
>>>>> 9.984626320294867, 9.984626320294867,
>>>>> 9.984626320294867,
>>>>> 9.984626320294867, 9.984626320294867]],
>>>>>
>>>>> [[ 11.495327724693091, 11.495327724693091,
>>>>> 11.495327724693091,
>>>>> 11.495327724693091, 11.495327724693091,
>>>>> 11.495327724693091,
>>>>> 11.495327724693091, 11.495327724693091]]],
>>>>>
>>>>>
>>>>> [[[ 12.991028416082928, 12.991028416082928,
>>>>> 12.991028416082928,
>>>>> 12.991028416082928, 12.991028416082928,
>>>>> 12.991028416082928,
>>>>> 12.991028416082928, 12.991028416082928]],
>>>>>
>>>>> [[ 14.498419290021163, 14.498419290021163,
>>>>> 14.498419290021163,
>>>>> 14.498419290021163, 14.498419290021163,
>>>>> 14.498419290021163,
>>>>> 14.498419290021163, 14.498419290021163]]]]])
>>>>>
>>>>>
>>>>> # Target function.
>>>>> ##################
>>>>>
>>>>> # Loop over the R20A elements (one per spin).
>>>>> for r20_index in range(NS*NM):
>>>>> # First multiply the spin specific R20A with the spin specific
>>>>> frequency mask, using temporary storage.
>>>>> multiply(R20A[r20_index], R20A_mask[r20_index], R20A_temp)
>>>>>
>>>>> # The add to the total.
>>>>> add(R20A_struct, R20A_temp, R20A_struct)
>>>>>
>>>>> # Show that the structure is reproduced perfectly.
>>>>> print(R20A_struct)
>>>>> print(R20A_struct - R20A_final)
>>>>> print(sum(abs(R20A_struct - R20A_final)))
>>>>> """
>>>>>
>>>>>
>>>>> You may notice one simplification compared to my previous example for
>>>>> the dw parameter
>>>>> (http://thread.gmane.org/gmane.science.nmr.relax.devel/6135/focus=6154).
>>>>> The values here too come from the
>>>>> Relax_disp.test_cpmg_synthetic_ns3d_to_cr72_noise_cluster system test.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Edward
>>>>>
>>>>>
>>>>> On 10 June 2014 21:31, Edward d'Auvergne <[email protected]> wrote:
>>>>>> Hi Troels,
>>>>>>
>>>>>> No need for an example. Here is the code to add to your
>>>>>> infrastructure which will make the analytic dispersion models insanely
>>>>>> fast:
>>>>>>
>>>>>>
>>>>>> """
>>>>>> from numpy import add, array, float64, multiply, ones, zeros
>>>>>>
>>>>>> # Init mimic.
>>>>>> #############
>>>>>>
>>>>>> # Values from Relax_disp.test_cpmg_synthetic_ns3d_to_cr72_noise_cluster.
>>>>>> NE = 1
>>>>>> NS = 2
>>>>>> NM = 2
>>>>>> NO = 1
>>>>>> ND = 8
>>>>>> dw = array([ 1.847792726895652, 0.193719379085542])
>>>>>> frqs = [-382.188861036982701, -318.479128911056137]
>>>>>> shape = (NE, NS, NM, NO, ND)
>>>>>>
>>>>>> # Final structure for lib.dispersion.
>>>>>> dw_struct = zeros(shape, float64)
>>>>>>
>>>>>> # Temporary storage to avoid memory allocations and garbage collection.
>>>>>> dw_temp = zeros((NS,) + shape, float64)
>>>>>>
>>>>>> # The structure for multiplication with dw to piecewise build up the
>>>>>> full dw structure.
>>>>>> dw_mask = zeros((NS,) + shape, float64)
>>>>>> for si in range(NS):
>>>>>> for mi in range(NM):
>>>>>> dw_mask[si, :, si, mi] = frqs[mi]
>>>>>> print(dw_mask)
>>>>>>
>>>>>> # Values to be found (again taken directly from
>>>>>> Relax_disp.test_cpmg_synthetic_ns3d_to_cr72_noise_cluster - as a
>>>>>> printout of dw_frq_a).
>>>>>> dw_final = array([[[[[-706.205797724669765, -706.205797724669765,
>>>>>> -706.205797724669765, -706.205797724669765,
>>>>>> -706.205797724669765, -706.205797724669765,
>>>>>> -706.205797724669765, -706.205797724669765]],
>>>>>>
>>>>>> [[-588.483418069912318, -588.483418069912318,
>>>>>> -588.483418069912318, -588.483418069912318,
>>>>>> -588.483418069912318, -588.483418069912318,
>>>>>> -588.483418069912318, -588.483418069912318]]],
>>>>>>
>>>>>>
>>>>>> [[[ -74.03738885349469 , -74.03738885349469 ,
>>>>>> -74.03738885349469 , -74.03738885349469 ,
>>>>>> -74.03738885349469 , -74.03738885349469 ,
>>>>>> -74.03738885349469 , -74.03738885349469 ]],
>>>>>>
>>>>>> [[ -61.69557910435401 , -61.69557910435401 ,
>>>>>> -61.69557910435401 , -61.69557910435401 ,
>>>>>> -61.69557910435401 , -61.69557910435401 ,
>>>>>> -61.69557910435401 , -61.69557910435401 ]]]]])
>>>>>>
>>>>>>
>>>>>> # Target function.
>>>>>> ##################
>>>>>>
>>>>>> # Loop over the dw elements (one per spin).
>>>>>> for si in range(NS):
>>>>>> # First multiply the spin specific dw with the spin specific
>>>>>> frequency mask, using temporary storage.
>>>>>> multiply(dw[si], dw_mask[si], dw_temp[si])
>>>>>>
>>>>>> # The add to the total.
>>>>>> add(dw_struct, dw_temp[si], dw_struct)
>>>>>>
>>>>>> # Show that the structure is reproduced perfectly.
>>>>>> print(dw_struct - dw_final)
>>>>>> """
>>>>>>
>>>>>> As mentioned in the comments, the structures come from the
>>>>>> Relax_disp.test_cpmg_synthetic_ns3d_to_cr72_noise_cluster. I just
>>>>>> added a check of "if len(dw) > 1: asdfasd" to kill the test, and added
>>>>>> printouts to obtain dw, frq_a, dw_frq_a, etc. This is exactly the
>>>>>> implementation I described. Although there might be an even faster
>>>>>> way, this will eliminate all numpy array creation and deletion via
>>>>>> Python garbage collection in the target functions (when used for R20
>>>>>> as well).
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Edward
>>>>>>
>>>>>> On 10 June 2014 21:09, Edward d'Auvergne <[email protected]> wrote:
>>>>>>> If you have a really complicated example of your current 'dw_frq_a'
>>>>>>> data structure for multiple spins and multiple fields, that could help
>>>>>>> to construct an example.
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> Edward
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 10 June 2014 20:57, Edward d'Auvergne <[email protected]> wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I'll have a look tomorrow but, as you've probably seen, some of the
>>>>>>>> fine details such as indices to be used need to be sorted out when
>>>>>>>> implementing this.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>> Edward
>>>>>>>>
>>>>>>>>
>>>>>>>> On 10 June 2014 20:49, Troels Emtekær Linnet <[email protected]>
>>>>>>>> wrote:
>>>>>>>>> What ever I do, I cannot get this to work?
>>>>>>>>>
>>>>>>>>> Can you show an example ?
>>>>>>>>>
>>>>>>>>> 2014-06-10 16:29 GMT+02:00 Edward d'Auvergne <[email protected]>:
>>>>>>>>>> Here is an example of avoiding automatic numpy data structure
>>>>>>>>>> creation
>>>>>>>>>> and then garbage collection:
>>>>>>>>>>
>>>>>>>>>> """
>>>>>>>>>> from numpy import add, ones, zeros
>>>>>>>>>>
>>>>>>>>>> a = zeros((5, 4))
>>>>>>>>>> a[1] = 1
>>>>>>>>>> a[:,1] = 2
>>>>>>>>>>
>>>>>>>>>> b = ones((5, 4))
>>>>>>>>>>
>>>>>>>>>> add(a, b, a)
>>>>>>>>>> print(a)
>>>>>>>>>> """
>>>>>>>>>>
>>>>>>>>>> The result is:
>>>>>>>>>>
>>>>>>>>>> [[ 1. 3. 1. 1.]
>>>>>>>>>> [ 2. 3. 2. 2.]
>>>>>>>>>> [ 1. 3. 1. 1.]
>>>>>>>>>> [ 1. 3. 1. 1.]
>>>>>>>>>> [ 1. 3. 1. 1.]]
>>>>>>>>>>
>>>>>>>>>> The out argument for numpy.add() is used here to operate in a similar
>>>>>>>>>> way to the Python "+=" operation. But it avoids the temporary numpy
>>>>>>>>>> data structures that the Python "+=" operation will create. This
>>>>>>>>>> will
>>>>>>>>>> save a lot of time in the dispersion code.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>>
>>>>>>>>>> Edward
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 10 June 2014 15:56, Edward d'Auvergne <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>> Hi Troels,
>>>>>>>>>>>
>>>>>>>>>>> Here is one suggestion, of many that I have, for significantly
>>>>>>>>>>> improving the speed of the analytic dispersion models in your
>>>>>>>>>>> 'disp_spin_speed' branch. The speed ups you have currently achieved
>>>>>>>>>>> for spin clusters are huge and very impressive. But now that you
>>>>>>>>>>> have
>>>>>>>>>>> the infrastructure in place, you can advance this much more!
>>>>>>>>>>>
>>>>>>>>>>> The suggestion has to do with the R20, R20A, and R20B numpy data
>>>>>>>>>>> structures. They way they are currently handled is relatively
>>>>>>>>>>> inefficient, in that they are created de novo for each function
>>>>>>>>>>> call.
>>>>>>>>>>> This means that memory allocation and Python garbage collection
>>>>>>>>>>> happens for every single function call - something which should be
>>>>>>>>>>> avoided at almost all costs.
>>>>>>>>>>>
>>>>>>>>>>> A better way to do this would be to have a self.R20_struct,
>>>>>>>>>>> self.R20A_struct, and self.R20B_struct created in __init__(), and
>>>>>>>>>>> then
>>>>>>>>>>> to pack in the values from the parameter vector into these
>>>>>>>>>>> structures.
>>>>>>>>>>> You could create a special structure in __init__() for this. It
>>>>>>>>>>> would
>>>>>>>>>>> have the dimensions [r20_index][ei][si][mi][oi], where the first
>>>>>>>>>>> dimension corresponds to the different R20 parameters. And for each
>>>>>>>>>>> r20_index element, you would have ones at the [ei][si][mi][oi]
>>>>>>>>>>> positions where you would like R20 to be, and zeros elsewhere. The
>>>>>>>>>>> key is that this is created at the target function start up, and not
>>>>>>>>>>> for each function call.
>>>>>>>>>>>
>>>>>>>>>>> This would be combined with the very powerful 'out' argument set to
>>>>>>>>>>> self.R20_struct with the numpy.add() and numpy.multiply() functions
>>>>>>>>>>> to
>>>>>>>>>>> prevent all memory allocations and garbage collection. Masks could
>>>>>>>>>>> be
>>>>>>>>>>> used, but I think that that would be much slower than having special
>>>>>>>>>>> numpy structures with ones where R20 should be and zeros elsewhere.
>>>>>>>>>>> For just creating these structures, looping over a single r20_index
>>>>>>>>>>> loop and multiplying by the special [r20_index][ei][si][mi][oi]
>>>>>>>>>>> one/zero structure and using numpy.add() and numpy.multiply() with
>>>>>>>>>>> out
>>>>>>>>>>> arguments would be much, much faster than masks or the current
>>>>>>>>>>> R20_axis logic. It will also simplify the code.
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>>
>>>>>>>>>>> Edward
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> relax (http://www.nmr-relax.com)
>>>>>>>>>>
>>>>>>>>>> This is the relax-devel mailing list
>>>>>>>>>> [email protected]
>>>>>>>>>>
>>>>>>>>>> To unsubscribe from this list, get a password
>>>>>>>>>> reminder, or change your subscription options,
>>>>>>>>>> visit the list information page at
>>>>>>>>>> https://mail.gna.org/listinfo/relax-devel
_______________________________________________
relax (http://www.nmr-relax.com)
This is the relax-devel mailing list
[email protected]
To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-devel