Re: Speed up suggestion for task #7807.

Edward d'Auvergne Tue, 10 Jun 2014 14:09:12 -0700

Note that masks and numpy.ma.multiply() and numpy.ma.add() may speed
this up even more.  However due to overheads in the numpy masking,
there is a chance that this also makes the dw and R20 data structure
construction slower.


Regards,

Edward



On 10 June 2014 22:36, Edward d'Auvergne <[email protected]> wrote:
> Hi Troels,
>
> To make things even simpler, here is what needs to be done for R20,
> R20A and R20B:
>
> """
> from numpy import abs, add, array, float64, multiply, ones, sum, zeros
>
> # Init mimic.
> #############
>
> # Values from Relax_disp.test_cpmg_synthetic_ns3d_to_cr72_noise_cluster.
> NE = 1
> NS = 2
> NM = 2
> NO = 1
> ND = 8
> R20A = array([  9.984626320294867,  11.495327724693091,
> 12.991028416082928, 14.498419290021163])
> shape = (NE, NS, NM, NO, ND)
>
> # Final structure for lib.dispersion.
> R20A_struct = zeros(shape, float64)
>
> # Temporary storage to avoid memory allocations and garbage collection.
> R20A_temp = zeros(shape, float64)
>
> # The structure for multiplication with R20A to piecewise build up the
> full R20A structure.
> R20A_mask = zeros((NS*NM,) + shape, float64)
> for si in range(NS):
>     for mi in range(NM):
>         R20A_mask[si*NM+mi, :, si, mi] = 1.0
> print(R20A_mask)
> print("\n\n")
>
> # Values to be found (again taken directly from
> Relax_disp.test_cpmg_synthetic_ns3d_to_cr72_noise_cluster - as a
> printout of dw_frq_a).
> R20A_final = array([[[[[  9.984626320294867,   9.984626320294867,
> 9.984626320294867,
>                           9.984626320294867,   9.984626320294867,
> 9.984626320294867,
>                           9.984626320294867,   9.984626320294867]],
>
>                       [[ 11.495327724693091,  11.495327724693091,
> 11.495327724693091,
>                          11.495327724693091,  11.495327724693091,
> 11.495327724693091,
>                          11.495327724693091,  11.495327724693091]]],
>
>
>                      [[[ 12.991028416082928,  12.991028416082928,
> 12.991028416082928,
>                          12.991028416082928,  12.991028416082928,
> 12.991028416082928,
>                          12.991028416082928,  12.991028416082928]],
>
>                       [[ 14.498419290021163,  14.498419290021163,
> 14.498419290021163,
>                          14.498419290021163,  14.498419290021163,
> 14.498419290021163,
>                          14.498419290021163,  14.498419290021163]]]]])
>
>
> # Target function.
> ##################
>
> # Loop over the R20A elements (one per spin).
> for r20_index in range(NS*NM):
>     # First multiply the spin specific R20A with the spin specific
> frequency mask, using temporary storage.
>     multiply(R20A[r20_index], R20A_mask[r20_index], R20A_temp)
>
>     # The add to the total.
>     add(R20A_struct, R20A_temp, R20A_struct)
>
> # Show that the structure is reproduced perfectly.
> print(R20A_struct)
> print(R20A_struct - R20A_final)
> print(sum(abs(R20A_struct - R20A_final)))
> """
>
>
> You may notice one simplification compared to my previous example for
> the dw parameter
> (http://thread.gmane.org/gmane.science.nmr.relax.devel/6135/focus=6154).
> The values here too come from the
> Relax_disp.test_cpmg_synthetic_ns3d_to_cr72_noise_cluster system test.
>
> Regards,
>
> Edward
>
>
> On 10 June 2014 21:31, Edward d'Auvergne <[email protected]> wrote:
>> Hi Troels,
>>
>> No need for an example.  Here is the code to add to your
>> infrastructure which will make the analytic dispersion models insanely
>> fast:
>>
>>
>> """
>> from numpy import add, array, float64, multiply, ones, zeros
>>
>> # Init mimic.
>> #############
>>
>> # Values from Relax_disp.test_cpmg_synthetic_ns3d_to_cr72_noise_cluster.
>> NE = 1
>> NS = 2
>> NM = 2
>> NO = 1
>> ND = 8
>> dw = array([ 1.847792726895652,  0.193719379085542])
>> frqs = [-382.188861036982701, -318.479128911056137]
>> shape = (NE, NS, NM, NO, ND)
>>
>> # Final structure for lib.dispersion.
>> dw_struct = zeros(shape, float64)
>>
>> # Temporary storage to avoid memory allocations and garbage collection.
>> dw_temp = zeros((NS,) + shape, float64)
>>
>> # The structure for multiplication with dw to piecewise build up the
>> full dw structure.
>> dw_mask = zeros((NS,) + shape, float64)
>> for si in range(NS):
>>     for mi in range(NM):
>>         dw_mask[si, :, si, mi] = frqs[mi]
>> print(dw_mask)
>>
>> # Values to be found (again taken directly from
>> Relax_disp.test_cpmg_synthetic_ns3d_to_cr72_noise_cluster - as a
>> printout of dw_frq_a).
>> dw_final = array([[[[[-706.205797724669765, -706.205797724669765,
>>                       -706.205797724669765, -706.205797724669765,
>>                       -706.205797724669765, -706.205797724669765,
>>                       -706.205797724669765, -706.205797724669765]],
>>
>>                     [[-588.483418069912318, -588.483418069912318,
>>                       -588.483418069912318, -588.483418069912318,
>>                       -588.483418069912318, -588.483418069912318,
>>                       -588.483418069912318, -588.483418069912318]]],
>>
>>
>>                    [[[ -74.03738885349469 ,  -74.03738885349469 ,
>>                        -74.03738885349469 ,  -74.03738885349469 ,
>>                        -74.03738885349469 ,  -74.03738885349469 ,
>>                        -74.03738885349469 ,  -74.03738885349469 ]],
>>
>>                     [[ -61.69557910435401 ,  -61.69557910435401 ,
>>                        -61.69557910435401 ,  -61.69557910435401 ,
>>                        -61.69557910435401 ,  -61.69557910435401 ,
>>                        -61.69557910435401 ,  -61.69557910435401 ]]]]])
>>
>>
>> # Target function.
>> ##################
>>
>> # Loop over the dw elements (one per spin).
>> for si in range(NS):
>>     # First multiply the spin specific dw with the spin specific
>> frequency mask, using temporary storage.
>>     multiply(dw[si], dw_mask[si], dw_temp[si])
>>
>>     # The add to the total.
>>     add(dw_struct, dw_temp[si], dw_struct)
>>
>> # Show that the structure is reproduced perfectly.
>> print(dw_struct - dw_final)
>> """
>>
>> As mentioned in the comments, the structures come from the
>> Relax_disp.test_cpmg_synthetic_ns3d_to_cr72_noise_cluster.  I just
>> added a check of "if len(dw) > 1: asdfasd" to kill the test, and added
>> printouts to obtain dw, frq_a, dw_frq_a, etc.  This is exactly the
>> implementation I described.  Although there might be an even faster
>> way, this will eliminate all numpy array creation and deletion via
>> Python garbage collection in the target functions (when used for R20
>> as well).
>>
>> Regards,
>>
>> Edward
>>
>> On 10 June 2014 21:09, Edward d'Auvergne <[email protected]> wrote:
>>> If you have a really complicated example of your current 'dw_frq_a'
>>> data structure for multiple spins and multiple fields, that could help
>>> to construct an example.
>>>
>>> Cheers,
>>>
>>> Edward
>>>
>>>
>>>
>>> On 10 June 2014 20:57, Edward d'Auvergne <[email protected]> wrote:
>>>> Hi,
>>>>
>>>> I'll have a look tomorrow but, as you've probably seen, some of the
>>>> fine details such as indices to be used need to be sorted out when
>>>> implementing this.
>>>>
>>>> Regards,
>>>>
>>>> Edward
>>>>
>>>>
>>>> On 10 June 2014 20:49, Troels Emtekær Linnet <[email protected]> wrote:
>>>>> What ever I do, I cannot get this to work?
>>>>>
>>>>> Can you show an example ?
>>>>>
>>>>> 2014-06-10 16:29 GMT+02:00 Edward d'Auvergne <[email protected]>:
>>>>>> Here is an example of avoiding automatic numpy data structure creation
>>>>>> and then garbage collection:
>>>>>>
>>>>>> """
>>>>>> from numpy import add, ones, zeros
>>>>>>
>>>>>> a = zeros((5, 4))
>>>>>> a[1] = 1
>>>>>> a[:,1] = 2
>>>>>>
>>>>>> b = ones((5, 4))
>>>>>>
>>>>>> add(a, b, a)
>>>>>> print(a)
>>>>>> """
>>>>>>
>>>>>> The result is:
>>>>>>
>>>>>> [[ 1.  3.  1.  1.]
>>>>>>  [ 2.  3.  2.  2.]
>>>>>>  [ 1.  3.  1.  1.]
>>>>>>  [ 1.  3.  1.  1.]
>>>>>>  [ 1.  3.  1.  1.]]
>>>>>>
>>>>>> The out argument for numpy.add() is used here to operate in a similar
>>>>>> way to the Python "+=" operation.  But it avoids the temporary numpy
>>>>>> data structures that the Python "+=" operation will create.  This will
>>>>>> save a lot of time in the dispersion code.
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Edward
>>>>>>
>>>>>>
>>>>>> On 10 June 2014 15:56, Edward d'Auvergne <[email protected]> wrote:
>>>>>>> Hi Troels,
>>>>>>>
>>>>>>> Here is one suggestion, of many that I have, for significantly
>>>>>>> improving the speed of the analytic dispersion models in your
>>>>>>> 'disp_spin_speed' branch.  The speed ups you have currently achieved
>>>>>>> for spin clusters are huge and very impressive.  But now that you have
>>>>>>> the infrastructure in place, you can advance this much more!
>>>>>>>
>>>>>>> The suggestion has to do with the R20, R20A, and R20B numpy data
>>>>>>> structures.  They way they are currently handled is relatively
>>>>>>> inefficient, in that they are created de novo for each function call.
>>>>>>> This means that memory allocation and Python garbage collection
>>>>>>> happens for every single function call - something which should be
>>>>>>> avoided at almost all costs.
>>>>>>>
>>>>>>> A better way to do this would be to have a self.R20_struct,
>>>>>>> self.R20A_struct, and self.R20B_struct created in __init__(), and then
>>>>>>> to pack in the values from the parameter vector into these structures.
>>>>>>> You could create a special structure in __init__() for this.  It would
>>>>>>> have the dimensions [r20_index][ei][si][mi][oi], where the first
>>>>>>> dimension corresponds to the different R20 parameters.  And for each
>>>>>>> r20_index element, you would have ones at the [ei][si][mi][oi]
>>>>>>> positions where you would like R20 to be, and zeros elsewhere.  The
>>>>>>> key is that this is created at the target function start up, and not
>>>>>>> for each function call.
>>>>>>>
>>>>>>> This would be combined with the very powerful 'out' argument set to
>>>>>>> self.R20_struct with the numpy.add() and numpy.multiply() functions to
>>>>>>> prevent all memory allocations and garbage collection.  Masks could be
>>>>>>> used, but I think that that would be much slower than having special
>>>>>>> numpy structures with ones where R20 should be and zeros elsewhere.
>>>>>>> For just creating these structures, looping over a single r20_index
>>>>>>> loop and multiplying by the special [r20_index][ei][si][mi][oi]
>>>>>>> one/zero structure and using numpy.add() and numpy.multiply() with out
>>>>>>> arguments would be much, much faster than masks or the current
>>>>>>> R20_axis logic.  It will also simplify the code.
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Edward
>>>>>>
>>>>>> _______________________________________________
>>>>>> relax (http://www.nmr-relax.com)
>>>>>>
>>>>>> This is the relax-devel mailing list
>>>>>> [email protected]
>>>>>>
>>>>>> To unsubscribe from this list, get a password
>>>>>> reminder, or change your subscription options,
>>>>>> visit the list information page at
>>>>>> https://mail.gna.org/listinfo/relax-devel

_______________________________________________
relax (http://www.nmr-relax.com)

This is the relax-devel mailing list
[email protected]

To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-devel

Re: Speed up suggestion for task #7807.

Reply via email to