Re: Speed up suggestion for task #7807.

Edward d'Auvergne Tue, 10 Jun 2014 13:38:07 -0700

Hi Troels,

To make things even simpler, here is what needs to be done for R20,
R20A and R20B:


"""
from numpy import abs, add, array, float64, multiply, ones, sum, zeros

# Init mimic.
#############

# Values from Relax_disp.test_cpmg_synthetic_ns3d_to_cr72_noise_cluster.
NE = 1
NS = 2
NM = 2
NO = 1
ND = 8
R20A = array([  9.984626320294867,  11.495327724693091,
12.991028416082928, 14.498419290021163])
shape = (NE, NS, NM, NO, ND)

# Final structure for lib.dispersion.
R20A_struct = zeros(shape, float64)

# Temporary storage to avoid memory allocations and garbage collection.
R20A_temp = zeros(shape, float64)

# The structure for multiplication with R20A to piecewise build up the
full R20A structure.
R20A_mask = zeros((NS*NM,) + shape, float64)
for si in range(NS):
    for mi in range(NM):
        R20A_mask[si*NM+mi, :, si, mi] = 1.0
print(R20A_mask)
print("\n\n")

# Values to be found (again taken directly from
Relax_disp.test_cpmg_synthetic_ns3d_to_cr72_noise_cluster - as a
printout of dw_frq_a).
R20A_final = array([[[[[  9.984626320294867,   9.984626320294867,
9.984626320294867,
                          9.984626320294867,   9.984626320294867,
9.984626320294867,
                          9.984626320294867,   9.984626320294867]],

                      [[ 11.495327724693091,  11.495327724693091,
11.495327724693091,
                         11.495327724693091,  11.495327724693091,
11.495327724693091,
                         11.495327724693091,  11.495327724693091]]],


                     [[[ 12.991028416082928,  12.991028416082928,
12.991028416082928,
                         12.991028416082928,  12.991028416082928,
12.991028416082928,
                         12.991028416082928,  12.991028416082928]],

                      [[ 14.498419290021163,  14.498419290021163,
14.498419290021163,
                         14.498419290021163,  14.498419290021163,
14.498419290021163,
                         14.498419290021163,  14.498419290021163]]]]])


# Target function.
##################

# Loop over the R20A elements (one per spin).
for r20_index in range(NS*NM):
    # First multiply the spin specific R20A with the spin specific
frequency mask, using temporary storage.
    multiply(R20A[r20_index], R20A_mask[r20_index], R20A_temp)

    # The add to the total.
    add(R20A_struct, R20A_temp, R20A_struct)

# Show that the structure is reproduced perfectly.
print(R20A_struct)
print(R20A_struct - R20A_final)
print(sum(abs(R20A_struct - R20A_final)))
"""


You may notice one simplification compared to my previous example for
the dw parameter
(http://thread.gmane.org/gmane.science.nmr.relax.devel/6135/focus=6154).
The values here too come from the
Relax_disp.test_cpmg_synthetic_ns3d_to_cr72_noise_cluster system test.

Regards,

Edward


On 10 June 2014 21:31, Edward d'Auvergne <[email protected]> wrote:
> Hi Troels,
>
> No need for an example.  Here is the code to add to your
> infrastructure which will make the analytic dispersion models insanely
> fast:
>
>
> """
> from numpy import add, array, float64, multiply, ones, zeros
>
> # Init mimic.
> #############
>
> # Values from Relax_disp.test_cpmg_synthetic_ns3d_to_cr72_noise_cluster.
> NE = 1
> NS = 2
> NM = 2
> NO = 1
> ND = 8
> dw = array([ 1.847792726895652,  0.193719379085542])
> frqs = [-382.188861036982701, -318.479128911056137]
> shape = (NE, NS, NM, NO, ND)
>
> # Final structure for lib.dispersion.
> dw_struct = zeros(shape, float64)
>
> # Temporary storage to avoid memory allocations and garbage collection.
> dw_temp = zeros((NS,) + shape, float64)
>
> # The structure for multiplication with dw to piecewise build up the
> full dw structure.
> dw_mask = zeros((NS,) + shape, float64)
> for si in range(NS):
>     for mi in range(NM):
>         dw_mask[si, :, si, mi] = frqs[mi]
> print(dw_mask)
>
> # Values to be found (again taken directly from
> Relax_disp.test_cpmg_synthetic_ns3d_to_cr72_noise_cluster - as a
> printout of dw_frq_a).
> dw_final = array([[[[[-706.205797724669765, -706.205797724669765,
>                       -706.205797724669765, -706.205797724669765,
>                       -706.205797724669765, -706.205797724669765,
>                       -706.205797724669765, -706.205797724669765]],
>
>                     [[-588.483418069912318, -588.483418069912318,
>                       -588.483418069912318, -588.483418069912318,
>                       -588.483418069912318, -588.483418069912318,
>                       -588.483418069912318, -588.483418069912318]]],
>
>
>                    [[[ -74.03738885349469 ,  -74.03738885349469 ,
>                        -74.03738885349469 ,  -74.03738885349469 ,
>                        -74.03738885349469 ,  -74.03738885349469 ,
>                        -74.03738885349469 ,  -74.03738885349469 ]],
>
>                     [[ -61.69557910435401 ,  -61.69557910435401 ,
>                        -61.69557910435401 ,  -61.69557910435401 ,
>                        -61.69557910435401 ,  -61.69557910435401 ,
>                        -61.69557910435401 ,  -61.69557910435401 ]]]]])
>
>
> # Target function.
> ##################
>
> # Loop over the dw elements (one per spin).
> for si in range(NS):
>     # First multiply the spin specific dw with the spin specific
> frequency mask, using temporary storage.
>     multiply(dw[si], dw_mask[si], dw_temp[si])
>
>     # The add to the total.
>     add(dw_struct, dw_temp[si], dw_struct)
>
> # Show that the structure is reproduced perfectly.
> print(dw_struct - dw_final)
> """
>
> As mentioned in the comments, the structures come from the
> Relax_disp.test_cpmg_synthetic_ns3d_to_cr72_noise_cluster.  I just
> added a check of "if len(dw) > 1: asdfasd" to kill the test, and added
> printouts to obtain dw, frq_a, dw_frq_a, etc.  This is exactly the
> implementation I described.  Although there might be an even faster
> way, this will eliminate all numpy array creation and deletion via
> Python garbage collection in the target functions (when used for R20
> as well).
>
> Regards,
>
> Edward
>
> On 10 June 2014 21:09, Edward d'Auvergne <[email protected]> wrote:
>> If you have a really complicated example of your current 'dw_frq_a'
>> data structure for multiple spins and multiple fields, that could help
>> to construct an example.
>>
>> Cheers,
>>
>> Edward
>>
>>
>>
>> On 10 June 2014 20:57, Edward d'Auvergne <[email protected]> wrote:
>>> Hi,
>>>
>>> I'll have a look tomorrow but, as you've probably seen, some of the
>>> fine details such as indices to be used need to be sorted out when
>>> implementing this.
>>>
>>> Regards,
>>>
>>> Edward
>>>
>>>
>>> On 10 June 2014 20:49, Troels Emtekær Linnet <[email protected]> wrote:
>>>> What ever I do, I cannot get this to work?
>>>>
>>>> Can you show an example ?
>>>>
>>>> 2014-06-10 16:29 GMT+02:00 Edward d'Auvergne <[email protected]>:
>>>>> Here is an example of avoiding automatic numpy data structure creation
>>>>> and then garbage collection:
>>>>>
>>>>> """
>>>>> from numpy import add, ones, zeros
>>>>>
>>>>> a = zeros((5, 4))
>>>>> a[1] = 1
>>>>> a[:,1] = 2
>>>>>
>>>>> b = ones((5, 4))
>>>>>
>>>>> add(a, b, a)
>>>>> print(a)
>>>>> """
>>>>>
>>>>> The result is:
>>>>>
>>>>> [[ 1.  3.  1.  1.]
>>>>>  [ 2.  3.  2.  2.]
>>>>>  [ 1.  3.  1.  1.]
>>>>>  [ 1.  3.  1.  1.]
>>>>>  [ 1.  3.  1.  1.]]
>>>>>
>>>>> The out argument for numpy.add() is used here to operate in a similar
>>>>> way to the Python "+=" operation.  But it avoids the temporary numpy
>>>>> data structures that the Python "+=" operation will create.  This will
>>>>> save a lot of time in the dispersion code.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Edward
>>>>>
>>>>>
>>>>> On 10 June 2014 15:56, Edward d'Auvergne <[email protected]> wrote:
>>>>>> Hi Troels,
>>>>>>
>>>>>> Here is one suggestion, of many that I have, for significantly
>>>>>> improving the speed of the analytic dispersion models in your
>>>>>> 'disp_spin_speed' branch.  The speed ups you have currently achieved
>>>>>> for spin clusters are huge and very impressive.  But now that you have
>>>>>> the infrastructure in place, you can advance this much more!
>>>>>>
>>>>>> The suggestion has to do with the R20, R20A, and R20B numpy data
>>>>>> structures.  They way they are currently handled is relatively
>>>>>> inefficient, in that they are created de novo for each function call.
>>>>>> This means that memory allocation and Python garbage collection
>>>>>> happens for every single function call - something which should be
>>>>>> avoided at almost all costs.
>>>>>>
>>>>>> A better way to do this would be to have a self.R20_struct,
>>>>>> self.R20A_struct, and self.R20B_struct created in __init__(), and then
>>>>>> to pack in the values from the parameter vector into these structures.
>>>>>> You could create a special structure in __init__() for this.  It would
>>>>>> have the dimensions [r20_index][ei][si][mi][oi], where the first
>>>>>> dimension corresponds to the different R20 parameters.  And for each
>>>>>> r20_index element, you would have ones at the [ei][si][mi][oi]
>>>>>> positions where you would like R20 to be, and zeros elsewhere.  The
>>>>>> key is that this is created at the target function start up, and not
>>>>>> for each function call.
>>>>>>
>>>>>> This would be combined with the very powerful 'out' argument set to
>>>>>> self.R20_struct with the numpy.add() and numpy.multiply() functions to
>>>>>> prevent all memory allocations and garbage collection.  Masks could be
>>>>>> used, but I think that that would be much slower than having special
>>>>>> numpy structures with ones where R20 should be and zeros elsewhere.
>>>>>> For just creating these structures, looping over a single r20_index
>>>>>> loop and multiplying by the special [r20_index][ei][si][mi][oi]
>>>>>> one/zero structure and using numpy.add() and numpy.multiply() with out
>>>>>> arguments would be much, much faster than masks or the current
>>>>>> R20_axis logic.  It will also simplify the code.
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Edward
>>>>>
>>>>> _______________________________________________
>>>>> relax (http://www.nmr-relax.com)
>>>>>
>>>>> This is the relax-devel mailing list
>>>>> [email protected]
>>>>>
>>>>> To unsubscribe from this list, get a password
>>>>> reminder, or change your subscription options,
>>>>> visit the list information page at
>>>>> https://mail.gna.org/listinfo/relax-devel

_______________________________________________
relax (http://www.nmr-relax.com)

This is the relax-devel mailing list
[email protected]

To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-devel

Re: Speed up suggestion for task #7807.

Reply via email to