Note that masks and numpy.ma.multiply() and numpy.ma.add() may speed this up even more. However due to overheads in the numpy masking, there is a chance that this also makes the dw and R20 data structure construction slower.
Regards, Edward On 10 June 2014 22:36, Edward d'Auvergne <[email protected]> wrote: > Hi Troels, > > To make things even simpler, here is what needs to be done for R20, > R20A and R20B: > > """ > from numpy import abs, add, array, float64, multiply, ones, sum, zeros > > # Init mimic. > ############# > > # Values from Relax_disp.test_cpmg_synthetic_ns3d_to_cr72_noise_cluster. > NE = 1 > NS = 2 > NM = 2 > NO = 1 > ND = 8 > R20A = array([ 9.984626320294867, 11.495327724693091, > 12.991028416082928, 14.498419290021163]) > shape = (NE, NS, NM, NO, ND) > > # Final structure for lib.dispersion. > R20A_struct = zeros(shape, float64) > > # Temporary storage to avoid memory allocations and garbage collection. > R20A_temp = zeros(shape, float64) > > # The structure for multiplication with R20A to piecewise build up the > full R20A structure. > R20A_mask = zeros((NS*NM,) + shape, float64) > for si in range(NS): > for mi in range(NM): > R20A_mask[si*NM+mi, :, si, mi] = 1.0 > print(R20A_mask) > print("\n\n") > > # Values to be found (again taken directly from > Relax_disp.test_cpmg_synthetic_ns3d_to_cr72_noise_cluster - as a > printout of dw_frq_a). > R20A_final = array([[[[[ 9.984626320294867, 9.984626320294867, > 9.984626320294867, > 9.984626320294867, 9.984626320294867, > 9.984626320294867, > 9.984626320294867, 9.984626320294867]], > > [[ 11.495327724693091, 11.495327724693091, > 11.495327724693091, > 11.495327724693091, 11.495327724693091, > 11.495327724693091, > 11.495327724693091, 11.495327724693091]]], > > > [[[ 12.991028416082928, 12.991028416082928, > 12.991028416082928, > 12.991028416082928, 12.991028416082928, > 12.991028416082928, > 12.991028416082928, 12.991028416082928]], > > [[ 14.498419290021163, 14.498419290021163, > 14.498419290021163, > 14.498419290021163, 14.498419290021163, > 14.498419290021163, > 14.498419290021163, 14.498419290021163]]]]]) > > > # Target function. > ################## > > # Loop over the R20A elements (one per spin). > for r20_index in range(NS*NM): > # First multiply the spin specific R20A with the spin specific > frequency mask, using temporary storage. > multiply(R20A[r20_index], R20A_mask[r20_index], R20A_temp) > > # The add to the total. > add(R20A_struct, R20A_temp, R20A_struct) > > # Show that the structure is reproduced perfectly. > print(R20A_struct) > print(R20A_struct - R20A_final) > print(sum(abs(R20A_struct - R20A_final))) > """ > > > You may notice one simplification compared to my previous example for > the dw parameter > (http://thread.gmane.org/gmane.science.nmr.relax.devel/6135/focus=6154). > The values here too come from the > Relax_disp.test_cpmg_synthetic_ns3d_to_cr72_noise_cluster system test. > > Regards, > > Edward > > > On 10 June 2014 21:31, Edward d'Auvergne <[email protected]> wrote: >> Hi Troels, >> >> No need for an example. Here is the code to add to your >> infrastructure which will make the analytic dispersion models insanely >> fast: >> >> >> """ >> from numpy import add, array, float64, multiply, ones, zeros >> >> # Init mimic. >> ############# >> >> # Values from Relax_disp.test_cpmg_synthetic_ns3d_to_cr72_noise_cluster. >> NE = 1 >> NS = 2 >> NM = 2 >> NO = 1 >> ND = 8 >> dw = array([ 1.847792726895652, 0.193719379085542]) >> frqs = [-382.188861036982701, -318.479128911056137] >> shape = (NE, NS, NM, NO, ND) >> >> # Final structure for lib.dispersion. >> dw_struct = zeros(shape, float64) >> >> # Temporary storage to avoid memory allocations and garbage collection. >> dw_temp = zeros((NS,) + shape, float64) >> >> # The structure for multiplication with dw to piecewise build up the >> full dw structure. >> dw_mask = zeros((NS,) + shape, float64) >> for si in range(NS): >> for mi in range(NM): >> dw_mask[si, :, si, mi] = frqs[mi] >> print(dw_mask) >> >> # Values to be found (again taken directly from >> Relax_disp.test_cpmg_synthetic_ns3d_to_cr72_noise_cluster - as a >> printout of dw_frq_a). >> dw_final = array([[[[[-706.205797724669765, -706.205797724669765, >> -706.205797724669765, -706.205797724669765, >> -706.205797724669765, -706.205797724669765, >> -706.205797724669765, -706.205797724669765]], >> >> [[-588.483418069912318, -588.483418069912318, >> -588.483418069912318, -588.483418069912318, >> -588.483418069912318, -588.483418069912318, >> -588.483418069912318, -588.483418069912318]]], >> >> >> [[[ -74.03738885349469 , -74.03738885349469 , >> -74.03738885349469 , -74.03738885349469 , >> -74.03738885349469 , -74.03738885349469 , >> -74.03738885349469 , -74.03738885349469 ]], >> >> [[ -61.69557910435401 , -61.69557910435401 , >> -61.69557910435401 , -61.69557910435401 , >> -61.69557910435401 , -61.69557910435401 , >> -61.69557910435401 , -61.69557910435401 ]]]]]) >> >> >> # Target function. >> ################## >> >> # Loop over the dw elements (one per spin). >> for si in range(NS): >> # First multiply the spin specific dw with the spin specific >> frequency mask, using temporary storage. >> multiply(dw[si], dw_mask[si], dw_temp[si]) >> >> # The add to the total. >> add(dw_struct, dw_temp[si], dw_struct) >> >> # Show that the structure is reproduced perfectly. >> print(dw_struct - dw_final) >> """ >> >> As mentioned in the comments, the structures come from the >> Relax_disp.test_cpmg_synthetic_ns3d_to_cr72_noise_cluster. I just >> added a check of "if len(dw) > 1: asdfasd" to kill the test, and added >> printouts to obtain dw, frq_a, dw_frq_a, etc. This is exactly the >> implementation I described. Although there might be an even faster >> way, this will eliminate all numpy array creation and deletion via >> Python garbage collection in the target functions (when used for R20 >> as well). >> >> Regards, >> >> Edward >> >> On 10 June 2014 21:09, Edward d'Auvergne <[email protected]> wrote: >>> If you have a really complicated example of your current 'dw_frq_a' >>> data structure for multiple spins and multiple fields, that could help >>> to construct an example. >>> >>> Cheers, >>> >>> Edward >>> >>> >>> >>> On 10 June 2014 20:57, Edward d'Auvergne <[email protected]> wrote: >>>> Hi, >>>> >>>> I'll have a look tomorrow but, as you've probably seen, some of the >>>> fine details such as indices to be used need to be sorted out when >>>> implementing this. >>>> >>>> Regards, >>>> >>>> Edward >>>> >>>> >>>> On 10 June 2014 20:49, Troels Emtekær Linnet <[email protected]> wrote: >>>>> What ever I do, I cannot get this to work? >>>>> >>>>> Can you show an example ? >>>>> >>>>> 2014-06-10 16:29 GMT+02:00 Edward d'Auvergne <[email protected]>: >>>>>> Here is an example of avoiding automatic numpy data structure creation >>>>>> and then garbage collection: >>>>>> >>>>>> """ >>>>>> from numpy import add, ones, zeros >>>>>> >>>>>> a = zeros((5, 4)) >>>>>> a[1] = 1 >>>>>> a[:,1] = 2 >>>>>> >>>>>> b = ones((5, 4)) >>>>>> >>>>>> add(a, b, a) >>>>>> print(a) >>>>>> """ >>>>>> >>>>>> The result is: >>>>>> >>>>>> [[ 1. 3. 1. 1.] >>>>>> [ 2. 3. 2. 2.] >>>>>> [ 1. 3. 1. 1.] >>>>>> [ 1. 3. 1. 1.] >>>>>> [ 1. 3. 1. 1.]] >>>>>> >>>>>> The out argument for numpy.add() is used here to operate in a similar >>>>>> way to the Python "+=" operation. But it avoids the temporary numpy >>>>>> data structures that the Python "+=" operation will create. This will >>>>>> save a lot of time in the dispersion code. >>>>>> >>>>>> Regards, >>>>>> >>>>>> Edward >>>>>> >>>>>> >>>>>> On 10 June 2014 15:56, Edward d'Auvergne <[email protected]> wrote: >>>>>>> Hi Troels, >>>>>>> >>>>>>> Here is one suggestion, of many that I have, for significantly >>>>>>> improving the speed of the analytic dispersion models in your >>>>>>> 'disp_spin_speed' branch. The speed ups you have currently achieved >>>>>>> for spin clusters are huge and very impressive. But now that you have >>>>>>> the infrastructure in place, you can advance this much more! >>>>>>> >>>>>>> The suggestion has to do with the R20, R20A, and R20B numpy data >>>>>>> structures. They way they are currently handled is relatively >>>>>>> inefficient, in that they are created de novo for each function call. >>>>>>> This means that memory allocation and Python garbage collection >>>>>>> happens for every single function call - something which should be >>>>>>> avoided at almost all costs. >>>>>>> >>>>>>> A better way to do this would be to have a self.R20_struct, >>>>>>> self.R20A_struct, and self.R20B_struct created in __init__(), and then >>>>>>> to pack in the values from the parameter vector into these structures. >>>>>>> You could create a special structure in __init__() for this. It would >>>>>>> have the dimensions [r20_index][ei][si][mi][oi], where the first >>>>>>> dimension corresponds to the different R20 parameters. And for each >>>>>>> r20_index element, you would have ones at the [ei][si][mi][oi] >>>>>>> positions where you would like R20 to be, and zeros elsewhere. The >>>>>>> key is that this is created at the target function start up, and not >>>>>>> for each function call. >>>>>>> >>>>>>> This would be combined with the very powerful 'out' argument set to >>>>>>> self.R20_struct with the numpy.add() and numpy.multiply() functions to >>>>>>> prevent all memory allocations and garbage collection. Masks could be >>>>>>> used, but I think that that would be much slower than having special >>>>>>> numpy structures with ones where R20 should be and zeros elsewhere. >>>>>>> For just creating these structures, looping over a single r20_index >>>>>>> loop and multiplying by the special [r20_index][ei][si][mi][oi] >>>>>>> one/zero structure and using numpy.add() and numpy.multiply() with out >>>>>>> arguments would be much, much faster than masks or the current >>>>>>> R20_axis logic. It will also simplify the code. >>>>>>> >>>>>>> Regards, >>>>>>> >>>>>>> Edward >>>>>> >>>>>> _______________________________________________ >>>>>> relax (http://www.nmr-relax.com) >>>>>> >>>>>> This is the relax-devel mailing list >>>>>> [email protected] >>>>>> >>>>>> To unsubscribe from this list, get a password >>>>>> reminder, or change your subscription options, >>>>>> visit the list information page at >>>>>> https://mail.gna.org/listinfo/relax-devel _______________________________________________ relax (http://www.nmr-relax.com) This is the relax-devel mailing list [email protected] To unsubscribe from this list, get a password reminder, or change your subscription options, visit the list information page at https://mail.gna.org/listinfo/relax-devel

