Re: Speed up suggestion for task #7807.

Troels Emtekær Linnet Tue, 10 Jun 2014 12:31:29 -0700

The problem is, that if you make multiply with [300], to a numpy
array, it will expand the last axis.


Reshaping that into a usable form is the problem.

I spent many many many hours, to get to this:
# Reshape dw to per experiment and nr spins. Or for R20A per
experiment, spin and frequency.
[300] -> [1, 300]    :     [300] -> [1, 100, 3]

# Expand axis
[1, 300] -> [1, 300, 1, 1, 1] : [1, 100, 3]-> [1, 100, 3, 1, 1]

# Then tile to dimensions
[1, 300, 1, 1, 1] -> [1, 300, 3, 1, 22]      :     [1, 100, 3, 1, 22]

This is a very logic build up of the shapes.
The add or multiply would multiply to the last axis.

The only thing I could do, was to create newaxis.
And then tile up!

--------- Example code to play with.
import numpy as np
s = [1, 100, 3, 1, 22]
print "dw"
dw = np.asarray(range(100))
print dw.shape
dw1 = dw.reshape( (s[0], s[1]) )
print dw1.shape
dw2 = dw1[:,:,None,None,None]
print dw2.shape
dw3 = np.tile(dw2, (1, 1, s[2], s[3], s[4]))
print dw3.shape

print "r20"
r = np.asarray(range(300))
print r.shape
r1 = r.reshape( (s[0], s[1], s[2]) )
print r1.shape
r2 = r1[:,:,:,None,None]
print r2.shape
r3 = np.tile(r2, (1, 1, 1, s[3], s[4]))
print r3.shape
-------------------


With this, I think I am done!

I really would like to wrap this up and continue with my own stuff.
So, what needs to be done, before you can accept this?

I will:
Expand to other CPMG models, which are analytical
Include R1rho.
I cannot do for numerical, do to the looping.

Should I add unit tests for each?


------------ No time saved on one spin, but 21X speed up on clustering !
3 frq,
1000 iterations


   ncalls  tottime  percall  cumtime  percall filename:lineno(function)

1 spin
        1    0.000    0.000    0.351    0.351 <string>:1(<module>)
        1    0.001    0.001    0.351    0.351 profiling_cr72.py:427(single)
     1000    0.001    0.000    0.341    0.000 profiling_cr72.py:413(calc)
     1000    0.009    0.000    0.340    0.000 relax_disp.py:989(func_CR72_full)
     1000    0.024    0.000    0.327    0.000 relax_disp.py:523(calc_CR72_chi2)
     1003    0.088    0.000    0.237    0.000 cr72.py:101(r2eff_CR72)
     2003    0.035    0.000    0.122    0.000 numeric.py:2056(allclose)
    10046    0.051    0.000    0.051    0.000 {method 'reduce' of
'numpy.ufunc' objects}
     3000    0.029    0.000    0.047    0.000 shape_base.py:761(tile)
     4015    0.005    0.000    0.035    0.000 fromnumeric.py:1762(any)

100 spin
        1    0.000    0.000    1.507    1.507 <string>:1(<module>)
        1    0.001    0.001    1.507    1.507 profiling_cr72.py:449(cluster)
     1000    0.001    0.000    1.433    0.001 profiling_cr72.py:413(calc)
     1000    0.010    0.000    1.432    0.001 relax_disp.py:989(func_CR72_full)
     1000    0.052    0.000    1.413    0.001 relax_disp.py:523(calc_CR72_chi2)
     1300    0.917    0.001    1.193    0.001 cr72.py:101(r2eff_CR72)
     2300    0.101    0.000    0.224    0.000 numeric.py:2056(allclose)
     3000    0.034    0.000    0.157    0.000 shape_base.py:761(tile)
     4000    0.108    0.000    0.108    0.000 {method 'repeat' of
'numpy.ndarray' objects}
    11828    0.085    0.000    0.085    0.000 {method 'reduce' of
'numpy.ufunc' objects}
        1    0.000    0.000    0.073    0.073 profiling_cr72.py:106(__init__)
        1    0.013    0.013    0.058    0.058
profiling_cr72.py:173(return_r2eff_arrays)
     1000    0.030    0.000    0.046    0.000 chi2.py:72(chi2_rankN)
     4609    0.006    0.000    0.045    0.000 fromnumeric.py:1762(any)
     2300    0.004    0.000    0.035    0.000 fromnumeric.py:1621(sum)


------------------TRUNK
1 spin
        1    0.000    0.000    0.351    0.351 <string>:1(<module>)
        1    0.001    0.001    0.351    0.351 profiling_cr72.py:427(single)
     1000    0.001    0.000    0.343    0.000 profiling_cr72.py:413(calc)
     1000    0.008    0.000    0.342    0.000 relax_disp.py:911(func_CR72_full)
     1000    0.032    0.000    0.330    0.000 relax_disp.py:456(calc_CR72_chi2)
     3003    0.177    0.000    0.246    0.000 cr72.py:100(r2eff_CR72)
    12036    0.061    0.000    0.061    0.000 {method 'reduce' of
'numpy.ufunc' objects}
     3000    0.025    0.000    0.049    0.000 chi2.py:32(chi2)
     6003    0.008    0.000    0.049    0.000 fromnumeric.py:1621(sum)


        1    0.000    0.000   32.180   32.180 <string>:1(<module>)
        1    0.001    0.001   32.180   32.180 profiling_cr72.py:449(cluster)
     1000    0.001    0.000   32.132    0.032 profiling_cr72.py:413(calc)
     1000    0.009    0.000   32.131    0.032 relax_disp.py:911(func_CR72_full)
     1000    3.121    0.003   32.112    0.032 relax_disp.py:456(calc_CR72_chi2)
   300300   17.356    0.000   23.903    0.000 cr72.py:100(r2eff_CR72)
  1200927    5.845    0.000    5.845    0.000 {method 'reduce' of
'numpy.ufunc' objects}
   300000    2.503    0.000    4.842    0.000 chi2.py:32(chi2)
   600300    0.762    0.000    4.705    0.000 fromnumeric.py:1621(sum)
   600300    0.608    0.000    3.460    0.000 _methods.py:23(_sum)
   300300    0.287    0.000    2.191    0.000 fromnumeric.py:2048(amax)
   300300    0.281    0.000    1.990    0.000 fromnumeric.py:2132(amin)
   300300    0.325    0.000    1.904    0.000 _methods.py:15(_amax)
   300300    0.296    0.000    1.709    0.000 _methods.py:19(_amin)




2014-06-10 20:57 GMT+02:00 Edward d'Auvergne <[email protected]>:
> Hi,
>
> I'll have a look tomorrow but, as you've probably seen, some of the
> fine details such as indices to be used need to be sorted out when
> implementing this.
>
> Regards,
>
> Edward
>
>
> On 10 June 2014 20:49, Troels Emtekær Linnet <[email protected]> wrote:
>> What ever I do, I cannot get this to work?
>>
>> Can you show an example ?
>>
>> 2014-06-10 16:29 GMT+02:00 Edward d'Auvergne <[email protected]>:
>>> Here is an example of avoiding automatic numpy data structure creation
>>> and then garbage collection:
>>>
>>> """
>>> from numpy import add, ones, zeros
>>>
>>> a = zeros((5, 4))
>>> a[1] = 1
>>> a[:,1] = 2
>>>
>>> b = ones((5, 4))
>>>
>>> add(a, b, a)
>>> print(a)
>>> """
>>>
>>> The result is:
>>>
>>> [[ 1.  3.  1.  1.]
>>>  [ 2.  3.  2.  2.]
>>>  [ 1.  3.  1.  1.]
>>>  [ 1.  3.  1.  1.]
>>>  [ 1.  3.  1.  1.]]
>>>
>>> The out argument for numpy.add() is used here to operate in a similar
>>> way to the Python "+=" operation.  But it avoids the temporary numpy
>>> data structures that the Python "+=" operation will create.  This will
>>> save a lot of time in the dispersion code.
>>>
>>> Regards,
>>>
>>> Edward
>>>
>>>
>>> On 10 June 2014 15:56, Edward d'Auvergne <[email protected]> wrote:
>>>> Hi Troels,
>>>>
>>>> Here is one suggestion, of many that I have, for significantly
>>>> improving the speed of the analytic dispersion models in your
>>>> 'disp_spin_speed' branch.  The speed ups you have currently achieved
>>>> for spin clusters are huge and very impressive.  But now that you have
>>>> the infrastructure in place, you can advance this much more!
>>>>
>>>> The suggestion has to do with the R20, R20A, and R20B numpy data
>>>> structures.  They way they are currently handled is relatively
>>>> inefficient, in that they are created de novo for each function call.
>>>> This means that memory allocation and Python garbage collection
>>>> happens for every single function call - something which should be
>>>> avoided at almost all costs.
>>>>
>>>> A better way to do this would be to have a self.R20_struct,
>>>> self.R20A_struct, and self.R20B_struct created in __init__(), and then
>>>> to pack in the values from the parameter vector into these structures.
>>>> You could create a special structure in __init__() for this.  It would
>>>> have the dimensions [r20_index][ei][si][mi][oi], where the first
>>>> dimension corresponds to the different R20 parameters.  And for each
>>>> r20_index element, you would have ones at the [ei][si][mi][oi]
>>>> positions where you would like R20 to be, and zeros elsewhere.  The
>>>> key is that this is created at the target function start up, and not
>>>> for each function call.
>>>>
>>>> This would be combined with the very powerful 'out' argument set to
>>>> self.R20_struct with the numpy.add() and numpy.multiply() functions to
>>>> prevent all memory allocations and garbage collection.  Masks could be
>>>> used, but I think that that would be much slower than having special
>>>> numpy structures with ones where R20 should be and zeros elsewhere.
>>>> For just creating these structures, looping over a single r20_index
>>>> loop and multiplying by the special [r20_index][ei][si][mi][oi]
>>>> one/zero structure and using numpy.add() and numpy.multiply() with out
>>>> arguments would be much, much faster than masks or the current
>>>> R20_axis logic.  It will also simplify the code.
>>>>
>>>> Regards,
>>>>
>>>> Edward
>>>
>>> _______________________________________________
>>> relax (http://www.nmr-relax.com)
>>>
>>> This is the relax-devel mailing list
>>> [email protected]
>>>
>>> To unsubscribe from this list, get a password
>>> reminder, or change your subscription options,
>>> visit the list information page at
>>> https://mail.gna.org/listinfo/relax-devel

_______________________________________________
relax (http://www.nmr-relax.com)

This is the relax-devel mailing list
[email protected]

To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-devel

Re: Speed up suggestion for task #7807.

Reply via email to